# Contents - [Installation](#installation) - [Experiment setup](#experiment-setup) - [Running the experiment](#running-the-experiment-locally) - [Running on Google Colab](#running-the-experiment-google-colab) - [MLDev features](#advanced-usage) - [Computable expressions](#using-expressions-in-the-experiment-configuration) - [Custom types in experiment](#using-custom-types) - [Jupyter integration](#jupyter-integration) - [Running with Tensorboard](#using-tensorboard-on-google-colab) - [Telegram notifications](#telegram-notifications) - [Collaboration Tool](#collaboration-tool) - [Running as Gitlab CI job](#running-pipelines-through-gitlab) # Installation ## Pre-requisites You will need the following: 1. A Git account at a publicly accessible repo hosting provider Github/Gitlab or your own (a Gitlab account is needed for pipeline automation) 2. A Google account - to run experiments in the cloud 3. A Google account to save experiment data 4. An ngrok.io account for Tensorboard access 5. A Telegram account to receive notifications from the bot ## Install system packages This is needed if you are going to run mldev on raw ``ubuntu:18.04`` or other Docker container images. A list of system packages to be installed is provided in the [``install_reqs.sh``](https://gitlab.com/mlrep/mldev/-/blob/develop/install_reqs.sh). This script is run if needed during installation. ## Install mldev Get the latest version of our install file to your local machine and run it. ```shell script $ curl https://gitlab.com/mlrep/mldev/-/raw/develop/install_mldev.sh -o install_mldev.sh $ chmod +x ./install_mldev.sh $ ./install_mldev.sh base ``` You may be asked for ``root`` privileges if there are [system packages to be installed](#install-system-packages) Wait a couple of minutes until installation will be done and then you are almost ready to use our instrument, congrats! ## Alternative sources It is also possible to install mldev from [a PyPI repostory](https://pypi.org) ```shell script $ pip install mldev ``` ## Configuration files MLdev may use ``config.yaml`` file for its configuration. Check this wiki page for detail on [mldev configuration files](https://gitlab.com/mlrep/mldev/-/wikis/Конфигурационный-файл-и-переменные-по-умолчанию). ## Installing extras mldev provides the following extra dependencies which can be installed if needed - **bot** - installs a Telegram notification bot, configured as ``!NotificationService`` service - **controller** - installs dependencies for Flask-based model serving, use as ``!ModelController`` service - **tensorboard** - for tracking experiment progress using Tensorboard, add as ``!TensorBoardService`` service - **dvc** - adds data version controlled stages using DVC and git, configure as ``!Stage`` - **base** - is the most basic version, uses ``!BasicStage`` without version control - **jupyter** - adds stages supporting Jupyter Notebooks code execution, configure as ``!JupyterStage`` - **collab** - adds a collaboration tool that provides the capability for researchers to work together You can install them like this ```shell script $ ./install_mldev.sh base bot dvc tensorboard controller jupyter collab ``` When installed, you can add them in the ``config.yaml``: ```yaml extras: base: mldev.experiment_objects bot: mldev_bot.bot_service tensorboard: mldev_tensorboard.tensorboard_service dvc: mldev_dvc.dvc_stage jupyter: mldev_jupyter.ipython controller: mldev_controller.controller_service collab: mldev_collab.collab ``` Note that values here refer to modules that contain definitions of the correponding stages and services. See examples of each in [template-default](https://gitlab.com/mlrep/template-default) and in [test config.yaml](https://gitlab.com/mlrep/mldev/-/blob/develop/test/config.yaml) # Experiment setup Experiment constists of stages, services, pipelines and any other custom object you would like to add. **Stages** are unit of the experiment that need their inputs and outputs versioned. For example, for repoducibility reasons. MLdev currently supports DVC stages. **Pipelines** are sequences of stages and service that can be run as a single experiment **Services** are anything that does not produce results but help running the experiment. For instance, a Tensorboard running alongside to display training progress is a service. Experiment description is usually located in ``experiment.yml`` file. You can specify another file using ``-f`` switch like this: ```shell script $ mldev run -f ``` Check some examples in [templates](https://gitlab.com/mlrep/) Here we provide an instruction on making a new experiment using our instrument. You can get help on MLdev command anytime by typing ``mldev --help`` ## Step 1. Create a separate project for the experiment Initialize mldev in the current folder: ```shell script $ mldev init ``` You use one of the templates (learn about mldev templates [here](https://gitlab.com/mlrep/mldev/-/wikis/Mldev-Templates)). Here are valid examples of using a template. ```shell script # Get a template from MLdev by name $ mldev init -t template-default # Get a template from GitHub by full URL # Like https://github.com/user/project (no .git suffix!) $ mldev init -t https://github.com/ # Get a template from GitLab by full URL # Like https://gitlab.com/user/project (no .git suffix!) # $ mldev init -t https://gitlab.com/ ``` Or just don't use ``-t`` flag, template will be set to ``template-default`` automatically. During initialization you may be asked about a new URL for the project and your Git login and email. This is needed to put your new experiment code under version control. If you do not need this, add the ``--no-commit`` switch like this: ```shell script $ mldev init --no-commit ``` You can also reuse an existing folder where you experiment is stored. Set ``-r`` (reuse) as in here: ```shell script $ mldev init -r ``` At this step DVC might ask you about Google Drive folder id for storing experiment data. This is usually a hexademical string - the last part of the Google Drive folder URL. For example, in the URL ``https://drive.google.com/drive/folders/1SqqJB9eDk822GNgyJ2PQWMy_IG1AFhWO`` the requested folder id will be ``1SqqJB9eDk822GNgyJ2PQWMy_IG1AFhWO`` See [DVC configuration](https://gitlab.com/mlrep/mldev/-/wikis/DVC_configuration) for more details. Do not forget to step into the new experiment folder! ```shell script $ cd ``` ## Step 2. Configure your experiment Configure your experiment with [experiment.yml](https://gitlab.com/mlrep/mldev/-/wikis/new-format-of-experiment.yml-configuration#monitoring-services) ## Step 3. Commit changes By default MLdev will add and commit necessary configuration files to the new Git repo. You may review them before pushing: ```shell script $ git log $ git push ``` Done! # Running the experiment (locally) If you have followed all the steps above, you can now run your experiment: ```shell script $ mldev run ``` You can add ``--no-commit`` switch to skip adding results to version control. # Running the experiment (Google Colab) Open a Google Colab notebook and add and run the following lines in the beginning (assuming a Gitlab account). Note that commands start with the exclamation sign. Example is done for the gitlab, if you use another storage please change `gitlab.com` in the line below to your storage. ```shell script !git clone https://:@gitlab.com//.git ``` then you will need to install mldev on Google colab (please do Experiment Setup Step1 on Google Colab) Add and run this lines to run the experiment ```shell script cd !mldev run ``` # Advanced usage ## Using expressions in the experiment configuration Instead of constant strings, a calculable python expressions can be used in experiment configuration. In this example, we use a full path of the output as a command line parameter in the script ```yaml prepare: &prepare_stage !Stage name: prepare params: size: 1 outputs: - !path path: "./data" files: - "first.txt" - !path path: "./logs" script: - > python3 src/prepare.py --to \"${self.outputs[0].get_files()[0]}\" --log \"${path(self.outputs[1])}\" ``` Here everything inside ``${...}`` is an **expression**. Expressions are evaluated at the time they are used (runtime) using a restricted subset of ``python`` language. Expressions are required to return a value. In the example above the following script will be run for the stage (note the ``/home/user/projects/template-mldev`` prepended): ```shell script $ python3 src/prepare.py \ --to "/home/user/projects/template-mldev/data/first.txt" \ --log "/home/user/projects/template-mldev/logs" ``` The following variables are available to expressions: - ``self`` points to the current ``Stage``, ``Pipeline`` or ``Service`` - ``root`` points to the yaml document itself - ``env`` contains a dictionary of environment variables avaialble to mldev as well as ``environ`` from the mldev config, see [mldev configuration](#configuration-files) for more details. These functions can be used: - ``json(obj)`` converts a dictionary, list or scalar value to a JSON representation and escapes it. - ``path(str)`` expands ``str`` to the full path - ``params(obj)`` expands ``dict`` to a sequence of ``--key "value"`` pairs, uses ``--key`` for true booleans ## Using custom types MLdev provides the following custom types or tags (yaml-y speaking), which can be used to describe an experiment - ``!path`` creates a ``FilePath`` object - ``!function`` imports a ``python`` function or class, which can be used in expressions More details on configuring stages and services are given in [reference doc for experiment.yml](https://gitlab.com/mlrep/mldev/-/wikis/new-format-of-experiment.yml-configuration#monitoring-services) **Security considerations** Using ``!function`` allows to import any python function or class that python import subsystem can load. ## Jupyter integration > applies to version 0.4.dev0 and higher > > uses ipython 7.16+ and needs compatible ipynb format, works with Jupyter 6.4+ MLDev module ``mldev_jupyter`` provides for running Jupyter notebooks in MLDev pipelines. A pipeline inside the notebook ``.ipynb`` is added in a separate Markdown cell as in an example below. ```` ```yaml my_pipeline: - hello #%mldev nb_context ``` ```` Here ``#%mldev nb_context`` specifies this ``yaml`` block as a MLDev pipeline definition. The ``pipeline`` block lists cell names in the needed execution order. Cell names are set as comments inside notebook code cells as in here ```python print("Hello world!") #%mldev hello ``` Then link the notebook to your experiment definition like this ``` all_notebook: !GenericPipeline runs: - !JupyterStage name: all_ipython notebook_pipeline: path/to/test_notebook.all_cells my_pipeline: !GenericPipeline runs: - !JupyterStage name: my_ipython notebook_pipeline: path/to/test_notebook.my_pipeline ``` Here ``my_pipeline`` is the name of the pipeline from the ``nb_context`` block in the notebook. **Run all cells** in the notebook using ``all_cells`` keyword instead of pipeline name ``my_pipeline``. ## Telegram notifications In order to use telegram notification bot you will need to obtain your personal token. Please visit: [Telegram token](https://gitlab.com/mlrep/mldev/-/wikis/instructions-for-tokens-obtaining-for-monitoring-services#telegram-bot) Open Telegram using your account via link [Telegram](https://web.telegram.org) Open your Telegram Bot and type `/start` enjoy logs via telegram while having a cup of tea! ## Using Tensorboard on Google Colab In order to use Tensorboard you will need to obtain ngrok token - please visit: [ngrok token](https://gitlab.com/mlrep/mldev/-/wikis/instructions-for-tokens-obtaining-for-monitoring-services#tensorboard) - Open your Ngrok DashBoard via link: [Ngrok](https://ngrok.com) - or add and run this commands to the Google Colab notebook: ```shell script !chmod ugo+x ./src/ngrok_urls.sh !./src/ngrok_urls.sh ``` ## Model demo Optional, to publish your model with flask controller please visit: [Flask Controller Instruction](https://gitlab.com/mlrep/mldev/-/wikis/Flask-Controller-Instructions) ## Collaboration Tool > applies to version 0.5 and higher > > uses Git v2.17.1+ The MLDev module ``mldev_collab`` provides the capability for researchers to work together. For example, the module allows tracking changes made by different researchers working on the same experiment and automatically merging them. All of this is achieved using familiar Git commands (commit, merge, pull, push), and there's no need to make any changes to the process of working with the experiment or Git. For more details on how to use the collaboration tool, please refer [here](https://gitlab.com/mlrep/mldev/-/wikis/mldev-collab-tool). ## Running pipelines through Gitlab *This functionality is still under development and will be here soon* It is possible to run an experiment via Gitlab CI/CD. Detailed instructions will be provided when available. See [template-default](https://gitlab.com/mlrep/template-default/.gitlab-ci.yml) for an example.