Contents

Installation

Pre-requisites

You will need the following:

  1. A Git account at a publicly accessible repo hosting provider Github/Gitlab or your own (a Gitlab account is needed for pipeline automation)

  2. A Google account - to run experiments in the cloud

  3. A Google account to save experiment data

  4. An ngrok.io account for Tensorboard access

  5. A Telegram account to receive notifications from the bot

Install system packages

This is needed if you are going to run mldev on raw ubuntu:18.04 or other Docker container images.

A list of system packages to be installed is provided in the install_reqs.sh. This script is run if needed during installation.

Install mldev

Get the latest version of our install file to your local machine and run it.

$ curl https://gitlab.com/mlrep/mldev/-/raw/develop/install_mldev.sh -o install_mldev.sh 
$ chmod +x ./install_mldev.sh
$ ./install_mldev.sh base

You may be asked for root privileges if there are system packages to be installed

Wait a couple of minutes until installation will be done and then you are almost ready to use our instrument, congrats!

Alternative sources

It is also possible to install mldev from a PyPI repostory

$ pip install mldev

Configuration files

MLdev may use config.yaml file for its configuration. Check this wiki page for detail on mldev configuration files.

Installing extras

mldev provides the following extra dependencies which can be installed if needed

  • bot - installs a Telegram notification bot, configured as !NotificationService service

  • controller - installs dependencies for Flask-based model serving, use as !ModelController service

  • tensorboard - for tracking experiment progress using Tensorboard, add as !TensorBoardService service

  • dvc - adds data version controlled stages using DVC and git, configure as !Stage

  • base - is the most basic version, uses !BasicStage without version control

  • jupyter - adds stages supporting Jupyter Notebooks code execution, configure as !JupyterStage

  • collab - adds a collaboration tool that provides the capability for researchers to work together

You can install them like this

$ ./install_mldev.sh base bot dvc tensorboard controller jupyter collab

When installed, you can add them in the config.yaml:

extras:
  base: mldev.experiment_objects
  bot: mldev_bot.bot_service
  tensorboard: mldev_tensorboard.tensorboard_service
  dvc: mldev_dvc.dvc_stage
  jupyter: mldev_jupyter.ipython
  controller: mldev_controller.controller_service
  collab: mldev_collab.collab

Note that values here refer to modules that contain definitions of the correponding stages and services.

See examples of each in template-default and in test config.yaml

Experiment setup

Experiment constists of stages, services, pipelines and any other custom object you would like to add.

Stages are unit of the experiment that need their inputs and outputs versioned. For example, for repoducibility reasons. MLdev currently supports DVC stages.

Pipelines are sequences of stages and service that can be run as a single experiment

Services are anything that does not produce results but help running the experiment. For instance, a Tensorboard running alongside to display training progress is a service.

Experiment description is usually located in experiment.yml file. You can specify another file using -f switch like this:

$ mldev run -f <path/to/experiment.yml> <pipeline name>

Check some examples in templates

Here we provide an instruction on making a new experiment using our instrument. You can get help on MLdev command anytime by typing mldev --help

Step 1. Create a separate project for the experiment

Initialize mldev in the current folder:

$ mldev init <your_folder>

You use one of the templates (learn about mldev templates here). Here are valid examples of using a template.

# Get a template from MLdev by name
$ mldev init <your_folder> -t template-default

# Get a template from GitHub by full URL
# Like https://github.com/user/project  (no .git suffix!)
$ mldev init <your_folder> -t https://github.com/<path-to-template>

# Get a template from GitLab by full URL
# Like https://gitlab.com/user/project  (no .git suffix!)
# $ mldev init <your_folder> -t https://gitlab.com/<path-to-template>

Or just don“t use -t flag, template will be set to template-default automatically.

During initialization you may be asked about a new URL for the project and your Git login and email. This is needed to put your new experiment code under version control. If you do not need this, add the --no-commit switch like this:

$ mldev init --no-commit <your_folder>

You can also reuse an existing folder where you experiment is stored. Set -r (reuse) as in here:

$ mldev init -r <your_folder>

At this step DVC might ask you about Google Drive folder id for storing experiment data. This is usually a hexademical string - the last part of the Google Drive folder URL. For example, in the URL https://drive.google.com/drive/folders/1SqqJB9eDk822GNgyJ2PQWMy_IG1AFhWO the requested folder id will be 1SqqJB9eDk822GNgyJ2PQWMy_IG1AFhWO

See DVC configuration for more details.

Do not forget to step into the new experiment folder!

$ cd <your_folder>

Step 2. Configure your experiment

Configure your experiment with experiment.yml

Step 3. Commit changes

By default MLdev will add and commit necessary configuration files to the new Git repo. You may review them before pushing:

$ git log
$ git push

Done!

Running the experiment (locally)

If you have followed all the steps above, you can now run your experiment:

$ mldev run

You can add --no-commit switch to skip adding results to version control.

Running the experiment (Google Colab)

Open a Google Colab notebook and add and run the following lines in the beginning (assuming a Gitlab account). Note that commands start with the exclamation sign. Example is done for the gitlab, if you use another storage please change gitlab.com in the line below to your storage.

!git clone https://<user>:<password>@gitlab.com/<user>/<experiment repo>.git

then you will need to install mldev on Google colab (please do Experiment Setup Step1 on Google Colab)

Add and run this lines to run the experiment

cd <your_project_folder>
!mldev run

Advanced usage

Using expressions in the experiment configuration

Instead of constant strings, a calculable python expressions can be used in experiment configuration.

In this example, we use a full path of the output as a command line parameter in the script

prepare: &prepare_stage !Stage
  name: prepare
  params:
    size: 1
  outputs:
    - !path
      path: "./data"
      files:
        - "first.txt"
    - !path
      path: "./logs"
  script:
    - >
      python3 src/prepare.py 
              --to \"${self.outputs[0].get_files()[0]}\" 
              --log \"${path(self.outputs[1])}\"

Here everything inside ${...} is an expression. Expressions are evaluated at the time they are used (runtime) using a restricted subset of python language. Expressions are required to return a value.

In the example above the following script will be run for the stage (note the /home/user/projects/template-mldev prepended):

$ python3 src/prepare.py \ 
          --to "/home/user/projects/template-mldev/data/first.txt" \
          --log "/home/user/projects/template-mldev/logs"

The following variables are available to expressions:

  • self points to the current Stage, Pipeline or Service

  • root points to the yaml document itself

  • env contains a dictionary of environment variables avaialble to mldev as well as environ from the mldev config, see mldev configuration for more details.

These functions can be used:

  • json(obj) converts a dictionary, list or scalar value to a JSON representation and escapes it.

  • path(str) expands str to the full path

  • params(obj) expands dict to a sequence of --key "value" pairs, uses --key for true booleans

Using custom types

MLdev provides the following custom types or tags (yaml-y speaking), which can be used to describe an experiment

  • !path creates a FilePath object

  • !function imports a python function or class, which can be used in expressions

More details on configuring stages and services are given in reference doc for experiment.yml

Security considerations

Using !function allows to import any python function or class that python import subsystem can load.

Jupyter integration

applies to version 0.4.dev0 and higher

uses ipython 7.16+ and needs compatible ipynb format, works with Jupyter 6.4+

MLDev module mldev_jupyter provides for running Jupyter notebooks in MLDev pipelines.

A pipeline inside the notebook .ipynb is added in a separate Markdown cell as in an example below.

```yaml
my_pipeline:
  - hello

#%mldev nb_context
```

Here #%mldev nb_context specifies this yaml block as a MLDev pipeline definition. The pipeline block lists cell names in the needed execution order.

Cell names are set as comments inside notebook code cells as in here

print("Hello world!")

#%mldev hello

Then link the notebook to your experiment definition like this

all_notebook: !GenericPipeline
  runs:
  - !JupyterStage
    name: all_ipython
    notebook_pipeline: path/to/test_notebook.all_cells

my_pipeline: !GenericPipeline
  runs:
  - !JupyterStage
    name: my_ipython
    notebook_pipeline: path/to/test_notebook.my_pipeline 

Here my_pipeline is the name of the pipeline from the nb_context block in the notebook.

Run all cells in the notebook using all_cells keyword instead of pipeline name my_pipeline.

Telegram notifications

In order to use telegram notification bot you will need to obtain your personal token. Please visit: Telegram token

Open Telegram using your account via link Telegram Open your Telegram Bot and type

/start

enjoy logs via telegram while having a cup of tea!

Using Tensorboard on Google Colab

In order to use Tensorboard you will need to obtain ngrok token - please visit: ngrok token

  • Open your Ngrok DashBoard via link: Ngrok

  • or add and run this commands to the Google Colab notebook:

!chmod ugo+x ./src/ngrok_urls.sh
!./src/ngrok_urls.sh

Model demo

Optional, to publish your model with flask controller please visit: Flask Controller Instruction

Collaboration Tool

applies to version 0.5 and higher

uses Git v2.17.1+

The MLDev module mldev_collab provides the capability for researchers to work together. For example, the module allows tracking changes made by different researchers working on the same experiment and automatically merging them. All of this is achieved using familiar Git commands (commit, merge, pull, push), and there“s no need to make any changes to the process of working with the experiment or Git.

For more details on how to use the collaboration tool, please refer here.

Running pipelines through Gitlab

This functionality is still under development and will be here soon

It is possible to run an experiment via Gitlab CI/CD. Detailed instructions will be provided when available. See template-default for an example.