Contents
Installation
Pre-requisites
You will need the following:
A Git account at a publicly accessible repo hosting provider Github/Gitlab or your own (a Gitlab account is needed for pipeline automation)
A Google account - to run experiments in the cloud
A Google account to save experiment data
An ngrok.io account for Tensorboard access
A Telegram account to receive notifications from the bot
Install system packages
This is needed if you are going to run mldev on raw ubuntu:18.04
or other Docker container images.
A list of system packages to be installed is provided in the install_reqs.sh
.
This script is run if needed during installation.
Install mldev
Get the latest version of our install file to your local machine and run it.
$ curl https://gitlab.com/mlrep/mldev/-/raw/develop/install_mldev.sh -o install_mldev.sh
$ chmod +x ./install_mldev.sh
$ ./install_mldev.sh core
You may be asked for root
privileges if there are system packages to be installed
Wait a couple of minutes until installation will be done and then you are almost ready to use our instrument, congrats!
Alternative sources
It is also possible to install mldev from a test PyPI repostory
$ pip install -i https://test.pypi.org/simple mldev
Note: this source is subject to test PyPI repo constraints on availability
Configuration files
MLdev may use config.yaml
file for its configuration. Check this wiki page for detail on mldev configuration files.
Installing extras
mldev provides the following extra dependencies which can be installed if needed
bot - installs a Telegram notification bot, configured as
!NotificationService
servicecontroller - installs dependencies for Flask-based model serving, use as
!ModelController
servicetensorboard - for tracking experiment progress using Tensorboard, add as
!TensorBoardService
servicedvc - adds data version controlled stages using DVC and git, configure as
!Stage
base - is the most basic version, uses
!BasicStage
without version controljupyter - adds stages supporting Jupyter Notebooks code execution, configure as
!JupyterStage
You can install them like this
$ ./install_mldev.sh core bot dvc tensorboard controller jupyter
When installed, you can add them in the config.yaml
:
extras:
base: mldev.experiment_objects
bot: mldev_bot.bot_service
tensorboard: mldev_tensorboard.tensorboard_service
dvc: mldev_dvc.dvc_stage
jupyter: mldev_jupyter.ipython
controller: mldev_controller.controller_service
Note that values here refer to modules that contain definitions of the correponding stages and services.
See examples of each in template-default and in test config.yaml
Experiment setup
Experiment constists of stages, services, pipelines and any other custom object you would like to add.
Stages are unit of the experiment that need their inputs and outputs versioned. For example, for repoducibility reasons. MLdev currently supports DVC stages.
Pipelines are sequences of stages and service that can be run as a single experiment
Services are anything that does not produce results but help running the experiment. For instance, a Tensorboard running alongside to display training progress is a service.
Experiment description is usually located in experiment.yml
file.
You can specify another file using -f
switch like this:
$ mldev run -f <path/to/experiment.yml> <pipeline name>
Check some examples in templates
Here we provide an instruction on making a new experiment using our instrument.
You can get help on MLdev command anytime by typing mldev --help
Step 1. Create a separate project for the experiment
Initialize mldev in the current folder:
$ mldev init <your_folder>
You use one of the templates (learn about mldev templates here). Here are valid examples of using a template.
# Get a template from MLdev by name
$ mldev init <your_folder> -t template-default
# Get a template from GitHub by full URL
# Like https://github.com/user/project (no .git suffix!)
$ mldev init <your_folder> -t https://github.com/<path-to-template>
# Get a template from GitLab by full URL
# Like https://gitlab.com/user/project (no .git suffix!)
# $ mldev init <your_folder> -t https://gitlab.com/<path-to-template>
Or just don“t use -t
flag, template will be set to template-default
automatically.
During initialization you may be asked about a new URL for the project
and your Git login and email.
This is needed to put your new experiment code under version control.
If you do not need this, add the --no-commit
switch like this:
$ mldev init --no-commit <your_folder>
You can also reuse an existing folder where you experiment is stored.
Set -r
(reuse) as in here:
$ mldev init -r <your_folder>
At this step DVC might ask you about Google Drive folder id for storing experiment data.
This is usually a hexademical string - the last part of the Google Drive folder URL.
For example, in the URL https://drive.google.com/drive/folders/1SqqJB9eDk822GNgyJ2PQWMy_IG1AFhWO
the requested folder id will be 1SqqJB9eDk822GNgyJ2PQWMy_IG1AFhWO
See DVC configuration for more details.
Do not forget to step into the new experiment folder!
$ cd <your_folder>
Step 2. Configure your experiment
Configure your experiment with experiment.yml
Step 3 - Commit changes
By default MLdev will add and commit necessary configuration files to the new Git repo. You may review them before pushing:
$ git log
$ git push
Done!
Running the experiment (locally)
If you have followed all the steps above, you can now run your experiment:
$ mldev run
You can add --no-commit
switch to skip adding results to version control.
Running the experiment (Google Colab)
Open a Google Colab notebook and add and run the following lines in the beginning (assuming a Gitlab account).
Note that commands start with the exclamation sign. Example is done for the gitlab, if you use another storage please change gitlab.com
in the line below to your storage.
!git clone https://<user>:<password>@gitlab.com/<user>/<experiment repo>.git
then you will need to install mldev on Google colab (please do Experiment Setup Step1 on Google Colab)
Add and run this lines to run the experiment
cd <your_project_folder>
!mldev run
Advanced usage
Using expressions in the experiment configuration
Instead of constant strings, a calculable python expressions can be used in experiment configuration.
In this example, we use a full path of the output as a command line parameter in the script
prepare: &prepare_stage !Stage
name: prepare
params:
size: 1
outputs:
- !path
path: "./data"
files:
- "first.txt"
- !path
path: "./logs"
script:
- >
python3 src/prepare.py
--to \"${self.outputs[0].get_files()[0]}\"
--log \"${path(self.outputs[1])}\"
Here everything inside ${...}
is an expression.
Expressions are evaluated at the time they are used (runtime) using a restricted subset of python
language.
Expressions are required to return a value.
In the example above the following script will be run for the stage
(note the /home/user/projects/template-mldev
prepended):
$ python3 src/prepare.py \
--to "/home/user/projects/template-mldev/data/first.txt" \
--log "/home/user/projects/template-mldev/logs"
The following variables are available to expressions:
self
points to the currentStage
,Pipeline
orService
root
points to the yaml document itselfenv
contains a dictionary of environment variables avaialble to mldev as well asenviron
from the mldev config, see mldev configuration for more details.
These functions can be used:
json(obj)
converts a dictionary, list or scalar value to a JSON representation and escapes it.path(str)
expandsstr
to the full pathparams(obj)
expandsdict
to a sequence of--key "value"
pairs, uses--key
for true booleans
Using custom types
MLdev provides the following custom types or tags (yaml-y speaking), which can be used to describe an experiment
!path
creates aFilePath
object!function
imports apython
function or class, which can be used in expressions
More details on configuring stages and services are given in reference doc for experiment.yml
Security considerations
Using !function
allows to import any python function or class that python import subsystem can load.
Jupyter integration
applies to version 0.4.dev0 and higher
uses ipython 7.16+ and needs compatible ipynb format, works with Jupyter 6.4+
MLDev module mldev_jupyter
provides for running Jupyter notebooks in MLDev pipelines.
A pipeline inside the notebook .ipynb
is added in a separate Markdown cell
as in an example below.
```yaml
my_pipeline:
- hello
#%mldev nb_context
```
Here #%mldev nb_context
specifies this yaml
block as a MLDev pipeline definition.
The pipeline
block lists cell names in the needed execution order.
Cell names are set as comments inside notebook code cells as in here
print("Hello world!")
#%mldev hello
Then link the notebook to your experiment definition like this
all_notebook: !GenericPipeline
runs:
- !JupyterStage
name: all_ipython
notebook_pipeline: path/to/test_notebook.all_cells
my_pipeline: !GenericPipeline
runs:
- !JupyterStage
name: my_ipython
notebook_pipeline: path/to/test_notebook.my_pipeline
Here my_pipeline
is the name of the pipeline from the nb_context
block in the notebook.
Run all cells in the notebook using all_cells
keyword instead of pipeline name my_pipeline
.
Telegram notifications
In order to use telegram notification bot you will need to obtain your personal token. Please visit: Telegram token
Open Telegram using your account via link Telegram Open your Telegram Bot and type
/start
enjoy logs via telegram while having a cup of tea!
Using Tensorboard on Google Colab
In order to use Tensorboard you will need to obtain ngrok token - please visit: ngrok token
Open your Ngrok DashBoard via link: Ngrok
or add and run this commands to the Google Colab notebook:
!chmod ugo+x ./src/ngrok_urls.sh
!./src/ngrok_urls.sh
Model demo
Optional, to publish your model with flask controller please visit: Flask Controller Instruction
Running pipelines through Gitlab
This functionality is still under development and will be here soon
It is possible to run an experiment via Gitlab CI/CD. Detailed instructions will be provided when available. See template-default for an example.