Skip to content

Using InfinStor Mlflow Projects

The InfinStor Mlflow Projects plugin is used to run Mlflow projects in the cloud. It can also be used to run InfinStor transforms as Mlflow Projects in the cloud. Running in the cloud can be done in a single EC2 VM or scaled out in an EMR cluster.

Summary

  • The InfinStor pip packages infinstor-mlflow-plugin and infinstor must be installed
  • The user must be logged into his or her InfinStor account using the command
    • python -m infinstor_mlflow_plugin.login
  • The MLFLOW_TRACKING_URI environment variable must be set to infinstor://infinstor.com/
  • The MLproject's conda.yaml file must be edited to include the pip packages infinstor, infinstor-mlflow-plugin and boto3
  • Projects can be run locally using:
    • mlflow run .
  • Projects can be run in a Single VM in the cloud using the command:
    • mlflow run -b infinstor-backend --backend-config '{"instance_type": "t3.xlarge"}' .

Logging in

First, login to the InfinStor service as follows:

python -m infinstor_mlflow_plugin.login

mlflow CLI

InfinStor mlflow server is configured using the environment variable MLFLOW_TRACKING_URI. For example, in the case of bash:

export MLFLOW_TRACKING_URI=infinstor://infinstor.com/
mlflow experiments list

It may be convenient to add the line 'export MLFLOW_TRACKING_URI=infinstor://infinstor.com/' to an init script such as ~/.bashrc

Running mlflow project on the local machine and tracking using the InfinStor service

In the following example, we are running the xgboost example included with mlflow. In order to track using InfinStor Starter's mlflow service, the pip packages infinstor, infinstor-mlflow-plugin and boto3 must be added to the conda.yaml file for the project. Here is the conda.yaml file for the mlflow xgboost example. It has been edited - the infinstor, infinstor-mlflow-plugin and boto3 packages have been added to the pip package list.

name: xgboost-example
channels:
  - defaults
  - anaconda
  - conda-forge
dependencies:
  - python=3.6
  - xgboost
  - pip
  - pip:
      - mlflow>=1.6.0
      - matplotlib
      - infinstor
      - infinstor-mlflow-plugin
      - boto3

Here is a complete example - the MLFLOW_TRACKING_URI is set, login is complete and the xgboost example project is run locally, with mlflow tracking by the InfinStor service

export MLFLOW_TRACKING_URI=infinstor://infinstor.com/
python -m infinstor_mlflow_plugin.login
mlflow run .

Once this project has successfully completed, you can visit the site https://mlflowui.infinstor.com/ to view the tracking information

Running an MLflow project in a Single VM in the Cloud using the InfinStor backend

export MLFLOW_TRACKING_URI=infinstor://infinstor.com/
python -m infinstor_mlflow_plugin.login
mlflow run -b infinstor-backend --backend-config '{"instance_type": "t3.xlarge"}' .

Note the parameter -b infinstor-backend. This tells the mlflow library to use the InfinStor backend for running the project Note also the configuration item instance_type. This tells the infinstor-backend to start an instance of the specified type

The MLflow UI now shows the following

Note that there is a new experiment - infinstor-mlflow-1vm. This experiment records a new run everytime a VM is started in EC2 to run a project started using the infinstor-backend