Using InfinStor Mlflow Projects¶
The InfinStor Mlflow Projects plugin is used to run Mlflow projects in the cloud. It can also be used to run InfinStor transforms as Mlflow Projects in the cloud. Running in the cloud can be done in a single EC2 VM or scaled out in an EMR cluster.
Summary¶
- The InfinStor pip packages infinstor-mlflow-plugin and infinstor must be installed
- The user must be logged into his or her InfinStor account using the command
- python -m infinstor_mlflow_plugin.login
- The MLFLOW_TRACKING_URI environment variable must be set to infinstor://mlflow.infinstor.com/
- The MLproject's conda.yaml file must be edited to include the pip packages infinstor, infinstor-mlflow-plugin and boto3
- Projects can be run locally using:
- mlflow run .
- Projects can be run in a Single VM in the cloud using the command:
- mlflow run -b infinstor-backend --backend-config '{"instance_type": "t3.xlarge"}' .
Logging in¶
First, login to the InfinStor service as follows:
python -m infinstor_mlflow_plugin.login
mlflow CLI¶
InfinStor mlflow server is configured using the environment variable MLFLOW_TRACKING_URI. For example, in the case of bash:
export MLFLOW_TRACKING_URI=infinstor://mlflow.infinstor.com/
mlflow experiments list
Running mlflow project on the local machine and tracking using the InfinStor service¶
In the following example, we are running the xgboost example included with mlflow. In order to track using InfinStor Starter's mlflow service, the pip packages infinstor, infinstor-mlflow-plugin and boto3 must be added to the conda.yaml file for the project. Here is the conda.yaml file for the mlflow xgboost example. It has been edited - the infinstor, infinstor-mlflow-plugin and boto3 packages have been added to the pip package list.
name: xgboost-example
channels:
- defaults
- anaconda
- conda-forge
dependencies:
- python=3.6
- xgboost
- pip
- pip:
- mlflow>=1.6.0
- matplotlib
- infinstor
- infinstor-mlflow-plugin
- boto3
export MLFLOW_TRACKING_URI=infinstor://mlflow.infinstor.com/
python -m infinstor_mlflow_plugin.login
mlflow run .
Running an MLflow project in a Single VM in the Cloud using the InfinStor backend¶
export MLFLOW_TRACKING_URI=infinstor://mlflow.infinstor.com/
python -m infinstor_mlflow_plugin.login
mlflow run -b infinstor-backend --backend-config '{"instance_type": "t3.xlarge"}' .
Note the parameter -b infinstor-backend. This tells the mlflow library to use the InfinStor backend for running the project Note also the configuration item instance_type. This tells the infinstor-backend to start an instance of the specified type
The MLflow UI now shows the following
Note that there is a new experiment - infinstor-mlflow-1vm. This experiment records a new run everytime a VM is started in EC2 to run a project started using the infinstor-backend