Running InfinStor Transforms in a Docker container
Transform Environments - conda and Docker
When transporting the transform and executing in the cheapest cloud instance available, InfinStor creates the correct environment for the transform.
InfinStor transforms always have an environment captured and stored along with the transform
The environment may be a conda environment or a docker environment
InfinStor supports three types of environment capture mechanisms:
- conda environment captured from the jupyterlab location
- docker environment specified using the Data Scientist
- copy environment from another transform
The Data Scientist can choose a conda enviroment from those available to the jupyterlab
The conda command used to capture the environment is 'conda env export --no-builds'. The conda environment is re-created at the execution node by the MLflow library. This system does not work in some cases, e.g. if the environment requires native libraries
The Data Scientist can either manually copy/paste dockerfile contents into the UI or use the wizard to specify the base image, apt packages and pip packages for the docker environment.
Docker environments are necessary for the following cases.
- The environment requires native libraries
- The pip package is installed using a reference to a git tree such as 'pip install git+https://github.com/JoHof/lungmask'
Example Dockerfile for creating a custom Docker container at the execution location
FROM pytorch/pytorch RUN apt update RUN apt install libgdcm-tools git emacs -y RUN pip install nibabel 'git+https://github.com/Project-MONAI/MONAI@0.2.0' SimpleITK 'git+https://github.com/JoHof/lungmask' dicom2nifti scikit-image infinstor infinstor-mlflow-plugin jupyterlab-infinstor
In the above example, note the following details:
- The base docker container is pytorch/pytorch
- Native packages libgdcm-tools, git and emacs are installed using apt.
- pip is used to install the required pip packages nibabel, 'git+https://github.com/Project-MONAI/MONAI@0.2.0', SimpleITK, 'git+https://github.com/JoHof/lungmask', dicom2nifti, scikit-image, infinstor, infinstor-mlflow-plugin, jupyterlab-infinstor
- The InfinStor required packages infinstor, infinstor-mlflow-plugin and jupyterlab-infinstor are added to the pip install command
Note that the Dockerfile environment is only valid for Cloud execution, e.g. singlevm. When the transform is executed inline, it is always executed in the Data Scientists's local python kernel. The assumption here is that the Data Scientist has all required native libraries installed.
Copy Environment from another transform
The environment for a transform being captured can simply be copied from an existing transform.