Running InfinStor Transforms in a Docker container
Transform Environments - conda and Docker
By default, InfinStor transforms are executed in a conda environment that is captured from the Data Scientist's Jupyterlab server machine, e.g. the Data Scientists's laptop or the cloud hosted Jupyterlab server. The conda command used to capture the environment is 'conda env export --no-builds'. The conda environment is re-created at the execution node by the MLflow library. This system does not work in some cases, for example:
- The environment requires native libraries
- The pip package is installed using a reference to a git tree such as 'pip install git+https://github.com/JoHof/lungmask'
In such cases, InfinStor transforms can be run in a custom Docker container created using a specification provided at the time of transform capture. The Docker container specification is supplied in the format of a Dockerfile, however it is embedded in the jupyterlab cell enclosed within a triple quote block
Note that the Dockerfile environment is only valid for Cloud execution, e.g. singlevm. When the transform is executed inline, it is always executed in the Data Scientists's local python kernel. The assumption here is that the Data Scientist has all required native libraries installed.
Example Dockerfile for creating a custom Docker container at the execution location
FROM pytorch/pytorch RUN apt update RUN apt install libgdcm-tools git emacs -y RUN pip install nibabel 'git+https://github.com/Project-MONAI/MONAI@0.2.0' SimpleITK 'git+https://github.com/JoHof/lungmask' dicom2nifti scikit-image infinstor infinstor-mlflow-plugin jupyterlab-infinstor
In the above example, note the following details:
- The base docker container is pytorch/pytorch
- Native packages libgdcm-tools, git and emacs are installed using apt.
- pip is used to install the required pip packages nibabel, 'git+https://github.com/Project-MONAI/MONAI@0.2.0', SimpleITK, 'git+https://github.com/JoHof/lungmask', dicom2nifti, scikit-image, infinstor, infinstor-mlflow-plugin, jupyterlab-infinstor
- The InfinStor required packages infinstor, infinstor-mlflow-plugin and jupyterlab-infinstor are added to the pip install command
Method of specifying the Dockerfile contents in a jupyterlab cell before capturing the transform
The Dockerfile contents must be specified within a triple quote block in the jupyterlab cell before capturing it. Here is an example dir-by-dir transform cell with the Dockerfile specified
import os import shutil # This transform is called for each directory in the chosen data def infin_transform_dir_by_dir(input_dir, output_dir, **kwargs): print('input_dir=' + input_dir + ', output_dir=' + output_dir) for onefile in os.listdir(input_dir): if (os.path.isfile(os.path.join(input_dir, onefile))): shutil.copy(os.path.join(input_dir, onefile), os.path.join(output_dir, onefile)) """ %infinstor-dockerfile FROM pytorch/pytorch RUN apt update RUN apt install libgdcm-tools git emacs -y RUN pip install nibabel 'git+https://github.com/Project-MONAI/MONAI@0.2.0' SimpleITK 'git+https://github.com/JoHof/lungmask' dicom2nifti scikit-image infinstor infinstor-mlflow-plugin jupyterlab-infinstor %infinstor-dockerfile """