Skip to content

Running InfinStor Transforms in a Docker container

Transform Environments - conda and Docker

When transporting the transform and executing in the cheapest cloud instance available, InfinStor creates the correct environment for the transform.

InfinStor transforms always have an environment captured and stored along with the transform

The environment may be a conda environment or a docker environment

InfinStor supports three types of environment capture mechanisms:

  • conda environment captured from the jupyterlab location
  • docker environment specified using the Data Scientist
  • copy environment from another transform

Conda Environment

The Data Scientist can choose a conda enviroment from those available to the jupyterlab

The conda command used to capture the environment is 'conda env export --no-builds'. The conda environment is re-created at the execution node by the MLflow library. This system does not work in some cases, e.g. if the environment requires native libraries

Docker Environment

The Data Scientist can either manually copy/paste dockerfile contents into the UI or use the wizard to specify the base image, apt packages and pip packages for the docker environment.

Docker environments are necessary for the following cases.

  • The environment requires native libraries
  • The pip package is installed using a reference to a git tree such as 'pip install git+'

Example Dockerfile for creating a custom Docker container at the execution location

FROM pytorch/pytorch

RUN apt update
RUN apt install libgdcm-tools git emacs -y

RUN pip install nibabel 'git+' SimpleITK 'git+' dicom2nifti scikit-image infinstor infinstor-mlflow-plugin jupyterlab-infinstor

In the above example, note the following details:

  • The base docker container is pytorch/pytorch
  • Native packages libgdcm-tools, git and emacs are installed using apt.
  • pip is used to install the required pip packages nibabel, 'git+', SimpleITK, 'git+', dicom2nifti, scikit-image, infinstor, infinstor-mlflow-plugin, jupyterlab-infinstor
  • The InfinStor required packages infinstor, infinstor-mlflow-plugin and jupyterlab-infinstor are added to the pip install command

Note that the Dockerfile environment is only valid for Cloud execution, e.g. singlevm. When the transform is executed inline, it is always executed in the Data Scientists's local python kernel. The assumption here is that the Data Scientist has all required native libraries installed.

Copy Environment from another transform

The environment for a transform being captured can simply be copied from an existing transform.