Skip to content

Cloud Hosted Jupyterlab for Data Scientists

InfinStor includes the capability to host jupyterlab instances in the cloud for data scientists. Our service provides the jupyterhub for free and we create a jupyterlab VM in your AWS/Azure account for each user. Idle VMs are stopped so that you don't pay for compute cost. Our service has built in authentication using Cognito.

One thing that is unique about our approach is that we don't have a kubernetes layer in our managment stack - that means when a user becomes idle, we can shutdown the VM immediately. We do not have issues such as one single active user preventing a large kubernetes host from being released.

Here's a high level architecture diagram.

Architecture

Note that we use completely stock jupyterlab version 3 in our service. Any extensions, add-ons, etc. that work with open source jupyterlab will work with out cloud hosted jupyterlab.

Enable Jupyterlab for Data Scientist

The admin account is responsible for enabling Jupyterlab for each Data Scientist in the administrator's account. This is accomplished by browsing to Users -> Manage Users in the InfinStor dashboard (https://service.infinstor.com/). In the table of users, there is a Jupyterlab column with a slider. Clicking on the slider pops up a flyout and the admin can choose the instance type for that Data Scientist.

Data Scientist access to jupyterlab

Data Scientists may access their jupyterlab by browsing to https://jupyterhub.infinstor.com/ and logging in using their InfinStor username and password

Notes:

  • The first time that the Data Scientist accesses their jupyterlab, the operation involves creating a new VM of the admin specified type. This could potentially take 30 minutes or more
  • After 15 minutes of idle time, the jupyterlab instance is shut down
  • The Data Scientist will not lose any content(notebooks, extensions, pip packages, etc.) in the jupyterlab
  • The Data Scientist can restart the jupyterlab instance by browsing to https://jupyterhub.infinstor.com/
  • Restarts of the lab should take no more than a few minutes

Installing Add-ons to the jupyterlab, e.g. jupyterlab-git

The jupyterlab notebook server process is started using the python binary /opt/jupyterhub/bin/python3

Adding any pip packages to the jupyter server may be accomplished using the following command

# /opt/conda/bin/python -m pip install <package_to_be_installed>

For example, to install the jupyterlab-git extension to jupyterlab, run the following command in a cell in a notebook

!/opt/conda/bin/python -m pip install jupyterlab-git

After running the above command, jupyterlab must be restarted. This can be accomplished in InfinStor Cloud Hosted Jupyterlab as follows. Go to File->Hub Control Panel, and click on 'Stop My Server'. Once the server stops and the 'Start My Server' button appears, click on that button. This restarts jupyterlab.

conda user environment for ipython kernels

In addition to the python binary used for the notebook server, conda is available to data scientists at /opt/conda. The /opt/conda enviroment is ideal for data scientists to install their own custom kernels.

Here is an example of adding a python 3.7 kernel

conda create -n 'MyPython3.7' python=3.7
. /opt/conda/bin/activate MyPython3.7
pip install mlflow
pip install infinstor_mlflow_plugin
pip install infinstor
conda install ipykernel
python -m ipykernel install --name m3.7 --display-name="My Python 3.7"

In the above example, we create a conda env called MyPython3.7 and add it as a kernel to the jupyterlab

For this change to take effect, you must stop the Jupyterlab server and start it using the hub control panel. This is accomplished by File->Hub Control Panel->Stop My Server followed by Start

Updating jupyterlab extension package, e.g. jupyterlab-infinstor

Data Scientists can update their jupyterlab extension packages such as the jupyterlab-infinstor package as follows:

Step 1: Uninstall currently installed package by running the following in a notebook cell

!jupyter labextension uninstall jupyterlab-infinstor

Step 2: Install latest version of package by running the following in a notebook cell

!jupyter labextension install jupyterlab-infinstor

After running the above command, jupyterlab must be restarted. This can be accomplished in InfinStor Cloud Hosted Jupyterlab as follows. Go to File->Hub Control Panel, and click on 'Stop My Server'. Once the server stops and the 'Start My Server' button appears, click on that button. This restarts jupyterlab.