Cloud Hosted Jupyterlab for Data Scientists¶
InfinStor includes the capability to host jupyterlab instances in the cloud for data scientists. Our service provides the jupyterhub for free and we create a jupyterlab VM in your AWS/Azure account for each user. Idle VMs are stopped so that you don't pay for compute cost. Our service has built in authentication using Cognito.
One thing that is unique about our approach is that we don't have a kubernetes layer in our managment stack - that means when a user becomes idle, we can shutdown the VM immediately. We do not have issues such as one single active user preventing a large kubernetes host from being released.
Here's a high level architecture diagram.
Architecture¶
Note that we use completely stock jupyterlab version 3 in our service. Any extensions, add-ons, etc. that work with open source jupyterlab will work with out cloud hosted jupyterlab.
Enable Jupyterlab for Data Scientist¶
The admin account is responsible for enabling Jupyterlab for each Data Scientist in the administrator's account. This is accomplished by browsing to Users -> Manage Users in the InfinStor dashboard (https://service.infinstor.com/). In the table of users, there is a Jupyterlab column with a slider. Clicking on the slider pops up a flyout and the admin can choose the instance type for that Data Scientist.
Data Scientist access to jupyterlab¶
Data Scientists may access their jupyterlab by browsing to https://jupyterhub.infinstor.com/ and logging in using their InfinStor username and password
Notes:
- The first time that the Data Scientist accesses their jupyterlab, the operation involves creating a new VM of the admin specified type. This could potentially take 30 minutes or more
- After 15 minutes of idle time, the jupyterlab instance is shut down
- The Data Scientist will not lose any content(notebooks, extensions, pip packages, etc.) in the jupyterlab
- The Data Scientist can restart the jupyterlab instance by browsing to https://jupyterhub.infinstor.com/
- Restarts of the lab should take no more than a few minutes
Installing Add-ons to the jupyterlab, e.g. jupyterlab-git¶
The jupyterlab notebook server process is started using the python binary /opt/jupyterhub/bin/python3
Adding any pip packages to the jupyter server may be accomplished using the following command
# /opt/conda/bin/python -m pip install <package_to_be_installed>
For example, to install the jupyterlab-git extension to jupyterlab, run the following command in a cell in a notebook
!/opt/conda/bin/python -m pip install jupyterlab-git
After running the above command, jupyterlab must be restarted. This can be accomplished in InfinStor Cloud Hosted Jupyterlab as follows. Go to File->Hub Control Panel, and click on 'Stop My Server'. Once the server stops and the 'Start My Server' button appears, click on that button. This restarts jupyterlab.
conda user environment for ipython kernels¶
In addition to the python binary used for the notebook server, conda is available to data scientists at /opt/conda. The /opt/conda enviroment is ideal for data scientists to install their own custom kernels.
Here is an example of adding a python 3.7 kernel
conda create -n 'MyPython3.7' python=3.7
. /opt/conda/bin/activate MyPython3.7
pip install mlflow
pip install infinstor_mlflow_plugin
pip install infinstor
conda install ipykernel
python -m ipykernel install --name m3.7 --display-name="My Python 3.7"
In the above example, we create a conda env called MyPython3.7 and add it as a kernel to the jupyterlab
For this change to take effect, you must stop the Jupyterlab server and start it using the hub control panel. This is accomplished by File->Hub Control Panel->Stop My Server followed by Start
Updating jupyterlab extension package, e.g. jupyterlab-infinstor¶
Data Scientists can update their jupyterlab extension packages such as the jupyterlab-infinstor package as follows:
Step 1: Uninstall currently installed package by running the following in a notebook cell
!jupyter labextension uninstall jupyterlab-infinstor
Step 2: Install latest version of package by running the following in a notebook cell
!jupyter labextension install jupyterlab-infinstor
After running the above command, jupyterlab must be restarted. This can be accomplished in InfinStor Cloud Hosted Jupyterlab as follows. Go to File->Hub Control Panel, and click on 'Stop My Server'. Once the server stops and the 'Start My Server' button appears, click on that button. This restarts jupyterlab.