Skip to content

InfinStor Jupyterlab Integration - Concepts and High Level Architecture

InfinStor Machine Learning Platform is a tool for accelerating the production deployment of Machine Learning models. InfinStor enables Data Scientists and Data Engineers to create accurate models, deploy them into production and monitor them after deployment. InfinStor platform features include:

  • Training Data Management using InfinSnap fine grained snapshots, InfinSlice fine grained slices of data, Logical Snapshots (TensorFlow td.data.Dataset, pytorch DataSet or pandas Dataframe)
  • Data Transformation and Training code management using Transforms. Capture your own transforms to transform raw data into formats consumable by machine learning frameworks. Also, auto-generate useful code snippets
  • Scale Out Compute orchestration for data preparation and training activities against large data sets
  • Model Deployment
  • Data Drift protection
  • MLFlow integration for Machine Learning Workflow management

InfinStor capabilities are accessible to Jupyterlab directly from their Jupyterlab browser window by means of a sidebar.

Components

Jupyterlab Extension

Jupyterlab is a sophisticated application consisting of three distinct pieces of software. InfinStor Jupyterlab extension requires code to be loaded into each of these three pieces of software - A web server which serves up content to a browser based interface - The browser based interface - The web server then sends commands and python snippets to the ipython kernel for execution

Service Website

The InfinStor service website is available at https://service.infinstor.com/ and has functionality for browsing S3 buckets, enabling and disabling InfinSnap for S3 buckets, browsing InfinSnap and InfinSlice, managing labels, etc. It provides a rich interface for managing the subscription to the InfinStor service

Service Backend Components

The InfinStor service backend is implemented in Amazon AWS using a set of Lambdas and other AWS services. Specifically, it includes the following components:

  • InfinStor S3 Events Lambda: InfinStor receives events from your S3 bucket whenver objects are created or deleted in the bucket. This is used to construct InfinSnap metadata. The ingest side which writes data to the S3 bucket is unaltered by InfinStor.
  • Reading from the InfinSnap enabled bucket is accomplished by means of two interfaces:
    • InfinStor Webhdfs Emulation with Active Directory Authentication: Tools with a Hadoop base such as Spark, EMR, Qubole, etc. can use their built-in webhdfs client to access InfinSnap enabled S3 buckets. In addition, the kerberos authentication technology used by Active Directory and other authentication servers is compatible with InfinStor webhdfs. This enables corporate Active Directory users to directly authenticate with InfinStor and be subject to permissions.
    • InfinStor Auto Scaling Webhdfs Proxy: This is useful for applications written to the S3 REST API. Examples include the aws cli, boto3, etc.
  • InfinStor serverless mlflow server