Jupyterlab Integration - Concepts and High Level Architecture
InfinStor Data Management Platform for ML is a tool for Data Scientists and Data Engineers to manage data. InfinStor's core capabilities include fine grained snapshots of S3 buckets at any point in time in the past, slices of S3 bucket data that was ingested in a specified period of time, snapshots of tensors, etc.
InfinStor capabilities are accessible to Jupyterlab directly from their Jupyterlab browser window by means of a sidebar.
Storage Feature: InfinSnap
InfinStor maintains metadata about every operation performed on your Cloud Object Store (S3 or compatible). This enables InfinStor to present a view of your bucket as it existed at any time in the Past. This preserves your ability to access a read-only version of the data you trained your model with, even if your data is deleted or modified in any way after your training run. Further, you do not need to organize you data ingest in any specific manner. Finally, you do not need to change your code in order to access data from a different point in time.
Storage Feature: InfinSlice
InfinSlice is an enhancement to InfinSnap - you can specify a start time and an end time, and the data that was ingested in between the start time and end time is presented.
Storage Feature: Labels
Labels add convenience to InfinSnap and InfinStor by adding the ability to name a chosen InfinSnap or InfinSlice. Once labelled, the label name can be used to refer to the data specified. In fact, a label consists of the following:
- InfinSnap time or InfinSlice start and end time
- path in bucket
InfinStor Jupyterlab Extension Components
InfinStor Jupyterlab Extension
Jupyterlab is a sophisticated application consisting of three distinct pieces of software. InfinStor Jupyterlab extension requires code to be loaded into each of these three pieces of software
- A web server which serves up content to a browser based interface
- The browser based interface
- The web server then sends commands and python snippets to the ipython kernel for execution
InfinStor Service Website
The InfinStor service website is available at https://service.infinstor.com/ and has functionality for browsing S3 buckets, enabling and disabling InfinSnap for S3 buckets, browsing InfinSnap and InfinSlice, managing labels, etc. It provides a rich interface for managing the subscription to the InfinStor service
InfinStor Service Backend Components
The InfinStor service backend is implemented in Amazon AWS using a set of Lambdas and other AWS services. Specifically, it includes the following components:
- InfinStor S3 Events Lambda: InfinStor receives events from your S3 bucket whenver objects are created or deleted in the bucket. This is used to construct InfinSnap metadata. The ingest side which writes data to the S3 bucket is unaltered by InfinStor.
- Reading from the InfinSnap enabled bucket is accomplished by means of two interfaces:
- InfinStor Webhdfs Emulation with Active Directory Authentication: Tools with a Hadoop base such as Spark, EMR, Qubole, etc. can use their built-in webhdfs client to access InfinSnap enabled S3 buckets. In addition, the kerberos authentication technology used by Active Directory and other authentication servers is compatible with InfinStor webhdfs. This enables corporate Active Directory users to directly authenticate with InfinStor and be subject to permissions.
- InfinStor Auto Scaling Webhdfs Proxy: This is useful for applications written to the S3 REST API. Examples include the aws cli, boto3, etc.