InfinStor MLflow SaaS Service Data Versioning¶
InfinStor MLflow includes the capability to version data stored in S3 buckets. When a MLflow run is started, the data version, a.k.a. InfinSnap is automatically recorded as a MLflow parameters. Thereafter, one can browse the state of the bucket at that point in time. This version snapshot is preserved forever.
This document describes the process for creating a bucket and configuring it for Data Versioning (InfinSnap). Note that the Data Versioning capabilities require that the bucket is freshly created in your AWS account using the Cloud Formation Template provided by InfinStor. Note also that a dynamodb database is created to store the snapshot metadata for this bucket
Data Versioning must be enabled after configuring your AWS account for use with InfinStor. That procedure is described here
Screencapture video of Data Versioning setup¶
Step 1: Create a Bucket for InfinSnap using the CFT we provide¶
In this step, you will create an S3 bucket for storing Machine Learning data that will be versioned and a dynamodb table for storing InfinSnap metadata.
- Login to the InfinStor Dashboard here(new tab) using the subscriber account, i.e. the InfinStor account that you created when you subscribed to the service from AWS Marketplace
- Click on Configuration and then click on Configure AWS
- In this page, there is a section titled InfinStor Bucket Creation
- Enter a name for the bucket you wish to create and press Start
As shown below, an informative flyout will indicate that a bucket and dynamodb table will be created in your AWS account
Click through the popup and you will be redirected to the AWS CloudFormation Console as shown below
Check the warning regarding creation of custom named IAM resources and press the Create Stack button. Wait for the stack to be created.
Step 2: Configure the bucket with InfinStor Dashboard¶
- Go back to the here(new tab) using the subscriber account, i.e. the InfinStor account that you created when you subscribed to the service from AWS Marketplace
- Click on Data on the left navigation bar
- Click on Add Bucket
- Enter the name of the bucket that you just created. The Cloud is AWS and other details can be left blank
Once the bucket has been added to InfinStor's buckets table, you will see a screen capture similar to the one below:
Step 3: Enable InfinSnap¶
Now, press the Enable button in the row for the bucket that was just created. When this succeeds, you have Data Versioning enabled in your system. Any data that you choose to store and read from this bucket will automatically versioned and a version parameter will be added to the MLflow run.
Browse Data Versions¶
You can now browse data versions using the Dashboard by clicking on the Browse button in the row for the bucket. Choose a specific time and view the objects in the bucket at that point in time.