Hyperscale ML with Kubeflow, MinIO, TensorFlow and Diamanti

Coauthored by Arnav Rustagi, Stanislav Bondarenko

Organizations are moving quickly to adopt artificial intelligence and machine learning (AI/ML) to improve their business. Oftentimes it is driven by a need to understand customer behavior and better serve them. Simultaneously, many organizations use AI/ML to analyze themselves to gain a competitive advantage, increase revenues, or reduce cost.

Even with all the recent advancements of AI/ML technology, data scientists are often limited by existing IT infrastructure and workflows. AI/ML pipelines need continuous data ingestion, faster and scalable model training, easy shareability of the ML model, and scalable deployment to the field. Traditional virtual and physical on-premises and public cloud infrastructures are not able to keep up with these requirements. Containers and Kubernetes bring new hope to the AI/ML world with the ability to eliminate many of the limitations in traditional infrastructure. With AI/ML frameworks like Kubeflow, the Kubernetes ecosystem allows data scientists to have a complete solution at their fingertips with very minimal complexity, making it very easy to share and actually utilize AI/ML for their business needs.

This post will walk through an easy example of running a hyperscale AI/ML workflow to showcase how data scientists can truly utilize the power of containers and Kubernetes. This very nice blog by MinIO inspires the actual ML pipeline, but we will utilize the power of Kubeflow and Kubernetes to train/serve the model. In the end, we will also serve the trained model using the Tensorflow serving API to show how easy it is to deploy the created models.

Let’s use a simple dataset, Large Movie Review Dataset from Stanford, to illustrate the Hyperscale AI/ML flow. A similar flow can be utilized when deploying real hyperscale models with larger datasets similar to ImageNet. However, in larger dataset scenarios, it is important to consider a few facts:

  • Data may not fit in memory, and storage might become the bottleneck. This means a high performing storage system is important to provide low latency access.
  • Large datasets may not fit in a local storage system or be too costly to store in a fast storage system. In that case, external object storage could be useful to save data, models, and even checkpoints.
  • Both fast storage systems and object stores need to be used wisely in the full ML pipeline to achieve the best performance.

Setting up Kubernetes

We will run the whole workflow on the Diamanti platform, an enterprise Kubernetes platform providing many built-in features to improve the AI/ML workflow, including support for GPUs and storage and networking acceleration. Diamanti’s platform especially helps to speed up the AI/ML flow for large data sets with its ultrafast storage. The steps below will also work with any other Kubernetes environment, although the Diamanti platform is ideal for its I/O capabilities.

Deploying MinIO Cluster

You can either use an existing MinIO cluster or deploy your own. We can deploy MinIO on the Diamanti cluster itself using the MinIO operator or using a Helm chart. For simplicity, let’s install a distributed MinIO using the Helm chart.

  • We used minio:RELEASE.2020-08-08T04-50-06Z for this tutorial.

Please note that Diamanti cluster allows you to choose your own cluster domain for the Kubernetes cluster if you have clusterDomain other than the default cluster.local, it needs to be specified in helm install cmd.

  • Check the status of the deployment.
  • Access UI
    • Ideally, MinIO needs to be deployed behind a load balancer to distribute the load, but in this example, we will use Diamanti Layer 2 networking to have direct access to one of the pods and its UI.
    • Let’s find the IP of any MinIO server pod and connect to it from your browser.
    • Enter <POD IP>:9000 into browser
    • Default username: minio, password: minio123
Deploying Kubeflow

The Kubeflow framework can be deployed in many ways. We will follow a straightforward deployment, but you can find various deployment scenarios on the Kubeflow site.

  • Download the Kubeflow CLI ( kfctl) from the Kubeflow releases page
  • Download the sample config YAML from kubeflow git
  • Set up and deploy Kubeflow using kfctl CLI and the above config file.
  • Kubeflow is a very complex deployment, and it will deploy many pods across many different namespaces.
  • Access Kubeflow UI
    • Kubeflow deploys an Istio service mesh and uses an Istio gateway to expose itself. You can find the endpoints by looking at service:
    • By default, the istio-ingress gateway has 5 backends. You can expose the svc to the outside world via nodeport or an external load balancer. For now, we will utilize Diamanti’s Layer 2 networking to directly access one of the gateways to connect to the UI. Enter one of the addresses from the list of endpoints above with no port on your browser.
    • The first time you log into Kubeflow, you will need to follow the prompts to set it up. You will need to specify a namespace to be used.
Figure 1: Leverage Diamanti’s Layer 2 networking to directly access one of the gateways to connect to the Kubeflow UI


Manage Jupyter Notebook Server
  • Navigate into Notebook Servers and select the namespace specified during startup
  • Click on “Create a new notebook server” under quick shortcuts.
    • Specify the name of the server, standard tensorflow2.1 cpu Docker image, 8 CPU and 16G memory, new standard workspace volume, no other specifications
Figure 2: Create a new notebook server
    • In this example, we are running training inside this notebook server itself. So it is important to choose the pod resource configuration carefully.
      • Select the number of CPU/GPU sufficient for your model
      • Select Memory sufficient for your model
      • As these models are trained on an extensive dataset and data is being processed on the server itself, it’s important to have high-speed storage attached for faster processing and mounted as your workspace. Diamanti’s storage system has the capability to provide more than 1M IOPS and with SLA guarantee volumes and can provide consistent throughput and latency even on shared infrastructure. We attached a Diamanti Volume to take advantage of Diamanti’s storage performance, but you can use your own CNI for the volume.
  • Once the notebook is deployed, click on Connect.
  • Under the Notebook server, create a new Python Notebook.
Setup TensorFlow Model with Jupyter Notebook

For the ML pipeline to prepare the data and train the model, we will follow this blog posted by MinIO to quickly demonstrate a hyperscale model. This example uses a Large Movie Review Dataset from Stanford of 50,000 entries to build a sentiment analysis model to categorize a movie review as either positive or negative. You can follow the original tutorial to understand the ML pipeline used, but there are a few things you need to be careful about, which we will explain later.

Figure 3: ML pipeline to prepare the data and train the model

Due to large data sets in hyperscale ML pipelines, we will utilize the MinIO Object store to read/upload data on-demand during the preprocessing, training, testing, and deployment stages. The preprocessed data is stored in a binary TFRecord object type that encodes the data in a TensorFlow-friendly way. The tf.data API efficiently loads data from MinIO during the training/validation stages. Also, during the training stage, this pipeline can save checkpoints directly to MinIO in case it gets interrupted, allowing for segmented training with new data, as well as save the trained model directly to MinIO.

  • Get Dataset
    • Download Large Movie Review Dataset from Stanford locally
    • Log on to MinIO UI, Click on +icon at the bottom right to create a new bucket called “datasets”
Figure 4: MinIO UI
    • Click on +icon to upload the tar file to the bucket named “Datasets.”
  • Configure Notebook
    • Switch to Jupyter Notebook Deployed earlier. The default image from Kubeflow may not be the most updated version of TensorFlow, so let’s install all desired dependencies by typing to notebook and run the cell:

    • Restart kernel by clicking on kernel -> restart to apply changes.
  • You can follow the original blog for a full pipeline code of pre-processing and training, but as our setup is a little different, please make sure to change the following:
    • In our setup, MinIO runs on the same cluster as the example, so we will use the minio service name to access it. Alternatively, you will need to specify the correct address for MinIO.
    • Let’s set a correct prefix for the list_objects command when getting the list of all preprocessed records from MinIO.
    • Following the rest of the original blog, you can create your pipeline for
      • Preprocessing
      • Training
      • Testing
Preprocess TensorFlow Model with Jupyter Notebook

Following the original blog, your pre-processing pipeline will flow like this:

  • Declare configurations for pipeline and importing necessary libraries.
  • Create/connect MinIO instance, download the dataset from MinIO using minio-py, and extract it to a temporary folder for preprocessing.
  • Read the data from four folders and split it into two lists. One for training and one for testing with pos/neg labels for each entry.
  • Shuffle the data to ensure that the model can learn from both positive and negative examples simultaneously.
  • Convert text to a vector representation that accurately depicts the sentence’s meanings using an existing embedding model called USE (Universal Sentence Encoder), which encodes sentences into single 1×512 vectors.
  • Slice dataset into chunks of 500 and encode the features as tf.train.Feature, so that data can be stored as TFRecord
  • Store the label of data as a list of tf.int64 and movie review as a list of floats since, after we encode the sentence using USE, we will end-up with an embedding of 512 dimensions.
  • Upload the final preprocessed TFRecords to the MinIO dataset bucket under a pre-processed folder.

Following the original blog, your pre-processing code should look like this:

In order to run this preprocessing, copy the code into a code block on Jupyter Notebook. Press the run command on the top toolbar to run the code block.

Train TensorFlow Model with Jupyter Notebook

Following the original blog, your pipeline to build and train the model will flow like this.

  • Fetch a list of the preprocessed training and testing files from MinIO
  • Create a tf.data.Dataset that loads records from MinIO as needed by creating two new lists that reference the actual location of objects in the MinIO bucket.
  • Once the records are loaded, split the training data into 90% for training and 10% for validation.
  • Create the tf.data datasets and a function that decodes TFRecords and reshapes them for the model.
  • Build a classification model using the Keras deep learning library. We will use dense layers and softmax activation to get probabilities of likelihoods at the end.
  • Prepare the dataset for the training stage by repeating and setting a batch size of 128.
  • Use Keras callback to store checkpoints of the model after every epoch to MinIO to resume training in the middle if it gets interrupted.
  • Finally, fit the model on the training data with 10 epochs.

Following the blog, your training code should look like this.

In order to run this training, copy the code into a code block on Jupyter Notebook. Press the run command on the top toolbar to run the code block.

Test/Run the Model

Once the model is trained, we can test the model using a testing dataset to see how the model can predict whether any example movie description is positive or negative. We should see around 86% accuracy after running the command. Please note this is a very simple example, and accuracy can be improved with better modeling.

In order to run this test, copy the code into a code block on Jupyter Notebook. Press the run command on the top toolbar to run the code block.


Serving the Model

Once the model is trained, it’s often painful for Data Scientists to serve the model correctly for actual usage. There are many ways to automate this. In this tutorial, we will showcase a simple way of automating the model serving with Kubernetes. It can still be further automated and enhanced using Kubeflow or a CI/CD pipeline for a full-fledged automated and scalable experience.

  • Let’s save the model to a separate folder within the MinIO bucket.
  • We will deploy the model using Kubernetes as a separate deployment using TensorFlow serving.
  • We will use TensorFlow serving container image v2.2 to serve the model and pass the model base path, which points to MinIO.
  • We will use the headless service, which takes advantage of Diamanti’s layer networking and serve the model on port 8500.

Copy the above code into a code block on Jupyter Notebook. Press the run command on the top toolbar to run the code block. This will start a new deployment on the Kubernetes cluster using the k8s_util library. Once the serving pod is up and running, the model will be available to be served using the Kubernetes service at the following REST endpoint:


Please note, if using Diamanti L2 networking, the served model can be accessed by its Pod IP.

Now the model is available to be served by any connected application. For a quick test, let’s use python code to access the model’s REST API for predict function on tf serving. We need to convert the embedded sample into the correct format before sending it as a JSON request.

The model will respond with JSON-formatted predictions back. We can analyze the JSON output to identify which of the samples provided were positive or negative, similar to how we tested locally before.


Data scientists need self-service workflows for AI/ML modeling tools, data, and compute resources that allow them to reproduce and share the ML modeling results across on-premises and public cloud environments. Containers and Kubernetes, in conjunction with AI/ML tools like Kubeflow, Kubernetes Operators, and CI/CD tools, work together to bring agility, flexibility, and portability to AI/ML workflows. Data scientists can ingest and prepare the data, train, test, and deploy ML models quickly for faster results and greater business impact.

Leveraging the right Kubernetes infrastructure can also improve outcomes and reduce costs. With its ready-to-use Kubernetes platform and underlying specialized hardware support (GPU, storage acceleration, network acceleration, etc.), Diamanti is the best platform to accelerate your AI/ML workflow without worrying about the complexities of Kubernetes and underlying infrastructure. It supports easy scaling of AI/ML workloads, allows for seamless application portability and reuse, and provides an intuitive portal for deploying and managing Kubernetes infrastructure. The Diamanti platform eliminates I/O bottlenecks with more than a million IOPS per node and very low disk latency of around 100 microseconds. This is an essential aspect of running I/O hungry applications like hyperscale AI/ML.