Scaling AI Workloads on OpenShift: Techniques and Best Practices

Table of Contents

AI/ML OCP Tooling - This article is part of a series.

Part 1: Exploring AI/ML Tools on OpenShift

Part 2: This Article

Part 3: Leveraging ODF for AI Workloads on OpenShift

Part 4: Deploying and Running Argo Workflows for AI/ML on OpenShift

Part 5: Leveraging Tekton for AI Workloads on OpenShift

Part 6: Securing AI Workloads on OpenShift with Native Tooling

Synopsis #

OpenShift, built on Kubernetes, offers robust capabilities for scaling AI workloads, ensuring efficient resource utilization and consistent performance. In this post, we’ll explore various OpenShift tools and techniques for scaling AI workloads, including Kubernetes features and AI/ML tools like Open Data Hub (ODH) and Kubeflow. We’ll provide detailed technical configurations to help you optimize your AI workloads on OpenShift.

Read our post on ODH and Kubeflow for more information.

Introduction #

Scaling AI workloads on OpenShift requires a combination of Kubernetes-native capabilities and specialized tools for AI/ML workloads. Let’s discuss several examples and the technical configurations required for each.

1. Horizontal Pod Autoscaling (HPA) #

HPA allows you to automatically scale your application based on CPU or memory utilization, which is especially useful for AI workloads that might have fluctuating resource requirements.

To enable HPA, create a HorizontalPodAutoscaler object for your AI workload:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: ai-workload-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-workload
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 80

This configuration sets the target CPU utilization to 80%, with a minimum of one replica and a maximum of five replicas. Kubernetes will automatically scale the number of replicas based on the CPU utilization.

2. Vertical Pod Autoscaling (VPA) #

VPA adjusts the CPU and memory resources allocated to individual pods based on their actual usage, which is beneficial for AI workloads with changing resource requirements over time.

To enable VPA, first, install the Vertical Pod Autoscaler Operator from the OperatorHub. Then, create a VerticalPodAutoscaler object for your AI workload:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: ai-workload-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-workload
  updatePolicy:
    updateMode: Auto

This configuration will automatically adjust the CPU and memory resources of your AI workload’s pods based on their usage.

3. Open Data Hub (ODH) and Kubeflow #

ODH and Kubeflow offer various capabilities to help you scale AI workloads on OpenShift. Here are a couple of examples:

a. Model Training with Distributed TensorFlow Jobs #

Kubeflow allows you to run distributed TensorFlow training jobs using its TFJob custom resource. A distributed training job can scale horizontally to utilize multiple nodes or GPUs, significantly reducing training time.

apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
  name: "distributed-tensorflow-job"
spec:
  tfReplicaSpecs:
    Worker:
      replicas: 4
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - name: tensorflow
              image: <your-tensorflow-image>
              resources:
                limits:
                  nvidia.com/gpu: 1

This configuration

creates a distributed TensorFlow training job with four workers, each with one GPU.

b. Serving Models with Seldon #

Seldon helps you serve your trained models and automatically scales the model serving pods based on the incoming request load. Here’s an example configuration:

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: seldon-model
spec:
  predictors:
    - componentSpecs:
      - spec:
          containers:
            - image: <your-model-image>
              name: model
              resources:
                requests:
                  cpu: "0.5"
      graph:
        children: []
        endpoint:
          type: REST
        name: model
        type: MODEL
      name: default
      replicas: 1
      hpaSpec:
        minReplicas: 1
        maxReplicas: 5
        targetCPUUtilizationPercentage: 80

This configuration serves your model with Seldon, and uses HPA to automatically scale the serving pods based on CPU utilization.

Conclusion #

OpenShift offers a plethora of options to scale your AI workloads efficiently. From Kubernetes-native capabilities like HPA and VPA to AI/ML-specific tools like ODH and Kubeflow, you can optimize resource utilization and performance for your AI workloads on OpenShift.

References #