Skip to main content

From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice Review

Author: HAMi Community
Published: 3/23/2026

KCD Beijing 2026 was one of the largest Kubernetes community events in recent years.

Over 1,000 people registered, setting a new record for KCD Beijing.

The HAMi community not only gave a technical talk but also set up a booth, engaging deeply with developers and enterprise users from the cloud-native and AI infrastructure fields.

The topic of this talk was:

From Device Plugin to DRA: GPU Scheduling Paradigm Upgrade and HAMi-DRA Practice

This article combines the on-site presentation and slides for a more complete technical review. Slides download: GitHub - HAMi-DRA KCD Beijing 2026.

HAMi Community at the Eventโ€‹

The talk was delivered by two core contributors of the HAMi community:

  • Wang Jifei (Dynamia, HAMi Approver, main HAMi-DRA contributor)
  • James Deng (Fourth Paradigm, HAMi Reviewer)

They have long focused on:

  • GPU scheduling and virtualization
  • Kubernetes resource models
  • Heterogeneous compute management

At the booth, the HAMi community discussed with attendees questions such as:

  • Is Kubernetes really suitable for AI workloads?
  • Should GPUs be treated as "scheduling resources" rather than "devices"?
  • How to introduce DRA without breaking the ecosystem?

Event Recapโ€‹

Main conference hall

Attendee registration

Attendees visiting the HAMi booth

Volunteers stamping for attendees

Wang Jifei presenting

James Deng presenting

GPU Scheduling Paradigm is Changingโ€‹

The core of this talk is not just DRA itself, but a bigger shift:

GPUs are evolving from "devices" to "resource objects".

1. The Ceiling of Device Pluginโ€‹

The problem with the traditional model is its limited expressiveness:

  • Can only describe "quantity" (nvidia.com/gpu: 1)
  • Cannot express:
    • Multi-dimensional resources (memory / core / slice)
    • Multi-card combinations
    • Topology (NUMA / NVLink)

๐Ÿ‘‰ This directly leads to:

  • Scheduling logic leakage (extender / sidecar)
  • Increased system complexity
  • Limited concurrency

2. DRA: Leap in Resource Modelingโ€‹

DRA's core advantages are:

  • Multi-dimensional resource modeling
  • Complete device lifecycle management
  • Fine-grained resource allocation

Key change:

Resource requests move from Pod fields โ†’ independent ResourceClaim objects

Key Reality: DRA is Too Complexโ€‹

A key slide in the PPT, often overlooked:

๐Ÿ‘‰ DRA request looks like thisโ€‹

spec:
devices:
requests:
- exactly:
allocationMode: ExactCount
capacity:
requests:
memory: 4194304k
count: 1

And you also need to write a CEL selector:

device.attributes["gpu.hami.io"].type == "hami-gpu"

Compared to Device Pluginโ€‹

resources:
limits:
nvidia.com/gpu: 1

๐Ÿ‘‰ The conclusion is clear:

DRA is an upgrade in capability, but UX is clearly degraded.

HAMi-DRA's Key Breakthrough: Automationโ€‹

One of the most valuable parts of this talk:

๐Ÿ‘‰ Webhook Automatically Generates ResourceClaimโ€‹

HAMi's approach is not to have users "use DRA directly", but:

Let users keep using Device Plugin, and the system automatically converts to DRA

How it worksโ€‹

Input (user):

nvidia.com/gpu: 1
nvidia.com/gpumemory: 4000

โ†“

Webhook conversion:

  • Generate ResourceClaim
  • Build CEL selector
  • Inject device constraints (UUID / GPU type)

โ†“

Output (system internal):

  • Standard DRA objects
  • Schedulable resource expression

Core valueโ€‹

Turn DRA from an "expert interface" into an interface ordinary users can use.

DRA Driver: Real Implementation Complexityโ€‹

DRA driver is not just "registering resources", but full lifecycle management:

Three core interfacesโ€‹

  • Publish Resources
  • Prepare Resources
  • Unprepare Resources

Real challengesโ€‹

  • libvgpu.so injection
  • ld.so.preload
  • Environment variable management
  • Temporary directories (cache / lock)

๐Ÿ‘‰ This means:

GPU scheduling has entered the runtime orchestration layer, not just simple resource allocation.

Performance Comparison: DRA is Not Just "More Elegant"โ€‹

A key benchmark from the PPT:

Pod creation time comparisonโ€‹

  • HAMi (traditional): up to ~42,000
  • HAMi-DRA: significantly reduced (~30%+ improvement)

๐Ÿ‘‰ This shows:

DRA's resource pre-binding mechanism can reduce scheduling conflicts and retries

Observability Paradigm Shiftโ€‹

An underestimated change:

Traditional modelโ€‹

  • Resource info: from Node
  • Usage: from Pod
  • โ†’ Needs aggregation, inference

DRA modelโ€‹

  • ResourceSlice: device inventory
  • ResourceClaim: resource allocation
  • โ†’ Resource perspective is first-class

๐Ÿ‘‰ The change:

Observability shifts from "inference" to "direct modeling"

Unified Modeling for Heterogeneous Devicesโ€‹

A key future direction from the PPT:

If device attributes are standardized, a vendor-agnostic scheduling model is possible

For example:

  • PCIe root
  • PCI bus ID
  • GPU attributes

๐Ÿ‘‰ This is a bigger narrative:

DRA is the starting point for heterogeneous compute abstraction

Bigger Trend: Kubernetes is Becoming the AI Control Planeโ€‹

Connecting these points reveals a bigger trend:

1. Node โ†’ Resourceโ€‹

  • From "scheduling machines"
  • To "scheduling resource objects"

2. Device โ†’ Virtual Resourceโ€‹

  • GPU is no longer just a card
  • But a divisible, composable resource

3. Imperative โ†’ Declarativeโ€‹

  • Scheduling logic โ†’ resource declaration

๐Ÿ‘‰ Essentially:

Kubernetes is evolving into the AI Infra Control Plane

HAMi's Positionโ€‹

HAMi's positioning is becoming clearer:

GPU Resource Layer on Kubernetes

  • Downward: adapts to heterogeneous GPUs
  • Upward: supports AI workloads (training / inference / Agent)
  • Middle: scheduling + virtualization + abstraction

HAMi-DRA:

is the key step aligning this resource layer with Kubernetes native models

Community Significanceโ€‹

Another important point from this talk:

  • Contributors from different companies collaborated
  • Validated in real production environments
  • Shared experience through the community

This is the way HAMi has always insisted on:

Promoting AI infrastructure through community, not closed systems

Summaryโ€‹

The real value of this talk is not just introducing DRA, but answering a key question:

How to turn a "correct but hard to use" model into a system you can use today?

HAMi-DRA's answer:

  • Don't change user habits
  • Absorb DRA capabilities
  • Handle complexity internally
CNCFHAMi is a CNCF Sandbox project