Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
openFuyao open source Kubernetes cluster computing platform for AI and heterogeneous hardware
Platform Engineering

openFuyao: Open Source Kubernetes Platform

openFuyao is an open source Kubernetes cluster ecosystem with NUMA-aware scheduling, NPU operators, and AI inference acceleration for 10,000+ nodes.

LB
Luca Berton
Β· 6 min read

What is openFuyao

openFuyao is an open source community and software ecosystem focused on diversified computing cluster infrastructure. Built on Kubernetes, it provides optimized scheduling, resource management, and acceleration capabilities for clusters running heterogeneous hardware β€” CPUs, GPUs, NPUs, and other accelerators at scale.

The project targets a specific problem: running Kubernetes at 10,000+ nodes with mixed compute hardware while maintaining performance, resource efficiency, and operational sanity. Standard Kubernetes was not designed for this. openFuyao extends it with the scheduling intelligence, hardware awareness, and AI-specific optimizations that ultra-large clusters demand.

Think of it as a Kubernetes distribution purpose-built for high-performance computing and AI workloads β€” similar in ambition to what OpenShift does for enterprise application platforms, but focused specifically on cluster computing at massive scale.

The architecture

openFuyao organizes its stack into clear layers, each solving a distinct infrastructure challenge:

Container platform release

The foundation is an optimized Kubernetes deployment with:

  • Container orchestration engine β€” Kubernetes with performance optimizations (30% lower memory usage for control plane components, 1,000+ pods per node support)
  • Container runtime, network runtime, storage runtime β€” pluggable infrastructure
  • OS support β€” openEuler, SUSE, Ubuntu, and other Linux distributions
  • Multi-architecture β€” x86, ARM, CPU, NPU, GPU

Scheduling components

This is where openFuyao differentiates itself from vanilla Kubernetes:

  • NUMA-aware scheduling β€” intelligently places pods based on Non-Uniform Memory Access topology, improving application performance by up to 20%
  • Ultra-large cluster scheduling β€” high-performance scheduling for clusters exceeding 10,000 nodes with task-based batch pod creation
  • Colocation scheduling β€” mixes online (latency-sensitive) and offline (batch) workloads on the same nodes, improving cluster CPU usage by 30% and memory usage by 10%
  • Distributed job scheduling β€” AI-optimized job scheduling that improves TPS by 15-20% in typical inference scenarios

Compute enablement

The hardware abstraction layer that makes heterogeneous compute accessible to Kubernetes:

  • NPU Operator β€” manages Huawei Ascend Neural Processing Units as Kubernetes resources
  • KAE Operator β€” integrates Kunpeng Acceleration Engine hardware
  • Node feature discovery β€” automatic detection and labeling of hardware capabilities
  • NPU software-based allocation β€” fractional NPU sharing (similar to GPU MIG/MPS/time-slicing but for NPUs)
  • Resource pool management β€” xPU resources become available within seconds

Application suites

Pre-built stacks for common workloads:

  • AI inference suite β€” with KVCache support, intelligent routing, and cache hit optimization to reduce time-to-first-token
  • Big data suite β€” optimized for data processing workloads
  • Scenario acceleration suite β€” workload-specific performance enhancements

Why platform engineers should pay attention

NUMA-aware scheduling is a game changer

Standard Kubernetes scheduling treats a node as a flat pool of CPU and memory. In reality, modern multi-socket servers have Non-Uniform Memory Access topology β€” accessing memory on a remote NUMA node is significantly slower than local access.

For latency-sensitive workloads (payment systems, real-time inference, databases), NUMA-unaware scheduling silently kills performance. openFuyao’s NUMA-aware scheduler understands the physical topology and places pods to minimize cross-NUMA access.

The claimed 20% performance improvement for payment applications is consistent with what I have seen in production. NUMA misalignment is one of the most common hidden cost drivers in Kubernetes clusters β€” you pay for the hardware but lose performance to poor placement.

Ultra-large cluster support

Vanilla Kubernetes officially supports clusters up to 5,000 nodes. Beyond that, the scheduler becomes a bottleneck, etcd struggles with the object count, and API server latency degrades.

openFuyao claims high-performance scheduling for 10,000+ node clusters with task-based batch pod creation. This is relevant for:

  • Telecom operators running edge and core infrastructure on a single platform
  • Large-scale AI training farms with thousands of GPU/NPU nodes
  • Cloud providers managing multi-tenant infrastructure

If you are hitting Kubernetes scaling limits, openFuyao’s scheduler optimizations are worth studying β€” even if you do not adopt the full platform.

Colocation scheduling saves real money

Running separate clusters for online services (latency-sensitive) and offline jobs (batch processing) wastes resources. Online clusters are over-provisioned for peak load, sitting at 20-30% utilization most of the time. Offline clusters are sized for batch windows.

openFuyao’s colocation scheduler mixes both workload types on the same nodes with quality-of-service guarantees. The result: 30% better CPU utilization, 10% better memory utilization β€” without degrading online service SLOs.

This is the same pattern that Google’s Borg pioneered and that the Kubernetes community has been working toward with priority classes and preemption. openFuyao packages it as a production-ready feature.

AI inference acceleration

The AI inference suite includes:

  • KVCache optimization β€” caching key-value pairs across inference requests to reduce redundant computation
  • Intelligent routing β€” directing requests to the most appropriate backend based on model, cache state, and load
  • Cache hit strategy optimization β€” improving TTFT (time-to-first-token) by maximizing cache reuse

These are the exact optimizations that production inference platforms need but that most teams build from scratch. Having them as part of the cluster platform reduces the PoC-to-production gap.

How openFuyao compares

CapabilityopenFuyaoVanilla K8sOpenShiftRancher
NUMA-aware schedulingBuilt-inManual (topology manager)LimitedNo
10,000+ node clustersOptimized5,000 node limit2,000 recommendedVaries
Colocation schedulingBuilt-inDIY with prioritiesNoNo
NPU operatorBuilt-inN/AN/AN/A
AI inference accelerationBuilt-in suiteN/ARHOAI separateN/A
Multi-arch (ARM, RISC-V)First-classSupportedSupportedSupported

The strongest differentiators are the scheduling optimizations and the NPU/heterogeneous hardware support. These are capabilities that other platforms either do not offer or require significant custom engineering to achieve.

The ecosystem connection

openFuyao runs on openEuler β€” the Huawei-originated, OpenAtom Foundation-governed enterprise Linux distribution. Together they form a full-stack open source infrastructure:

  • openEuler provides the OS layer with multi-architecture support and multi-kernel capabilities
  • openFuyao provides the cluster orchestration layer with scheduling intelligence and hardware enablement

This is analogous to the Red Hat stack (RHEL + OpenShift) or the SUSE stack (SLE + Rancher) β€” but optimized for heterogeneous, large-scale computing with AI as a first-class workload.

Adoption is concentrated in China, with China Unicom Cloud as a notable production user for colocation workloads.

What to watch

openFuyao is still early in its international visibility, but the technical capabilities are substantial:

  1. NUMA-aware scheduling becoming standard in Kubernetes (KEPs are in progress, but openFuyao ships it today)
  2. NPU operator patterns that may influence how the broader Kubernetes community handles non-GPU accelerators
  3. Colocation scheduling as a reference implementation for mixed-workload clusters
  4. AI inference optimizations (KVCache, routing) that are increasingly relevant as inference costs dominate AI budgets

Whether or not you deploy openFuyao, the architectural patterns it implements β€” NUMA-aware placement, heterogeneous accelerator management, colocation scheduling β€” are the direction Kubernetes is heading for AI-scale infrastructure.

Getting started

# Visit the official site for downloads and documentation
# https://www.openfuyao.cn/en/

# The platform supports multiple OS bases
# openEuler (recommended), SUSE, Ubuntu

# Key components to explore:
# - Container platform release (optimized Kubernetes)
# - NUMA-aware scheduler
# - NPU Operator / KAE Operator
# - AI inference suite

The documentation is available in Chinese and English. For teams evaluating the platform, start with the container platform release and the scheduling components β€” those deliver the most immediate value.


Related: openEuler: Enterprise Linux You Have Not Heard Of, GPU Sharing on Kubernetes: MIG, MPS, Time-Slicing, Multi-Tenant GPU Platform Operating Model. Need help with cluster computing strategy? Book a consultation.

Free 30-min AI & Cloud consultation

Book Now