openFuyao: Open Source Kubernetes Platform

What is openFuyao

openFuyao is an open source community and software ecosystem focused on diversified computing cluster infrastructure. Built on Kubernetes, it provides optimized scheduling, resource management, and acceleration capabilities for clusters running heterogeneous hardware — CPUs, GPUs, NPUs, and other accelerators at scale.

The project targets a specific problem: running Kubernetes at 10,000+ nodes with mixed compute hardware while maintaining performance, resource efficiency, and operational sanity. Standard Kubernetes was not designed for this. openFuyao extends it with the scheduling intelligence, hardware awareness, and AI-specific optimizations that ultra-large clusters demand.

Think of it as a Kubernetes distribution purpose-built for high-performance computing and AI workloads — similar in ambition to what OpenShift does for enterprise application platforms, but focused specifically on cluster computing at massive scale.

The architecture

openFuyao organizes its stack into clear layers, each solving a distinct infrastructure challenge:

Container platform release

The foundation is an optimized Kubernetes deployment with:

Container orchestration engine — Kubernetes with performance optimizations (30% lower memory usage for control plane components, 1,000+ pods per node support)
Container runtime, network runtime, storage runtime — pluggable infrastructure
OS support — openEuler, SUSE, Ubuntu, and other Linux distributions
Multi-architecture — x86, ARM, CPU, NPU, GPU

Scheduling components

This is where openFuyao differentiates itself from vanilla Kubernetes:

NUMA-aware scheduling — intelligently places pods based on Non-Uniform Memory Access topology, improving application performance by up to 20%
Ultra-large cluster scheduling — high-performance scheduling for clusters exceeding 10,000 nodes with task-based batch pod creation
Colocation scheduling — mixes online (latency-sensitive) and offline (batch) workloads on the same nodes, improving cluster CPU usage by 30% and memory usage by 10%
Distributed job scheduling — AI-optimized job scheduling that improves TPS by 15-20% in typical inference scenarios

Compute enablement

The hardware abstraction layer that makes heterogeneous compute accessible to Kubernetes:

NPU Operator — manages Huawei Ascend Neural Processing Units as Kubernetes resources
KAE Operator — integrates Kunpeng Acceleration Engine hardware
Node feature discovery — automatic detection and labeling of hardware capabilities
NPU software-based allocation — fractional NPU sharing (similar to GPU MIG/MPS/time-slicing but for NPUs)
Resource pool management — xPU resources become available within seconds

Application suites

Pre-built stacks for common workloads:

AI inference suite — with KVCache support, intelligent routing, and cache hit optimization to reduce time-to-first-token
Big data suite — optimized for data processing workloads
Scenario acceleration suite — workload-specific performance enhancements

Why platform engineers should pay attention

NUMA-aware scheduling is a game changer

Standard Kubernetes scheduling treats a node as a flat pool of CPU and memory. In reality, modern multi-socket servers have Non-Uniform Memory Access topology — accessing memory on a remote NUMA node is significantly slower than local access.

For latency-sensitive workloads (payment systems, real-time inference, databases), NUMA-unaware scheduling silently kills performance. openFuyao’s NUMA-aware scheduler understands the physical topology and places pods to minimize cross-NUMA access.

The claimed 20% performance improvement for payment applications is consistent with what I have seen in production. NUMA misalignment is one of the most common hidden cost drivers in Kubernetes clusters — you pay for the hardware but lose performance to poor placement.

Ultra-large cluster support

Vanilla Kubernetes officially supports clusters up to 5,000 nodes. Beyond that, the scheduler becomes a bottleneck, etcd struggles with the object count, and API server latency degrades.

openFuyao claims high-performance scheduling for 10,000+ node clusters with task-based batch pod creation. This is relevant for:

Telecom operators running edge and core infrastructure on a single platform
Large-scale AI training farms with thousands of GPU/NPU nodes
Cloud providers managing multi-tenant infrastructure

If you are hitting Kubernetes scaling limits, openFuyao’s scheduler optimizations are worth studying — even if you do not adopt the full platform.

Colocation scheduling saves real money

Running separate clusters for online services (latency-sensitive) and offline jobs (batch processing) wastes resources. Online clusters are over-provisioned for peak load, sitting at 20-30% utilization most of the time. Offline clusters are sized for batch windows.

openFuyao’s colocation scheduler mixes both workload types on the same nodes with quality-of-service guarantees. The result: 30% better CPU utilization, 10% better memory utilization — without degrading online service SLOs.

This is the same pattern that Google’s Borg pioneered and that the Kubernetes community has been working toward with priority classes and preemption. openFuyao packages it as a production-ready feature.

AI inference acceleration

The AI inference suite includes:

KVCache optimization — caching key-value pairs across inference requests to reduce redundant computation
Intelligent routing — directing requests to the most appropriate backend based on model, cache state, and load
Cache hit strategy optimization — improving TTFT (time-to-first-token) by maximizing cache reuse

These are the exact optimizations that production inference platforms need but that most teams build from scratch. Having them as part of the cluster platform reduces the PoC-to-production gap.

How openFuyao compares

Capability	openFuyao	Vanilla K8s	OpenShift	Rancher
NUMA-aware scheduling	Built-in	Manual (topology manager)	Limited	No
10,000+ node clusters	Optimized	5,000 node limit	2,000 recommended	Varies
Colocation scheduling	Built-in	DIY with priorities	No	No
NPU operator	Built-in	N/A	N/A	N/A
AI inference acceleration	Built-in suite	N/A	RHOAI separate	N/A
Multi-arch (ARM, RISC-V)	First-class	Supported	Supported	Supported

The strongest differentiators are the scheduling optimizations and the NPU/heterogeneous hardware support. These are capabilities that other platforms either do not offer or require significant custom engineering to achieve.

The ecosystem connection

openFuyao runs on openEuler — the Huawei-originated, OpenAtom Foundation-governed enterprise Linux distribution. Together they form a full-stack open source infrastructure:

openEuler provides the OS layer with multi-architecture support and multi-kernel capabilities
openFuyao provides the cluster orchestration layer with scheduling intelligence and hardware enablement

This is analogous to the Red Hat stack (RHEL + OpenShift) or the SUSE stack (SLE + Rancher) — but optimized for heterogeneous, large-scale computing with AI as a first-class workload.

Adoption is concentrated in China, with China Unicom Cloud as a notable production user for colocation workloads.

What to watch

openFuyao is still early in its international visibility, but the technical capabilities are substantial:

NUMA-aware scheduling becoming standard in Kubernetes (KEPs are in progress, but openFuyao ships it today)
NPU operator patterns that may influence how the broader Kubernetes community handles non-GPU accelerators
Colocation scheduling as a reference implementation for mixed-workload clusters
AI inference optimizations (KVCache, routing) that are increasingly relevant as inference costs dominate AI budgets

Whether or not you deploy openFuyao, the architectural patterns it implements — NUMA-aware placement, heterogeneous accelerator management, colocation scheduling — are the direction Kubernetes is heading for AI-scale infrastructure.

Getting started

# Visit the official site for downloads and documentation
# https://www.openfuyao.cn/en/

# The platform supports multiple OS bases
# openEuler (recommended), SUSE, Ubuntu

# Key components to explore:
# - Container platform release (optimized Kubernetes)
# - NUMA-aware scheduler
# - NPU Operator / KAE Operator
# - AI inference suite

The documentation is available in Chinese and English. For teams evaluating the platform, start with the container platform release and the scheduling components — those deliver the most immediate value.