Basic concepts

Cluster

A cluster is used to run AI workloads, and each cluster corresponds to a Ray cluster. Each cluster consists of one or more nodes, and workloads run as containers on the nodes.

Model cache

Model cache is used to cache model files from the model registry within a cluster, reducing access to the model registry. When multiple inference service instances deployed in the same cluster use the same model, model cache avoids redundant downloads and improves resource efficiency.

Container registry

The container registry provides the container images required for cluster deployment.