Basic concepts
Cluster
Section titled “Cluster”A cluster is used to run AI workloads, and each cluster corresponds to a Ray cluster. Each cluster consists of one or more nodes, and workloads run as containers on the nodes.
Model cache
Section titled “Model cache”Model cache is used to cache model files from the model registry within a cluster, reducing access to the model registry. When multiple inference service instances deployed in the same cluster use the same model, model cache avoids redundant downloads and improves resource efficiency.
Container registry
Section titled “Container registry”The container registry provides the container images required for cluster deployment.