Skip to content

Basic Concepts

Clusters are used to run AI workloads, with each cluster corresponding to a Ray cluster. Each cluster consists of one or more nodes that execute workloads as containers.

Model cache is used to cache model files from model registries at the cluster level, reducing access to model registries. When multiple inference endpoints deployed on the same cluster use the same model, model cache avoids redundant downloads and improves resource utilization efficiency.

Container image registries provide the container images required for cluster deployment.