Basic Concepts
Cluster
Section titled “Cluster”Clusters are used to run AI workloads, with each cluster corresponding to a Ray cluster. Each cluster consists of one or more nodes that execute workloads as containers.
Model Cache
Section titled “Model Cache”Model cache is used to cache model files from model registries at the cluster level, reducing access to model registries. When multiple inference endpoints deployed on the same cluster use the same model, model cache avoids redundant downloads and improves resource utilization efficiency.
Container Image Registry
Section titled “Container Image Registry”Container image registries provide the container images required for cluster deployment.