Skip to content
- Supports configuring external endpoints.
- Supports proxying external APIs (such as OpenAI) through the AI gateway with unified authentication and usage statistics.
- Compatible with the Anthropic protocol.
- Adds the following features to endpoints:
- Adds a replica selector to the monitoring page for viewing metrics by replica.
- Adds auto-refresh for endpoint logs.
- Compatible with the Anthropic protocol.
- Static node clusters support running different engine versions simultaneously.
- Supports online cluster version upgrade for both static node clusters and Kubernetes clusters without recreating clusters.
- Supports login with username in addition to email.
- Supports NFS model cache integrity verification to ensure model data consistency.
- Supports search and batch deletion functions on the resource list page.
- Supports importing engine version metadata from a standalone
manifest.yaml file without downloading the full engine image.
- Adds a quick-start wizard that guides new users through core features on first login.
- Adds the following parameter and commands to the CLI tool:
- Adds the
--registry-project parameter to specify the image registry project name.
- Adds
apply, get, wait, delete, and cleanup commands for declarative resource management.
- Adds the
engine remove-version command to delete custom engine versions.
- Adds vLLM engine versions v0.11.2 and v0.17.1 as built-in options.
- Upgrades Ray to v2.53.0 for static node clusters.
- The cluster status with spec hash comparison now includes
Updating, Upgrading, and Deleting for accurate cluster status determination.
- Enhances Grafana dashboard theme styling with custom CSS injection.
- Displays token usage in compact notation (K/M/B).
- The endpoint list is sorted by running status by default.
- Supports dynamic browser tab icon updates when customizing the platform appearance.
- Supports automatic selection of dependent permissions when assigning permissions to roles.
- Restricts PostgREST anonymous role permissions and optimizes container security configurations to enhance overall system security.
- Reduces Ray Object Store memory from 30% to 10%.
- Switches GPU detection from
nvidia-smi to lspci to avoid driver loading race conditions.
- When using the vLLM engine with multiple accelerators configured, the system automatically sets the engine variable
tensor_parallel_size to the number of accelerators, eliminating the need for manual configuration.
- Optimizes the file descriptor limit (ulimit nofile) configuration for control plane containers to improve system stability.
- The handling of JSON values in engine_args for Kubernetes and SSH/Ray paths was inconsistent. The issue has been resolved in this release.
- The race condition in Ray Serve concurrent deployment has been resolved in this release.
- Pod-level labels caused duplication of DCGM metrics. The issue has been resolved in this release.
- The GGUF model file in subdirectories was not discovered. The issue has been resolved in this release.
- The file filter was applied to non-GGUF models in downloaders. The issue has been resolved in this release.
- Model name and version were not correctly updated during push. The issue has been resolved in this release.
- The endpoint unhealthy detection was inaccurate. The issue has been resolved in this release.
- The image registry URL caused an exception when the URL included a scheme prefix. The issue has been resolved in this release.
- Accelerator data format validation was missing during endpoint import. The issue has been resolved in this release.
- Repeated image extraction occurred when uploading images with the CLI tool. The issue has been resolved in this release.
- PostgreSQL pods were recreated during minor version upgrades. The issue has been resolved in this release.
- The SSH cluster node recovery status was not written correctly. The issue has been resolved in this release.
- A nil map panic occurred when DeploymentOptions was unset. The issue has been resolved in this release.
- The form retained the previous template configuration after switching the model catalog template for an endpoint. The issue has been resolved in this release.
- Worker nodes could not be added when editing a static node cluster. The issue has been resolved in this release.
- The accelerator count was displayed as a negative number. The issue has been resolved in this release.
- The workspace filter criteria were unexpectedly changed when editing resources. The issue has been resolved in this release.
- Available resources were displayed as exceeding the total amount. The issue has been resolved in this release.
- Automatic recovery was not triggered when the Raylet process on the head node of a static node cluster exited while the dashboard remained accessible. The issue has been resolved in this release.
- Temporary files overwrote each other when the CLI tool import command was run concurrently. The issue has been resolved in this release.
- Endpoints failed to start because the engine could not recognize engine variables. The issue has been resolved in this release.
- After deleting an endpoint in a Kubernetes cluster, a re-created endpoint with the same name was skipped. The issue has been resolved in this release.