What's in this release

What’s new

Supports configuring external endpoints.
- Supports proxying external APIs (such as OpenAI) through the AI gateway with unified authentication and usage statistics.
- Compatible with the Anthropic protocol.
Adds the following features to endpoints:
- Adds a replica selector to the monitoring page for viewing metrics by replica.
- Adds auto-refresh for endpoint logs.
- Compatible with the Anthropic protocol.
Static node clusters support running different engine versions simultaneously.
Supports online cluster version upgrade for both static node clusters and Kubernetes clusters without recreating clusters.
Supports login with username in addition to email.
Supports NFS model cache integrity verification to ensure model data consistency.
Supports search and batch deletion functions on the resource list page.
Supports importing engine version metadata from a standalone manifest.yaml file without downloading the full engine image.
Adds a quick-start wizard that guides new users through core features on first login.
Adds the following parameter and commands to the CLI tool:
- Adds the --registry-project parameter to specify the image registry project name.
- Adds apply, get, wait, delete, and cleanup commands for declarative resource management.
- Adds the engine remove-version command to delete custom engine versions.

Improvements

Adds vLLM engine versions v0.11.2 and v0.17.1 as built-in options.
Upgrades Ray to v2.53.0 for static node clusters.
The cluster status with spec hash comparison now includes Updating, Upgrading, and Deleting for accurate cluster status determination.
Enhances Grafana dashboard theme styling with custom CSS injection.
Displays token usage in compact notation (K/M/B).
The endpoint list is sorted by running status by default.
Supports dynamic browser tab icon updates when customizing the platform appearance.
Supports automatic selection of dependent permissions when assigning permissions to roles.
Restricts PostgREST anonymous role permissions and optimizes container security configurations to enhance overall system security.
Reduces Ray Object Store memory from 30% to 10%.
Switches GPU detection from nvidia-smi to lspci to avoid driver loading race conditions.
When using the vLLM engine with multiple accelerators configured, the system automatically sets the engine variable tensor_parallel_size to the number of accelerators, eliminating the need for manual configuration.
Optimizes the file descriptor limit (ulimit nofile) configuration for control plane containers to improve system stability.

Resolved issues

The handling of JSON values in engine_args for Kubernetes and SSH/Ray paths was inconsistent. The issue has been resolved in this release.
The race condition in Ray Serve concurrent deployment has been resolved in this release.
Pod-level labels caused duplication of DCGM metrics. The issue has been resolved in this release.
The GGUF model file in subdirectories was not discovered. The issue has been resolved in this release.
The file filter was applied to non-GGUF models in downloaders. The issue has been resolved in this release.
Model name and version were not correctly updated during push. The issue has been resolved in this release.
The endpoint unhealthy detection was inaccurate. The issue has been resolved in this release.
The image registry URL caused an exception when the URL included a scheme prefix. The issue has been resolved in this release.
Accelerator data format validation was missing during endpoint import. The issue has been resolved in this release.
Repeated image extraction occurred when uploading images with the CLI tool. The issue has been resolved in this release.
PostgreSQL pods were recreated during minor version upgrades. The issue has been resolved in this release.
The SSH cluster node recovery status was not written correctly. The issue has been resolved in this release.
A nil map panic occurred when DeploymentOptions was unset. The issue has been resolved in this release.
The form retained the previous template configuration after switching the model catalog template for an endpoint. The issue has been resolved in this release.
Worker nodes could not be added when editing a static node cluster. The issue has been resolved in this release.
The accelerator count was displayed as a negative number. The issue has been resolved in this release.
The workspace filter criteria were unexpectedly changed when editing resources. The issue has been resolved in this release.
Available resources were displayed as exceeding the total amount. The issue has been resolved in this release.
Automatic recovery was not triggered when the Raylet process on the head node of a static node cluster exited while the dashboard remained accessible. The issue has been resolved in this release.
Temporary files overwrote each other when the CLI tool import command was run concurrently. The issue has been resolved in this release.
Endpoints failed to start because the engine could not recognize engine variables. The issue has been resolved in this release.
After deleting an endpoint in a Kubernetes cluster, a re-created endpoint with the same name was skipped. The issue has been resolved in this release.