Managing Inference Engines
Neutree provides built-in inference engines. Users cannot create custom inference engines from scratch or delete existing engines. However, for Kubernetes clusters, you can add new versions of the built-in inference engines.
View Inference Engines
Section titled “View Inference Engines”Log in to the Neutree management interface, click Inference Engines in the left sidebar, and the inference engine list on the right will display all inference engines built into the platform. Click on an inference engine name to view details, including supported task types and parameters.
Currently supported inference engines by default:
| Name | Version | Description |
|---|---|---|
| vllm | v0.8.5 | vLLM community v0.8.5 release. Static node clusters use this version by default. |
| vllm | v0.11.2 | vLLM community v0.11.2 release. Kubernetes clusters use this version by default. |
| llama-cpp | v0.3.7 | Llama-cpp python high-level implementation (Llama-cpp commit: 794fe23f29fb40104975c91fe19f23798f7c726e). |
Add Inference Engine Version
Section titled “Add Inference Engine Version”Only Kubernetes type clusters support adding new versions of existing inference engines.
Steps
-
Create an API key and save it securely.
-
Download the Neutree CLI from GitHub Releases according to your server’s CPU architecture:
Terminal window # For amd64curl -LO https://github.com/neutree-ai/neutree/releases/download/v1.0.0/neutree-cli-amd64# For aarch64curl -LO https://github.com/neutree-ai/neutree/releases/download/v1.0.0/neutree-cli-aarch64 -
Rename and grant executable permissions to the CLI:
Terminal window mv neutree-cli-<arch> neutree-clichmod +x neutree-cliReplace
<arch>with your server’s CPU architecture:amd64oraarch64. -
Download the specified inference engine version package from GitHub Releases.
-
Import the engine version package using the CLI tool:
Terminal window ./neutree-cli import engine --skip-image-push \--package <engine_version_package> \--api-key <api_key> \--server-url <server_url>Parameter Description <engine_version_package>The inference engine version package name, e.g., vllm-v0.8.5.tar.gz.<api_key>The API key created in step 1. <server_url>The control plane access URL, e.g., http://localhost:3000.Terminal window ./neutree-cli import engine \--package <engine_version_package> \--mirror-registry <mirror_registry> \--registry-username <registry_username> \--registry-password <registry_password> \--api-key <api_key> \--server-url <server_url>Parameter Description <engine_version_package>The inference engine version package name, e.g., vllm-v0.8.5.tar.gz.<mirror_registry>The image registry address. <registry_username>The username of the image registry user, must have permission to upload images. <registry_password>The login password or access token of the image registry user. <api_key>The API key created in step 1. <server_url>The control plane access URL, e.g., http://localhost:3000. -
After the import is complete, log in to the Neutree management interface, click Inference Engines in the left sidebar, and confirm that the new engine version appears in the inference engine list.