Managing Inference Engines

Neutree provides built-in inference engines. Users cannot create custom inference engines from scratch or delete existing engines. However, for Kubernetes clusters, you can add new versions of the built-in inference engines.

View Inference Engines

Log in to the Neutree management interface, click Inference Engines in the left sidebar, and the inference engine list on the right will display all inference engines built into the platform. Click on an inference engine name to view details, including supported task types and parameters.

Currently supported inference engines by default:

Name	Version	Description
vllm	v0.8.5	vLLM community v0.8.5 release. Static node clusters use this version by default.
vllm	v0.11.2	vLLM community v0.11.2 release. Kubernetes clusters use this version by default.
llama-cpp	v0.3.7	Llama-cpp python high-level implementation (Llama-cpp commit: 794fe23f29fb40104975c91fe19f23798f7c726e).

Add Inference Engine Version

Only Kubernetes type clusters support adding new versions of existing inference engines.

Steps

Create an API key and save it securely.

Download the Neutree CLI from GitHub Releases according to your server’s CPU architecture:

# For amd64
curl -LO https://github.com/neutree-ai/neutree/releases/download/v1.0.0/neutree-cli-amd64

# For aarch64
curl -LO https://github.com/neutree-ai/neutree/releases/download/v1.0.0/neutree-cli-aarch64

Rename and grant executable permissions to the CLI:
Terminal window
```
mv neutree-cli-<arch> neutree-cli
chmod +x neutree-cli
```
Replace <arch> with your server’s CPU architecture: amd64 or aarch64.
Download the specified inference engine version package from GitHub Releases.

Import the engine version package using the CLI tool:

Using Docker Hub
Upload to Remote Image Registry

./neutree-cli import engine --skip-image-push \
--package <engine_version_package> \
--api-key <api_key> \
--server-url <server_url>

Parameter	Description
`<engine_version_package>`	The inference engine version package name, e.g., `vllm-v0.8.5.tar.gz`.
`<api_key>`	The API key created in step 1.
`<server_url>`	The control plane access URL, e.g., `http://localhost:3000`.

./neutree-cli import engine  \
--package <engine_version_package> \
--mirror-registry <mirror_registry> \
--registry-username <registry_username> \
--registry-password <registry_password> \
--api-key <api_key> \
--server-url <server_url>

Parameter	Description
`<engine_version_package>`	The inference engine version package name, e.g., `vllm-v0.8.5.tar.gz`.
`<mirror_registry>`	The image registry address.
`<registry_username>`	The username of the image registry user, must have permission to upload images.
`<registry_password>`	The login password or access token of the image registry user.
`<api_key>`	The API key created in step 1.
`<server_url>`	The control plane access URL, e.g., `http://localhost:3000`.

After the import is complete, log in to the Neutree management interface, click Inference Engines in the left sidebar, and confirm that the new engine version appears in the inference engine list.