Skip to content

Managing Inference Engines

Neutree provides built-in inference engines. Users cannot create custom inference engines from scratch or delete existing engines. However, for Kubernetes clusters, you can add new versions of the built-in inference engines.

Log in to the Neutree management interface, click Inference Engines in the left sidebar, and the inference engine list on the right will display all inference engines built into the platform. Click on an inference engine name to view details, including supported task types and parameters.

Currently supported inference engines by default:

NameVersionDescription
vllmv0.8.5vLLM community v0.8.5 release. Static node clusters use this version by default.
vllmv0.11.2vLLM community v0.11.2 release. Kubernetes clusters use this version by default.
llama-cppv0.3.7Llama-cpp python high-level implementation (Llama-cpp commit: 794fe23f29fb40104975c91fe19f23798f7c726e).

Only Kubernetes type clusters support adding new versions of existing inference engines.

Steps

  1. Create an API key and save it securely.

  2. Download the Neutree CLI from GitHub Releases according to your server’s CPU architecture:

    Terminal window
    # For amd64
    curl -LO https://github.com/neutree-ai/neutree/releases/download/v1.0.0/neutree-cli-amd64
    # For aarch64
    curl -LO https://github.com/neutree-ai/neutree/releases/download/v1.0.0/neutree-cli-aarch64
  3. Rename and grant executable permissions to the CLI:

    Terminal window
    mv neutree-cli-<arch> neutree-cli
    chmod +x neutree-cli

    Replace <arch> with your server’s CPU architecture: amd64 or aarch64.

  4. Download the specified inference engine version package from GitHub Releases.

  5. Import the engine version package using the CLI tool:

    Terminal window
    ./neutree-cli import engine --skip-image-push \
    --package <engine_version_package> \
    --api-key <api_key> \
    --server-url <server_url>
    ParameterDescription
    <engine_version_package>The inference engine version package name, e.g., vllm-v0.8.5.tar.gz.
    <api_key>The API key created in step 1.
    <server_url>The control plane access URL, e.g., http://localhost:3000.
  6. After the import is complete, log in to the Neutree management interface, click Inference Engines in the left sidebar, and confirm that the new engine version appears in the inference engine list.