Skip to content
Neutree Documentation

Concepts

Models are the core resources of AI workloads, and inference is the core AI workload type on Neutree.

A model registry stores models. Neutree supports public model registries of the Hugging Face type and private model registries based on a file system.

The model catalog compiles best-practice parameters for commonly used models, with preset inference engines, resource specifications, and runtime parameters. It aims to standardize model deployment and reduce operational costs.

An engine is a built-in code framework within Neutree for running models. It includes multiple configurable parameters and provides functions such as model loading, accelerator adaptation, inference optimization, and OpenAI-compatible APIs.

An endpoint is the specific deployment entity for a model inference service. Each endpoint corresponds to an independent inference service that provides OpenAI-compatible API interfaces to external clients. An endpoint contains a complete inference service runtime environment, including model selection, inference engine configuration, and inference parameter settings.