Skip to content

Basic Concepts

Models are the core resources for AI workloads, and inference is the primary AI workload type in Neutree.

A model registry stores models. Neutree supports Hugging Face type public model registries and file system based private model registries.

The model catalog provides best-practice configurations for commonly used models, with preset inference engines, resource specifications, and runtime parameters. It aims to standardize model deployment and reduce operational costs.

An inference engine is a built-in code framework in Neutree for running models. It includes various configurable parameters and provides features such as model loading, accelerator adaptation, inference optimization, and OpenAI-compatible APIs.

An endpoint is a deployed instance of a model inference service. Each endpoint corresponds to an independent inference service that provides OpenAI-compatible API interfaces. An endpoint includes the complete runtime environment for inference services, including model selection, inference engine configuration, and inference parameter settings.