Create Your First Endpoint
You can import the YAML example file below into Neutree to quickly create an inference endpoint. After importing, you can test the endpoint with chat conversations in the platform.
The example file configures an inference endpoint named quick-start-inference that uses the llama.cpp inference engine to run the small model Tinystories-gpt-0.1-3m-GGUF. Since the model used in this example is extremely small, the generated text content is primarily for experiencing the basic workflow.
When using the example, replace the following parameters with actual values:
| Parameter | Description |
|---|---|
<control_plane_ip> |
IP address of the server where the control plane is deployed. |
<ssh_user> |
SSH username for the server where the control plane is deployed. Must be the root user or another user with root privileges. |
<ssh_private_key> |
Base64-encoded string of the SSH private key for the server where the control plane is deployed. Can be obtained using |
apiVersion: v1kind: Clustermetadata: name: quick-start-cluster workspace: defaultspec: type: ssh config: ssh_config: provider: head_ip: <control_plane_ip> auth: ssh_user: <ssh_user> ssh_private_key: <ssh_private_key> image_registry: public-docker version: v1.0.0---apiVersion: v1kind: Endpointmetadata: name: quick-start-inference workspace: defaultspec: cluster: quick-start-cluster model: registry: public-hugging-face name: afrideva/Tinystories-gpt-0.1-3m-GGUF file: "*8_0.gguf" version: main task: text-generation engine: engine: llama-cpp version: v0.3.7 resources: cpu: '1' memory: '1' replicas: num: 1 deployment_options: scheduler: type: consistent_hash variables: engine_args: {}---apiVersion: v1kind: ImageRegistrymetadata: name: public-docker workspace: defaultspec: url: https://docker.io repository: authconfig: username: "" password: "" auth: "" ca: ""---apiVersion: v1kind: ModelRegistrymetadata: name: public-hugging-face workspace: defaultspec: type: hugging-face url: https://huggingface.co credentials: ""