Skip to content

Create Your First Endpoint

You can import the YAML example file below into Neutree to quickly create an inference endpoint. After importing, you can test the endpoint with chat conversations in the platform.

The example file configures an inference endpoint named quick-start-inference that uses the llama.cpp inference engine to run the small model Tinystories-gpt-0.1-3m-GGUF. Since the model used in this example is extremely small, the generated text content is primarily for experiencing the basic workflow.

When using the example, replace the following parameters with actual values:

Parameter Description
<control_plane_ip> IP address of the server where the control plane is deployed.
<ssh_user> SSH username for the server where the control plane is deployed. Must be the root user or another user with root privileges.
<ssh_private_key> Base64-encoded string of the SSH private key for the server where the control plane is deployed.

Can be obtained using cat <ssh_private_key_path> | base64 -w 0.

apiVersion: v1
kind: Cluster
metadata:
name: quick-start-cluster
workspace: default
spec:
type: ssh
config:
ssh_config:
provider:
head_ip: <control_plane_ip>
auth:
ssh_user: <ssh_user>
ssh_private_key: <ssh_private_key>
image_registry: public-docker
version: v1.0.0
---
apiVersion: v1
kind: Endpoint
metadata:
name: quick-start-inference
workspace: default
spec:
cluster: quick-start-cluster
model:
registry: public-hugging-face
name: afrideva/Tinystories-gpt-0.1-3m-GGUF
file: "*8_0.gguf"
version: main
task: text-generation
engine:
engine: llama-cpp
version: v0.3.7
resources:
cpu: '1'
memory: '1'
replicas:
num: 1
deployment_options:
scheduler:
type: consistent_hash
variables:
engine_args: {}
---
apiVersion: v1
kind: ImageRegistry
metadata:
name: public-docker
workspace: default
spec:
url: https://docker.io
repository:
authconfig:
username: ""
password: ""
auth: ""
ca: ""
---
apiVersion: v1
kind: ModelRegistry
metadata:
name: public-hugging-face
workspace: default
spec:
type: hugging-face
url: https://huggingface.co
credentials: ""