Create Your First Endpoint

You can import the YAML example file below into Neutree to quickly create an inference endpoint. After importing, you can test the endpoint with chat conversations in the platform.

The example file configures an inference endpoint named quick-start-inference that uses the llama.cpp inference engine to run the small model Tinystories-gpt-0.1-3m-GGUF. Since the model used in this example is extremely small, the generated text content is primarily for experiencing the basic workflow.

When using the example, replace the following parameters with actual values:

Parameter	Description
`<control_plane_ip>`	IP address of the server where the control plane is deployed.
`<ssh_user>`	SSH username for the server where the control plane is deployed. Must be the root user or another user with root privileges.
`<ssh_private_key>`	Base64-encoded string of the SSH private key for the server where the control plane is deployed. Can be obtained using `cat <ssh_private_key_path> \| base64 -w 0`.

apiVersion: v1
kind: Cluster
metadata:
  name: quick-start-cluster
  workspace: default
spec:
  type: ssh
  config:
    ssh_config:
      provider:
        head_ip: <control_plane_ip>
      auth:
        ssh_user: <ssh_user>
        ssh_private_key: <ssh_private_key>
  image_registry: public-docker
  version: v1.0.0
---
apiVersion: v1
kind: Endpoint
metadata:
  name: quick-start-inference
  workspace: default
spec:
  cluster: quick-start-cluster
  model:
    registry: public-hugging-face
    name: afrideva/Tinystories-gpt-0.1-3m-GGUF
    file: "*8_0.gguf"
    version: main
    task: text-generation
  engine:
    engine: llama-cpp
    version: v0.3.7
  resources:
    cpu: '1'
    memory: '1'
  replicas:
    num: 1
  deployment_options:
    scheduler:
      type: consistent_hash
  variables:
    engine_args: {}
---
apiVersion: v1
kind: ImageRegistry
metadata:
  name: public-docker
  workspace: default
spec:
  url: https://docker.io
  repository:
  authconfig:
    username: ""
    password: ""
    auth: ""
  ca: ""
---
apiVersion: v1
kind: ModelRegistry
metadata:
  name: public-hugging-face
  workspace: default
spec:
  type: hugging-face
  url: https://huggingface.co
  credentials: ""