Managing Kubernetes clusters
Port requirements
If there is a firewall between your standard Kubernetes cluster and the Neutree management plane, open the corresponding ports on the destination side according to the following list to ensure Neutree can manage your Kubernetes cluster. Unless otherwise specified, all ports listed below are TCP ports.
| Source | Destination | Port | Purpose |
|---|---|---|---|
| Control plane | Kubernetes cluster nodes | 6443 | Manage and deploy Neutree cluster components and inference endpoints. |
| LoadBalancer IP associated with Kubernetes cluster nodes | 8000 | Forward requests from the AI gateway to the specific cluster. | |
| Control Plane virtual IP of Kubernetes cluster nodes | Assigned NodePort | Forward requests from the AI gateway to the specific cluster. | |
| Kubernetes cluster nodes | Node where monitoring components are deployed | 8480 | Required when monitoring components are deployed on a server or VM to upload monitoring metrics. |
| LoadBalancer IP allocated to monitoring components deployed on Kubernetes | 8480 | Required when monitoring components are deployed on a Kubernetes cluster to upload monitoring metrics. |
Configuring accelerators
If the Kubernetes nodes include accelerators, complete the corresponding configuration based on the accelerator type.
When used with AKE, enable the NVIDIA GPU Operator plugin for the AKE workload cluster in Arcfra Operation Center. For details, see the Configuring cluster addons section in the Arcfra Kubernetes Engine Administration Guide.
When used with other standard Kubernetes clusters, refer to the Installing the NVIDIA GPU Operator section in NVIDIA GPU Operator from NVIDIA’s official documentation.
Refer to AMD’s official documentation to complete the following:
-
The cluster image currently supports ROCM software version 6.3.3. Install the corresponding version of the AMDGPU driver. See the AMDGPU driver installation section in the AMD ROCm documentation.
-
The cluster image currently supports ROCM software version 6.3.3. Install the corresponding version of the AMD GPU Device Plugin. See the AMD GPU Device Plugin for Kubernetes section in the Device Plugin Documentation.
Creating a cluster
Follow the steps below to create a cluster. If the cluster nodes cannot access Docker Hub or the connection is slow, you can manually import cluster images.
-
Log in to the Neutree management interface, click Clusters in the left navigation pane, then click Create on the right.
-
Fill in the configuration.
-
Basic Information
Parameter Description Editable after creation Name The name of the cluster. No Workspace The workspace to which the cluster belongs. No -
Image Registry
Select a container registry for the cluster to store cluster-related container images. If no registry is available, see Creating a container registry; if no registry is available in your environment, see Setting up a temporary container registry. This field is not editable after creation.
-
Cluster Type
The cluster type. Select Kubernetes. Not editable after creation.
-
Version
The cluster version. The system automatically retrieves available versions from the selected registry. Can be updated after creation via Upgrading the cluster version.
-
Provider
Enter the Kubeconfig string for the cluster to access the Kubernetes cluster. Not editable after creation.
-
Router
Parameter Description Editable after creation Access Mode Routing component access mode: LoadBalancer or NodePort. When selecting LoadBalancer, ensure the Kubernetes cluster supports LoadBalancer services.
Yes Replicas Number of replicas for the routing component. Recommended: at least 2 for high availability. Yes CPU Number of CPUs for the routing component. Yes Memory Memory capacity for the routing component. Yes -
Model Caches
Parameter Description Editable after creation Name The name of the model cache. No Cache Type Supported cache types: Host Path (local cache); NFS (NFS cache); PVC (persistent storage, ReadWriteMany only). Yes Cache Path The path for model caching. When cache type is Host Path, specify the host path; when NFS, specify the NFS server path; when PVC, this field is not required. Yes NFS Server Address The IP address or domain name of the NFS server. Required only when cache type is NFS. Yes Storage Specify the storage capacity for model caching. Required only when cache type is PVC. Yes Storage Class Name Specify the storage class name for model caching. Required only when cache type is PVC. No If model cache is not configured during creation, it can be added after the cluster is created.
-
-
After confirming the configuration is correct, click Save to complete creation.
Manually importing a cluster image
When upgrading the cluster version or when the network environment is restricted, you can manually import the required cluster images into the Neutree container registry.
Procedure
-
Download version 1.0.1 of the Neutree CLI tool and the cluster offline image for the specified accelerator type, based on the server CPU architecture.
-
Upload the cluster offline image to the specified registry using the CLI tool:
Terminal window ./neutree-cli-<arch> import cluster \--package <cluster_package> \--mirror-registry <mirror_registry> \[--registry-project <registry_project>] \--registry-username <registry_username> \--registry-password <registry_password>Parameter Description <arch>CPU architecture of the server: amd64oraarch64.<cluster_package>Cluster offline image name, format: neutree-cluster-k8s-v1.0.1-<arch>.tar.gz.<mirror_registry>Registry address. Must match the address used when uploading images with the CLI tool during Neutree management plane deployment. Enter an OCI-compatible registry address without the https://prefix.--registry-project <registry_project>Optional. Registry project name. Ensure the corresponding project has been pre-created in the registry. <registry_username>Registry username. Must have upload permissions. <registry_password>Registry password or access key (such as a token).
Viewing clusters
Log in to the Neutree management interface, click Clusters in the left navigation pane. The cluster list on the right shows all current clusters. Click a cluster name to view details. On the details page, you can view Basic and Monitor as needed.
The possible statuses during cluster operation and their descriptions are as follows:
| Status | Description |
|---|---|
| Initializing | The cluster is performing initial initialization. |
| Running | The cluster is operating normally. |
| Updating | The cluster configuration has changed and the new configuration is being applied. |
| Upgrading | The cluster is undergoing a version upgrade. |
| Failed | The cluster is experiencing an error. Check node status and logs. |
| Deleting | The cluster is being deleted; resources are being cleaned up. |
If cluster monitoring shows No data, see Kubernetes cluster monitoring shows No data to install the required components.
Editing a cluster
After creation, you can modify the routing and model cache configuration of the cluster as needed.
-
Log in to the Neutree management interface, click the menu icon (…) in the cluster list or details page, and select Edit.
-
On the configuration page, modify as needed. For parameter descriptions, see Creating a cluster.
-
After confirming the configuration is correct, click Save to complete editing.
Deleting clusters
You can delete one or more clusters at a time.
-
Log in to the Neutree management interface, click the menu icon (…) in the cluster list or details page, and select Delete; or select multiple clusters in the list and click Delete above the list.
-
In the dialog that appears, confirm again and click Delete. The selected clusters will be permanently deleted.