Managing Static Node Clusters
Static node clusters consist of two node types: head nodes and worker nodes. In multi-node clusters, it’s recommended to use nodes without accelerators as head nodes for scheduling and management, while worker nodes handle AI workload execution. The minimum cluster size is a single node (one head node with no worker nodes), where the head node handles both scheduling and workload execution. Plan the number of worker nodes according to your requirements.
Static Node Requirements
Section titled “Static Node Requirements”Ensure that your servers or virtual machines meet the following requirements:
-
Resource Configuration
- System disk: 200 GiB
- CPU: At least 8 vCPU cores
- Memory: At least 16 GiB
-
Operating System Images
Accelerator Type Operating System Image CPU or NVIDIA GPU Rocky-8.10-x86_64-minimal.iso AMD GPU Ubuntu-22.04.5-live-server-amd64.iso -
Port Requirements
When the source accesses the target, the target must open the corresponding ports. Please open the ports according to the following table. Unless otherwise specified, all ports listed below are TCP ports.
Source Target Port Purpose Control Plane All Nodes 22 Remote login, static node initialization and maintenance. Control Plane All Nodes 54311 Collect node runtime status data. Control Plane All Nodes 44217 Retrieve auto-scaling related monitoring data. Control Plane All Nodes 44227 Export dashboard monitoring data. All Nodes All Nodes 10002-20000 Core channel for inter-node data exchange and distributed computing. All Nodes All Nodes 8077 Node management. All Nodes All Nodes 8076 Shared memory object access and distribution. All Nodes All Nodes 56999 Node runtime environment (dependencies, etc.) management. Head Node All Nodes 52365, 8078 Proxy for dashboard-related command dispatch. Control Plane Head Node 8265, 8079 Access graphical management interface. Control Plane Head Node 8000 vLLM model inference service entry point. Worker Nodes Head Node 6379 Ray cluster metadata center. Developers Head Node 10001 Allow remote scripts to connect to the cluster and run jobs.
Configure Operating System
Section titled “Configure Operating System”Please refer to the following sections to configure the system and install Docker as the container runtime based on your server or virtual machine operating system type.
System Configuration
Section titled “System Configuration”-
Configure static IP address:
Terminal window sudo vi /etc/sysconfig/network-scripts/ifcfg-<interface>Replace
<interface>with your network interface name, such aseth0. -
Configure DNS server:
Terminal window sudo vi /etc/resolv.conf -
Disable firewall:
Terminal window sudo systemctl stop firewalld && sudo systemctl disable firewalld -
Disable SELinux:
Terminal window echo -e "SELINUX=disabled\nSELINUXTYPE=targeted" | sudo tee /etc/selinux/configsudo setenforce 0
-
Configure static IP address and DNS server:
Terminal window sudo vi /etc/netplan/50-cloud-init.yaml -
Apply the network configuration:
Terminal window sudo netplan apply -
Disable firewall:
Terminal window sudo ufw disable -
Optional: Disable AppArmor as needed:
Terminal window sudo systemctl disable apparmor && sudo systemctl stop apparmor -
Reboot the operating system to apply the configuration:
Terminal window sudo reboot
Install Docker
Section titled “Install Docker”-
Install Docker CE:
Terminal window sudo dnf -y install dnf-plugins-coresudo dnf config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.reposudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -
Start Docker service:
Terminal window sudo systemctl enable --now docker -
Verify Docker installation:
Terminal window docker --version -
Reboot the operating system to apply the configuration:
Terminal window sudo reboot
-
Update package index:
Terminal window sudo apt-get updatesudo apt-get -y install ca-certificates curl -
Add Docker GPG key:
Terminal window sudo install -m 0755 -d /etc/apt/keyringssudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.ascsudo chmod a+r /etc/apt/keyrings/docker.asc -
Add Docker repository:
Terminal window echo \"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \$(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \sudo tee /etc/apt/sources.list.d/docker.list > /dev/null -
Install Docker CE:
Terminal window sudo apt-get updatesudo apt-get -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -
Start Docker service:
Terminal window sudo systemctl enable --now docker -
Verify Docker installation:
Terminal window docker --version
Configure Accelerators
Section titled “Configure Accelerators”If your nodes include accelerators, please complete the configuration according to the accelerator type:
Please refer to NVIDIA official documentation for the following configuration:
-
Disable the NVIDIA GPU Nouveau driver. Refer to the Disabling the Nouveau Driver for NVIDIA Graphics Cards section in the Virtual GPU Software User Guide.
-
Install NVIDIA Graphics driver version between 530.x.x and 590.x.x (inclusive). Refer to the Installing the NVIDIA vGPU Software Graphics Driver section in the Virtual GPU Software User Guide.
-
Install NVIDIA Container Toolkit. Refer to the Installing the NVIDIA Container Toolkit section in the NVIDIA Container Toolkit documentation.
The current cluster image supports ROCM software version 6.3.3. Please install the corresponding version of AMDGPU driver and AMD Container Toolkit. Refer to the Quick Start Guide section in the AMD Container Toolkit Documentation from AMD official documentation.
Prepare SSH Private Key
Section titled “Prepare SSH Private Key”Before creating a static node cluster, you need to prepare an SSH private key for node authentication. The control plane will use the SSH private key to securely connect to and manage nodes in the cluster via SSH protocol.
Create SSH Key Pair
Section titled “Create SSH Key Pair”An SSH key pair consists of a public key and a private key, used for node authentication and secure communication. For better security, it’s recommended to use different SSH key pairs for different clusters.
If you don’t have an SSH key pair yet, follow these steps to create one:
-
Run the following command on the control plane or your local machine to generate an SSH key pair:
Terminal window ssh-keygen -t rsa -b 4096 -C "your_email@example.com" -f ~/.ssh/neutree_cluster_keyParameter descriptions:
Parameter Description -t rsaSpecifies the key encryption algorithm type as RSA. -b 4096Specifies the key length as 4096 bits. -C "your_email@example.com"Adds a comment to the key, typically an email address. -f ~/.ssh/neutree_cluster_keySpecifies the save path and filename for the key file. -
When prompted
Enter passphrase (empty for no passphrase), press Enter to leave it empty (no password). -
After the command completes, the key pair will be generated at the specified location:
- Private key file:
~/.ssh/neutree_cluster_key. The private key is sensitive information. Keep it secure and do not share it with others. - Public key file:
~/.ssh/neutree_cluster_key.pub. This needs to be configured on all nodes in the cluster.
- Private key file:
Configure Public Key on Target Nodes
Section titled “Configure Public Key on Target Nodes”The ~/.ssh/authorized_keys file on nodes stores the public keys allowed for access. After configuring the public key on all nodes following this section, SSH will automatically use key-based login without prompting for a password.
-
Copy the public key content to the
~/.ssh/authorized_keysfile on the target node using one of the following methods:Terminal window ssh-copy-id -i ~/.ssh/neutree_cluster_key.pub <username>@<node_ip>Terminal window cat ~/.ssh/neutree_cluster_key.pub | ssh <username>@<node_ip> "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"Parameter Description <username>SSH username. Must be root or a user with root privileges. <node_ip>IP address of the cluster node. -
Set the permissions for the
~/.ssh/authorized_keysfile and its directory on the target node:Terminal window ssh <username>@<node_ip> "chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys"
Verify SSH Connection
Section titled “Verify SSH Connection”Run the following command on all nodes in the cluster to test the SSH connection and ensure each node can connect properly.
-
Test the SSH connection using the private key:
Terminal window ssh -i ~/.ssh/neutree_cluster_key <username>@<node_ip>You should be able to log in successfully without entering a password, indicating the SSH key is configured correctly.
-
For non-root users, test root privileges:
Terminal window ssh -i ~/.ssh/neutree_cluster_key <username>@<node_ip> "sudo whoami"The expected output is
root, indicating the user has sudo privileges.
Get Private Key Content
Section titled “Get Private Key Content”Before creating a cluster, get the private key content using the following command:
cat ~/.ssh/neutree_cluster_keyThe private key content format looks like:
-----BEGIN OPENSSH PRIVATE KEY-----xxxxx...xxxxx-----END OPENSSH PRIVATE KEY-----Create Cluster
Section titled “Create Cluster”-
Log in to the Neutree management interface, click Clusters in the left sidebar, and click Create on the right page.
-
Fill in the configuration information.
-
Basic Information
Parameter Description Editable After Creation Name The name of the cluster. No Workspace The workspace to which the cluster belongs. No -
Image Registry
Select an image registry for the cluster to store cluster-related container images. If no image registry is available, refer to the Create Container Image Registry section to create one. Not editable after cluster creation.
-
Cluster Type
The type of cluster, select Static Node. Not editable after cluster creation.
-
Provider
Parameter Description Editable After Creation Head Node IP The IP address of the head node. No Worker Node IPs The IP addresses of worker nodes. Not required for single-node clusters. For multi-node clusters, enter an IP and click + Add to add more. Yes -
Node Authentication
Parameter Description Editable After Creation SSH User SSH username, must be root user or another user with root privileges. No SSH Private Key SSH private key string. Refer to Prepare SSH Private Key for details. No -
Model Cache
Parameter Description Editable After Creation Name Model cache name. No Cache Type Static node clusters only support Host Path. No Cache Path Host path for model cache. Yes If model cache is not configured during creation, it cannot be added after cluster creation.
-
-
After confirming the configuration is correct, click Save to complete the creation.
View Cluster
Section titled “View Cluster”Log in to the Neutree management interface, click Clusters in the left sidebar, and the cluster list on the right will display all current clusters. Click on a cluster name to view details.
On the details page, you can view Basic Information, Monitoring, and Ray Dashboard as needed.
Edit Cluster
Section titled “Edit Cluster”After cluster creation, you can modify worker node IPs and model cache paths as needed.
-
Log in to the Neutree management interface, click the menu icon on the cluster list or details page, and select Edit.
-
Modify as needed on the configuration page. For parameter descriptions, refer to Create Cluster.
-
After confirming the configuration is correct, click Save to complete the edit.
Delete Cluster
Section titled “Delete Cluster”-
Log in to the Neutree management interface, click the menu icon on the cluster list or details page, and select Delete.
-
In the pop-up dialog, confirm and click Delete. The cluster will be permanently deleted.