Skip to content

Managing Static Node Clusters

Static node clusters consist of two node types: head nodes and worker nodes. In multi-node clusters, it’s recommended to use nodes without accelerators as head nodes for scheduling and management, while worker nodes handle AI workload execution. The minimum cluster size is a single node (one head node with no worker nodes), where the head node handles both scheduling and workload execution. Plan the number of worker nodes according to your requirements.

Ensure that your servers or virtual machines meet the following requirements:

  • Resource Configuration

    • System disk: 200 GiB
    • CPU: At least 8 vCPU cores
    • Memory: At least 16 GiB
  • Operating System Images

    Accelerator TypeOperating System Image
    CPU or NVIDIA GPURocky-8.10-x86_64-minimal.iso
    AMD GPUUbuntu-22.04.5-live-server-amd64.iso
  • Port Requirements

    When the source accesses the target, the target must open the corresponding ports. Please open the ports according to the following table. Unless otherwise specified, all ports listed below are TCP ports.

    SourceTargetPortPurpose
    Control PlaneAll Nodes22Remote login, static node initialization and maintenance.
    Control PlaneAll Nodes54311Collect node runtime status data.
    Control PlaneAll Nodes44217Retrieve auto-scaling related monitoring data.
    Control PlaneAll Nodes44227Export dashboard monitoring data.
    All NodesAll Nodes10002-20000Core channel for inter-node data exchange and distributed computing.
    All NodesAll Nodes8077Node management.
    All NodesAll Nodes8076Shared memory object access and distribution.
    All NodesAll Nodes56999Node runtime environment (dependencies, etc.) management.
    Head NodeAll Nodes52365, 8078Proxy for dashboard-related command dispatch.
    Control PlaneHead Node8265, 8079Access graphical management interface.
    Control PlaneHead Node8000vLLM model inference service entry point.
    Worker NodesHead Node6379Ray cluster metadata center.
    DevelopersHead Node10001Allow remote scripts to connect to the cluster and run jobs.

Please refer to the following sections to configure the system and install Docker as the container runtime based on your server or virtual machine operating system type.

  1. Configure static IP address:

    Terminal window
    sudo vi /etc/sysconfig/network-scripts/ifcfg-<interface>

    Replace <interface> with your network interface name, such as eth0.

  2. Configure DNS server:

    Terminal window
    sudo vi /etc/resolv.conf
  3. Disable firewall:

    Terminal window
    sudo systemctl stop firewalld && sudo systemctl disable firewalld
  4. Disable SELinux:

    Terminal window
    echo -e "SELINUX=disabled\nSELINUXTYPE=targeted" | sudo tee /etc/selinux/config
    sudo setenforce 0
  1. Install Docker CE:

    Terminal window
    sudo dnf -y install dnf-plugins-core
    sudo dnf config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.repo
    sudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
  2. Start Docker service:

    Terminal window
    sudo systemctl enable --now docker
  3. Verify Docker installation:

    Terminal window
    docker --version
  4. Reboot the operating system to apply the configuration:

    Terminal window
    sudo reboot

If your nodes include accelerators, please complete the configuration according to the accelerator type:

Please refer to NVIDIA official documentation for the following configuration:

  1. Disable the NVIDIA GPU Nouveau driver. Refer to the Disabling the Nouveau Driver for NVIDIA Graphics Cards section in the Virtual GPU Software User Guide.

  2. Install NVIDIA Graphics driver version between 530.x.x and 590.x.x (inclusive). Refer to the Installing the NVIDIA vGPU Software Graphics Driver section in the Virtual GPU Software User Guide.

  3. Install NVIDIA Container Toolkit. Refer to the Installing the NVIDIA Container Toolkit section in the NVIDIA Container Toolkit documentation.

Before creating a static node cluster, you need to prepare an SSH private key for node authentication. The control plane will use the SSH private key to securely connect to and manage nodes in the cluster via SSH protocol.

An SSH key pair consists of a public key and a private key, used for node authentication and secure communication. For better security, it’s recommended to use different SSH key pairs for different clusters.

If you don’t have an SSH key pair yet, follow these steps to create one:

  1. Run the following command on the control plane or your local machine to generate an SSH key pair:

    Terminal window
    ssh-keygen -t rsa -b 4096 -C "your_email@example.com" -f ~/.ssh/neutree_cluster_key

    Parameter descriptions:

    ParameterDescription
    -t rsaSpecifies the key encryption algorithm type as RSA.
    -b 4096Specifies the key length as 4096 bits.
    -C "your_email@example.com"Adds a comment to the key, typically an email address.
    -f ~/.ssh/neutree_cluster_keySpecifies the save path and filename for the key file.
  2. When prompted Enter passphrase (empty for no passphrase), press Enter to leave it empty (no password).

  3. After the command completes, the key pair will be generated at the specified location:

    • Private key file: ~/.ssh/neutree_cluster_key. The private key is sensitive information. Keep it secure and do not share it with others.
    • Public key file: ~/.ssh/neutree_cluster_key.pub. This needs to be configured on all nodes in the cluster.

The ~/.ssh/authorized_keys file on nodes stores the public keys allowed for access. After configuring the public key on all nodes following this section, SSH will automatically use key-based login without prompting for a password.

  1. Copy the public key content to the ~/.ssh/authorized_keys file on the target node using one of the following methods:

    Terminal window
    ssh-copy-id -i ~/.ssh/neutree_cluster_key.pub <username>@<node_ip>
    ParameterDescription
    <username>SSH username. Must be root or a user with root privileges.
    <node_ip>IP address of the cluster node.
  2. Set the permissions for the ~/.ssh/authorized_keys file and its directory on the target node:

    Terminal window
    ssh <username>@<node_ip> "chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys"

Run the following command on all nodes in the cluster to test the SSH connection and ensure each node can connect properly.

  1. Test the SSH connection using the private key:

    Terminal window
    ssh -i ~/.ssh/neutree_cluster_key <username>@<node_ip>

    You should be able to log in successfully without entering a password, indicating the SSH key is configured correctly.

  2. For non-root users, test root privileges:

    Terminal window
    ssh -i ~/.ssh/neutree_cluster_key <username>@<node_ip> "sudo whoami"

    The expected output is root, indicating the user has sudo privileges.

Before creating a cluster, get the private key content using the following command:

Terminal window
cat ~/.ssh/neutree_cluster_key

The private key content format looks like:

-----BEGIN OPENSSH PRIVATE KEY-----
xxxxx...xxxxx
-----END OPENSSH PRIVATE KEY-----
  1. Log in to the Neutree management interface, click Clusters in the left sidebar, and click Create on the right page.

  2. Fill in the configuration information.

    • Basic Information

      ParameterDescriptionEditable After Creation
      NameThe name of the cluster.No
      WorkspaceThe workspace to which the cluster belongs.No
    • Image Registry

      Select an image registry for the cluster to store cluster-related container images. If no image registry is available, refer to the Create Container Image Registry section to create one. Not editable after cluster creation.

    • Cluster Type

      The type of cluster, select Static Node. Not editable after cluster creation.

    • Provider

      ParameterDescriptionEditable After Creation
      Head Node IPThe IP address of the head node.No
      Worker Node IPsThe IP addresses of worker nodes. Not required for single-node clusters. For multi-node clusters, enter an IP and click + Add to add more.Yes
    • Node Authentication

      ParameterDescriptionEditable After Creation
      SSH UserSSH username, must be root user or another user with root privileges.No
      SSH Private KeySSH private key string. Refer to Prepare SSH Private Key for details.No
    • Model Cache

      ParameterDescriptionEditable After Creation
      NameModel cache name.No
      Cache TypeStatic node clusters only support Host Path.No
      Cache PathHost path for model cache.Yes

      If model cache is not configured during creation, it cannot be added after cluster creation.

  3. After confirming the configuration is correct, click Save to complete the creation.

Log in to the Neutree management interface, click Clusters in the left sidebar, and the cluster list on the right will display all current clusters. Click on a cluster name to view details.

On the details page, you can view Basic Information, Monitoring, and Ray Dashboard as needed.

After cluster creation, you can modify worker node IPs and model cache paths as needed.

  1. Log in to the Neutree management interface, click the menu icon on the cluster list or details page, and select Edit.

  2. Modify as needed on the configuration page. For parameter descriptions, refer to Create Cluster.

  3. After confirming the configuration is correct, click Save to complete the edit.

  1. Log in to the Neutree management interface, click the menu icon on the cluster list or details page, and select Delete.

  2. In the pop-up dialog, confirm and click Delete. The cluster will be permanently deleted.