Deployment
The following sections describe the deployment process when using HPCBOX Terraform templates with the Google Infrastructure Manager to deploy your HPCBOX cluster.
Make sure you have followed the Preparing section of the documentation to get your Google Cloud Project and required accounts ready for deployment.
You must have up-to-date Terraform templates and already activated access to the HPCBOX images, including the compute, CUDA and OpenGL worker images from within your Google Project where HPCBOX would be deployed.
Templates structure
The HPCBOX Terraform (TF) templates are structured as follows:
- main.tf - The main TF template which deploys the Head node and configuration settings for your HPCBOX cluster.
- variables.tf - The file with all TF variables used by main.tf.
- setup_gcp.sh - The bootstrap setup script for the HPCBOX head node.
Terraform Variables
The following Terraform variables are customizable:
-
project_with_hpcbox_images - This should be set to drizti-public unless advised by HPCBOX Support to change this value. Also, you must have access enabled to images in drizti-public from this Google Project where the HPCBOX cluster is to be deployed.
-
project_id - Your Google GCP Project Id where HPCBOX is to be deployed.
-
head-node-service-account - The service account you created in this preparation step.
-
region - The Google Cloud region where you want to deploy the HPCBOX cluster.
-
zone - The Google Cloud region Zone where you want to deploy the HPCBOX cluster.
-
clusterName - The prefix to use while naming all your HPCBOX cluster artifacts. As an example, setting this value to hpcbox-cls-010 will name the head node as hpcbox-cls-010-m-01.
-
clusterTag - A friendly tag/label for your HPCBOX cluster artifacts.
-
appsDiskSize - The size of the disk in GB to be created to host all the applications that would be installed on the HPCBOX cluster. This disk will be shared over NFS with all the compute, OpenGL, CUDA and login workers on your cluster. Make sure this is large enough to host all versions of all applications you plan to install on the HPCBOX cluster.
-
appsDiskType - The disk type used for the Apps Disk. This is by default set to SSD. Available options are described here in the Google documentation.
-
dataDiskSize - The size of the disk in GB that would hold the home directories on users on the HPCBOX cluster. This is exported over NFS to all worker nodes.
-
dataDiskType - The disk type used for the Data Disk. This is by default set to SSD. Available options are described here in the Google documentation.
-
headNodeImageSku - The SKU to be used for the head/management node. Make sure the SKU selected here supports the disk type set for appsDiskType and dataDiskType.
-
headNodeNetworkEgressTier - The Egress Network Tier used. TIER_1 will incure additional Google Cloud Charges.
-
vnetName - The name of the VPC Network to use for deploying your HPCBOX cluster. This is the VPC Network you created during preparation.
-
subnetName - The name of the subnet to use for deploying the HPCBOX cluster. This is the subnet you created during preparation.
-
subnetRegion - The region where your selected subnet is hosted.
-
adminUserName - The name of the initial administrator user for the HPCBOX cluster. This is the account you set up during preparation.
-
enableCloudLogin - Set this to TRUE if using Google OS-Login.
-
workerTypes - The configuration setting for the worker types and SKUs that would be used on the HPCBOX cluster. This is a comma separated value and each of the entries in the CSV tag is of the format worker_type//worker_type-SKU//worker_type-Suffix. As an example, compute-worker//c3d-standard-30-lssd//cn means, the compute-worker is of SKU c3d-standard-30-lssd and its hostname should be suffixed with cn.
The following variables are available for each compute-worker type. By default your HPCBOX cluster template includes one compute-worker type.
- compute-worker-imageSku - This variable designates the SKU to be used for worker of type compute-worker. This must match whatever was configured as the value for worker_type-SKU in the variable workerTypes.
- compute-worker-numLocalSSDs - This variable designates the number of local SSDs to use for worker of type computer-worker, if supported by the SKU type.
- compute-worker-placementMaxDistance - The Max distance to be considered when creating workers of type compute-worker for low latency between them. See Google Placement Policy for more information.
- login-worker-imageSku - This variable designates the SKU to be used for worker of type login-worker. This must match whatever was configured as the value for worker_type-SKU in the variable workerTypes
- gpu-worker-imageSku - This variable designates the SKU to be used for worker of type gpu-worker. This must match whatever was configured as the value for worker_type-SKU in the variable workerTypes
- gpu-worker-acceleratorType - The accelerator to use for the worker gpu-worker.
- cuda-worker-imageSku - This variable designates the SKU to be used for worker of type cuda-worker. This must match whatever was configured as the value for worker_type-SKU in the variable workerTypes
- cuda-worker-acceleratorType - The accelerator type to use for worker of type cuda-worker.
Make sure the Google Project you are using for HPCBOX has enough quota of vCPUs for instance SKUs specified in workerTypes and for each worker-type SKU.
Please don't change the values of the variables below because they are handled by the HPCBOX ClusterControl Service.
- compute-worker-set - Set of Compute Workers to deploy.
- gpu-worker-set - Set of GPU Workers to deploy.
- cuda-worker-set - Set of CUDA Workers to deploy.
- login-worker-set - Set of Login Workers to deploy.
Initial Deployment
If you have a Premium support agreement for HPCBOX, these operations are performed by HPCBOX Support for you.
- Unpack the Terraform templates package in a suitable directory.
- Edit any necessary variables in the variables.tf file.
- You must have the Google Cloud CLI installed on your local machine.
- Open a Command windows, Bash on Linux or Cmd.exe on Windows and follow the example run shown in the command block below.
In the example run shown below:
- the Google Project where you are deploying HPCBOX, i.e. the value of project_id in the template variables file is hpcbox-003
- the name of the Infra Manager service account, i.e the one you set up within this project in this Preparing section is projects/hpcbox-003/serviceAccounts/hpcbox-[email protected]
- the region/location where you are deploying is us-central1 i.e, the value of the region template variables file.
- the HPCBOX cluster deployment i.e., the value of goog_cm_deployment_name in the template variables file is hpcbox-cls-007.
Replace these with your values before executing the commands.
gcloud config set project hpcbox-003
gcloud auth login
cd tf-marketplace
gcloud infra-manager deployments apply projects/hpcbox-003/locations/us-central1/deployments/hpcbox-cls-007 --service-account projects/hpcbox-003/serviceAccounts/hpcbox-001-infra-manager@hpcbox-003.iam.gserviceaccount.com --local-source="." --inputs-file=.\variables.tf --location us-central1
Once the Deployment is applied, you can monitor the deployment in Google Cloud Console's Infrastructure Manager section.