How to use diagnostic tools to run performance tests on a cluster in ThinkAgile HX
How to use diagnostic tools to run performance tests on a cluster in ThinkAgile HX
How to use diagnostic tools to run performance tests on a cluster in ThinkAgile HX
Description
This article introduces the procedure used to run performance tests on a cluster using the diagnostics utility. This tool is useful in the pre-sales demonstration of a cluster and while identifying the source of performance issues in a production cluster. Diagnostics should also be run as a part of the setup process to ensure a cluster is running properly before the customer takes ownership of it.
The diagnostic utility deploys a VM on each node in the cluster. Controller VMs (CVMs) control the diagnostic VM on their hosts and report back to a single system.
The diagnostics test covers the following data:
- Sequential write bandwidth
- Sequential read bandwidth
- Random read IOPS
- Random write IOPS
Applicable Systems
ThinkAgile HX
Procedure
1. Use SSH to log in to any CM in the cluster.
2. Setup the diagnostics test.
nutanix@CVM:~$ ~/diagnostics/diagnostics.py cleanup
Cleaning up node 10.10.3.13 ... done.
Cleaning up node 10.10.3.14 ... done.
Cleaning up node 10.10.3.15 ... done.
Cleaning up the container and the storage pool ... done.
3. Run the diagnostics test.
nutanix@CVM:~$ ~/diagnostics/diagnostics.py run
The script performs the following tasks:
- Install a diagnostic VM on each node in the cluster
- If necessary, create cluster entities to support the test
- Uses the Linux fio utility to run four performance tests
- Reports results back to a single system
If the command fails and returns an ERROR:root:Zookeeper host port list is not set message, refresh the environment by running either source /etc/profile or bash -l, and then run the command again.
4. The test might take up to 15 minutes to complete for a four-node cluster. For larger clusters, you should allow more time.
5. When complete, review the results. You also can review the results archived in the home/nutanix/diagnostics/results/timestamp directory.
6. Because the test creates new cluster entities, it is necessary to run a cleanup script when you are finished.
nutanix@CVM:~$ ~/diagnostics/diagnostics.py cleanup
Diagnostics output
System output similar to the following indicates a successful test.
Checking if an existing storage pool can be used ...
Using storage pool sp1 for the tests.
Checking if the diagnostics container exists ... does not exist.
Creating a new container NTNX-diagnostics-ctr for the runs ... done.
Mounting NFS datastore 'NTNX-diagnostics-ctr' on each host ... done.
Deploying the diagnostics UVM on host 172.16.8.170 ... done.
Preparing the UVM on host 172.16.8.170 ... done.
Deploying the diagnostics UVM on host 172.16.8.171 ... done.
Preparing the UVM on host 172.16.8.171 ... done.
Deploying the diagnostics UVM on host 172.16.8.172 ... done.
Preparing the UVM on host 172.16.8.172 ... done.
Deploying the diagnostics UVM on host 172.16.8.173 ... done.
Preparing the UVM on host 172.16.8.173 ... done.
VM on host 172.16.8.170 has booted. 3 remaining.
VM on host 172.16.8.171 has booted. 2 remaining.
VM on host 172.16.8.172 has booted. 1 remaining.
VM on host 172.16.8.173 has booted. 0 remaining.
Waiting for the hot cache to flush ... done.
Running test 'Prepare disks' ... done.
Waiting for the hot cache to flush ... done.
Running test 'Sequential write bandwidth (using fio)' ... bandwidth MBps
Waiting for the hot cache to flush ... done.
Running test 'Sequential read bandwidth (using fio)' ... bandwidth MBps
Waiting for the hot cache to flush ... done.
Running test 'Random read IOPS (using fio)' ... operations IOPS
Waiting for the hot cache to flush ... done.
Running test 'Random write IOPS (using fio)' ... operations IOPS
Tests done.
Note:
- Expected results vary based on the specific AOS version and hardware model used
- The IOPS values reported by the diagnostics script are higher than the values reported by the Nutanix management interfaces. This is because the diagnostics script reports physical disk I/O, and the management interfaces show IOPS reported by the hypervisor.
Additional Information
Related Article
- Lenovo ThinkAgile HX Series knowledge base article landing page
- How to run the NCC health check and collect the output using Nutanix Prism
- How to run the NCC health check and collect the output using the Nutanix CVM CLI
- How to collect hypervisor logs using SSH to connect to a Controller VM in ThinkAgile HX systems