Nutanix NCC Health Check: check_ntp
Nutanix
Nutanix
Description
The Nutanix NCC health check plugin check_ntp verifies the NTP configuration of the CVMs (Controller VMs) and hypervisor hosts. It also checks if there are any time drifts on the cluster.
The plugin check_ntp contains multiple individual checks that focus on specific NTP-related scenarios:
- CVM/PCVM NTP time synchronization - determines whether the CVM/PCVM is able to synchronise time with any of the NTP servers configured
- Hypervisor NTP time synchronization (AHV + ESXi only) - determines whether the host is able to synchronise time with any of the NTP servers configured
Note: NTP configuration check, check id 103076 is retired in NCC version 4.0.0.
This plugin also runs on Prism Central (PC), except for the hypervisor check.
This health check plugin was introduced in NCC version 3.1 and converges all NTP checks from previous NCC versions. On Prism Central, this check was introduced in NCC version 3.5.3. The alert function of these checks was introduced in NCC 3.6.2.
Possible causes
If this health check returns a non-PASS result, the following are possible causes:
- There are no NTP servers configured on the cluster.
- There are no NTP servers configured on the hypervisor.
- All or some NTP servers configured on the hypervisor are not the same as those configured on the CVMs or PC VMs.
- A configured NTP server is not reachable or not responding to NTP queries.
- A configured NTP server is not reliable or stable.
- The NTP server is configured with a hostname but cannot be resolved due to DNS/name resolution issues.
- NTP Port (UDP/123) is not open.
- The time on the cluster is out of sync and found to be in the future by at least 5 seconds when compared to the actual time on the NTP servers.
- The NTP server is passing a parameter that the NTP client of CVM or PC VM considers unsuitable for NTP synchronization, such as a high dispersion value, offset, jitter, reach, or stratum.
- A Windows-based NTP server (AD PDC) that uses its local clock as its time source, by default, will advertise itself as a less suitable NTP source by including a dispersion value of 10 seconds in the NTP parameter of that server. W32time is not designed with the precision required for NTP and does not guarantee better than +/- 5-minute tolerance.
- The genesis service has recently restarted and NTP synchronization is still pending, or if the NTP configuration has been changed, the effect might take some time. According to the NTP protocol, it takes about 5 minutes (five good samples) until an NTP server is accepted as a synchronization source. Waiting and rerunning the check after 10-15 minutes may produce a different result if this has provided sufficient time for the change to take effect and synchronize.
For example, after restarting genesis, the ntpq command shows that the time is still synchronizing with .LOCL.
remote refid st t when poll reach delay offset jitter
==============================================================================
x.x.x.x x.x.x.x 2 u 2 64 1 58.698 93.111 0.000
*127.127.1.0 .LOCL. 10 l 1 64 1 0.000 0.000 0.000
Then, after waiting 10-15 minutes, ntpq command now shows:
remote refid st t when poll reach delay offset jitter
==============================================================================
*x.x.x.x x.x.x.x 2 u 7 64 177 58.523 93.156 0.646
127.127.1.0 .LOCL. 10 l 20 64 177 0.000 0.000 0.000
Hence, rerunning the check immediately will fail, but rerunning it after some time, say 10-15 minutes, should PASS.
Symptoms and the impact
If this health check returns a non-PASS result, the cluster operation may be at risk of various symptoms/impacts such as:
- Users not being able to log on to the Prism web console using LDAP or other directory integrated services.
- Cluster not being able to start or function correctly due to major time-skew after outage or maintenance.
- Inaccurate logging and log collection.
- Inaccurate health check results rely on accurate time frames and event correlation.
- Incorrect and skewed graphs in Prism.
- User VMs starting on hypervisor hosts with inaccurate RTC (real-time clocks) causing guest OS time skew.
- Third-party backup software products like Veeam or Commvault having trouble interacting with the cluster.
- Snapshots expiring too soon or too late when the time between a cluster and a remote site is out of sync
Running the NCC Check
Run this check as part of the complete NCC Health Checks:
Or run this check individually:
You can also run the checks from the Prism Web console Health page: select Actions > Run Checks. Select All checks and click Run.
Sample output
For Status: INFO
INFO: The NTP servers configured on the hypervisor (['x.x.x.x', 'x.x.x.x']) differ from those configured in zeus config ([u'x.x.x.x', u'x.x.x.x']).
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
For Status: FAIL
FAIL: This CVM is the NTP leader but is not syncing time with any external NTP server.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: NTP config on CVM is not yet updated with the NTP servers configured in the zeus config. The NTP config on the CVM will not be updated if the cluster time is in the future relative to the NTP servers.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: CVM is not configured to sync time with NTP Leader CVM (x.x.x.x).
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: NTP is not configured on CVM.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: NTP is not configured on Hypervisor.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: NTP leader is not synchronizing to an external NTP server
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: No NTP servers are configured in the cluster configuration
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: The NTP leader is not synchronizing to any external NTP server because the cluster's time is in the future time relative to the external NTP servers: x.x.x.x
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
FAIL: The hypervisor is not synchronizing with any NTP server
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
For Status: ERR
ERROR: Failed to get NTP servers on hypervisor: x.x.x.x with stdout: message stderr: message
ERROR: Failed to run ntpq on the host
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
ERROR: Error occurred when tried to sync to the external NTP servers x.x.x.x
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
From NCC-4.0.0 onwards for status: WARN
Node x.x.x.x:
WARN: NTP is not configured on host (x.x.x.x). The NTP servers configured on the host ([]) differ from those configured on the cluster ([u'x.x.x.x'])
Node x.x.x.x:
WARN: NTP is not configured on host (x.x.x.x). The NTP servers configured on the host ([]) differ from those configured on the cluster ([u'x.x.x.x'])
Node x.x.x.x:
WARN: NTP is not configured on host (x.x.x.x). The NTP servers configured on the host ([]) differ from those configured on the cluster ([u'x.x.x.x'])
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
This might occur if none of the configured NTP servers are available or you are currently experiencing network instability determined by the high offset/high jitter.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
This might occur if none of the configured NTP servers are available or you are currently experiencing network instability determined by the high offset/high jitter.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
the future time relative to the external NTP servers: x.x.x.x
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=x.x.x.x
Output messaging
Check ID | 103076 |
Description | Check that NTP is configured properly on the CVM and hypervisor |
Causes of failure | Detected problems with NTP configuration. |
Resolutions | Follow the instructions in KB 4519. |
Impact | Metadata operations or alerts might not work properly. |
Alert ID | A103076 |
Alert Title | Incorrect vm_type NTP Configuration |
Alert Message | vm_type NTP is not properly configured. |
Schedule | This check is scheduled to run every hour, by default. |
Number of failures to alert | This check will generate an alert after 2 failures. |
Note: Check id 103076 is retired in NCC version 4.0.0.
Check ID | 3026 |
Description | Checks to ensure that the Controller VM is synchronizing time with an NTP server. |
Causes of failure | External NTP servers are not configured or are not reachable |
Resolutions | Verify that the external NTP servers are configured and reachable. |
Impact | Workflows involving Kerberos may fail if the time difference between the Controller VM and the NTP server is greater than 5 minutes. |
Alert ID | A3026 |
Alert Title | The vm_type is not synchronizing time with any external servers. |
Alert Message | The vm_type is not synchronizing time with any external servers. |
Schedule | This check is scheduled to run every hour, by default. |
Number of failures to alert | This check will generate an alert after 2 failures. |
Check ID | 103090 |
Description | Checks to ensure that the hypervisor is synchronizing time with an NTP server. |
Causes of failure | External NTP servers are not configured or are not reachable. |
Resolutions | Verify if the NTP servers are configured and reachable from the hypervisor. |
Impact | Logs may have different timestamps in the hypervisor and the CVMs. The hypervisor may not work as expected. |
Alert ID | A103090 |
Alert Title | The hypervisor is not synchronizing time with any external servers. |
Alert Message | The hypervisor is not synchronizing time with any external servers. |
Schedule | This check is scheduled to run every hour, by default. |
Number of failures to alert | This check will generate an alert after 2 failures. |
Solution
For clusters running ESXi 7.0.3 build 19193900, the check will give a false-positive even after the NTP servers configured on host and Prism UI are the same.
WARN: NTP is not configured on host (aa.bb.cc.51). Cluster ntp_servers: [u'dd.ee.ff.110', u'xx.yy.zz.110'].
Node 192.168.3.63:
WARN: NTP is not configured on host (aa.bb.cc.53). Cluster ntp_servers: [u'dd.ee.ff.110', u'xx.yy.zz.110'].
Node 192.168.3.62:
WARN: NTP is not configured on host (aa.bb.cc.52). Cluster ntp_servers: [u'dd.ee.ff.110', u'xx.yy.zz.110'].
Please upgrade to NCC-4.5.0.1 to mitigate the false-positive.
General Troubleshooting Steps
If this check returns a non-PASS result, check the following:
- At least one, but preferably three or more reliable off-cluster NTP servers are configured on the cluster (CVMs/PCVMs) AND on the hosts (hypervisors - AHV, ESXi, Hyper-V, XenServer).
- To configure NTP servers on the CVMs and AHV, see Configuring NTP Servers in the Prism Web Console Guide. (Configuring NTP servers via Prism will update both the CVMs and the AHV hosts).
- To configure NTP servers on ESXi hosts, see Configuring Network Time Protocol (NTP) on ESX/ESXi hosts using the vSphere Client (2012069).
- To configure NTP servers on Hyper-V hosts, see Configuring NTP on Hyper-V below.
- For recommendations on which NTP servers to use, see Recommendations for Time Synchronization.
- The list of NTP servers configured on the hypervisors should preferably be the same as those configured on the CVMs.
- If the NTP server is set by using the FQDN or the hostname, ensure the cluster can resolve the IP address for the NTP FQDN against all the configured DNS Name Servers. An invalid Name Server configuration in Prism may prevent the NTP servers from being used and lead to time sync issues.
- NTP protocol destination port (UDP 123) is open to the target NTP servers through any ACLs/firewalls in the network path between all CVMs/hosts and the NTP servers.
- Try and ping the NTP servers using FQDN and IP addresses to establish basic network connectivity. Be aware that some ACLs/firewalls may intentionally block ping (ICMP echo) traffic but still allow UDP/123, so consider that an unreachable result is not necessarily a root cause but possible insight into network connectivity issues. Use the next step to validate further.
- Regardless of the NTP server reachability by ping on the network, ensure it is healthy and responding at the application layer with valid and usable NTP queries and that it returns accurate time information. You can verify if the NTP queries return time information by running the following command:
nutanix@cvm$ /usr/sbin/ntpdate -t 10 -q
- Check the status of NTP synchronization on all CVMs and hosts using the procedure Reviewing the output of the "ntpq -pn" command below.
- Check the NTP configuration on all hosts using the procedure Reviewing the contents of the ntp.conf file below.
- This check may produce a non-PASS result after NTP is configured if the time has not been synchronized yet with the new/updated NTP configuration. If the NTP server has just recently been added, and the CVM time is not considered to be in future time (negative offset from NTP server), this check may fire until the NTP protocol has found a stable and suitable NTP source and the CVM has successfully synced (~10 minutes).
- If the configured NTP server(s) are not themselves reliable stratum 0 sources (GPS/Atomic Clock), they must have an external time source of suitable stratum (0-3 is good) configured and should not be syncing to the local clock of that server or an internal time source.
Notes:
- Synchronizing a Nutanix AOS/PC cluster with a Windows-based time source is known to cause issues over a period of time. Refer to KB 3851 Troubleshooting NTP Sync to Windows Time Servers.
Nutanix recommends that you do not synchronize a cluster’s time with Windows time sources. Use reliable non-Windows time sources instead. See Recommendations for Time Synchronization in the Prism Web Console Guide. - Do not use an NTP server as a source for a Nutanix cluster and/or hypervisor if the actual NTP server is a user VM running as a guest on the very same cluster! This is unreliable, unpredictable in user VM and cluster outages and restarts, and not recommended.
- You do not need to configure the NTP servers on AHV hosts manually. Configuring NTP servers via Prism/ncli will update both the CVMs and the AHV hosts.
- When using the Prism web console or ncli to add the NTP servers on an ESXi-based AOS cluster, the NTP servers are not automatically added to the /etc/ntp.conf file of the host. After you add the NTP servers in Prism, you must also manually configure those NTP servers on the ESXi hosts. For more information about configuring NTP servers on the ESXi hosts, see Configuring Network Time Protocol (NTP) on ESX/ESXi hosts using the vSphere Client (2012069).
- In a mixed-hypervisor cluster (AHV + ESXi), as mentioned above, AHV hosts will be configured via Prism but you must manually configure the NTP servers on the ESXi hosts of the mixed-hypervisor cluster.
- On a Hyper-V cluster, the check_ntp plugin validates only the CVM NTP configuration. It does not check the NTP or time configuration of the Windows Hyper-V hosts, so the check does not result in a FAIL status if the hypervisor is misconfigured or is out of sync with the NTP sources and/or AD PDC. Manually confirm that the Hyper-V hosts and Domain Controllers have a healthy Windows time hierarchy. The AD PDC(s) should be using reliable upstream NTP time sources in parallel to the CVMs, potentially the same NTP servers (see next point).
- Ideally, to simplify the comparison of logs and avoid complex time sync issue triage, the hypervisors and the Controller VMs should all be using the same NTP servers. If the hypervisors and the Controller VMs are using different NTP servers, this health check may produce an INFO output to raise awareness and ensure this is a conscious and reasonable configuration as opposed to an accidental misconfiguration and to quickly highlight this fact during any other unrelated troubleshooting event should it arise anytime during the clusters production.
For more information and best practices around Nutanix cluster time synchronization, refer to Cluster Time Synchronization in the Prism Web Console Guide on the Nutanix Support Portal.
Specific troubleshooting steps
- If the check reports "INFO: The NTP servers configured on the hypervisor x.x.x.x differ from those configured in Zeus config x.x.x.x", configure the same NTP servers on the cluster as well as the hypervisors.
- If the check reports "FAIL: The NTP leader is not synchronizing to any external NTP server because the cluster's time is in the future relative to the external NTP servers: xxxx", the cluster may have been started without a valid NTP sync status and pulling the CVM time backward may affect in-flight storage metadata operations. To resolve this particular issue of future CVM time, log a case with Nutanix Support for further assistance, and do not manually change any CVM date/time.
- If the check reports "FAIL: NTP leader is not synchronizing to any external NTP server", follow the General Troubleshooting Steps above. In case the above-mentioned steps do not resolve the issue, log a case with Nutanix Support, providing the results and any output from general troubleshooting and current cluster NTP configuration.
- If the check reports "FAIL: The hypervisor is not synchronizing with any NTP server", follow the General Troubleshooting Steps above. In case the above-mentioned steps do not resolve the issue, follow the steps below:
- On the host, restart the ntpd service using the procedure Restarting the ntpd service outlined below.
- Check if the host is now syncing time with NTP using the procedure Reviewing the output of the "ntpq -pn" command below. Be sure to wait ~10 minutes for sync.
- If not all hosts are synchronizing correctly, follow the procedure Reviewing the contents of the ntp.conf file below.
- If the issue is still not resolved, consider engaging Nutanix Support, providing the results, and any output from general troubleshooting and current cluster NTP configuration.
- If the check reports "FAIL: This CVM is the NTP leader but is not syncing time with any external NTP server" and you have verified that the NTP server has been set:
- The configured NTP server(s) may be overwhelmed and/or purposefully rate-limiting the number of NTP client requests to respond to protect itself from DDoS (accidental or otherwise), hence not responding to valid NTP requests by the CVM NTP leader. You can investigate if your NTP server is rate-limiting the requests by inspecting the CVM genesis service log file for an error line-entry containing "rate limit response from server":
nutanix@cvm$ allssh "grep -A 1 -i 'rate limit' ~/data/logs/genesis.out | tail"
...
2018-12-12 11:03:14 ERROR node_manager.py:3941 Systime update with ntpdate failed with error: 1: 12 Dec 11:03:14 ntpdate[26695]: n.n.n.101 rate limit response from server.
2018-12-12 11:03:14 ntpdate[26695]: no server suitable for synchronization found- If you do not control the affected NTP server, then remove it from Prism's NTP configuration and add a different, more reliable NTP server.
- If you control the source NTP server configuration, consider adding restriction exceptions for the CVM/host IPs. See your NTP server's own documentation for details. For example, on a Linux-based ntpd service, the following line would need to be added to the NTP server's /etc/ntp.conf file and then reloaded:
restrict
mask
- CVM time may be ahead of NTP server time, and the CVM's genesis service will intentionally prevent NTP syncing. This may be further evidenced in the affected CVM's Genesis logs by running the following command and looking for a negative offset between CVM and NTP source:
nutanix@cvm$ allssh "grep -i ntp /home/nutanix/data/logs/genesis.out | tail"Example output:2019-02-03 22:42:11 INFO node_manager.py:2314 Querying upstream NTP servers: 10.x.x.11
2019-02-03 22:42:12 INFO node_manager.py:2334 NTP offset: -89.328 seconds
2019-02-03 22:42:12 INFO node_manager.py:2354 Time is ahead of external NTP server by 89.328 seconds, not syncing time while cluster services are running
2019-02-03 22:42:12 INFO node_manager.py:2230 Restarting the NTP server.
2019-02-03 23:02:13 ERROR node_manager.py:2450 External NTP still unusable (0)
2019-02-03 23:02:13 WARNING node_manager.py:2456 Disabling upstream NTP servers
2019-02-03 23:02:13 INFO node_manager.py:2202 Stopping the NTP server.
2019-02-03 23:02:13 INFO node_manager.py:2230 Restarting the NTP server.
2019-02-03 23:12:13 INFO node_manager.py:2314 Querying upstream NTP servers: 10.x.x.11
2019-02-03 23:12:13 INFO node_manager.py:2334 NTP offset: -89.297 seconds
In the example output above, the cluster is not synchronizing with a newly added NTP server. In this situation, the NTP server is running 89 seconds behind the CVM and is therefore considered unusable as an NTP source.
Important: If the CVM time is in the future, DO NOT manually set the clock backwards! Contact Nutanix Support for assistance and provide the above output.
- The configured NTP server(s) may be overwhelmed and/or purposefully rate-limiting the number of NTP client requests to respond to protect itself from DDoS (accidental or otherwise), hence not responding to valid NTP requests by the CVM NTP leader. You can investigate if your NTP server is rate-limiting the requests by inspecting the CVM genesis service log file for an error line-entry containing "rate limit response from server":
- If the check reports "ERR: Failed to run ntpq on the host": Run the following command on each CVM and ensure the command runs successfully.
nutanix@cvm$ ntpq -pn
If the command fails to run or the NCC check reports an ERR status again, investigate the CVMs for free memory. Log a case with Nutanix Support for further assistance.
Reviewing the output of the "ntpq -pn" command
The command 'ntpq -pn' is the main command used by this check to identify the NTP synchronization status of the CVM or the host.
Each line of the results will be of the format: (Example output only. Actual IPs, rows of NTP servers, and related values will differ based on individual configurations)
==============================================================================
*144.xx.xx.166 202.xx.xx.118 2 u 817 1024 377 6.607 2.162 1.274
+203.xx.xx.191 216.xx.xx.202 2 u 729 1024 377 1.963 5.527 4.090
+203.xx.xx.2 216.xx.xx.202 2 u 1063 1024 377 1.662 -9.615 2.289
127.127.1.0 .LOCL. 10 l 28h 64 0 0.000 0.000 0.000
Where remote is the remote peer or server being synced to. “127.127.1.0 LOCL” is this local host (included in case there are no remote peers or servers available).
The first character displayed in the table is a state flag. A synchronized state, represented by the '*' as the first character of one remote NTP server entry, is expected.
Note: It takes 10-15 minutes for this synchronized status to occur if the genesis service with the NTP leader role has recently changed or the NTP server configuration has been modified.
- To check the NTP status on all CVMs, run the following command from one CVM:
nutanix@cvm$ allssh ntpq -pnThe following example is a good result - showing the CVM NTP leader is synchronized with an external NTP server and the other CVMs are synchronized with the CVM NTP leader.================== 10.xx.xx.61 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
+10.xxx.xxx.21 10.xx.xx.15 4 u 654 1024 377 0.812 -1.026 0.429
+10.xxx.xxx.22 10.xx.xx.15 4 u 997 1024 377 0.830 -0.998 0.533
+10.xxx.xxx.10 10.xx.xx.15 4 u 409 1024 377 1.365 -1.159 5.158
*10.xxx.xxx.11 10.xx.xx.15 4 u 579 1024 377 1.626 -1.055 0.326 <--- Synchronized with a configured NTP server 10.xx.xx.11
127.127.1.0 .LOCL. 10 l 27h 64 0 0.000 0.000 0.000
================== 10.xx.xx.62 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.61 10.xx.xx.11 5 u 1065 1024 377 0.353 2.584 1.355 <--- Synchronized with the CVM NTP leader 10.xx.xx.61
================== 10.xx.xx.63 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.61 10.xx.xx.11 5 u 722 1024 377 0.192 1.775 1.682 <--- Synchronized with the CVM NTP leader 10.xx.xx.61
Below is an example of a problematic result. The CVM NTP leader is synchronized only with its local clock:================== 10.xx.xx.61 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 27h 64 0 0.000 0.000 0.000 <--- CVM NTP leader synchronized only with its local clock
================== 10.xx.xx.62 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.61 10.xx.xx.11 5 u 1065 1024 377 0.353 2.584 1.355 <--- Synchronized with the CVM NTP leader 10.xx.xx.61
================== 10.xx.xx.63 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.61 10.xx.xx.11 5 u 722 1024 377 0.192 1.775 1.682 <--- Synchronized with the CVM NTP leader 10.xx.xx.61
If the IP '127.127.1.0' is being used, it signifies the CVMs are synchronizing with the NTP leader only ('127.127.1.0' is a localhost IP) and it is NOT syncing to any external NTP server at the time the check was performed. - To check the NTP status on all hosts/hypervisors, run the following command from one CVM:
nutanix@cvm$ hostssh ntpq -pn
The following example is a good result. All hosts are syncing with the same NTP servers.============= 192.xx.xx.1 ============If the NTP IP addresses are not consistently the same across all hosts, check /etc/ntp.conf for if they are using a hostname/FQDN which represents a pool of NTP servers. NTP pools are made up of very many round-robin DNS entries, so at initialization time, the DNS response given to each host as they start the NTP service may return a different IP address for use as an NTP server.
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.15 218.1xx.xx.70 2 u 822 1024 377 96.679 12.968 3.105
10.xx.xx.16 .INIT. 16 u - 1024 0 0.000 0.000 0.000
+10.xx.xx.21 203.xx.xx.251 3 u 27 1024 377 0.609 -23.479 4.167
============= 192.xx.xx2 ============
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.15 218.xx.xx.70 2 u 8 1024 157 2.513 3.510 2.980
10.xx.xx.16 .INIT. 16 u - 1024 0 0.000 0.000 0.000
+10.xx.xx.21 203.xx.xx.251 3 u 253 1024 377 0.665 -8.794 5.203
============= 192.xx.xx.3 ============
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.xx.xx.15 218.xx.xx.70 2 u 184 1024 377 96.566 17.003 4.010
10.xx.xx.16 .INIT. 16 u - 1024 0 0.000 0.000 0.000
+10.xx.xx.21 203.xx.xx.251 3 u 394 1024 377 0.659 -18.181 5.601 - If you see the following message on an AHV host when running ntpq:
No association ID's returned
Confirm if you are running an AHV el6 kernel by running the following commands:nutanix@cvm$ ssh root@192.168.5.1
[root@ahv]# cat /etc/nutanix-release
If you are running on an el6 kernel, you will see an output similar to below:el6.nutanix.20170830.151To fix this issue temporarily (workaround), on the host, restart the ntpd service using the procedure Restarting the ntpd service below, then rerun this NCC check to confirm.To fix this issue permanently, upgrade AOS to 5.5.8, 5.9.2, 5.10 or later.
- If you see the following message on an ESXi host when running ntpq, it means the ESXi/ESX host cannot reach the configured NTP server:
No association ID's returnedConfirm the time on all hosts are correct and same with hostssh date command.
Confirm the NTP server IPs are configured on the host with /etc/ntp.conf.
Confirm if the DNS server configuration on the hosts is correct with the below command:
nutanix@cvm$ ssh root@192.168.5.1 esxcli network ip dns server list >>> To check on single host
nutanix@cvm$ hostssh "esxcli network ip dns server list" >>> To check on all host
To resolve this, correct DNS server configuration with the following command. Alternatively, add the correct DNS configuration in Center:[root@Esxi:~]esxcli network ip dns server add --server= - If you see the following message on an AHV host when running ntpq:
Name or service not knownThis issue may be caused by the ntpq command being unable to resolve "localhost" into 127.0.0.1.
To fix this issue, log a case with Nutanix Support providing the results and any output from general troubleshooting and current host NTP configuration.
- You might see the following kind of output when you run ntpq -pn on PCVM:
nutanix@PCVM:~$ ntpq -pnFor more info on the ntpq command, see the ntpq man page.
remote refid st t when poll reach delay offset jitter
==============================================================================
x10.48.147.26 .GNSS. 1 u 30 64 377 0.910 -4549.1 22.565
x10.65.140.26 .GNSS. 1 u 58 64 377 0.251 -4527.7 15.504
*127.127.1.0 .LOCL. 10 l 29 64 277 0.000 0.000 0.000
nutanix@NTNX-10-66-154-101-A-PCVM:~$
Reviewing the contents of the ntp.conf file
- Review the output of the ntpq -pn command using the procedure above.
- If not all AHV or ESXi hosts are syncing time with NTP, check the /etc/ntp.conf files of all hosts.
Below is a sample output where only 2 out of 3 hosts are successfully syncing with NTP.
nutanix@cvm$ hostssh cat /etc/ntp.confIn the sample config above, hosts 10.xx.xx.1 and 10.1xx.xx.2 are successfully syncing with NTP while 10.xx.xx.3 is failing because it is restricting NTP sync
============= 10.xx.xx.1 ============
restrict default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
server 10.xx.xx.8
driftfile /etc/ntp.drift
============= 10.xx.xx.2 ============
restrict default kod nomodify notrap nopeer noquery
restrict 127.0.0.1
server 10.xx.xx.8
driftfile /etc/ntp.drift
============= 10.xx.xx.3 ============
tinker panic 0
server 10.xx.xx.8
driftfile /var/lib/ntp/drift
logfile /var/log/ntp.log
restrict 10.8.x.x mask 255.255.255.0 nomodify notrap
interface ignore wildcard
interface listen br0
restrict 127.0.0.1
restrict -6 ::1
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
disable monitor - To resolve this, follow the General Troubleshooting Steps above. Note that AHV hosts are also configured along with CVMs via Prism.
- In case of a transient upstream NTP or connectivity issue, restart the ntpd service using the procedure below.
- Wait for 5-10 minutes and run the following command from one of the CVMs to check if all hypervisors are now synchronizing with the NTP server:
nutanix@cvm$ hostssh ntpq -pn
- Run NCC check again.
- If the above-mentioned steps do not resolve the issue, log a case with Nutanix Support, providing the results and any output from general troubleshooting and current cluster NTP configuration.
Note: On ESXi, "interface listen br0" being listed in /etc/ntp.conf has been known to cause the above issue. The line should be removed and ntpd service restarted.
Restarting the ntpd/w32time service
On AHV el6 or ESXi, run:
On AHV el7 run:
To check whether the AHV version installed belongs to el6 or el7 family use the command:
[root@AHV]# uname -r 4.19.84-2.el7.nutanix.20190916.410.x86_64
On Hyper-V, run:
C:\> net start w32time
Configuring NTP on Hyper-V
Hyper-V 2016 hosts use the Domain Controller as NTP. To configure external NTP sources on the Active Directory Domain Controller(s):
- Open a command prompt on the DC with administrative permissions.
- Stop the time service:
C:\> net stop w32time
- Set the manual peer list external servers:
C:\> w32tm /config /syncfromflags:manual /manualpeerlist:”
” - Set the connection as reliable:
C:\> w32tm /config /reliable:yes
- Start the time service back up:
C:\> net start w32time
- Test the configuration:
C:\> w32tm /query /configuration and w32tm /query /status
Additional Information
- Nutanix KB 4519 - Original document in Nutanix Portal