NCC Health Check: fs_inconsistency_check

NCC Health Check: fs_inconsistency_check

NCC Health Check: fs_inconsistency_check

Description

The NCC health check fs_inconsistency_check verifies whether any CVM (Controller VM) in the cluster is experiencing filesystem inconsistencies by checking for EXT4-fs error/warning messages in dmesg and scanning tune2fs output for all mounted disks. From NCC 4.4.0 onwards, if the failed disk causing inconsistencies is unmounted from the cluster, the check will skip the execution on the removed disk and pass.

This plugin was introduced in NCC version 3.9.3.

The check runs on CVMs on all platforms and hypervisors, is scheduled to run once every 24 hours on the prior 24 hours of data in the CVM dmesg ring buffer.
Starting with NCC-4.1.0, this check will generate the alert A3038 after 1 concurrent failure across scheduled intervals.

In NCC 4.5.0 the dependency on dmesg logs is removed, instead checking a system counter that gives the number of errors in real-time.

Running the NCC check

The check can be run as part of a complete NCC by running:

nutanix@CVM$ ncc health_checks run_all

It can also be run individually as follows:

nutanix@CVM$ ncc health_checks system_checks fs_inconsistency_check

You can also run the check from the Prism web console Health page. Select Actions > Run Checks > All Checks > Run.

Sample Output

For Status: PASS

Running : health_checks system_checks fs_inconsistency_check
[==================================================] 100%
/health_checks/system_checks/fs_inconsistency_check [ PASS ]
------------------------------------------------------------------------------+
+-----------------------+
| State | Count |
+-----------------------+
| Pass | 1 |
| Total Plugins | 1 |
+-----------------------+

If the check results in a PASS, there are no filesystem inconsistencies detected. No action needs to be taken.


For Status: WARN

Running : health_checks system_checks fs_inconsistency_check
[==================================================] 100%
/health_checks/system_checks/fs_inconsistency_check [ WARN ]
------------------------------------------------------------------------------+
Detailed information for fs_inconsistency_check:
Node x.y.z.10:
WARN: 2 EXT4-fs error messages are detected in dmesg. Errors occurred are:
[Tue Nov 19 06:08:46 2019] EXT4-fs error (device sdaX): ext4_lookup:1441: inode #xxxxxx: comm postdrop: deleted inode referenced: 532994
[Tue Nov 19 06:09:14 2019] EXT4-fs error (device sdaX): ext4_lookup:1441: inode #xxxxxx: comm postdrop: deleted inode referenced: 532194
Refer to KB 8514 (http://portal.nutanix.com/kb/8514) for details on fs_inconsistency_check or Recheck with: ncc health_checks system_checks fs_inconsistency_check --cvm_list=x.y.z.10

If one or more CVMs are logging filesystem inconsistencies, the check will result in a WARN

Note: From NCC-4.5.0 and later, the severity is changed to FAIL. The End-user will experience "Critical" alert on the UI and "FAIL" status on the CLI when the check fails.

Output messaging

Check ID 3038
Description Captures EXT4-fs error messages
Causes of failure File system inconsistencies are present on the node.
Resolutions Look for any problems in the file system. Review KB 8514.
Impact The inability of the CVM to boot or for the upgrade pre-checks to run.
Alert ID A3038
Alert Title File system inconsistencies are detected.
Alert Smart Title File system inconsistencies are detected on CVM: cvm_IP
Alert Message EXT4 file system errors are detected on CVM: cvm_ip: alert_msg

Solution

Investigating a WARN

Should the check report EXT4 filesystem errors on one or more CVMs, consider engaging Nutanix Support.

To speed up the resolution time and minimize possible impact, avoid performing any activity on the cluster that would involve a reboot (including upgrades). Also, collect and attach the following information to the support case:

  • A complete NCC report:
nutanix@cvm:~ $ ncc health_checks run_all
  • A log bundle generated from the cluster. This can be collected through Prism Web Console's health page. Select Actions > Collect Logs. Logs can also be collected through the command line using logbay (KB 6691 - NCC - Logbay Quickstart Guide)
nutanix@cvm:~ $ logbay collect
  • The output of the following command, collected from the CVMs that have the EXT4 fs-error:
nutanix@cvm:~ $ sudo dmesg -T

Additional Information

Dokument-ID:HT514183
Opprinnelig publiseringsdato:09/09/2022
Siste endrede data:09/21/2022