Chkdsk.exe is the command-line interface for the CHKDSK program, which verifies the logical integrity of a file system. If CHKDSK encounters logical inconsistencies in file system data, CHKDSK performs actions that repair the file system data (assuming that the data is not in read-only mode).
The /C and /I switches are valid only for a drive that is formatted in the NTFS file system. Each of the new switches directs the CHKDSK routine to bypass certain actions that CHKDSK would otherwise take to validate the integrity of NTFS data structures.
If you run CHKDSK online, the code that actually performs the verification resides in utility DLLs, for example Untfs.dll and Ufat.dll. The verification routines that CHKDSK invokes are the same routines that run when a volume is verified through the Windows Explorer or Disk Management graphical user interface.
However, if CHKDSK is scheduled to run when the computer restarts, the binary module that contains the verification code is Autochk.exe, a native Windows program. Because Autochk.exe runs early in the computer’s startup sequence, Autochk.exe does not have the benefit of virtual memory or of other Win32 services.
Autochk.exe generates the same kind of text output that the Chkdsk.exe utility DLLs generate. Autochk.exe displays this text output during the startup process and also logs an event in the application event log. The logged event information includes as much of the text output as can fit into the event log’s data buffer.
Because both Autochk.exe and the verification code in the Chkdsk.exe utility DLLs are based on the same source code, the rest of this article uses the term “CHKDSK” to refer generically to either Autochk.exe or Chkdsk.exe. Likewise, because this article concerns only those CHKDSK changes that involve NTFS volumes, any statement that “CHKDSK does such-and-such” means that “CHKDSK does such-and-such when CHKDSK runs on an NTFS volume.”
Note that if you use the /C and /I switches, it is possible for a volume to still be corrupted even after CHKDSK runs. Therefore, it is recommended that you use these switches only if downtime must be kept to a minimum. These switches are intended for situations when you must run CHKDSK on exceptionally large volumes and you require flexibility in managing the downtime that occurs.
To understand when it might be appropriate to use the /C and /I switches, you need a basic understanding of some of the internal NTFS data structures, the kinds of corruption that can take place, what actions CHKDSK takes when it verifies a volume, and what the potential consequences are if you circumvent CHKDSK’s usual verification steps.
Understanding what CHKDSK does
CHKDSK’s activity is divided into three major passes, during which CHKDSK examines all the metadata on the volume, and an optional fourth pass.
Metadata is “data about data.” Metadata is the file system “overhead,” so to speak, that keeps track of information about all of the files that are stored on the volume. Metadata includes information about what allocation units make up the data for a given file, what allocation units are free, what allocation units contain bad sectors, and so on. The data that the file contains, on the other hand, is termed “user data.” NTFS protects its metadata through the use of a transaction log. User data is not protected in this way.
Phase 1: Checking files
During its first pass, CHKDSK displays a message that tells you that CHKDSK is verifying files and also displays the percent of verification that is completed, counting from 0 to 100 percent. During this phase, CHKDSK examines each file record segment in the volume’s master file table (MFT).
A specific file record segment in the MFT uniquely identifies every file and directory on an NTFS volume. The “percent completed” that CHKDSK displays during this phase is the percentage of the MFT that CHKDSK has verified. During this pass, CHKDSK examines each file record segment for internal consistency and builds two bitmaps, one representing the file record segments that are in use and the other representing the clusters on the volume that are in use.
At the end of this phase, CHKDSK has identified the space that is in use and the space that is available, both within the MFT and on the volume as a whole. NTFS keeps track of this information in bitmaps of its own, which are stored on the disk. CHKDSK compares its results with the bitmaps that NTFS keeps. If there are discrepancies, the discrepancies are noted in the CHKDSK output. For example, if a file record segment that was in use is found to be corrupted, the disk clusters that were associated with that file record segment are marked as “available” in the CHKDSK bitmap but are marked as “in use” in the NTFS bitmap.
Phase 2: Checking indexes
During its second pass, CHKDSK displays a message that tells you that CHKDSK is verifying indexes and again displays the percent completed, counting from 0 to 100 percent. During this phase, CHKDSK examines each of the indexes on the volume.
Indexes are essentially NTFS directories. The “percent completed” that CHKDSK displays during this phase is the percentage of the total number of the volume’s directories that have been checked. During this pass, CHKDSK examines each directory that is on the volume, checking for internal consistency and verifying that every file and directory that is represented by a file record segment in the MFT is referenced by at least one directory. CHKDSK confirms that every file or subdirectory that is referenced in a directory actually exists as a valid file record segment in the MFT and also checks for circular directory references. Finally, CHKDSK confirms that the time stamps and file size information for the files are up-to-date in the directory listings for those files.
At the end of this phase, CHKDSK has made sure that there are no “orphaned” files and that all directory listings are for legitimate files. An orphaned file is a file for which there is a legitimate file record segment but for which there is no listing in any directory. An orphaned file often can be restored to its proper directory if that directory still exists. If the proper directory no longer exists, CHKDSK creates a directory in the root directory and places the file there. If CHKDSK finds directory listings for file record segments that are no longer in use, or for file record segments that are in use but that do not correspond to the file that is listed in the directory, CHKDSK simply removes the directory entry for the file record segment.
Phase 3: Checking security descriptors
During its third pass, CHKDSK displays a message that tells you that CHKDSK is verifying security descriptors and, for the third time, displays “percent completed,” counting from 0 to 100 percent. During this phase, CHKDSK examines each security descriptor that is associated with files or directories that are on the volume.
Security descriptors contain information about ownership of a file or directory, about NTFS permissions for the file or directory, and about auditing for the file or directory. The “percent completed” that CHKDSK displays during this phase is the percentage of the volume’s files and directories that have been checked. CHKDSK verifies that each security descriptor structure is well formed and is internally consistent. CHKDSK does not verify the actual existence of the users or groups that are listed or the appropriateness of the permissions that are granted.
Phase 4: Checking sectors
If the /R switch is in effect, CHKDSK runs a fourth pass to look for bad sectors in the volume’s free space. CHKDSK attempts to read every sector on the volume to confirm that the sector is usable. Even without the /R switch, CHKDSK always reads sectors that are associated with metadata. Sectors that are associated with user data are read during earlier phases of CHKDSK if the /R switch is specified.
When CHKDSK finds an unreadable sector, NTFS adds the cluster that contains that sector to its list of bad clusters. If the bad cluster is in use, CHKDSK allocates a new cluster to do the job of the bad cluster. If you are using a fault-tolerant disk, NTFS recovers the bad cluster’s data and writes the data to the newly allocated cluster. Otherwise, the new cluster is filled with a pattern of 0xFF bytes.
If NTFS encounters unreadable sectors during the course of normal operation, NTFS remaps the sectors in the same way that it does when CHKDSK runs. Therefore, using the /R switch is usually not essential. However, using the /R switch is a convenient way to scan the entire volume if you suspect that a disk might have bad sectors.
Understanding CHKDSK time requirements
The preceding description of the phases of running CHKDSK gives you only a broad outline of the most important tasks that CHKDSK performs to verify the integrity of an NTFS volume. CHKDSK also makes many additional specific checks during each pass and several quick checks between passes. However, even such a broad outline provides some basis for the following discussion of the variables that affect the amount of time that CHKDSK takes to run and of the impact of the new /C and /I switches that are available in Windows XP.
Variable 1: The “Indexes” phase
During the first and third phases of running CHKDSK (checking files and checking security descriptors), the progress of the “percent completed” indicator is relatively smooth. Unused file record segments do require less time to process, and large security descriptors do take more time to process, but overall the “percent completed” is a fairly accurate reflection of the actual time that the phase requires.
However, this percentage/time relationship is not necessarily applicable to the second phase, when CHKDSK examines indexes (NTFS directories). The time that it takes to process a directory is closely tied to the number of files and subdirectories that are in that directory, but the “percent completed” during this phase is based only on the number of directories that CHKDSK must examine. There is no adjustment for how long it might take, for example, to process a directory that has an extremely large number of files and subdirectories. Unless the directories on a volume all contain about the same number of files, the “percent completed” that is displayed during this phase does not reliably reflect the actual time that the second phase requires.
To make matters worse if you are caught in the middle of an unexpected CHKDSK procedure, the second phase of CHKDSK is the one that typically takes the longest to run.
Variable 2: The Condition of the volume
Many factors that concern the state of a volume play a role in how long CHKDSK takes to run. A formula for predicting the time that is required to run CHKDSK on a given volume would have to include such variables as the number of files and directories, the degree of fragmentation of the volume in general and of the MFT in particular, the format of file names (long names, 8.3-formatted names, or a mixture), and the amount of actual damage that CHKDSK must repair.
Variable 3: Hardware issues
Hardware issues also affect how long it takes for CHKDSK to run. The variables include the amount of available memory, CPU speed, disk speed, and so on.
Variable 4: The CHKDSK settings
If you do not use the /R switch, the biggest time concern on a given hardware platform is the number of files and directories that are on the volume, rather than the absolute size of the volume.
For example, without the /R switch, a 50-gigabyte (GB) volume that has only one or two large database files might take only seconds for CHKDSK to run. If you use the /R switch, CHKDSK has to read and verify every sector on the volume, which adds significantly to the time that is required for large volumes. On the other hand, running CHKDSK on even a relatively small volume might require hours if the volume has hundreds of thousands or even millions of small files–regardless of whether you specify the /R switch.
Predicting CHKDSK time requirements
As you can see, running CHKDSK can take anywhere from a few seconds to several days, depending on your specific situation. The best way to predict how long CHKDSK will take to run on a given volume is to actually do a trial run in read-only mode during a period of low system usage.
However, you must use this technique with great care, for the following reasons:
- In read-only mode, CHKDSK quits before it completes all three phases if it encounters errors in earlier phases, and CHKDSK is prone to falsely reporting errors. For example, CHKDSK may report disk corruption if NTFS happens to modify areas of a disk while CHKDSK is examining the disk. For correct verification, a volume must be static, and the only way to guarantee a static state is to lock the volume. CHKDSK locks the volume only if you specify the /F switch (or the /R switch, which implies /F). You may need to run CHKDSK more than once to get CHKDSK to complete all its passes in read-only mode.
- CHKDSK is both CPU- and disk-intensive. The time that it takes to run CHKDSK is affected by how much load is on the system and whether CHKDSK runs online or during the Windows XP startup sequence. Which factor becomes the bottleneck depends on the hardware configuration, but high CPU usage or heavy disk I/O while CHKDSK is running in read-only mode will inflate the CHKDSK running time. Also, Autochk.exe runs in a different environment from that of Chkdsk.exe. Running CHKDSK through Autochk.exe gives CHKDSK exclusive use of CPU and I/O resources, but it also prevents CHKDSK from using virtual memory. Although you might expect Autochk.exe to run faster than Chkdsk.exe, Autochk.exe may actually take longer if the computer has relatively little available RAM.
- Fixing corruption adds to the time required. In read-only mode, CHKDSK runs to completion only if CHKDSK does not find any significant corruption. If a disk shows only minor corruption, you can predict that fixing the problems will not add much to the time that is required just to run CHKDSK. But if CHKDSK finds major damage, for example from a serious hardware failure, you can predict that the time that is required to run CHKDSK will increase in proportion to the number of damaged files that CHKDSK must repair. In extreme cases, this can more than double the time that it takes for CHKDSK to run.
Introducing the /C and /I switches
The /C switch
The /C switch directs CHKDSK to skip the checks that detect cycles in the directory structure. Cycles are a very rare form of corruption in which a subdirectory has itself for an “ancestor.”
Using the /C switch can speed up CHKDSK by about 1 to 2 percent, but using this switch can also leave directory “loops” on an NTFS volume. Such loops might be inaccessible from the rest of the directory tree, and some files might be orphaned in the sense that Win32 programs, including backup programs, cannot see the files.
The /I switch
The /I switch directs CHKDSK to skip the checks that compare directory entries to their corresponding file record segments. With this switch in effect, directory entries are still checked for internal consistency, but the directory entries are not necessarily consistent with the data that is stored in the corresponding file record segments.
How much time you will save by using the /I switch is difficult to predict. Typically, the /I switch lowers CHKDSK times by 50 to 70 percent, depending on factors such as the ratio of files to directories and the speed of disk I/O relative to CPU speed.
Using the /I switch has these limitations:
- You may have directory entries that refer to incorrect file record segments. In this case, any program that tries to use such an entry will encounter errors.
- You may have file record segments that no directory entry references (another way that orphaned files occur). A file that is actually intact, as represented by the file record segment, may be invisible to all Win32 programs, including backup programs.
The value of the /C and /I switches
When disk corruption is detected on a volume, there are three basic options for response.
The first option is to take no action. On a mission-critical server that is expected to be online 24 hours a day, this is often the choice of necessity. The drawback is that relatively minor corruption can snowball into major corruption. Therefore, consider this option only if keeping the server online is more important than guarding the integrity of the data that is stored on the corrupted volume. All data on the corrupted volume should be considered “at risk” until you run CHKDSK. The second option is to run a full CHKDSK operation to repair all file system data and restore all of the user data that can be recovered by means of an automated process. However, running a full CHKDSK operation can cost you several hours of downtime for a mission-critical server at an inopportune time. Your third option is to run an abbreviated CHKDSK operation by using one or both of the /C and /I switches, to repair the kinds of corruption that can snowball into bigger problems in much less time than a full CHKDSK requires.
Note however that running an abbreviated CHKDSK does not repair all of the corruption that might exist. You still need to run a full CHKDSK at some future time to guarantee that all recoverable data has in fact been recovered.
Note also that NTFS does not guarantee the integrity of user data after an instance of disk corruption, even if you immediately run a full CHKDSK operation. There might be files that CHKDSK cannot recover, and files that CHKDSK does recover might still be internally corrupted. It remains vitally important that you protect mission-critical data by performing periodic backups or by using some other robust method of data recovery.