Some thoughts about NTFS Filesystem

Some information raised during preparation of GCFA exam

The New Technology File System (NTFS) is a file system developed and introduced by Microsoft in 1995 with Windows NT as a replacement for the FAT file system.

Versions

Microsoft has released five versions of NTFS:

  • v1.0: Released with Windows NT 3.1 in 1993.
    v1.0 is incompatible with v1.1 and newer: Volumes written by Windows NT 3.5x cannot be read by Windows NT 3.1 until an update (available on the NT 3.5x installation media) is installed.
  • v1.1: Released with Windows NT 3.51 in 1995.
    Supports compressed files, named streams and access control lists
  • v1.2: Released with Windows NT 4.0 in 1996.
    Supports security descriptors.
    Commonly called NTFS 4.0 after the OS release.
  • v3.0: Released with Windows 2000.
    Supports disk quotas, Encrypting File System, sparse files, reparse points, update sequence number (USN) journaling, the $Extend folder and its files.
    Reorganized security descriptors so that multiple files using the same security setting can share the same descriptor.
    Commonly called NTFS 5.0 after the OS release.
  • v3.1: Released with Windows XP in October 2001.
    Expanded the Master File Table (MFT) entries with redundant MFT record number (useful for recovering damaged MFT files).
    Commonly called NTFS 5.1 after the OS release

Structure

NTFS is optimized for 4 KB clusters, but supports a maximum cluster size of 64 KB.

The maximum NTFS volume size that the specification can support is 264 − 1 clusters, but not all implementations achieve this theoretical maximum, as discussed below.

The maximum NTFS volume size implemented in Windows XP Professional is 232 − 1 clusters, partly due to partition table limitations.

Using the default cluster size of 4 KB, the maximum NTFS volume size is 16 TB minus 4 KB.

Both of these are vastly higher than the 128 GB limit in Windows XP SP1.

Because partition tables on master boot record (MBR) disks support only partition sizes up to 2 TB, multiple GUID Partition Table (GPT or “dynamic”) volumes must be combined to create a single NTFS volume larger than 2 TB.

Booting from a GPT volume to a Windows environment in a Microsoft supported way requires a system with Unified Extensible Firmware Interface (UEFI) and 64-bit support.

The NTFS maximum theoretical limit on the size of individual files is 16 EiB (16 × 10246 or 264 bytes) minus 1 KB, which totals to 18,446,744,073,709,550,592 bytes.

With Windows 8 and Windows Server 2012, the maximum implemented file size is 256 TB minus 64 KB or 281,474,976,645,120 bytes.

Master File Table

In NTFS, all file, directory and metafile data — file name, creation date, access permissions (by the use of access control lists), and size — are stored as metadata in the Master File Table (MFT).

http://amanda.secured.org/ntfs-mft-record-parsing-parser/

This abstract approach allowed easy addition of file system features during Windows NT’s development and also enables fast file search software such as
Everything to locate named local files and folders included in the MFT very quickly, without requiring any other index.

The MFT structure supports algorithms which minimize disk fragmentation.
A directory entry consists of a filename and a “file ID”, which is the record number representing the file in the Master File Table.
The file ID also contains a reuse count to detect stale references.

Two copies of the MFT are stored in case of corruption.
If the first record is corrupted, NTFS reads the second record to find the MFT mirror file. Locations for both files are stored in the boot sector.

Metafiles

NTFS contains several files that define and organize the file system. In all respects, most of these files are structured like any other user file ($Volume being the most peculiar), but are not of direct interest to file system clients.

These metafiles define files, back up critical file system data, buffer file system changes, manage free space allocation, satisfy BIOS expectations, track bad allocation units, and store security and disk space usage information.
All content is in an unnamed data stream, unless otherwise indicated.

<a href="https://medium.com/media/8523adb25690300ce414078b7e3d0704/href">https://medium.com/media/8523adb25690300ce414078b7e3d0704/href</a>

These metafiles are treated specially by Windows, handled directly by the NTFS.SYS driver and are difficult to directly view: special purpose-built tools are needed.

Attributes

For each file (or directory) described in the MFT record, there is a linear repository of stream descriptors (also named attributes), packed together in one or more MFT records (containing the so-called attributes list), with extra padding to fill the fixed 1 KB size of every MFT record, and that fully describes the effective streams associated with that file.

http://amanda.secured.org/ntfs-mft-record-parsing-parser/

Each attribute has an attribute type (a fixed-size integer mapping to an attribute definition in file $AttrDef), an optional attribute name (for example, used as the name for an alternate data stream), and a value, represented in a sequence of bytes.

For NTFS, the standard data of files, the alternate data streams, or the index data for directories are stored as attributes.

Resident and non-resident attributes

According to $AttrDef, some attributes can be either resident or non-resident. The $DATA attribute, which contains file data, is such an example. When the attribute is resident (which is represented by a flag), its value is stored directly in the MFT record.
Otherwise, clusters are allocated for the data, and the cluster location information is stored as data runs in the attribute.

Anonymous attributes

Some attribute types cannot have a name and must remain anonymous.

This is the case for the standard attributes, or for the preferred NTFS “filename” attribute type, or the “short filename” attribute type, when it is also present (for compatibility with DOS-like applications, see below).
It is also possible for a file to contain only a short filename, in which case it will be the preferred one, as listed in the Windows Explorer.

The filename attributes stored in the attribute list do not make the file immediately accessible through the hierarchical file system. In fact, all the filenames must be indexed separately in at least one separate directory on the same volume, with its own MFT record and its own security descriptors and attributes, that will reference the MFT record number for that file. This allows the same file or directory to be “hardlinked” several times from several containers on the same volume, possibly with distinct filenames.

Last Access Time

Each file and folder on an NTFS volume contains an attribute called Last Access Time.

This attribute shows when the file or folder was last accessed, such as when a user performs a folder listing, adds files to a folder, reads a file, or makes changes to a file.

The most up-to-date Last Access Time is always stored in memory and is eventually written to disk within two places:

  • The file’s attribute, which is part of its MFT record
  • A directory entry for the file. The directory entry is stored in the folder that contains the file. Files with multiple hard links have multiple directory entries.

Within the file’s attribute

NTFS typically updates a file’s attribute on disk if the current Last Access Time in memory differs by more than an hour from the Last Access Time stored on disk, or when all in-memory references to that file are gone, whichever is more recent.

For example, if a file’s current Last Access Time is 1:00 P.M., and you read the file at 1:30 P.M., NTFS does not update the Last Access Time. If you read the file again at 2:00 P.M., NTFS updates the Last Access Time in the file’s attribute to reflect 2:00 P.M. because the file’s attribute shows 1:00 P.M. and the in-memory Last Access Time shows 2:00 P.M.

Within a directory entry for a file

NTFS updates the directory entry for a file during the following events:

  • When NTFS updates the file’s Last Access Time and detects that the Last Access Time for the file differs by more than an hour from the Last Access Time stored in the file’s directory entry. This update typically occurs after a program closes the handle used to access a file within the directory. If the program holds the handle open for an extended time, a lag occurs before the change appears in the directory entry.
  • When NTFS updates other file attributes such as Last Modify Time, and a Last Access Time update is pending. In this case, NTFS updates the Last Access Time along with the other updates without additional performance impact.

The Last Access Time on disk is not always current because NTFS looks for a one-hour interval before forcing the Last Access Time updates to disk.

Windows NT and its descendants keep internal timestamps as UTC and make the appropriate conversions for display purposes; all NTFS timestamps are in UTC.

For historical reasons, the versions of Windows that do not support NTFS all keep time internally as local zone time, and therefore so do all file systems — other than NTFS — that are supported by current versions of Windows. This means that

when files are copied or moved between NTFS and non-NTFS partitions, the OS needs to convert timestamps on the fly.

NTFS also delays writing the Last Access Time to disk when users or programs perform read-only operations on a file or folder, such as listing the folder’s contents or reading (but not changing) a file in the folder.

Alternate Data streams

ADS were introduced starting in Windows NT 3.1. in order to add “extra” information to the files without altering the original file format or content.

This extra information is the metadata about the file. This metadata is arranged in the form of streams that attach to the main data stream (the stream which is visible to a normal user).

For example, one file stream could hold the security information for the file such as access permissions while another one could hold data that describes the purpose of the file, its author and the MAC times.

Alternate streams are not listed in Windows Explorer, and their size is not included in the file’s size.
When the file is copied or moved to another file system without ADS support the user is warned that alternate data streams cannot be preserved.
No such warning is typically provided if the file is attached to an e-mail, or uploaded to a website.

Many applications use ADS to store attributes of a file in them: for example, if you create a word document and right click and go into its properties, you can see a summary page which contains information that contains metadata about the data contained in the file.
The metadata includes the author of the document, word count, no of pages and so on: this summary information is attached to the file via ADS.

All files on an NTFS volume consist of at least one stream — the main stream — this is the normal, viewable file in which data is stored.

The default data stream of a regular file is a stream of type $DATA but with an anonymous name, and the ADSs are similar but must be named.

On the opposite, the default data stream of directories has a distinct type, but are not anonymous: they have an attribute name (“$I30” in NTFS 3+) that reflects its indexing format.

The full name of a stream is of the form below.

<filename>:<stream name>:<stream type>

The default data stream has no name.

https://blogs.technet.microsoft.com/askcore/2013/03/24/alternate-data-streams-in-ntfs/

That is, the fully qualified name for the default stream for a file called “sample.txt” is “sample.txt::$DATA” since “sample.txt” is the name of the file and “$DATA” is the stream type.

A user can create a named stream in a file and “$DATA” as a legal name. That means that for this stream, the full name is sample.txt:$DATA:$DATA.
If the user had created a named stream of name “bar”, its full name would be sample.txt:bar:$DATA.

https://blogs.technet.microsoft.com/askcore/2013/03/24/alternate-data-streams-in-ntfs/

In the case of directories, there is no default data stream, but there is a default directory stream: directories are the stream type $INDEX_ALLOCATION.

The default stream name for the type $INDEX_ALLOCATION (a directory stream) is $I30.

The following are equivalent:

Dir C:\Users
Dir C:\Users:$I30:$INDEX_ALLOCATION
Dir C:\Users::$INDEX_ALLOCATION

Although directories do not have a default data stream, they can have named data streams.

These alternate data streams are not normally visible, but can be observed from a command line using the /R option of the DIR command.

Known Alternate Stream Names

  • Zone.Identifier: Windows Internet Explorer uses this streamfor storage of URL security zones. (1=Intranet, 2=Trusted, 3=Internet, 4=Untrusted)
  • OECustomProperty: Used by Outlook Express for storage of custom properties related to email files.
  • encryptable: Windows Shell uses the stream to store attributes relating to thumbnails in the thumbnails database.
  • favicon: Used by Windows Internet Explorer for storing favorite ICONs for web pages.
  • AFP_AfpInfo and AFP_Resource: Used for compatibility with Macintosh operating system property lists.
  • {59828bbb-3f72–4c1b-a420-b51ad66eb5d3}.XPRESS: Used during remote differential compression.

Journaling

NTFS is a journaling file system and uses the NTFS Log ($LogFile) to record metadata changes to the volume.

It is a feature critical for NTFS to ensure that its complex internal data structures will remain consistent in case of system crashes or data moves performed by the defragmentation API, and allow easy rollback of uncommitted changes to these critical data structures when the volume is remounted.

Notably affected structures are the volume allocation bitmap, modifications to MFT records such as moves of some variable-length attributes stored in MFT records and attribute lists, and indices for directories and security descriptors.

The USN Journal (Update Sequence Number Journal) is a system management feature that records (in $Extend\$UsnJrnl) changes to files, streams and directories on the volume, as well as their various attributes and security settings.

https://www.slideshare.net/null0x00/ntfs-forensics
This is a system management feature used for recovering quickly from a computer or volume failure.

Directory junctions

Junctions point are symbolic links to a directory that acts as an alias of that directory.
This feature offers benefits over a shortcut (.lnk) file, such as allowing access to files within the directory both via Windows Explorer and Command Prompt.

https://blogs.msdn.microsoft.com/aaron_margosis/2012/12/09/using-ntfs-junctions-to-fix-application-compatibility-issues-on-64-bit-editions-of-windows/

Unlike NTFS symbolic links, junction points can only link to a local volume, and junction points from a local volume to a remote share are unsupported.

Hard links

The hard link feature allows different file names to directly refer to the same file contents.
Hard links are similar to directory junctions, but refer to files instead.

https://blogs.technet.microsoft.com/joscon/2011/01/06/how-hard-links-work/

Hard links may link only to files in the same volume, because each volume has its own MFT.

The NTFS file system has a limit of 1024 hard links on a file.

Sparse Files

A sparse file has an attribute that causes the I/O subsystem to allocate only meaningful (nonzero) data.
Nonzero data is allocated on disk, and non-meaningful data (large strings of data composed of zeros) is not.
When a sparse file is read, allocated data is returned as it was stored; non-allocated data is returned, by default, as zeros.

NTFS deallocates sparse data streams and only maintains other data as allocated. When a program accesses a sparse file, the file system yields allocated data as actual data and deallocated data as zeros.

NTFS includes full sparse file support for both compressed and uncompressed files.

NTFS handles read operations on sparse files by returning allocated data and sparse data. It is possible to read a sparse file as allocated data and a range of data without retrieving the entire data set, although NTFS returns the entire data set by default.

With the sparse file attribute set, the file system can deallocate data from anywhere in the file and, when an application calls, yield the zero data by range instead of storing and returning the actual data.

File system application programming interfaces (APIs) allow for the file to be copied or backed as actual bits and sparse stream ranges.

The net result is efficient file system storage and access: for example, the properties of a file might show that the file is a 1-GB sparse file, but although the file is 1 GB, it occupies only 64 KB of disk space.

File compression

NTFS can compress files using LZNT1 algorithm.
Files are compressed in 16 cluster chunks (with 4 KB clusters, files are compressed in 64 KB chunks).

The compression algorithms in NTFS are designed to support cluster sizes of up to 4 KB: when the cluster size is greater than 4 KB on an NTFS volume, NTFS compression is not available.

If the compression reduces 64 KB of data to 60 KB or less, NTFS treats the unneeded 4 KB pages like empty sparse file clusters and they are not written.

This allows for reasonable random-access times as the OS just has to follow the chain of fragments. However, large compressible files become highly fragmented since every chunk smaller than 64 KB becomes a fragment.

https://blogs.msdn.microsoft.com/ntdebugging/2008/05/20/understanding-ntfs-compression/
So, compression works best with files that have repetitive content, are seldom written, are usually accessed sequentially, and are not themselves compressed (Log files are an ideal example).

References

Some thoughts about NTFS Filesystem was originally published in So Long, and Thanks for All the Fish on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: https://andreafortuna.org/some-thoughts-about-ntfs-filesystem-16c6c7dd4211?source=rss----bf18ac17f001---4