Thicket data repository for the EEG
1{
2 "id": "https://www.tunbury.org/2025/06/18/windows-reflinks",
3 "title": "Hardlinks and Reflinks on Windows",
4 "link": "https://www.tunbury.org/2025/06/18/windows-reflinks/",
5 "updated": "2025-06-18T00:00:00",
6 "published": "2025-06-18T00:00:00",
7 "summary": "Who knew there was a limit on creating hard links? I didn’t even consider this until my hard links started to fail. On NTFS, the limit is 1024 links to any given file. Subsequent research shows that the limit varies between file systems, with NTFS at the lower end of the scale.",
8 "content": "<p>Who knew there was a limit on creating hard links? I didn’t even consider this until my hard links started to fail. On NTFS, the limit is 1024 links to any given file. Subsequent research shows that the limit varies between file systems, with NTFS at the lower end of the scale.</p>\n\n<p>Here’s an excerpt from <a href=\"https://en.wikipedia.org/wiki/Hard_link\">Wikipedia</a> on the subject.</p>\n\n<blockquote>\n <p>In AT&T Unix System 6, released in 1975, the number of hard links allowed was 127. On Unix-like systems, the in-memory counter is 4,294,967,295 (on 32-bit machines) or 18,446,744,073,709,551,615 (on 64-bit machines). In some file systems, the number of hard links is limited more strictly by their on-disk format. For example, as of Linux 3.11, the ext4 file system limits the number of hard links on a file to 65,000. Windows limits enforces a limit of 1024 hard links to a file on NTFS volumes.</p>\n</blockquote>\n\n<p>This restriction probably doesn’t even come close to being a practical limit for most normal use cases, but it’s worth noting that <code>git.exe</code> has 142 hard links on a standard Cygwin installation.</p>\n\n<div><div><pre><code>fsutil hardlink list %LOCALAPPDATA%\\opam\\.cygwin\\root\\bin\\git.exe\n</code></pre></div></div>\n\n<p>Back in 2012, Microsoft released ReFS as an alternative to NTFS. The feature gap has closed over the years, with hard links being introduced in the preview of Windows Server 2022. ReFS supports 1 million hard links per file, but even more interestingly, it supports <a href=\"https://learn.microsoft.com/en-us/windows/win32/fileio/block-cloning\">block cloning</a>, aka <a href=\"https://blogs.oracle.com/linux/post/xfs-data-block-sharing-reflink\">reflinks</a>, whereby files can share common data blocks. When changes are written to a block, it is copied, and its references are updated.</p>\n\n<p>The implementation is interesting because it doesn’t work in quite the way that one would think. It can only be used to clone complete clusters. Therefore, we must first call <a href=\"https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_get_integrity_information\">FSCTL_GET_INTEGRITY_INFORMATION</a>, which returns <a href=\"https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ns-winioctl-fsctl_get_integrity_information_buffer\">FSCTL_GET_INTEGRITY_INFORMATION_BUFFER</a> with the cluster size in bytes.</p>\n\n<p>Despite <a href=\"https://learn.microsoft.com/en-us/windows/win32/api/winioctl/ni-winioctl-fsctl_duplicate_extents_to_file\">FSCTL_DUPLICATE_EXTENTS_TO_FILE</a> taking an exact number of bytes, we must round up the file size to the next cluster boundary.</p>\n\n<p>Additionally, the target file needs to exist before the clone and be large enough to hold the cloned clusters. In practice, this means calling <a href=\"https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew\">CreateFileW</a> to create the file and then calling <a href=\"https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setfileinformationbyhandle\">SetFileInformationByHandle</a> to set the file size to match the source file (not the rounded cluster size).</p>\n\n<p>Taking an example file of 23075 bytes, this would be rounded to 24576 bytes (6 clusters). We can use <code>fsutil file queryextents</code> to get detailed information about the clusters used in the source file:</p>\n\n<div><div><pre><code>D:\\> fsutil file queryextents source.txt\nVCN: 0x0 Clusters: 0x6 LCN: 0x2d3d801\n</code></pre></div></div>\n\n<p>Now we clone the file <code>ReFS-clone d:\\source.txt d:\\target.txt</code> and then query the extents which it uses.</p>\n\n<div><div><pre><code>D:\\> fsutil file queryextents target.txt\nVCN: 0x0 Clusters: 0x5 LCN: 0x2d3d801\nVCN: 0x5 Clusters: 0x1 LCN: 0x2d3c801\n</code></pre></div></div>\n\n<p>The first five whole clusters are shared between the two files, while the final partial cluster has been copied. When trying to implement this, I initially used a text file of just a few bytes and couldn’t get it clone. After I rounded up the size to 4096, the API returned successfully, but there are no shared clusters. It wasn’t until I tried a larger file with the size rounded up that I started to see actual shared clusters.</p>\n\n<div><div><pre><code>D:\\>echo hello > foo.txt\n\nD:\\>fsutil file queryextents foo.txt\nVCN: 0x0 Clusters: 0x1 LCN: 0x2d3dc04\n\nD:\\>ReFS-clone.exe foo.txt bar.txt\nReFS File Clone Utility\nReFS Clone: foo.txt -> bar.txt\nCluster size: 4096 bytes\nFile size: 8 bytes -> 4096 bytes (1 clusters)\nCloning 4096 bytes...\nSuccess!\nReFS cloning completed successfully.\n\nD:\\>fsutil file queryextents bar.txt\nVCN: 0x0 Clusters: 0x1 LCN: 0x2d3d807\n</code></pre></div></div>\n\n<p>The code is on GitHub in <a href=\"https://github.com/mtelvers/ReFS-Clone\">ReFS-Clone</a>.</p>",
9 "content_type": "html",
10 "author": {
11 "name": "Mark Elvers",
12 "email": "mark.elvers@tunbury.org",
13 "uri": null
14 },
15 "categories": [
16 "OCaml,Windows",
17 "tunbury.org"
18 ],
19 "source": "https://www.tunbury.org/atom.xml"
20}