Why files may take up more disk space on Windows
If you check a folder's properties on Windows, values for the data's size and the data's size on disk are displayed to you.
These values may differ slightly or a lot, depending on the file system used and the type of files stored on the drive.
It can happen that you see a Gigabyte or more of a difference between the two values.
You can test this by right-clicking any folder or drive letter in Windows Explorer and selecting the properties option from the context menu.
It may take a while before the final values are displayed which depends largely on the files stored under the structure.
As you can see on the screenshot below, the values differ by 0.2 Gigabyte which is not much if you take the size into account. There are however situations where the difference may be much larger.
So why do the sizes differ?
The answer comes in the form of file systems and cluster sizes. Without going into too many details, each file system that Windows supports, e.g. NTFS or FAT32, uses so-called clusters.
A cluster is the smallest amount of disk space that can be used by a file. Microsoft notes on the topic:
Cluster size represents the smallest amount of disk space that can be used to hold a file. When file sizes do not come out to an even multiple of the cluster size, additional space must be used to hold the file (up to the next multiple of the cluster size).
Typical default cluster sizes:
- NTFS 2 GB - 16 TB drives - 4 KB
- FAT16 1 GB - 2 GB drives - 32 KB
- FAT16 2 GB - 4 GB drives - 64 KB
- FAT32 256 MB - 8 GB drives - 4 KB
Imagine this scenario: You have a 1 Gigabyte FAT16 drive connected to your PC. Stored on it is a folder that contains 1000 files that have a size of 1 Kilobyte each.
The size value in the properties dialog would display as 1,000 Kilobyte, while the size on disk value would display as 32,000 Kilobyte (1000x32KB) instead, a difference of 31,000 Kilobyte.
While each file has a size of 1 Kilobyte, each wastes 31 Kilobyte in addition as the cluster size of the drive is 32 Kilobyte.
This has become less of an issue under Fat32 or NTFS file systems as the cluster size is 4 Kilobyte by default on those systems. Some USB Flash Drives or old computer systems may however use FAT16 as the file system.
With increasing storage sizes, it is becoming an issue again. The cluster size of a 64 TB NTFS volume for instance is 32 Kilobyte.
Determine the cluster size of a hard disk
- Tap on the Windows-key on the keyboard and type cmd.
- Right-click cmd.exe and select run as administrator from the context menu.
- Run the command chkdsk drive letter (e.g. chkdsk d:) and wait for it to finish.
- Check the "bytes in each allocation unit" output. It is in bytes, if you want it in Kilobytes, divide the number by 1024. If you take the example above, 4096 bytes become 4 Kilobyte (4096 / 1024 = 4).
What can you do about it?
Depending on how the storage is used, you may be able to reduce the cluster size:
- Use Fat32 or NTFS instead of FAT16. This may not always be possible, for instance if the file system needs to be FAT16. If there is no such requirement, you may be able to free up lots of space on disk this way by changing the file system. You can use the command line tool CONVERT for this. To change the file system of drive d: to ntfs, you would run the command CONVERT d: /fs:ntfs on an elevated command prompt. There won't be any data loss.
- Partition the drive. If you reduce the size of each partition, you may be able to reduce the cluster size as well. A 512 MB FAT16 partition has a default cluster size of 16 KB for example, while a 1 GB partition one of 32 KB
Martin, nothing about alternate data streams on NTFS? It’s possible for malware to hide data in them, or other programs to store data in these (for example, photo manipulation programs may hide metadata there). I don’t think this data usage would show up in a normal file listing, but it would eat up drive space like mad.
Another thing that takes up lots of drive space is THUMBS.DB files and Indexing Services files; I normally turn both of these off and occasionally hunt them down and delete them when I’m running low on disk space.
Another irritation is System Restore files; Vista and Windows 7 store system restore points in huge, monolithic files that can’t be defragmented, and unless you rein in the amount of space that can be used, System Restore points can use a huge amount of disk space (around 15% of each drive it is enabled on, IIRC).
Thank you. I have often wondered about this.
lol, I’ll put my 1080p movies on 512 MB FAT16 partitions :-P
serious additional:
Data cluster
http://en.wikipedia.org/wiki/Data_cluster
Fork (file system)
http://en.wikipedia.org/wiki/Fork_(file_system)
It depends on our needs, FAT32 file size support tops out at 4GB and volume size tops out at 2TB. This means that you’re limited to 2TB FAT32 partitions if you want to use a 4TB drive. It also means that you are limited to 4GB files. Use FAT32 only if you need to exchange files with a non-Windows system like a Mac or Linux, and as long as your file sizes are smaller than 4GB.
I’ll give you the ultimate trick even if TrueCrypt security maybe a issue the advantage of using a truecrypt volume is if you have many small files guess what your not wasting space.
Only way to stop the waste is with proper container files.
Windows has not used fat 16/32 in ages as primary storage. “Why files may take up more disk space on Windows” is also sort of a bad title since all files systems allocate there storage in some kind of cluster/inode/whatever so this happens across all file systems to some degree. Unless you have crazy amounts of storage NTFS defaults to 4k or less at which point the inefficiencies are acceptable. Also this has been going on for decades are your writing about it now?
Windows 2000 would run just fine on FAT16 or FAT32; IIRC, early editions of Windows XP would as well.
https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/choosing_between_ntfs_fat_and_fat32.mspx?mfr=true
All file systems allocate *their* (not “there”) storage in clusters; cluster sizes are usually chosen in a way that trades off between slack space (the parts of a cluster that aren’t used in the last cluster of a file) and the FAT/MFT size.