Find duplicate files

From Rosetta Code
Revision as of 09:03, 10 April 2013 by rosettacode>TobyK (Find duplicate files under a directory)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Find duplicate files is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In a large directory structure it is easy to inadvertently leave unnecessary copies of files around, which can use considerable disk space and create confusion. Create a program which, given a minimum size and a folder/directory, will find all files of at least size bytes with duplicate contents under the directory and output or show the sets of duplicate files in order of decreasing size.

The program may be comand-line or graphical. Duplicate content may be determined by direct comparison or by calculating a hash of the data. Specify which filesystems or operating systems your program works with if it has any filesystem- or OS-specific requirements. Detect and show hard links (filenames referencing the same content) if applicable for the filesystem. For extra points detect when whole directory sub-trees are identical, or optionally remove or link identical files.