.\" $NetBSD: same.1,v 1.3 2013/07/20 21:50:54 wiz Exp $ .Dd July 14, 2004 .Dt SAME 1 .Sh NAME .Nm same .Nd link identical files to save disk space .Sh SYNOPSIS .Nm .Fl HVcdhnstuvz .Sh DESCRIPTION .Nm takes a list of files (e.g. the output of find . -type f) on stdin. Each of the files is compared against each of the others. Whenever two files are found that match exactly, the two files are linked (soft or hard) together. .Ss Goal The goal of this program is to conserve disk space when you have several different trees of large projects on your disk. By creating hardlinks or softlinks between the files that are the same, you can save lots of disk space. For example, two different versions of the Linux kernel only differ in a small number of files. By running this program you only need to store the contents of those files once. This is especially useful if you have different versions of complete trees lying around. .Ss Implementation The filesize of every file is used as an indication of wether two files can be the same. Whenever the filesizes match, the hashes of these two files are compared. Whenever these match, the file contents are compared. For every matching pair one of the two files is replaced by a hard link to the other file. With the -s option a softlink is used. .Pp To allow you to do this incrementally, the "rm" is done on the file with the least links. This allows you to "merge" a new tree with several trees that have already been processed. The new tree has link count 1, while the old tree has a higher link count for those files that are likely candidates for linkage. .Pp The current implementation keeps the "first" incantation of a file, and replaces further occurrances of the same file. This is significant when using softlinks. .Ss Options .Bl -tag -width xxxxxx .It Fl H Ar n , Fl -hashstart Ar n Start at hash value .Ar n instead of 0. .It Fl V , Fl -version Print the version of the program and exit. .It Fl c Ar file , Fl -cache Ar file Keep a cache between runs in file .Ar file . .It Fl d , Fl -debug Output some debug messages. .It Fl h , Fl -help Output this page and exit successfully. .It Fl n , Fl -dryrun Only simulate. .It Fl s , Fl -softlinks Create soft links instead of hard links. .It Fl u , Fl -user Don't relink files owned by another user. .It Fl v , Fl -verbose Output verbose messages. .It Fl z , Fl -nullfiles Link empty files, too. By default, only non-empty files are linked. .El .Sh EXIT STATUS Zero on success, non-zero on failure. .Sh EXAMPLES .Bd -literal find . -type f -print | same .Ed .Sh SEE ALSO .Xr find 1 .Sh AUTHORS .An "Roger E. Wolff" Aq Mt R.E.Wolff@BitWizard.nl , .An "Geert Uytterhoeven" Aq Mt geert@linux-m68k.org , .An "Roland Illig" Aq Mt roland.illig@gmx.de . .Sh CAVEATS .Bl -bullet .It If your editor does not move the original aside before writing a new copy, you will change the file in ALL incarnations when editing a file. Patch works just fine: it moves the original aside before creating a new copy. I'm confident that I could learn Emacs to do it this way too. I'm too lazy to figure it out, so if you happen to know an easy way how to do this, please Email me at .Aq R.E.Wolff@BitWizard.nl . .It There is a 1024 (BUFSIZE) character limit to pathnames when using symlinks. .El