File size distribution: Difference between revisions

+python
(→‎{{header|zkl}}: didn't like how I formated)
(+python)
Line 2:
 
Beginning from the current directory, or optionally from a directory specified as a command-line argument, determine how many files there are of various sizes in a directory hierarchy. My suggestion is to sort by logarithmn of file size, since a few bytes here or there, or even a factor of two or three, may not be that significant. Don't forget that empty files may exist, to serve as a marker. Is your file system predominantly devoted to a large number of smaller files, or a smaller number of huge files?
 
=={{header|Python}}==
The distribution is stored in a '''collections.Counter''' object (like a dictionnary with automatic 0 value when a key is not found, useful when incrementing). Anything could be done with this object, here the number of files is printed for increasing sizes. No check is made during the directory walk: usually, safeguards would be needed or the program will fail on any unreadable file or directory (depending on rights, or too deep paths for instance).
 
<lang python>import sys, os
from collections import Counter
 
def dodir(path):
global h
 
for name in os.listdir(path):
p = os.path.join(path, name)
 
if os.path.islink(p):
pass
elif os.path.isfile(p):
h[os.stat(p).st_size] += 1
elif os.path.isdir(p):
dodir(p)
else:
pass
 
def main(arg):
global h
h = Counter()
for dir in arg:
dodir(dir)
s = n = 0
for k, v in sorted(h.items()):
print("Size %d -> %d file(s)" % (k, v))
n += v
s += k * v
print("Total %d bytes for %d files" % (s, n))
 
main(sys.argv[1:])</lang>
 
=={{header|zkl}}==
Anonymous user