File size distribution: Difference between revisions

Simplify and accelerate UNIX shell solution massively
(Tcl simplifications)
(Simplify and accelerate UNIX shell solution massively)
Line 2:
 
;Task:
Beginning from the current directory, or optionally from a directory specified as a command-line argument, determine how many files there are of various sizes in a directory hierarchy.
 
 
My suggestion is to sort by logarithmn of file size, since a few bytes here or there, or even a factor of two or three, may not be that significant.
 
Don't forget that empty files may exist, to serve as a marker.
 
 
Line 30:
double scale;
FILE* fp;
 
if(argC==1)
printf("Usage : %s <followed by directory to start search from(. for current dir), followed by \n optional parameters (T or G) to show text or graph output>",argV[0]);
Line 43:
sprintf(commandString,"forfiles /p %s /s /c \"cmd /c echo @fsize\" 2>&1",startPath);
}
 
else if(strlen(argV[1])==1 && argV[1][0]=='.')
strcpy(commandString,"forfiles /s /c \"cmd /c echo @fsize\" 2>&1");
 
else
sprintf(commandString,"forfiles /p %s /s /c \"cmd /c echo @fsize\" 2>&1",argV[1]);
Line 58:
fileSizeLog[strlen(str)]++;
}
 
if(argC==2 || (argC==3 && (argV[2][0]=='t'||argV[2][0]=='T'))){
for(i=0;i<MAXORDER;i++){
Line 64:
}
}
 
else if(argC==3 && (argV[2][0]=='g'||argV[2][0]=='G')){
CONSOLE_SCREEN_BUFFER_INFO csbi;
Line 72:
 
max = fileSizeLog[0];
 
for(i=1;i<MAXORDER;i++)
(fileSizeLog[i]>max)?max=fileSizeLog[i]:max;
 
(max < csbi.dwSize.X)?(scale=1):(scale=(1.0*(csbi.dwSize.X-50))/max);
 
for(i=0;i<MAXORDER;i++){
printf("\nSize Order < 10^%2d bytes |",i);
Line 85:
}
}
 
}
return 0;
Line 721:
groupThreshold = round . (*0.25) . realToFrac</lang>
{{out}}
<pre style="height: 50rem;">$ filedist ~/Music
Using 4 worker threads
Total files: 688
Total folders: 663
Total size: 985.85MB
 
Distribution:
Line 731:
From <-> To Count
----------------------------------------------
0B <-> 80B = 7 1.017%: █
81B <-> 161B = 74 10.756%: ███████████
162B <-> 242B = 112 16.279%: ████████████████
243B <-> 323B = 99 14.390%: ██████████████
323B <-> 645B = 23 3.343%: ███
646B <-> 968B = 2 0.291%: ▍
969B <-> 1.26KB = 1 0.145%: ▍
3.19KB <-> 6.38KB = 12 1.744%: ██
6.38KB <-> 9.58KB = 22 3.198%: ███
9.58KB <-> 12.77KB = 12 1.744%: ██
13.52KB <-> 27.04KB = 15 2.180%: ██
27.04KB <-> 40.57KB = 6 0.872%: █
40.57KB <-> 54.09KB = 22 3.198%: ███
54.20KB <-> 108.41KB = 99 14.390%: ██████████████
108.41KB <-> 162.61KB = 23 3.343%: ███
162.61KB <-> 216.81KB = 8 1.163%: █
236.46KB <-> 472.93KB = 3 0.436%: ▍
709.39KB <-> 945.85KB = 44 6.395%: ██████
3.30MB <-> 4.96MB = 4 0.581%: █
4.96MB <-> 6.61MB = 21 3.052%: ███
6.67MB <-> 13.33MB = 72 10.465%: ██████████
13.33MB <-> 20.00MB = 6 0.872%: █
20.00MB <-> 26.66MB = 1 0.145%: ▍
 
$ filedist ~/Music 10
Using 4 worker threads
Total files: 688
Total folders: 663
Total size: 985.85MB
 
Distribution:
Line 765:
From <-> To Count
----------------------------------------------
0B <-> 88B = 7 1.017%: █
89B <-> 177B = 75 10.901%: ███████████
178B <-> 266B = 156 22.674%: ███████████████████████
267B <-> 355B = 57 8.285%: ████████
356B <-> 444B = 20 2.907%: ███
801B <-> 889B = 2 0.291%: ▍
959B <-> 1.87KB = 1 0.145%: ▍
3.75KB <-> 4.68KB = 1 0.145%: ▍
4.68KB <-> 5.62KB = 1 0.145%: ▍
5.62KB <-> 6.55KB = 11 1.599%: ██
6.56KB <-> 7.49KB = 10 1.453%: █
7.49KB <-> 8.43KB = 4 0.581%: █
8.43KB <-> 9.36KB = 7 1.017%: █
9.43KB <-> 18.85KB = 21 3.052%: ███
18.85KB <-> 28.28KB = 6 0.872%: █
28.28KB <-> 37.71KB = 4 0.581%: █
37.71KB <-> 47.13KB = 12 1.744%: ██
47.13KB <-> 56.56KB = 16 2.326%: ██
56.56KB <-> 65.99KB = 23 3.343%: ███
65.99KB <-> 75.41KB = 26 3.779%: ████
75.41KB <-> 84.84KB = 15 2.180%: ██
84.84KB <-> 94.27KB = 17 2.471%: ██
94.59KB <-> 189.17KB = 42 6.105%: ██████
189.17KB <-> 283.76KB = 4 0.581%: █
283.76KB <-> 378.35KB = 2 0.291%: ▍
851.28KB <-> 945.87KB = 44 6.395%: ██████
2.67MB <-> 5.33MB = 5 0.727%: █
5.33MB <-> 8.00MB = 41 5.959%: ██████
8.00MB <-> 10.67MB = 35 5.087%: █████
10.67MB <-> 13.33MB = 16 2.326%: ██
13.33MB <-> 16.00MB = 3 0.436%: ▍
16.00MB <-> 18.67MB = 3 0.436%: ▍
24.00MB <-> 26.66MB = 1 0.145%: ▍
</pre>
 
Line 832:
 
{{out}}
<pre>filesizes:
- between 0.0 B and 1.0 B bytes: 0
- between 1.0 B and 10.0 B bytes: 1
Line 1,039:
integer sdx = 1
while size>sizes[sdx] do
if sdx=length(sizes) then
sizes &= sizes[$]*iff(mod(length(sizes),3)?10:10.24)
res &= 0
Line 1,106:
for dir in arg:
dodir(dir)
 
s = n = 0
for k, v in sorted(h.items()):
Line 1,216:
 
=={{header|REXX}}==
This REXX version works for Microsoft Windows using the &nbsp; '''dir''' &nbsp; subcommand; &nbsp; extra code was added for
<br>older versions of Windows that used suffixes to express big numbers &nbsp; (the size of a file), &nbsp; and also versions
<br>that used a mixed case for showing the output text.
 
Also, some Windows versions of the &nbsp; '''dir''' &nbsp; command insert commas into numbers, so code was added to elide them.
Line 1,267:
/*──────────────────────────────────────────────────────────────────────────────────────*/
commas: parse arg _; do j#=length(_)-3 to 1 by -3; _=insert(',', _, j#); end; return _</lang>
This REXX program makes use of &nbsp; '''LINESIZE''' &nbsp; REXX program (or BIF) which is used to determine the screen width (or linesize) of the terminal (console) so as to maximize the width of the histogram.
 
The &nbsp; '''LINESIZE.REX''' &nbsp; REXX program is included here &nbsp; ──► &nbsp; [[LINESIZE.REX]].<br>
Line 1,604:
=={{header|UNIX Shell}}==
{{works with|Bourne Shell}}
Use POSIX conformant code unless the environment variable GNU is set to anything not empty.
Completely POSIX comformant, too.
<lang sh>#!/bin/sh
set -eu
 
if [ ${GNU:-} ]
# Use a subshell to remove the last "total" line per each ARG_MAX
then
# Truncate outside bc because fumbling with scale is a pain
find -- "${1:-.}" -type f -exec shdu -cb -- {} '+
else
wc -c -- "$@" | awk '\''{if (NR > 1) print buf; buf = $1}'\''' argv0 {} + | \
# Use a subshell to remove the last "total" line per each ARG_MAX
{
find -- "${1:-.}" -type f -exec sh -c 'wc -c -- "$@" | sed \$d' argv0 {} +
echo 'define f(x) {if (!x) return -1; return l(x)/l(10)}'
fi | awk '
sed 's/^/f(/; s/$/)/'
} | \ {
++hist[$1 ? length($1) - 1 : -1]
bc -l | \
total += $1
cut -d. -f1 | \
sort -n | \ }
END {
uniq -c | \
for (i in hist)
awk -vOFS='\t' '{ print ($2i == - 1 ? 0 : "1e" $2i), $1}'</lang>"\t" hist[i]
print "Total: " total " bytes in " NR " files"
}' | sort</lang>
{{out}}
<pre>0$ time 1~/fsd.sh
0 4
1e1 339
1e0 66
1e2 3142
1e1 66
1e3 2015
1e2 1418
1e4 150
1e3 1026
1e5 29
1e4 1564
1e6 13
1e5 60083
1e7 3</pre>
1e6 16282
1e7 3881
1e8 1444
1e9 16
Total: 612404756079 bytes in 85850 files
~/fsd.sh 0.60s user 0.98s system 134% cpu 1.182 total
$ time GNU=1 ~/fsd.sh
0 4
1e0 66
1e1 66
1e2 1418
1e3 1026
1e4 1564
1e5 60083
1e6 16282
1e7 3881
1e8 1444
1e9 16
Total: 612404756079 bytes in 85850 files
GNU=1 ~/fsd.sh 0.35s user 0.48s system 135% cpu 0.613 total</pre>
 
=={{header|zkl}}==
Line 1,667 ⟶ 1,690:
Found 4320 files, 67,627,849,052 bytes, 15,654,594 mean.
File size Number of files (* = 69.84)
n :
nn :
nnn :
n,nnn : *
nn,nnn :
nnn,nnn :
n,nnn,nnn : *
nn,nnn,nnn : **************************************************
Anonymous user