Anonymous user
File size distribution: Difference between revisions
Simplify and accelerate UNIX shell solution massively
(Tcl simplifications) |
(Simplify and accelerate UNIX shell solution massively) |
||
Line 2:
;Task:
Beginning from the current directory, or optionally from a directory specified as a command-line argument, determine how many files there are of various sizes in a directory hierarchy.
My suggestion is to sort by logarithmn of file size, since a few bytes here or there, or even a factor of two or three, may not be that significant.
Don't forget that empty files may exist, to serve as a marker.
Line 30:
double scale;
FILE* fp;
if(argC==1)
printf("Usage : %s <followed by directory to start search from(. for current dir), followed by \n optional parameters (T or G) to show text or graph output>",argV[0]);
Line 43:
sprintf(commandString,"forfiles /p %s /s /c \"cmd /c echo @fsize\" 2>&1",startPath);
}
else if(strlen(argV[1])==1 && argV[1][0]=='.')
strcpy(commandString,"forfiles /s /c \"cmd /c echo @fsize\" 2>&1");
else
sprintf(commandString,"forfiles /p %s /s /c \"cmd /c echo @fsize\" 2>&1",argV[1]);
Line 58:
fileSizeLog[strlen(str)]++;
}
if(argC==2 || (argC==3 && (argV[2][0]=='t'||argV[2][0]=='T'))){
for(i=0;i<MAXORDER;i++){
Line 64:
}
}
else if(argC==3 && (argV[2][0]=='g'||argV[2][0]=='G')){
CONSOLE_SCREEN_BUFFER_INFO csbi;
Line 72:
max = fileSizeLog[0];
for(i=1;i<MAXORDER;i++)
(fileSizeLog[i]>max)?max=fileSizeLog[i]:max;
(max < csbi.dwSize.X)?(scale=1):(scale=(1.0*(csbi.dwSize.X-50))/max);
for(i=0;i<MAXORDER;i++){
printf("\nSize Order < 10^%2d bytes |",i);
Line 85:
}
}
}
return 0;
Line 721:
groupThreshold = round . (*0.25) . realToFrac</lang>
{{out}}
<pre style="height: 50rem;">$ filedist ~/Music
Using 4 worker threads
Total files: 688
Total folders: 663
Total size: 985.85MB
Distribution:
Line 731:
From <-> To Count
----------------------------------------------
0B <-> 80B = 7 1.017%: █
81B <-> 161B = 74 10.756%: ███████████
162B <-> 242B = 112 16.279%: ████████████████
243B <-> 323B = 99 14.390%: ██████████████
323B <-> 645B = 23 3.343%: ███
646B <-> 968B = 2 0.291%: ▍
969B <-> 1.26KB = 1 0.145%: ▍
3.19KB <-> 6.38KB = 12 1.744%: ██
6.38KB <-> 9.58KB = 22 3.198%: ███
9.58KB <-> 12.77KB = 12 1.744%: ██
13.52KB <-> 27.04KB = 15 2.180%: ██
27.04KB <-> 40.57KB = 6 0.872%: █
40.57KB <-> 54.09KB = 22 3.198%: ███
54.20KB <-> 108.41KB = 99 14.390%: ██████████████
108.41KB <-> 162.61KB = 23 3.343%: ███
162.61KB <-> 216.81KB = 8 1.163%: █
236.46KB <-> 472.93KB = 3 0.436%: ▍
709.39KB <-> 945.85KB = 44 6.395%: ██████
3.30MB <-> 4.96MB = 4 0.581%: █
4.96MB <-> 6.61MB = 21 3.052%: ███
6.67MB <-> 13.33MB = 72 10.465%: ██████████
13.33MB <-> 20.00MB = 6 0.872%: █
20.00MB <-> 26.66MB = 1 0.145%: ▍
$ filedist ~/Music 10
Using 4 worker threads
Total files: 688
Total folders: 663
Total size: 985.85MB
Distribution:
Line 765:
From <-> To Count
----------------------------------------------
0B <-> 88B = 7 1.017%: █
89B <-> 177B = 75 10.901%: ███████████
178B <-> 266B = 156 22.674%: ███████████████████████
267B <-> 355B = 57 8.285%: ████████
356B <-> 444B = 20 2.907%: ███
801B <-> 889B = 2 0.291%: ▍
959B <-> 1.87KB = 1 0.145%: ▍
3.75KB <-> 4.68KB = 1 0.145%: ▍
4.68KB <-> 5.62KB = 1 0.145%: ▍
5.62KB <-> 6.55KB = 11 1.599%: ██
6.56KB <-> 7.49KB = 10 1.453%: █
7.49KB <-> 8.43KB = 4 0.581%: █
8.43KB <-> 9.36KB = 7 1.017%: █
9.43KB <-> 18.85KB = 21 3.052%: ███
18.85KB <-> 28.28KB = 6 0.872%: █
28.28KB <-> 37.71KB = 4 0.581%: █
37.71KB <-> 47.13KB = 12 1.744%: ██
47.13KB <-> 56.56KB = 16 2.326%: ██
56.56KB <-> 65.99KB = 23 3.343%: ███
65.99KB <-> 75.41KB = 26 3.779%: ████
75.41KB <-> 84.84KB = 15 2.180%: ██
84.84KB <-> 94.27KB = 17 2.471%: ██
94.59KB <-> 189.17KB = 42 6.105%: ██████
189.17KB <-> 283.76KB = 4 0.581%: █
283.76KB <-> 378.35KB = 2 0.291%: ▍
851.28KB <-> 945.87KB = 44 6.395%: ██████
2.67MB <-> 5.33MB = 5 0.727%: █
5.33MB <-> 8.00MB = 41 5.959%: ██████
8.00MB <-> 10.67MB = 35 5.087%: █████
10.67MB <-> 13.33MB = 16 2.326%: ██
13.33MB <-> 16.00MB = 3 0.436%: ▍
16.00MB <-> 18.67MB = 3 0.436%: ▍
24.00MB <-> 26.66MB = 1 0.145%: ▍
</pre>
Line 832:
{{out}}
<pre>filesizes:
- between 0.0 B and 1.0 B bytes: 0
- between 1.0 B and 10.0 B bytes: 1
Line 1,039:
integer sdx = 1
while size>sizes[sdx] do
if sdx=length(sizes) then
sizes &= sizes[$]*iff(mod(length(sizes),3)?10:10.24)
res &= 0
Line 1,106:
for dir in arg:
dodir(dir)
s = n = 0
for k, v in sorted(h.items()):
Line 1,216:
=={{header|REXX}}==
This REXX version works for Microsoft Windows using the '''dir''' subcommand; extra code was added for
<br>older versions of Windows that used suffixes to express big numbers (the size of a file), and also versions
<br>that used a mixed case for showing the output text.
Also, some Windows versions of the '''dir''' command insert commas into numbers, so code was added to elide them.
Line 1,267:
/*──────────────────────────────────────────────────────────────────────────────────────*/
commas: parse arg _; do j#=length(_)-3 to 1 by -3; _=insert(',', _, j#); end; return _</lang>
This REXX program makes use of '''LINESIZE''' REXX program (or BIF) which is used to determine the screen width (or linesize) of the terminal (console) so as to maximize the width of the histogram.
The '''LINESIZE.REX''' REXX program is included here ──► [[LINESIZE.REX]].<br>
Line 1,604:
=={{header|UNIX Shell}}==
{{works with|Bourne Shell}}
Use POSIX conformant code unless the environment variable GNU is set to anything not empty.
<lang sh>#!/bin/sh
set -eu
if [ ${GNU:-} ]
# Use a subshell to remove the last "total" line per each ARG_MAX▼
then
find -- "${1:-.}" -type f -exec
else
▲ # Use a subshell to remove the last "total" line per each ARG_MAX
find -- "${1:-.}" -type f -exec sh -c 'wc -c -- "$@" | sed \$d' argv0 {} +
fi | awk '
++hist[$1 ? length($1) - 1 : -1]
total += $1
▲ END {
for (i in hist)
print "Total: " total " bytes in " NR " files"
}' | sort</lang>
{{out}}
<pre>
0 4
1e0 66
1e1 66
1e2 1418
1e3 1026
1e4 1564
1e5 60083
1e6 16282
1e7 3881
1e8 1444
1e9 16
Total: 612404756079 bytes in 85850 files
~/fsd.sh 0.60s user 0.98s system 134% cpu 1.182 total
$ time GNU=1 ~/fsd.sh
0 4
1e0 66
1e1 66
1e2 1418
1e3 1026
1e4 1564
1e5 60083
1e6 16282
1e7 3881
1e8 1444
1e9 16
Total: 612404756079 bytes in 85850 files
GNU=1 ~/fsd.sh 0.35s user 0.48s system 135% cpu 0.613 total</pre>
=={{header|zkl}}==
Line 1,667 ⟶ 1,690:
Found 4320 files, 67,627,849,052 bytes, 15,654,594 mean.
File size Number of files (* = 69.84)
n :
nn :
nnn :
n,nnn : *
nn,nnn :
nnn,nnn :
n,nnn,nnn : *
nn,nnn,nnn : **************************************************
|