File size distribution: Difference between revisions

← Older edit

File size distribution (view source)

Revision as of 04:58, 23 April 2024

22,611 bytes added , 1 month ago

→‎{{header|jq}}

Peak

2,455

edits

Revision as of 16:56, 21 April 2021 (view source) rosettacode>Tclfan (Simplify and accelerate UNIX shell solution massively) ← Older edit		Latest revision as of 04:58, 23 April 2024 (view source) Peak (talk \| contribs) (→‎{{header\|jq}})
(17 intermediate revisions by 10 users not shown)
Line 12: Is your file system predominantly devoted to a large number of smaller files, or a smaller number of huge files? <br><br> =={{header\|Action!}}== DOS 2.5 returns file size in number of sectors. {{libheader\|Action! Tool Kit}} <syntaxhighlight lang="action!">INCLUDE "D2:PRINTF.ACT" ;from the Action! Tool Kit PROC SizeDistribution(CHAR ARRAY filter INT ARRAY limits,counts BYTE count) CHAR ARRAY line(255),tmp(4) INT size BYTE i,dev=[1] FOR i=0 TO count-1 DO counts(i)=0 OD Close(dev) Open(dev,filter,6) DO InputSD(dev,line) IF line(0)=0 THEN EXIT FI SCopyS(tmp,line,line(0)-3,line(0)) size=ValI(tmp) FOR i=0 TO count-1 DO IF size<limits(i) THEN counts(i)==+1 EXIT FI OD OD Close(dev) RETURN PROC GenerateLimits(INT ARRAY limits BYTE count) BYTE i INT l l=1 FOR i=0 TO count-1 DO limits(i)=l l==LSH 1 IF l>1000 THEN l=1000 FI OD RETURN PROC PrintBar(INT len,max,size) INT i,count count=4lensize/max IF count=0 AND len>0 THEN count=1 FI FOR i=0 TO count/4-1 DO Put(160) OD i=count MOD 4 IF i=1 THEN Put(22) ELSEIF i=2 THEN Put(25) ELSEIF i=3 THEN Put(130) FI RETURN PROC PrintResult(CHAR ARRAY filter INT ARRAY limits,counts BYTE count) BYTE i CHAR ARRAY tmp(5) INT min,max,total total=0 max=0 FOR i=0 TO count-1 DO total==+counts(i) IF counts(i)>max THEN max=counts(i) FI OD PrintF("File size distribution of ""%S"" in sectors:%E",filter) PutE() PrintE("From To Count Perc") min=0 FOR i=0 TO count-1 DO StrI(min,tmp) PrintF("%4S ",tmp) StrI(limits(i)-1,tmp) PrintF("%3S ",tmp) StrI(counts(i),tmp) PrintF("%3S ",tmp) StrI(counts(i)100/total,tmp) PrintF("%3S%% ",tmp) PrintBar(counts(i),max,17) PutE() min=limits(i) OD RETURN PROC Main() DEFINE LIMITCOUNT="11" CHAR ARRAY filter="H1:." INT ARRAY limits(LIMITCOUNT),counts(LIMITCOUNT) Put(125) PutE() ;clear the screen GenerateLimits(limits,LIMITCOUNT) SizeDistribution(filter,limits,counts,LIMITCOUNT) PrintResult(filter,limits,counts,LIMITCOUNT) RETURN</syntaxhighlight> {{out}} [https://gitlab.com/amarok8bit/action-rosetta-code/-/raw/master/images/File_size_distribution.png Screenshot from Atari 8-bit computer] <pre> File size distribution of "H1:." in sectors: From To Count Perc 0 0 2 0% ▌ 1 1 20 3% █▌ 2 3 44 8% ███▌ 4 7 195 37% █████████████████ 8 15 183 35% ███████████████▌ 16 31 67 12% █████▌ 32 63 6 1% ▌ 64 127 0 0% 128 255 0 0% 256 511 0 0% 512 999 1 0% ▌ </pre> =={{header\|Ada}}== {{libheader\|Dir_Iterators}} <syntaxhighlight lang="ada">with Ada.Numerics.Elementary_Functions; with Ada.Directories; use Ada.Directories; with Ada.Strings.Fixed; use Ada.Strings; with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_IO; use Ada.Text_IO; with Dir_Iterators.Recursive; procedure File_Size_Distribution is type Exponent_Type is range 0 .. 18; type File_Count is range 0 .. Long_Integer'Last; Counts : array (Exponent_Type) of File_Count := (others => 0); Non_Zero_Index : Exponent_Type := 0; Directory_Name : constant String := (if Argument_Count = 0 then "." else Argument (1)); Directory_Walker : Dir_Iterators.Recursive.Recursive_Dir_Walk := Dir_Iterators.Recursive.Walk (Directory_Name); begin if not Exists (Directory_Name) or else Kind (Directory_Name) /= Directory then Put_Line ("Directory does not exist"); return; end if; for Directory_Entry of Directory_Walker loop declare use Ada.Numerics.Elementary_Functions; Size_Of_File : File_Size; Exponent : Exponent_Type; begin if Kind (Directory_Entry) = Ordinary_File then Size_Of_File := Size (Directory_Entry); if Size_Of_File = 0 then Counts (0) := Counts (0) + 1; else Exponent := Exponent_Type (Float'Ceiling (Log (Float (Size_Of_File), Base => 10.0))); Counts (Exponent) := Counts (Exponent) + 1; end if; end if; end; end loop; for I in reverse Counts'Range loop if Counts (I) /= 0 then Non_Zero_Index := I; exit; end if; end loop; for I in Counts'First .. Non_Zero_Index loop Put ("Less than 10"); Put (Fixed.Trim (Exponent_Type'Image (I), Side => Left)); Put (": "); Put (File_Count'Image (Counts (I))); New_Line; end loop; end File_Size_Distribution;</syntaxhighlight> {{out}} <pre>Less than 100: 8 Less than 101: 0 Less than 102: 18 Less than 103: 88 Less than 104: 39 Less than 105: 8 Less than 106: 2 Less than 107: 1</pre> =={{header\|C}}== The platform independent way to get the file size in C involves opening every file and reading the size. The implementation below works for Windows and utilizes command scripts to get size information quickly even for a large number of files, recursively traversing a large number of directories. Both textual and graphical ( ASCII ) outputs are shown. The same can be done for Linux by a combination of the find, ls and stat commands and my plan was to make it work on both OS types, but I don't have access to a Linux system right now. This would also mean either abandoning scaling the graphical output in order to fit the console buffer or porting that as well, thus including windows.h selectively. ===Windows=== <syntaxhighlight lang="c"> ~~<lang C>~~ #include<windows.h> #include<string.h> Line 90 ⟶ 284: } } </syntaxhighlight> ~~</lang>~~ Invocation and textual output : <pre> Line 156 ⟶ 350: {{libheader\|POSIX}} This works on macOS 10.15. It should be OK for Linux as well. <~~lang~~syntaxhighlight lang="c">#include <ftw.h> #include <locale.h> #include <stdint.h> Line 203 ⟶ 397: printf("Total file size: %'lu\n", total_size); return EXIT_SUCCESS; }</~~lang~~syntaxhighlight> {{out}} Line 223 ⟶ 417: =={{header\|C++}}== <~~lang~~syntaxhighlight lang="cpp">#include <algorithm> #include <array> #include <filesystem> Line 274 ⟶ 468: } return EXIT_SUCCESS; }</~~lang~~syntaxhighlight> {{out}} Line 297 ⟶ 491: {{libheader\| Winapi.Windows}} {{Trans\|Go}} <syntaxhighlight lang="delphi"> ~~<lang Delphi>~~ program File_size_distribution; Line 404 ⟶ 598: fileSizeDistribution('.'); readln; end.</~~lang~~syntaxhighlight> =={{header\|Factor}}== {{works with\|Factor\|0.99 2020-03-02}} <~~lang~~syntaxhighlight lang="factor">USING: accessors assocs formatting io io.directories.search io.files.types io.pathnames kernel math math.functions math.statistics namespaces sequences ; Line 421 ⟶ 615: current-directory get file-size-histogram dup [ "Count of files < 10^%d bytes: %4d\n" printf ] assoc-each nl values sum "Total files: %d\n" printf</~~lang~~syntaxhighlight> {{out}} <pre> Line 440 ⟶ 634: =={{header\|Go}}== {{trans\|Kotlin}} <~~lang~~syntaxhighlight lang="go">package main import ( Line 511 ⟶ 705: func main() { fileSizeDistribution("./") }</~~lang~~syntaxhighlight> {{out}} Line 539 ⟶ 733: Uses a grouped frequency distribution. Program arguments are optional. Arguments include starting directory and initial frequency distribution group size. After the first frequency distribution is computed it further breaks it down for any group that exceeds 25% of the total file count, when possible. </p> <~~lang~~syntaxhighlight lang="haskell">{-# LANGUAGE LambdaCase #-} import Control.Concurrent (forkIO, setNumCapabilities) Line 719 ⟶ 913: mapM_ (displayFrequency fileCount) $ Map.assocs results where groupThreshold = round . (0.25) . realToFrac</~~lang~~syntaxhighlight> {{out}} <pre style="height: 50rem;">$ filedist ~/Music Line 798 ⟶ 992: 16.00MB <-> 18.67MB = 3 0.436%: ▍ 24.00MB <-> 26.66MB = 1 0.145%: ▍ </pre> =={{header\|J}}== We can get file sizes of all files under a specific path by inspecting the last column from dirtree. For example, the sizes of the files under the user's home directory would be <tt>;{:\|:dirtree '~'</tt> From there, we can bucket them by factors of ten, then display the limiting size of each bucket along with the number of files contained (we'll sort them, for legibility): <syntaxhighlight lang="j"> ((10x^~.),.#/.~) <.10 ^.1>. /:~;{:\|:dirtree '~' 1 2 10 8 100 37 1000 49 10000 20 100000 9 1000000 4 10000000 4</syntaxhighlight> =={{header\|Java}}== <syntaxhighlight lang="java"> import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.util.HashMap; import java.util.List; import java.util.Map; public final class FileSizeDistribution { public static void main(String[] aArgs) throws IOException { List<Path> fileNames = Files.list(Path.of(".")) .filter( file -> ! Files.isDirectory(file) ) .map(Path::getFileName) .toList(); Map<Integer, Integer> fileSizes = new HashMap<Integer, Integer>(); for ( Path path : fileNames ) { fileSizes.merge(String.valueOf(Files.size(path)).length(), 1, Integer::sum); } final int fileCount = fileSizes.values().stream().mapToInt(Integer::valueOf).sum(); System.out.println("File size distribution for directory \".\":" + System.lineSeparator()); System.out.println("File size in bytes \| Number of files \| Percentage"); System.out.println("-------------------------------------------------"); for ( int key : fileSizes.keySet() ) { final int value = fileSizes.get(key); System.out.println(String.format("%s%d%s%d%15d%15.1f%%", " 10^", ( key - 1 ), " to 10^", key, value, ( 100.0 * value ) / fileCount)); } } } </syntaxhighlight> {{ out }} <pre> File size distribution for directory ".": File size in bytes \| Number of files \| Percentage ------------------------------------------------- 10^0 to 10^1 1 0.2% 10^1 to 10^2 1 0.2% 10^2 to 10^3 5 1.1% 10^3 to 10^4 3 0.6% 10^4 to 10^5 161 34.0% 10^5 to 10^6 196 41.4% 10^6 to 10^7 98 20.7% 10^7 to 10^8 9 1.9% </pre> =={{header\|jq}}== '''Works with jq, the C implementation of jq''' '''Works with gojq, the Go implementation of jq''' '''Works with jaq, the Rust implementation of jq''' This entry illustrates how jq plays nicely with other command-line tools; in this case jc (https://kellyjonbrazil.github.io/jc) is used to JSONify the output of `ls -Rl`. (jq could also be used to parse the raw output of `ls`, but it would no doubt be tricky to achieve portability.) The invocation of jc and jq would be along the following lines: <pre> jc --ls -lR \| jq -c -f file-size-distribution.jq </pre> In the present case, the output from the call to `histogram` is a stream of [category, count] pairs beginning with [0, _] showing the number of files of size 0; thereafter, the boundaries of the categories are defined logarithmically, i.e. a file of size of $n is assigned to the category `1 + ($n \| log10 \| trunc)`. The output shown below for an actual directory tree suggests a unimodal distribution of file sizes. <syntaxhighlight lang="jq"> # bag of words def bow(stream): reduce stream as $word ({}; .[($word\|tostring)] += 1); # `stream` is expected to be a stream of non-negative numbers or numeric strings. # The output is a stream of [bucket, count] pairs, sorted by the value of `bucket`. # No sorting except for the sorting of these bucket boundaries takes place. def histogram(stream): bow(stream) \| to_entries \| map( [(.key \| tonumber), .value] ) \| sort_by(.[0]) \| .[]; histogram(.[] \| .size \| if . == 0 then 0 else 1 + (log10 \| trunc) end) </syntaxhighlight> {{output}} <pre> [0,9] [1,67] [2,616] [3,6239] [4,3679] [5,213] [6,56] [7,40] [8,20] [9,4] [10,1] </pre> Line 803 ⟶ 1,124: {{works with\|Julia\|0.6}} <~~lang~~syntaxhighlight lang="julia">using Humanize function sizelist(path::AbstractString) Line 829 ⟶ 1,150: end main(".")</~~lang~~syntaxhighlight> {{out}} Line 847 ⟶ 1,168: =={{header\|Kotlin}}== <~~lang~~syntaxhighlight lang="scala">// version 1.2.10 import java.io.File Line 894 ⟶ 1,215: fun main(args: Array<String>) { fileSizeDistribution("./") // current directory }</~~lang~~syntaxhighlight> {{out}} Line 919 ⟶ 1,240: Number of inaccessible files : 0 </pre> =={{header\|Lang}}== {{libheader\|lang-io-module}} <syntaxhighlight lang="lang"> # Load the IO module # Replace "<pathToIO.lm>" with the location where the io.lm Lang module was installed to without "<" and ">" ln.loadModule(<pathToIO.lm>) fp.fileSizeDistribution = (&sizes, $[totalSize], $file) -> { if([[io]]::fp.isDirectory($file)) { &fileNames = [[io]]::fp.listFilesAndDirectories($file) $path = [[io]]::fp.getCanonicalPath($file) if($path == /) { $path = \e } $fileName foreach($[fileName], &fileNames) { $innerFile = [[io]]::fp.openFile($path/$fileName) $innerTotalSize = 0L fp.fileSizeDistribution(&sizes, $innerTotalSize, $innerFile) $totalSize += $innerTotalSize [[io]]::fp.closeFile($innerFile) } }else { $len = [[io]]::fp.getSize($file) if($len == null) { return } $totalSize += $len if($len == 0) { &sizes[0] += 1 }else { $index = fn.int(fn.log10($len)) &sizes[$index] += 1 } } } $path $= @&LANG_ARGS == 1?&LANG_ARGS[0]:{{{./}}} &sizes = fn.arrayMake(12) fn.arraySetAll(&sizes, 0) $file = [[io]]::fp.openFile($path) $totalSize = 0L fp.fileSizeDistribution(&sizes, $totalSize, $file) [[io]]::fp.closeFile($file) fn.println(File size distribution for "$path":) $i repeat($[i], @&sizes) { fn.printf(10 ^% 3d bytes: %d%n, $i, parser.op(&sizes[$i])) } fn.println(Number of files: fn.arrayReduce(&sizes, 0, fn.add)) fn.println(Total file size: $totalSize) </syntaxhighlight> =={{header\|Mathematica}} / {{header\|Wolfram Language}}== <syntaxhighlight lang="mathematica">SetDirectory[NotebookDirectory[]]; Histogram[FileByteCount /@ Select[FileNames[__], DirectoryQ /* Not], {"Log", 15}, {"Log", "Count"}]</syntaxhighlight> =={{header\|Nim}}== <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import math, os, strformat const Line 967 ⟶ 1,357: echo fmt"Size in {rangeString: 14} {count:>7} {100 * count / total:5.2f}%" echo "" echo "Total number of files: ", sum(counts)</~~lang~~syntaxhighlight> {{out}} Line 989 ⟶ 1,379: =={{header\|Perl}}== {{trans\|Raku}} <~~lang~~syntaxhighlight lang="perl">use File::Find; use List::Util qw(max); Line 1,016 ⟶ 1,406: sub fsize { $fsize{ log10( (lstat($_))[7] ) }++ } sub log10 { my($s) = @_; $s ? int log($s)/log(10) : 0 }</~~lang~~syntaxhighlight> {{out}} <pre>File size distribution in bytes for directory: . Line 1,030 ⟶ 1,420: =={{header\|Phix}}== Works on Windows and Linux. Uses "proper" sizes, ie 1MB==1024KB. Can be quite slow at first, but is pretty fast on the second and subsequent runs, that is once the OS has cached its (low-level) directory reads. <!--<syntaxhighlight lang="phix">(notonline)--> ~~<lang Phix>sequence sizes = {1},~~ <span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span> ~~res = {0}~~ <span style="color: #004080;">sequence</span> <span style="color: #000000;">sizes</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">1</span><span style="color: #0000FF;">},</span> ~~atom t1 = time()+1~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">0</span><span style="color: #0000FF;">}</span> <span style="color: #004080;">atom</span> <span style="color: #000000;">t1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">time</span><span style="color: #0000FF;">()+</span><span style="color: #000000;">1</span> ~~function store_res(string filepath, sequence dir_entry)~~ ~~if not find('d', dir_entry[D_ATTRIBUTES]) then~~ <span style="color: #008080;">function</span> <span style="color: #000000;">store_res</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">filepath</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">sequence</span> <span style="color: #000000;">dir_entry</span><span style="color: #0000FF;">)</span> ~~atom size = dir_entry[D_SIZE]~~ <span style="color: #008080;">if</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #008000;">'d'</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">dir_entry</span><span style="color: #0000FF;">[</span><span style="color: #004600;">D_ATTRIBUTES</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">then</span> ~~integer sdx = 1~~ <span style="color: #004080;">atom</span> <span style="color: #000000;">size</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">dir_entry</span><span style="color: #0000FF;">[</span><span style="color: #004600;">D_SIZE</span><span style="color: #0000FF;">]</span> ~~while size>sizes[sdx] do~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">sdx</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span> ~~if sdx=length(sizes) then~~ <span style="color: #008080;">while</span> <span style="color: #000000;">size</span><span style="color: #0000FF;">></span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sdx</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">do</span> ~~sizes &= sizes[$]iff(mod(length(sizes),3)?10:10.24)~~ <span style="color: #008080;">if</span> <span style="color: #000000;">sdx</span><span style="color: #0000FF;">=</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span> ~~res &= 0~~ <span style="color: #000000;">sizes</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">sizes</span><span style="color: #0000FF;">[$]</span><span style="color: #008080;">iff</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">mod</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">),</span><span style="color: #000000;">3</span><span style="color: #0000FF;">)?</span><span style="color: #000000;">10</span><span style="color: #0000FF;">:</span><span style="color: #000000;">10.24</span><span style="color: #0000FF;">)</span> ~~end if~~ <span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">0</span> ~~sdx += 1~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end while~~ <span style="color: #000000;">sdx</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~res[sdx] += 1~~ <span style="color: #008080;">end</span> <span style="color: #008080;">while</span> ~~if time()>t1 then~~ <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">sdx</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> ~~printf(1,"%,d files found\r",sum(res))~~ <span style="color: #008080;">if</span> <span style="color: #7060A8;">time</span><span style="color: #0000FF;">()></span><span style="color: #000000;">t1</span> <span style="color: #008080;">then</span> ~~t1 = time()+1~~ <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%,d files found\r"</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">))</span> ~~end if~~ <span style="color: #000000;">t1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">time</span><span style="color: #0000FF;">()+</span><span style="color: #000000;">1</span> ~~end if~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~return 0 -- keep going~~ <span style="color: #008080;">end</span> <span style="color: #008080;">if</span> ~~end function~~ <span style="color: #008080;">return</span> <span style="color: #000000;">0</span> <span style="color: #000080;font-style:italic;">-- keep going</span> ~~integer exit_code = walk_dir(".", routine_id("store_res"), true)~~ <span style="color: #008080;">end</span> <span style="color: #008080;">function</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">exit_code</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">walk_dir</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"."</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">store_res</span><span style="color: #0000FF;">,</span> <span style="color: #004600;">true</span><span style="color: #0000FF;">)</span> ~~printf(1,"%,d files found\n",sum(res))~~ ~~integer w = max(res)~~ <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%,d files found\n"</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">))</span> ~~include builtins/pfile.e~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">w</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">max</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> ~~for i=1 to length(res) do~~ <span style="color: #000080;font-style:italic;">--include builtins/pfile.e</span> ~~integer ri = res[i]~~ <span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span> ~~string s = file_size_k(sizes[i], 5),~~ <span style="color: #004080;">integer</span> <span style="color: #000000;">ri</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> ~~p = repeat('',floor(60ri/w))~~ <span style="color: #004080;">string</span> <span style="color: #000000;">s</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">file_size_k</span><span style="color: #0000FF;">(</span><span style="color: #000000;">sizes</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span> <span style="color: #000000;">5</span><span style="color: #0000FF;">),</span> ~~printf(1,"files < %s: %s%,d\n",{s,p,ri})~~ <span style="color: #000000;">p</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #008000;">''</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">floor</span><span style="color: #0000FF;">(</span><span style="color: #000000;">60</span><span style="color: #0000FF;"></span><span style="color: #000000;">ri</span><span style="color: #0000FF;">/</span><span style="color: #000000;">w</span><span style="color: #0000FF;">))</span> ~~end for</lang>~~ <span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"files < %s: %s%,d\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span><span style="color: #000000;">p</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ri</span><span style="color: #0000FF;">})</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <!--</syntaxhighlight>--> {{out}} <pre> Line 1,083 ⟶ 1,476: The distribution is stored in a '''collections.Counter''' object (like a dictionary with automatic 0 value when a key is not found, useful when incrementing). Anything could be done with this object, here the number of files is printed for increasing sizes. No check is made during the directory walk: usually, safeguards would be needed or the program will fail on any unreadable file or directory (depending on rights, or too deep paths, for instance). Here links are skipped, so it should avoid cycles. <~~lang~~syntaxhighlight lang="python">import sys, os from collections import Counter Line 1,114 ⟶ 1,507: print("Total %d bytes for %d files" % (s, n)) main(sys.argv[1:])</~~lang~~syntaxhighlight> =={{header\|Racket}}== <~~lang~~syntaxhighlight lang="racket">#lang racket (define (file-size-distribution (d (current-directory)) #:size-group-function (sgf values)) Line 1,145 ⟶ 1,538: (module+ test (call-with-values (λ () (file-size-distribution #:size-group-function log10-or-so)) (report-fsd log10-or-so)))</~~lang~~syntaxhighlight> {{out}} Line 1,163 ⟶ 1,556: By default, process the current and all readable sub-directories, or, pass in a directory path at the command line. <syntaxhighlight lang="raku" ~~perl6~~line>sub MAIN($dir = '.') { sub log10 (Int $s) { $s ?? $s.log(10).Int !! 0 } my %fsize; Line 1,188 ⟶ 1,581: my ($end, $bar) = $scaled.polymod(8); (@blocks[8] x $bar * 8) ~ (@blocks[$end] if $end) ~ "\n" }</~~lang~~syntaxhighlight> {{out}} Line 1,221 ⟶ 1,614: Also, some Windows versions of the   '''dir'''   command insert commas into numbers, so code was added to elide them. <~~lang~~syntaxhighlight lang="rexx">/REXX program displays a histogram of filesize distribution of a directory structure(s)/ numeric digits 30 /ensure enough decimal digits for a #./ parse arg ds . /obtain optional argument from the CL./ Line 1,266 ⟶ 1,659: exit /stick a fork in it, we're all done. / /──────────────────────────────────────────────────────────────────────────────────────/ commas: parse arg _; do j#=length(_)-3 to 1 by -3; _=insert(',', _, j#); end; return _</~~lang~~syntaxhighlight> This REXX program makes use of   '''LINESIZE'''   REXX program (or BIF) which is used to determine the screen width (or linesize) of the terminal (console) so as to maximize the width of the histogram. Line 1,331 ⟶ 1,724: {{libheader\|walkdir}} {{works with\|Rust\|2018}} <~~lang~~syntaxhighlight lang="rust"> use std::error::Error; use std::marker::PhantomData; Line 1,508 ⟶ 1,901: } } </syntaxhighlight> ~~</lang>~~ {{out}} <pre> Line 1,528 ⟶ 1,921: =={{header\|Sidef}}== <~~lang~~syntaxhighlight lang="ruby">func traverse(Block callback, Dir dir) { dir.open(\var dir_h) \|\| return nil Line 1,556 ⟶ 1,949: } say "Total: #{total_size} bytes in #{files_num} files"</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,573 ⟶ 1,966: =={{header\|Tcl}}== This is with the '''fileutil::traverse''' package from Tcllib to do the tree walking, a '''glob''' based alternative ignoring links but not hidden files is possible but would add a dozen of lines. <~~lang~~syntaxhighlight lang="tcl">package require fileutil::traverse namespace path {::tcl::mathfunc ::tcl::mathop} Line 1,591 ⟶ 1,984: foreach key [lsort -int [dict keys $hist]] { puts "[? {$key == -1} 0 {1e$key}]\t[dict get $hist $key]" }</~~lang~~syntaxhighlight> {{out}} <pre>0 1 Line 1,605 ⟶ 1,998: {{works with\|Bourne Shell}} Use POSIX conformant code unless the environment variable GNU is set to anything not empty. <~~lang~~syntaxhighlight lang="sh">#!/bin/sh set -eu tabs -8 if [ ${GNU:-} ] then Line 1,614 ⟶ 2,008: # Use a subshell to remove the last "total" line per each ARG_MAX find -- "${1:-.}" -type f -exec sh -c 'wc -c -- "$@" \| sed \$d' argv0 {} + fi \| awk -vOFS='\t' ' BEGIN {split("KB MB GB TB PB", u); u[0] = "B"} { ++hist[$1 ? length($1) - 1 : -1] Line 1,620 ⟶ 2,015: } END { max = -2 for (i in hist) ~~print~~max = (i ==> -1max ? 0i : ~~"1e" i~~max) ~~"\t" hist[i]~~ ~~print "Total: " total " bytes in " NR " files"~~ print "From", "To", "Count\n" ~~}' \| sort</lang>~~ for (i = -1; i <= max; ++i) { if (i in hist) { if (i == -1) print "0B", "0B", hist[i] else print 10 (i % 3) u[int(i / 3)], 10 ((i + 1) % 3) u[int((i + 1) / 3)], hist[i] } } l = length(total) - 1 printf "\nTotal: %.1f %s in %d files\n", total / (10 ** l), u[int(l / 3)], NR }'</syntaxhighlight> {{out}} <pre>$ time ~/fsd.sh 0From To 4 Count ~~1e0 66~~ ~~1e1~~0B 66 0B 13 ~~1e2~~1B ~~1418~~ 10B 74 ~~1e3~~10B ~~1026~~100B 269 ~~1e4~~100B 1KB ~~1564~~ 5894 ~~1e5~~1KB ~~60083~~10KB 12727 ~~1e6~~10KB 100KB ~~16282~~ 12755 ~~1e7~~100KB 1MB ~~3881~~ 110922 ~~1e8~~1MB ~~1444~~10MB 50019 ~~1e9~~10MB 100MB 16 17706 100MB 1GB 5056 ~~Total: 612404756079 bytes in 85850 files~~ 1GB 10GB 1139 ~~~/fsd.sh 0.60s user 0.98s system 134% cpu 1.182 total~~ 10GB 100GB 141 100GB 1TB 1 Total: 8.9 TB in 216716 files ~/fsd.sh 1.28s user 2.55s system 134% cpu 2.842 total $ time GNU=1 ~/fsd.sh 0From To 4 Count ~~1e0 66~~ ~~1e1~~0B 66 0B 13 ~~1e2~~1B ~~1418~~ 10B 74 ~~1e3~~10B ~~1026~~100B 269 ~~1e4~~100B 1KB ~~1564~~ 5894 ~~1e5~~1KB ~~60083~~10KB 12727 ~~1e6~~10KB 100KB ~~16282~~ 12755 ~~1e7~~100KB 1MB ~~3881~~ 110922 ~~1e8~~1MB ~~1444~~10MB 50019 ~~1e9~~10MB 100MB 16 17706 100MB 1GB 5056 ~~Total: 612404756079 bytes in 85850 files~~ 1GB 10GB 1139 ~~GNU=1 ~/fsd.sh 0.35s user 0.48s system 135% cpu 0.613 total</pre>~~ 10GB 100GB 141 100GB 1TB 1 Total: 8.9 TB in 216716 files GNU=1 ~/fsd.sh 0.81s user 1.33s system 135% cpu 1.586 total</pre> =={{header\|Wren}}== {{libheader\|Wren-math}} {{libheader\|Wren-fmt}} <syntaxhighlight lang="wren">import "io" for Directory, File, Stat import "os" for Process import "./math" for Math import "./fmt" for Fmt var sizes = List.filled(12, 0) var totalSize = 0 var numFiles = 0 var numDirs = 0 var fileSizeDist // recursive function fileSizeDist = Fn.new { \|path\| var files = Directory.list(path) for (file in files) { var path2 = "%(path)/%(file)" var stat = Stat.path(path2) if (stat.isFile) { numFiles = numFiles + 1 var size = stat.size if (size == 0) { sizes[0] = sizes[0] + 1 } else { totalSize = totalSize + size var logSize = Math.log10(size) var index = logSize.floor + 1 sizes[index] = sizes[index] + 1 } } else if (stat.isDirectory) { numDirs = numDirs + 1 fileSizeDist.call(path2) } } } var args = Process.arguments var path = (args.count == 0) ? "./" : args[0] if (!Directory.exists(path)) Fiber.abort("Path does not exist or is not a directory.") fileSizeDist.call(path) System.print("File size distribution for '%(path)' :-\n") for (i in 0...sizes.count) { System.write((i == 0) ? " " : "+ ") Fmt.print("Files less than 10 ^ $-2d bytes : $,5d", i, sizes[i]) } System.print(" -----") Fmt.print("= Number of files : $,5d", numFiles) Fmt.print(" Total size in bytes : $,d", totalSize) Fmt.print(" Number of sub-directories : $,5d", numDirs)</syntaxhighlight> {{out}} <pre> File size distribution for './' :- Files less than 10 ^ 0 bytes : 4 + Files less than 10 ^ 1 bytes : 2 + Files less than 10 ^ 2 bytes : 135 + Files less than 10 ^ 3 bytes : 946 + Files less than 10 ^ 4 bytes : 746 + Files less than 10 ^ 5 bytes : 79 + Files less than 10 ^ 6 bytes : 11 + Files less than 10 ^ 7 bytes : 3 + Files less than 10 ^ 8 bytes : 0 + Files less than 10 ^ 9 bytes : 0 + Files less than 10 ^ 10 bytes : 0 + Files less than 10 ^ 11 bytes : 0 ----- = Number of files : 1,926 Total size in bytes : 12,683,455 Number of sub-directories : 3 </pre> =={{header\|zkl}}== <~~lang~~syntaxhighlight lang="zkl">pipe:=Thread.Pipe(); // hoover all files in tree, don't return directories fcn(pipe,dir){ File.globular(dir,"",True,8,pipe); } Line 1,673 ⟶ 2,169: println("%15s : %s".fmt(szchrs[idx,], ""(scale*cnt).round().toInt())); idx-=1 + comma(); }</~~lang~~syntaxhighlight> {{out}} <pre>