Walk a directory/Recursively: Difference between revisions

From Rosetta Code
Content added Content deleted
(Replace older noisy example replaced by cleaner implementation. Improve variable names.)
Line 164: Line 164:
==[[Python]]==
==[[Python]]==
[[Category:Python]]
[[Category:Python]]

'''Interpreter:''' [[Python]] *(''Note:'' (see below for an improved method that ships with Python 2.5 and later. This method doesn't require support for generators)

#!/usr/bin/env python
from fnmatch import fnmatch
import os, os.path
def visitor(args, dir, filelist):
myPattern=args[0]
myResults=args[1]
for eachFile in filelist:
if fnmatch(eachFile, pattern):
results.append(os.path.join(dir, eachFile))

if __name__ == "__main__":
""" Find files under specified directories and matching the
specified patterns. The calling convention is to
provide the patterns of the form: [dir/][glob pattern]
Example:
/ '*.txt' '/tmp/foo*'
Would walk over three directories, / (with an implicit pattern
matching all files), then (implicitly) the current directory
matching all files ending in .txt, and lastly over the /tmp
directory tree matching all the files starting with the letters "foo"
NOTE: Glob patterns should be quoted to avoid shell expansion,
in order to pass them to this script properly.
"""
''' '''
import sys
root = '.' ## Current directory
results = []
pattern = '*'
if len(sys.argv[1:]):
for eachArg in sys.argv[1:]:
root, pattern = os.path.split(eachArg)
if root is '':
root = '.'
if pattern is '':
pattern = '*'
'''os.path.walk(root, visitor, (pattern, results))'''
else:
os.path.walk(root, visitor, (pattern, results))
for eachResult in results:
print eachResult

Only the ''visitor()'' function and the invocation of ''os.path.walk()'' are strictly necessary to this example. The rest of this code wraps that up in a potentially useful command line utility. The thing that Python novices, and quite a few old hands, find confusing about the interaction between ''os.path.walk()'' and the "visitor" function which they must implement and pass into it, is the use of a mutable container (a list or dictionary, for example) to collect and return the results; and the need to pass arguments to the visitor function (a pattern for ''fnmatch.fnmatch()'' in this case) through the ''os.path.walk()'' invocation into our visitor function. However, the model is very powerful, as it allows the visitor to implement arbitrarily complex behavior in selecting matching files (for example using ''os.stat()'' and various other functions and modules). It's even possible for the visitor function to prune subdirectory trees from the traversal by removing entries from its mutable "filelist" argument. Unfortunately these more advanced visitor functions are difficult to write since the entire ''os.path.walk()'' traversal is done in one invocation. (Contrast this to the generator based approached below).


'''Interpreter:''' [[Python]] 2.5
'''Interpreter:''' [[Python]] 2.5
Line 215: Line 170:
import os
import os
rootPath = '/' # Change to a suitable path for your OS
rootPath = '/'
pattern = '*.mp3' # Any string; Can include any UNIX shell-style wildcards
pattern = '*.mp3' # Can include any UNIX shell-style wildcards

# Includes: *, ?, [seq], [!seq]
for root, directories, files in os.walk(rootPath):
for root, dirs, files in os.walk(rootPath):
for aFile in files:
for filename in files:
if fnmatch.fnmatch(aFile, pattern):
if fnmatch.fnmatch(filename, pattern):
print os.path.join(root, aFile)
print os.path.join(root, filename)


This uses the ''os.walk()'' "[[generator]]" which is considered easier and more "Pythonic" than the use of the "visitor"-based ''os.path.walk()'' in the previous example.
This uses the ''os.walk()'' "[[generator]]" which is considered easier and more "Pythonic" than the use of the "visitor"-based ''os.path.walk()'' in the previous example.

'''Interpreter:''' [[Python]] older then 2.2

A more strictly comparable port of this 2.5 code to earlier versions of Python would be:
A more strictly comparable port of this 2.5 code to earlier versions of Python would be:


from fnmatch import fnmatch
from fnmatch import fnmatch
import os, os.path
import os, os.path
def print_fnmatches(pattern, dir, filelist):
def print_fnmatches(pattern, dir, files):
for eachFile in filelist:
for filename in files:
if fnmatch(eachFile, pattern):
if fnmatch(name, pattern):
print os.path.join(dir, eachFile)
print os.path.join(dir, filename)
os.path.walk('/', print_fnmatches, '*.mp3')
os.path.walk('/', print_fnmatches, '*.mp3')

However, we favor the previous example in that it shows how to make the results available to our subsequent code.



'''Interpreter:''' [[Python]] 2.5
'''Interpreter:''' [[Python]] 2.5


'''Libraries:''' [[Path]] *(''Note:'' This uses a non-standard replacement to the '''path''' module)
'''Libraries:''' [[Path]] *(''Note:'' This uses a non-standard replacement to the '''os.path''' module)
[[Category:Path]]
[[Category:Path]]



Revision as of 03:35, 20 October 2007

Task
Walk a directory/Recursively
You are encouraged to solve this task according to the task description, using any language you may know.

Walk a given directory tree and print files matching a given pattern.

Note: Please be careful when running any code examples found here.


E

def walkTree(directory, pattern) {
  for name => file in directory {
    if (name =~ rx`.*$pattern.*`) {
      println(file.getPath())
    }
    if (file.isDirectory()) {
      walkTree(file, pattern)
    }
  }
}

Example:

? walkTree(<file:/usr/share/man>, "rmdir")
/usr/share/man/man1/rmdir.1
/usr/share/man/man2/rmdir.2

Forth

Interpreter: gforth 0.6.2

Todo: track the full path and print it on matching files.

defer ls-filter

: dots? ( name len -- ? )
  dup 1 = if drop c@ [char] . =
  else 2 = if dup c@ [char] . = swap 1+ c@ [char] . = and
  else drop false then then ;

: ls-r ( dir len -- )
  open-dir if drop exit then  ( dirid)
  begin
    dup pad 256 rot read-dir throw
  while
    pad over dots? 0= if   \ ignore current and parent dirs
      pad over recurse
      pad over ls-filter if
        cr pad swap type
      else drop then
    else drop then 
  repeat
  drop close-dir throw ;

: c-file? ( str len -- ? )
  dup 3 < if 2drop false exit then
  + 1- dup c@ 32 or
   dup [char] c <> swap [char] h <> and if drop false exit then
  1- dup c@ [char] . <> if drop false exit then
  drop true ;
' c-file? is ls-filter

s" ." ls-r

Groovy

Print all text files in the current directory tree

new File('.').eachFileRecurse {
  if (it.name =~ /.*\.txt/) println it;
}

IDL

 result = file_search( directory, '*.txt', count=cc )

This will descend down the directory/ies in the variable "directory" (which can be an array) returning an array of strings with the names of the files matching "*.txt" and placing the total number of matches into the variable "cc"

Java

Compiler: javac, JDK 1.4 and up

Done using no pattern. But with end string comparison which gave better results.

import java.io.File;
public class MainEntry {
    public static void main(String[] args) {
        walkin(new File("/home/user")); //Replace this with a suitable directory
    }
    
    /**
     * Recursive function to descent into the directory tree and find all the file 
     * that end with ".mp3"
     * @param dir A file object defining the top directory
     **/
    public static void walkin(File dir) {
        String pattern = ".mp3";
        
        File listFile[] = dir.listFiles();
        if(listFile != null) {
            for(int i=0; i<listFile.length; i++) {
                if(listFile[i].isDirectory()) {
                    walkin(listFile[i]);
                } else {
                    if(listFile[i].getName().endsWith(pattern)) {
                        System.out.println(listFile[i].getPath());
                    }
                }
            }
        }
    }
}

MAXScript

fn walkDir dir pattern =
(
    dirArr = GetDirectories (dir + "\\*")

    for d in dirArr do
    (
        join dirArr (getDirectories (d + "\\*"))
    )

    append dirArr (dir + "\\") -- Need to include the original top level directory

    for f in dirArr do
    (
        print (getFiles (f + pattern))
    )
)

walkDir "C:" "*.txt"

Perl

Interpreter: Perl 5.x

use File::Find qw(find);
my $dir     = '.';
my $pattern = 'foo';
find sub {print $File::Find::name if /$pattern/}, $dir;

Pop11

Builtin procedure sys_file_match searches directories or directory trees using shell-like patterns (three dots indicate search for subdirectory tree).

lvars repp, fil;
;;; create path repeater
sys_file_match('.../*.p', '', false, 0) -> repp;
;;; iterate over paths
while (repp() ->> fil) /= termin do
     ;;; print the path
     printf(fil, '%s\n');
endwhile;

Python

Interpreter: Python 2.5

 import fnmatch
 import os
 
 rootPath = '/'
 pattern = '*.mp3' # Can include any UNIX shell-style wildcards
 for root, dirs, files in os.walk(rootPath):
     for filename in files:
         if fnmatch.fnmatch(filename, pattern):
             print os.path.join(root, filename)

This uses the os.walk() "generator" which is considered easier and more "Pythonic" than the use of the "visitor"-based os.path.walk() in the previous example.

Interpreter: Python older then 2.2

A more strictly comparable port of this 2.5 code to earlier versions of Python would be:

from fnmatch import fnmatch
import os, os.path
def print_fnmatches(pattern, dir, files):
    for filename in files:
        if fnmatch(name, pattern):
            print os.path.join(dir, filename)
os.path.walk('/', print_fnmatches, '*.mp3')

Interpreter: Python 2.5

Libraries: Path *(Note: This uses a non-standard replacement to the os.path module)

 from path import path
 
 rootPath = '/'
 pattern = '*.mp3'
 
 d = path(rootPath)
 for f in d.walkfiles(pattern):
   print f

Ruby

Pattern matching using regular expressions

 #define a recursive function that will traverse the directory tree
 def printAndDescend(pattern)
   #we keep track of the directories, to be used in the second, recursive part of this function
   directories=[]
   Dir['*'].sort.each do |name|
     if File.file?(name) and name[pattern]
       puts(File.expand_path(name))
     elsif File.directory?(name)
       directories << name
     end
   end
   directories.each do |name|
     #don't descend into . or .. on linux
     Dir.chdir(name){printAndDescend(pattern)} if !Dir.pwd[File.expand_path(name)]
   end
 end
 #print all ruby files
 printAndDescend(/.+\.rb$/)

Or use the Find core Module

 require 'find'
 
 def find_and_print(path, pattern)
   Find.find(path) do |entry|
     if File.file?(entry) and entry[pattern]
       puts entry
     end
   end
 end
 
 # print all the ruby files
 find_and_print(".", /.+\.rb$/)

Or, to find and print all files under '/foo/bar' the easy way:

 Dir.glob( File.join('/foo/bar', '**', '*') ) { |file| puts file }

Scala

This is not implemented in the Scala library. Here is a possible solution, building on class java.io.File and on scala language and library iteration facilities

package io.utils

import java.io.File
 
/** A wrapper around file, allowing iteration either on direct children 
     or on directory tree */
class RichFile(file: File) {
  
  def children = new Iterable[File] {
    def elements = 
      if (file.isDirectory) file.listFiles.elements else Iterator.empty;
  }

  def andTree : Iterable[File] = (
    Seq.single(file) 
    ++ children.flatMap(child => new RichFile(child).andTree))
}
 
/** implicitely enrich java.io.File with methods of RichFile */
object RichFile {
  implicit def toRichFile(file: File) = new RichFile(file)
}

Class RichFile gets a java.io.File in constructor. Its two methods returns Iterables on items of type File. children allow iterations on the direct children (empty if file is not a directory). andTree contains file and all files below, as a concatenation (++) of a a sequence which contains only file (Seq.single) and actual descendants. Method flatMap in Iterable takes a function argument which associates each item (child) to another Iterable (andTree called recursively on that child) and returns the concatenation of those iterables.

The purpose of object RichFile is to publish implicit method toRichFile. When this method is available in scope (after import RichFile.toRichFile or import RichFile._), it is called behind the scene when a method of class RichFile is called on an instance of type File : with f of type File, code f.children (resp. f.andTree) becomes toRichFile(f).children (resp. toRichFile(f).andTree). It is as if class File had been added the methods of class RichFile.

Using it :

package test.io.utils

import io.utils.RichFile._ // this makes implicit toRichFile active
import java.io.File

object Test extends Application {
  val root = new File("/home/user")
  for(f <- root.andTree) Console.println(f)

 // filtering comes for free
 for(f <- root.andTree; if f.getName.endsWith(".mp3")) Console.println(f)
}

Tcl

Interpreter: Tcl 8.4

proc walkin { fromDir } {
    foreach fname [glob -nocomplain -directory $fromDir *] {
        if { [file isdirectory $fname] } {
            walkin $fname
        } else {
            if { [string match *.mp3 $fname] } {
                puts [file normalize $fname]
            }
        }
    }
}
# replace directory with something appropriate
walkin /home/user