Extract file extension: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎{{header|Racket}}: Several Improvements)
Line 236: Line 236:
fileExt("file.odd_one") println
fileExt("file.odd_one") println
</pre>
</pre>

=={{header|Python}}==
Uses [https://docs.python.org/3/library/os.path.html#os.path.splitext os.path.splitext] and the extended tests from the Go example above.

<lang python>Python 3.5.0a1 (v3.5.0a1:5d4b6a57d5fd, Feb 7 2015, 17:58:38) [MSC v.1900 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import os
>>> tests = ["picture.jpg",
"http://mywebsite.com/picture/image.png",
"myuniquefile.longextension",
"IAmAFileWithoutExtension",
"/path/to.my/file",
"file.odd_one",
# Extra, with unicode
"café.png",
"file.resumé",
# with unicode combining characters
"cafe\u0301.png",
"file.resume\u0301"]
>>> for path in tests:
print("Path: %r -> Extension: %r" % (path, os.path.splitext(path)[-1]))

Path: 'picture.jpg' -> Extension: '.jpg'
Path: 'http://mywebsite.com/picture/image.png' -> Extension: '.png'
Path: 'myuniquefile.longextension' -> Extension: '.longextension'
Path: 'IAmAFileWithoutExtension' -> Extension: ''
Path: '/path/to.my/file' -> Extension: ''
Path: 'file.odd_one' -> Extension: '.odd_one'
Path: 'café.png' -> Extension: '.png'
Path: 'file.resumé' -> Extension: '.resumé'
Path: 'café.png' -> Extension: '.png'
Path: 'file.resumé' -> Extension: '.resumé'
>>> </lang>


=={{header|Racket}}==
=={{header|Racket}}==

Revision as of 06:47, 6 June 2015

Extract file extension is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Write a program that takes one string argument representing the path to a file and returns the file's extension, or the null string if the file path has no extension. An extension appears after the last period in the file name and consists of one or more letters or numbers.

Show here the action of your routine on the following examples:

  1. picture.jpg returns .jpg
  2. http://mywebsite.com/picture/image.png returns .png
  3. myuniquefile.longextension returns .longextension
  4. IAmAFileWithoutExtension returns an empty string ""
  5. /path/to.my/file returns an empty string as the period is in the directory name rather than the file
  6. file.odd_one returns an empty string as an extension (by this definition), cannot contain an underscore.


AWK

<lang AWK>

  1. syntax: GAWK -f EXTRACT_FILE_EXTENSION.AWK

BEGIN {

   arr[++i] = "picture.jpg"
   arr[++i] = "http://mywebsite.com/picture/image.png"
   arr[++i] = "myuniquefile.longextension"
   arr[++i] = "IAmAFileWithoutExtension"
   arr[++i] = "/path/to.my/file"
   arr[++i] = "file.odd_one"
   for (j=1; j<=i; j++) {
     printf("%-40s '%s'\n",arr[j],extract_ext(arr[j]))
   }
   exit(0)

} function extract_ext(fn, sep1,sep2,tmp) {

   while (fn ~ (sep1 = ":|\\\\|\\/")) { # ":" or "\" or "/"
     fn = substr(fn,match(fn,sep1)+1)
   }
   while (fn ~ (sep2 = "\\.")) { # "."
     fn = substr(fn,match(fn,sep2)+1)
     tmp = 1
   }
   if (fn ~ /_/ || tmp == 0) {
     return("")
   }
   return(fn)

} </lang>

Output:

picture.jpg                              'jpg'
http://mywebsite.com/picture/image.png   'png'
myuniquefile.longextension               'longextension'
IAmAFileWithoutExtension                 ''
/path/to.my/file                         ''
file.odd_one                             ''

C#

<lang C#> public static string ExtractExtension(string str) {

           string s = str;
           string temp = "";
           string result = "";
           bool isDotFound = false;
           for (int i = s.Length -1; i >= 0; i--)
           {
               if(s[i].Equals('.'))
               {
                   temp += s[i];
                   isDotFound = true;
                   break;
               }
               else
               {
                   temp += s[i];
               }
           }
           if(!isDotFound)
           {
               result = "";
           }
           else
           {
               for (int j = temp.Length - 1; j >= 0; j--)
               {
                   result += temp[j];
               }
           }
           return result;

} </lang>

Emacs Lisp

<lang Lisp>(file-name-extension "foo.txt") => "txt"</lang>

No extension is distinguished from empty extension but an (or ... "") can give "" for both if desired

<lang Lisp>(file-name-extension "foo.") => "" (file-name-extension "foo") => nil</lang>

An Emacs backup ~ or .~NUM~ are not part of the extension, but otherwise any characters are allowed.

<lang Lisp>(file-name-extension "foo.txt~") => "txt" (file-name-extension "foo.txt.~1.234~") => "txt"</lang>

Go

<lang go>package main

import ( "fmt" "path" )

// An exact copy of `path.Ext` from Go 1.4.2 for reference: func Ext(path string) string { for i := len(path) - 1; i >= 0 && path[i] != '/'; i-- { if path[i] == '.' { return path[i:] } } return "" }

// A variation that handles the extra non-standard requirement // that extensions shall only "consists of one or more letters or numbers". // // Note, instead of direct comparison with '0-9a-zA-Z' we could instead use: // case !unicode.IsLetter(rune(b)) && !unicode.IsNumber(rune(b)): // return "" // even though this operates on bytes instead of Unicode code points (runes), // it is still correct given the details of UTF-8 encoding. func ext(path string) string { for i := len(path) - 1; i >= 0; i-- { switch b := path[i]; { case b == '.': return path[i:] case '0' <= b && b <= '9': case 'a' <= b && b <= 'z': case 'A' <= b && b <= 'Z': default: return "" } } return "" }

func main() { tests := []string{ "picture.jpg", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one", // Extra, with unicode "café.png", "file.resumé", // with unicode combining characters "cafe\u0301.png", "file.resume\u0301", } for _, str := range tests { std := path.Ext(str) custom := ext(str) fmt.Printf("%38s\t→ %-8q", str, custom) if custom != std { fmt.Printf("(Standard: %q)", std) } fmt.Println() } }</lang>

Output:
                           picture.jpg	→ ".jpg"  
http://mywebsite.com/picture/image.png	→ ".png"  
            myuniquefile.longextension	→ ".longextension"
              IAmAFileWithoutExtension	→ ""      
                      /path/to.my/file	→ ""      
                          file.odd_one	→ ""      (Standard: ".odd_one")
                              café.png	→ ".png"  
                           file.resumé	→ ""      (Standard: ".resumé")
                             café.png	→ ".png"  
                          file.resumé	→ ""      (Standard: ".resumé")

J

Implementation:

<lang J>require'regex' ext=: '[.][a-zA-Z0-9]+$'&rxmatch ;@rxfrom ]</lang>

Obviously most of the work here is done by the regex implementation (pcre, if that matters - and this particular kind of expression tends to be a bit more concise expressed in perl than in J...).

Perhaps of interest is that this is an example of a J fork - here we have three verbs separated by spaces. Unlike a unix system fork (which spins up child process which is an almost exact clone of the currently running process), a J fork is three independently defined verbs. The two verbs on the edge get the fork's argument and the verb in the middle combines those two results.

The left verb uses rxmatch to find the beginning position of the match and its length. The right verb is the identity function. The middle verb extracts the desired characters from the original argument. (For a non-match, the length of the "match" is zero so the empty string is extracted.)


Alternative non-regex Implementation <lang J>ext=: #~ [: +./\ e.&'.' *. [: -. [: +./\. -.@e.&('.',AlphaNum_j_)</lang>

Task examples:

<lang J> ext 'picture.jpg' .jpg

  ext 'http://mywebsite.com/picture/image.png'

.png

  Examples=: 'picture.jpg';'http://mywebsite.com/picture/image.png';'myuniquefile.longextension';'IAmAFileWithoutExtension';'/path/to.my/file';'file.odd_one'
  ext each Examples

┌────┬────┬──────────────┬┬┬┐ │.jpg│.png│.longextension││││ └────┴────┴──────────────┴┴┴┘</lang>

Oforth

If extension is not valid, returns null, not "". Easy to change if "" is required.

<lang Oforth>: fileExt(s) { | i |

  s lastIndexOf('.') dup ->i ifNull: [ null return ]
  s extract(i 1 +, s size) conform(#isAlpha) ifFalse: [ null return ]
  s extract(i, s size)

} </lang>

Output:
fileExt("picture.jpg") println
fileExt("http://mywebsite.com/picture/image.png") println
fileExt("myuniquefile.longextension") println
fileExt("IAmAFileWithoutExtension") println
fileExt("/path/to.my/file") println
fileExt("file.odd_one") println

Python

Uses os.path.splitext and the extended tests from the Go example above.

<lang python>Python 3.5.0a1 (v3.5.0a1:5d4b6a57d5fd, Feb 7 2015, 17:58:38) [MSC v.1900 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import os >>> tests = ["picture.jpg", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one", # Extra, with unicode "café.png", "file.resumé", # with unicode combining characters "cafe\u0301.png", "file.resume\u0301"] >>> for path in tests:

   print("Path: %r -> Extension: %r" % (path, os.path.splitext(path)[-1]))


Path: 'picture.jpg' -> Extension: '.jpg' Path: 'http://mywebsite.com/picture/image.png' -> Extension: '.png' Path: 'myuniquefile.longextension' -> Extension: '.longextension' Path: 'IAmAFileWithoutExtension' -> Extension: Path: '/path/to.my/file' -> Extension: Path: 'file.odd_one' -> Extension: '.odd_one' Path: 'café.png' -> Extension: '.png' Path: 'file.resumé' -> Extension: '.resumé' Path: 'café.png' -> Extension: '.png' Path: 'file.resumé' -> Extension: '.resumé' >>> </lang>

Racket

<lang Racket>#lang racket

Note that for a real implementation, Racket has a
`filename-extension` in its standard library, but don't use it here
since it requires a proper name (fails on ""), returns a byte-string,
and handles path values so might run into problems with unicode
string inputs.

(define (string-extension x)

 (cadr (regexp-match #px"(\\.alnum:+|)$" x)))

(define (string-extension/unicode x)

 (cadr (regexp-match #px"(\\.(?:\\p{L}|\\p{N}|\\p{M})+|)$" x)))

(define examples '("picture.jpg"

                  "http://mywebsite.com/picture/image.png"
                  "myuniquefile.longextension"
                  "IAmAFileWithoutExtension"
                  "/path/to.my/file"
                  "file.odd_one"
                  ""
                  ;; Extra, with unicode
                  "café.png"
                  "file.resumé"
                  ;; with unicode combining characters
                  "cafe\u0301.png"
                  "file.resume\u0301"))

(printf "Official task:\n") (for ([x (in-list examples)])

 (printf "~s ==> ~s\n" x (string-extension x)))

(printf "\nWith unicode support:\n") (for ([x (in-list examples)])

 (printf "~s ==> ~s\n" x (string-extension/unicode x)))

</lang>

Output:
Official task:
  "picture.jpg" ==> ".jpg"
  "http://mywebsite.com/picture/image.png" ==> ".png"
  "myuniquefile.longextension" ==> ".longextension"
  "IAmAFileWithoutExtension" ==> ""
  "/path/to.my/file" ==> ""
  "file.odd_one" ==> ""
  "" ==> ""
  "café.png" ==> ".png"
  "file.resumé" ==> ""
  "café.png" ==> ".png"
  "file.resumé" ==> ""

With unicode support:
  "picture.jpg" ==> ".jpg"
  "http://mywebsite.com/picture/image.png" ==> ".png"
  "myuniquefile.longextension" ==> ".longextension"
  "IAmAFileWithoutExtension" ==> ""
  "/path/to.my/file" ==> ""
  "file.odd_one" ==> ""
  "" ==> ""
  "café.png" ==> ".png"
  "file.resumé" ==> ".resumé"
  "café.png" ==> ".png"
  "file.resumé" ==> ".resumé"

REXX

(Using this paraphrased Rosetta Code task's definition that a legal file extension only consists of mixed-case Latin letters and/or decimal digits.) <lang rexx>/*REXX program extracts the (legal) file extension from a file name. */

                    @.  =             /*define default value for array.*/

if arg(1)\== then @.1 = arg(1) /*use the filename from the C.L. */

               else do                /*No filename given? Use defaults*/
                    @.1 = 'picture.jpg'
                    @.2 = 'http://mywebsite.com/pictures/image.png'
                    @.3 = 'myuniquefile.longextension'
                    @.4 = 'IAmAFileWithoutExtension'
                    @.5 = '/path/to.my/file'
                    @.6 = 'file.odd_one'
                    end
 do j=1  while @.j\==;  $=@.j;  x=  /*process all of the file names. */
 p=lastpos(.,$)                       /*find last position of a period.*/
 if p\==0  then x=substr($,p+1)       /*Found?  Get the stuff after it.*/
 if \datatype(x,'A')  then x=         /*upper & lower case letters+digs*/
 if x==  then x=' [null]'           /*use a better name for a "null".*/
           else x=. || x              /*prefix extension with a period.*/
 say 'file ext='left(x,20)       'for file name='$
 end       /*j*/
                                      /*stick a fork in it, we're done.*/</lang>

output using the default inputs:

file ext=.jpg                 for file name=picture.jpg
file ext=.png                 for file name=http://mywebsite.com/pictures/image.png
file ext=.longextension       for file name=myuniquefile.longextension
file ext= [null]              for file name=IAmAFileWithoutExtension
file ext= [null]              for file name=/path/to.my/file
file ext= [null]              for file name=file.odd_one

sed

<lang sed>s:.*\.:.: s:\(^[^.]\|.*[/_]\).*::</lang> or <lang bash>sed -re 's:.*\.:.:' -e 's:(^[^.]|.*[/_]).*::'</lang>

Output:
.jpg
.png
.longextension
IAmAFileWithoutExtension


Tcl

Tcl's built in file extension command already almost knows how to do this, except it accepts any character after the dot. Just for fun, we'll enhance the builtin with a new subcommand with the limitation specified for this problem.

<lang Tcl>proc assert {expr} {  ;# for "static" assertions that throw nice errors

   if {![uplevel 1 [list expr $expr]]} {
       set msg "{$expr}"
       catch {append msg " {[uplevel 1 [list subst -noc $expr]]}"}
       tailcall throw {ASSERT ERROR} $msg
   }

}

proc file_ext {file} {

   set res ""
   regexp {(\.[a-z0-9]+)$} $file -> res
   return $res

}

set map [namespace ensemble configure file -map] dict set map ext ::file_ext namespace ensemble configure file -map $map

  1. and a test:

foreach {file ext} {

   picture.jpg	.jpg
   http://mywebsite.com/picture/image.png	.png
   myuniquefile.longextension	.longextension
   IAmAFileWithoutExtension	""
   /path/to.my/file	""
   file.odd_one	""

} {

   set res ""
   assert {[file ext $file] eq $ext}

}</lang>