Extract file extension: Difference between revisions
→{{header|zkl}}: update |
add "needs updating" template to 20 of the language entries |
||
Line 55: | Line 55: | ||
=={{header|ALGOL 68}}== |
=={{header|ALGOL 68}}== |
||
{{works with|ALGOL 68G|Any - tested with release 2.8.win32}} |
{{works with|ALGOL 68G|Any - tested with release 2.8.win32}} |
||
<lang algol68># extracts a file-extension from the end of a pathname. The file extension is # |
<lang algol68># extracts a file-extension from the end of a pathname. The file extension is # |
||
Line 115: | Line 116: | ||
=={{header|ALGOL W}}== |
=={{header|ALGOL W}}== |
||
<lang algolw>begin |
<lang algolw>begin |
||
% extracts a file-extension from the end of a pathname. % |
% extracts a file-extension from the end of a pathname. % |
||
Line 203: | Line 205: | ||
=={{header|AWK}}== |
=={{header|AWK}}== |
||
{{update|AWK|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang AWK> |
<lang AWK> |
||
# syntax: GAWK -f EXTRACT_FILE_EXTENSION.AWK |
# syntax: GAWK -f EXTRACT_FILE_EXTENSION.AWK |
||
Line 240: | Line 245: | ||
file.odd_one '' |
file.odd_one '' |
||
</pre> |
</pre> |
||
=={{header|C}}== |
=={{header|C}}== |
||
{{update|C|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang C> |
<lang C> |
||
#include <assert.h> |
#include <assert.h> |
||
Line 292: | Line 301: | ||
=={{header|C++}}== |
=={{header|C++}}== |
||
{{update|C++|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang cpp>#include <string> |
<lang cpp>#include <string> |
||
#include <algorithm> |
#include <algorithm> |
||
Line 333: | Line 345: | ||
=={{header|C sharp|C#}}== |
=={{header|C sharp|C#}}== |
||
<lang [[C sharp|C#]]>public static string FindExtension(string filename) { |
<lang [[C sharp|C#]]>public static string FindExtension(string filename) { |
||
int indexOfDot = filename.Length; |
int indexOfDot = filename.Length; |
||
Line 355: | Line 368: | ||
=={{header|Emacs Lisp}}== |
=={{header|Emacs Lisp}}== |
||
<lang Lisp>(file-name-extension "foo.txt") |
<lang Lisp>(file-name-extension "foo.txt") |
||
=> |
=> |
||
Line 370: | Line 384: | ||
=={{header|Forth}}== |
=={{header|Forth}}== |
||
{{update|Forth|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang forth>: invalid? ( c -- f ) |
<lang forth>: invalid? ( c -- f ) |
||
toupper dup [char] A [char] Z 1+ within |
toupper dup [char] A [char] Z 1+ within |
||
Line 410: | Line 427: | ||
=={{header|Fortran}}== |
=={{header|Fortran}}== |
||
{{update|Fortran|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
The plan is to scan backwards from the end of the text until a non-extensionish character is encountered. If it is a period, then a valid file extension has been spanned. Otherwise, no extension. Yet again the "no specification" on the possibility of shortcut evaluation of compound logical expressions prevents the structured use of a DO WHILE(L1 > 0 & etc) loop because the possible evaluation of both parts of the expression means that the second part may attempt to access character zero of a text. So, the compound expression has to be broken into two separate parts. |
The plan is to scan backwards from the end of the text until a non-extensionish character is encountered. If it is a period, then a valid file extension has been spanned. Otherwise, no extension. Yet again the "no specification" on the possibility of shortcut evaluation of compound logical expressions prevents the structured use of a DO WHILE(L1 > 0 & etc) loop because the possible evaluation of both parts of the expression means that the second part may attempt to access character zero of a text. So, the compound expression has to be broken into two separate parts. |
||
Line 493: | Line 513: | ||
=={{header|Go}}== |
=={{header|Go}}== |
||
{{update|Go|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang go>package main |
<lang go>package main |
||
Line 500: | Line 523: | ||
) |
) |
||
// An exact copy of `path.Ext` from Go |
// An exact copy of `path.Ext` from Go 1.4.2 for reference: |
||
func Ext(path string) string { |
func Ext(path string) string { |
||
for i := len(path) - 1; i >= 0 && path[i] != '/'; i-- { |
for i := len(path) - 1; i >= 0 && path[i] != '/'; i-- { |
||
Line 573: | Line 596: | ||
=={{header|Haskell}}== |
=={{header|Haskell}}== |
||
{{update|Haskell|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang Haskell>module FileExtension |
<lang Haskell>module FileExtension |
||
where |
where |
||
Line 591: | Line 617: | ||
=={{header|J}}== |
=={{header|J}}== |
||
{{update|J|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
'''Implementation:''' |
'''Implementation:''' |
||
Line 620: | Line 648: | ||
=={{header|Java}}== |
=={{header|Java}}== |
||
{{update|Java|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang java>public class Test { |
<lang java>public class Test { |
||
Line 649: | Line 680: | ||
=={{header|jq}}== |
=={{header|jq}}== |
||
{{update|jq|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
Pending resolution of the inconsistency in the task description as of this writing, the following |
Pending resolution of the inconsistency in the task description as of this writing, the following |
||
Line 699: | Line 732: | ||
=={{header|Lua}}== |
=={{header|Lua}}== |
||
{{update|Lua|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang Lua>-- Lua pattern docs at http://www.lua.org/manual/5.1/manual.html#5.4.1 |
<lang Lua>-- Lua pattern docs at http://www.lua.org/manual/5.1/manual.html#5.4.1 |
||
function fileExt (filename) return filename:match("(%.%w+)$") or "" end |
function fileExt (filename) return filename:match("(%.%w+)$") or "" end |
||
Line 722: | Line 758: | ||
=={{header|Oforth}}== |
=={{header|Oforth}}== |
||
{{update|Oforth|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
If extension is not valid, returns null, not "". |
If extension is not valid, returns null, not "". |
||
Easy to change if "" is required. |
Easy to change if "" is required. |
||
Line 732: | Line 771: | ||
s extract(i, s size) |
s extract(i, s size) |
||
} </lang> |
} </lang> |
||
{{out}} |
{{out}} |
||
Line 814: | Line 851: | ||
=={{header|Phix}}== |
=={{header|Phix}}== |
||
{{update|Phix|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang Phix>function getExtension(string filename) |
<lang Phix>function getExtension(string filename) |
||
for i=length(filename) to 1 by -1 do |
for i=length(filename) to 1 by -1 do |
||
Line 843: | Line 883: | ||
=={{header|PowerShell}}== |
=={{header|PowerShell}}== |
||
{{update|PowerShell|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang PowerShell> |
<lang PowerShell> |
||
function extension($file){ |
function extension($file){ |
||
Line 869: | Line 912: | ||
=={{header|Python}}== |
=={{header|Python}}== |
||
{{update|Python|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
Uses [https://docs.python.org/3/library/os.path.html#os.path.splitext os.path.splitext] and the extended tests from the Go example above. |
Uses [https://docs.python.org/3/library/os.path.html#os.path.splitext os.path.splitext] and the extended tests from the Go example above. |
||
Line 903: | Line 949: | ||
=={{header|Racket}}== |
=={{header|Racket}}== |
||
{{update|Racket|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang Racket>#lang racket |
<lang Racket>#lang racket |
||
Line 969: | Line 1,018: | ||
=={{header|REXX}}== |
=={{header|REXX}}== |
||
Using this paraphrased Rosetta Code task's definition that: |
Using this paraphrased Rosetta Code task's definition that: |
||
Line 1,004: | Line 1,054: | ||
=={{header|sed}}== |
=={{header|sed}}== |
||
{{update|sed|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang sed>s:.*\.:.: |
<lang sed>s:.*\.:.: |
||
s:\(^[^.]\|.*[/_]\).*::</lang> or <lang bash>sed -re 's:.*\.:.:' -e 's:(^[^.]|.*[/_]).*::'</lang> |
s:\(^[^.]\|.*[/_]\).*::</lang> or <lang bash>sed -re 's:.*\.:.:' -e 's:(^[^.]|.*[/_]).*::'</lang> |
||
Line 1,017: | Line 1,070: | ||
=={{header|Sidef}}== |
=={{header|Sidef}}== |
||
{{update|Sidef|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang ruby>func extension (filename) { |
<lang ruby>func extension (filename) { |
||
given(filename.split('.').last) { |
given(filename.split('.').last) { |
||
Line 1,043: | Line 1,099: | ||
=={{header|Tcl}}== |
=={{header|Tcl}}== |
||
{{update|Tcl|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
Tcl's built in [http://wiki.tcl.tk/10072 file extension] command already almost knows how to do this, except it accepts any character after the dot. Just for fun, we'll enhance the builtin with a new subcommand with the limitation specified for this problem. |
Tcl's built in [http://wiki.tcl.tk/10072 file extension] command already almost knows how to do this, except it accepts any character after the dot. Just for fun, we'll enhance the builtin with a new subcommand with the limitation specified for this problem. |
||
Line 1,078: | Line 1,136: | ||
=={{header|VBScript}}== |
=={{header|VBScript}}== |
||
{{update|VBScript|The format of a suffix has been clarified, and the test-cases have been replaced with new ones.}} |
|||
<lang vb> |
<lang vb> |
||
Function GetExtension(s) |
Function GetExtension(s) |
||
Line 1,118: | Line 1,179: | ||
=={{header|zkl}}== |
=={{header|zkl}}== |
||
The File object has a method splitFileName that does just that, returning a list of the parts. The method knows about the OS it was compiled on (Unix, Windows). |
The File object has a method splitFileName that does just that, returning a list of the parts. The method knows about the OS it was compiled on (Unix, Windows). |
||
<lang zkl>valid:=Walker.chain(".",["a".."z"],["A".."Z"],["0".."9")).sink(String).walk(); |
<lang zkl>valid:=Walker.chain(".",["a".."z"],["A".."Z"],["0".."9")).sink(String).walk(); |
Revision as of 09:29, 4 September 2016
Filename extensions are a rudimentary but commonly used way of identifying files types.
Write a function or program that
- takes one string argument representing the path/URL to a file
- returns the filename extension according to the below specification, or an empty string if the filename has no extension.
If your programming language (or standard library) has built-in functionality for extracting a filename extension, show how it would be used and how exactly its behavior differs from this specification.
For the purposes of this task, a filename extension
- occurs at the very end of the filename
- consists of a period, followed solely by one or more ASCII letters or digits (A-Z, a-z, 0-9)
Input | Output | Comment |
---|---|---|
http://example.com/download.tar.gz
|
.gz
|
|
CharacterModel.3DS
|
.3DS
|
|
.desktop
|
.desktop
|
|
document
|
|
empty string |
document.txt_backup
|
|
empty string, because _ is not a letter or number
|
/etc/pam.d/login
|
|
empty string, as the period is in the parent directory name rather than the filename |
ALGOL 68
<lang algol68># extracts a file-extension from the end of a pathname. The file extension is #
- defined as a dot followed by one or more letters or digits #
OP EXTENSION = ( STRING pathname )STRING:
IF LWB pathname >= UPB pathname THEN # the pathname has 0 or 1 characters and so has no extension # "" ELIF NOT isalnum( pathname[ UPB pathname ] ) THEN # the final character is not a letter or digit - no extension # "" ELSE # could have an extension # INT pos := UPB pathname; WHILE pos > LWB pathname AND isalnum( pathname[ pos ] ) DO pos -:= 1 OD; IF pathname[ pos ] = "." THEN # the character before the letters and digits was a "." # pathname[ pos : ] ELSE # no "." before the letters and digits - no extension # "" FI FI ; # EXTENSION #
- test the EXTENSION operator #
PROC test extension = ( STRING pathname, STRING expected extension )VOID:
BEGIN STRING extension = EXTENSION pathname; write( ( ( pathname + " got extension: (" + extension + ") " + IF extension = expected extension THEN "" ELSE "NOT" FI + " as expected" ) , newline ) ) END ; # text extension #
main: ( test extension( "http://example.com/download.tar.gz", ".gz" )
- test extension( "CharacterModel.3DS", ".3DS" )
- test extension( ".desktop", ".desktop" )
- test extension( "document", "" )
- test extension( "document.txt_backup", "" )
- test extension( "/etc/pam.d/login", "" )
)</lang>
- Output:
http://example.com/download.tar.gz got extension: (.gz) as expected CharacterModel.3DS got extension: (.3DS) as expected .desktop got extension: (.desktop) as expected document got extension: () as expected document.txt_backup got extension: () as expected /etc/pam.d/login got extension: () as expected
ALGOL W
<lang algolw>begin
% extracts a file-extension from the end of a pathname. % % The file extension is defined as a dot followed by one or more letters % % or digits. As Algol W only has fixed length strings we limit the % % extension to 32 characters and the pathname to 256 (the longest string % % allowed by Algol W) % string(32) procedure extension( string(256) value pathname ) ; begin
integer pathPos;
% position to the previous character in the pathname % procedure prev ; pathPos := pathPos - 1; % get the character as pathPos from pathname % string(1) procedure ch ; pathname( pathPos // 1 ); % checks for a letter or digit - assumes the letters are contiguous % % in the character set - not true for EBCDIC % logical procedure isLetterOrDigit( string(1) value c ) ; ( c <= "z" and c >= "a" ) or ( c <= "Z" and c >= "A" ) or ( c <= "9" and c >= "0" ) ;
% find the length of the pathname with trailing blanks removed % pathPos := 255; while pathPos >= 0 and ch = " " do prev;
% extract the extension if possible % if pathPos <= 0 then "" % no extension: 0 or 1 character pathname % else if not isLetterOrDigit( ch ) then "" % no extension: last character not a letter/digit % else begin while pathPos > 0 and isLetterOrDigit( ch ) do prev; if ch not = "." then "" % no extension: letters/digits not preceeded by "." % else begin % have an extension % string(32) ext; ext := " "; % algol W substring lengths must be compile-time constants % % hence the loop to copy the extension characters % for charPos := 0 until 31 do begin if pathPos <= 255 then begin ext( charPos // 1 ) := pathname( pathPos // 1 ); pathPos := pathPos + 1 end end for_charPos ; ext end end
end extension ;
% test the extension procedure % procedure testExtension( string(256) value pathname ; string(32) value expectedExtension ) ; begin string(32) ext; ext := extension( pathname ); write( pathname( 0 // 40 ) , " -> (" , ext( 0 // 16 ) , ") " , if ext = expectedExtension then "" else "NOT" , " as expected" ) end ; % text extension % testExtension( "http://example.com/download.tar.gz", ".gz" ); testExtension( "CharacterModel.3DS", ".3DS" ); testExtension( ".desktop", ".desktop" ); testExtension( "document", "" ); testExtension( "document.txt_backup", "" ); testExtension( "/etc/pam.d/login", "" );
end.</lang>
- Output:
http://example.com/download.tar.gz -> (.gz ) as expected CharacterModel.3DS -> (.3DS ) as expected .desktop -> (.desktop ) as expected document -> ( ) as expected document.txt_backup -> ( ) as expected /etc/pam.d/login -> ( ) as expected
AWK
<lang AWK>
- syntax: GAWK -f EXTRACT_FILE_EXTENSION.AWK
BEGIN {
arr[++i] = "picture.jpg" arr[++i] = "http://mywebsite.com/picture/image.png" arr[++i] = "myuniquefile.longextension" arr[++i] = "IAmAFileWithoutExtension" arr[++i] = "/path/to.my/file" arr[++i] = "file.odd_one" for (j=1; j<=i; j++) { printf("%-40s '%s'\n",arr[j],extract_ext(arr[j])) } exit(0)
} function extract_ext(fn, sep1,sep2,tmp) {
while (fn ~ (sep1 = ":|\\\\|\\/")) { # ":" or "\" or "/" fn = substr(fn,match(fn,sep1)+1) } while (fn ~ (sep2 = "\\.")) { # "." fn = substr(fn,match(fn,sep2)+1) tmp = 1 } if (fn ~ /_/ || tmp == 0) { return("") } return(fn)
} </lang>
Output:
picture.jpg 'jpg' http://mywebsite.com/picture/image.png 'png' myuniquefile.longextension 'longextension' IAmAFileWithoutExtension '' /path/to.my/file '' file.odd_one ''
C
<lang C>
- include <assert.h>
- include <ctype.h>
- include <string.h>
- include <stdio.h>
/* Returns a pointer to the extension of 'string'. If no extension is found,
* then returns a pointer to the null-terminator of 'string'. */
char* file_ext(const char *string) {
assert(string != NULL); char *ext = strrchr(string, '.');
if (ext == NULL) return (char*) string + strlen(string);
for (char *iter = ext + 1; *iter != '\0'; iter++) { if (!isalnum(*iter)) return (char*) string + strlen(string); }
return ext + 1;
}
int main(void) {
const char *strings[] = { "picture.jpg", "http://mywebsite.con/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one" };
for (int i = 0; i < sizeof(strings) / sizeof(strings[0]); ++i) { printf("'%s' - '%s'\n", strings[i], file_ext(strings[i])); }
} </lang>
- Output:
'picture.jpg' - 'jpg' 'http://mywebsite.con/picture/image.png' - 'png' 'myuniquefile.longextension' - 'longextension' 'IAmAFileWithoutExtension' - '' '/path/to.my/file' - '' 'file.odd_one' - ''
C++
<lang cpp>#include <string>
- include <algorithm>
- include <iostream>
- include <vector>
std::string findExtension ( const std::string & filename ) {
auto position = filename.find_last_of ( '.' ) ; if ( position == std::string::npos ) return "" ; else { std::string extension ( filename.substr( position ) ) ; position = extension.find( '_' ) ; auto pos2 = extension.find( '/' ) ; if (( position != std::string::npos ) || ( pos2 != std::string::npos ))
return "" ;
else
return extension ;
}
}
int main( ) {
std::vector<std::string> filenames {"picture.jpg" , "http://mywebsite.com/picture/image.png" , "myuniquefile.longextension" , "IAmAFileWithoutExtension" , "/path/to.my/file" , "file.odd_one" } ; std::vector<std::string> extensions( filenames.size( ) ) ; std::transform( filenames.begin( ) , filenames.end( ) , extensions.begin( ) , findExtension ) ; for ( int i = 0 ; i < filenames.size( ) ; i++ ) std::cout << filenames[i] << " has extension : " << extensions[i] << " !\n" ; return 0 ;
} </lang>
- Output:
picture.jpg has extension : .jpg ! http://mywebsite.com/picture/image.png has extension : .png ! myuniquefile.longextension has extension : .longextension ! IAmAFileWithoutExtension has extension : ! /path/to.my/file has extension : ! file.odd_one has extension : !
C#
<lang C#>public static string FindExtension(string filename) {
int indexOfDot = filename.Length; for (int i = filename.Length - 1; i >= 0; i--) { char c = filename[i]; if (c == '.') { indexOfDot = i; break; } if (c >= '0' && c <= '9') continue; if (c >= 'A' && c <= 'Z') continue; if (c >= 'a' && c <= 'z') continue; break; } //The dot must be followed by at least one other character, //so if the last character is a dot, return the empty string return indexOfDot + 1 == filename.Length ? "" : filename.Substring(indexOfDot);
}</lang>
Using regular expressions (C# 6) <lang C#>public static string FindExtension(string filename) => Regex.Match(filename, @"\.[A-Za-z0-9]+$").Value;</lang>
Emacs Lisp
<lang Lisp>(file-name-extension "foo.txt") => "txt"</lang>
No extension is distinguished from empty extension but an (or ... "")
can give ""
for both if desired
<lang Lisp>(file-name-extension "foo.") => "" (file-name-extension "foo") => nil</lang>
An Emacs backup ~
or .~NUM~
are not part of the extension, but otherwise any characters are allowed.
<lang Lisp>(file-name-extension "foo.txt~") => "txt" (file-name-extension "foo.txt.~1.234~") => "txt"</lang>
Forth
<lang forth>: invalid? ( c -- f )
toupper dup [char] A [char] Z 1+ within swap [char] 0 [char] 9 1+ within or 0= ;
- extension ( addr1 u1 -- addr2 u2 )
dup 0= if exit then 2dup over + begin 1- 2dup <= while dup c@ invalid? until then \ no '.' found 2dup - 0> if 2drop dup /string exit then \ invalid char dup c@ [char] . <> if 2drop dup /string exit then swap - \ '.' is last char 2dup 1+ = if drop dup then /string ;
- type.quoted ( addr u -- )
[char] ' emit type [char] ' emit ;
- test ( addr u -- )
2dup type.quoted ." => " extension type.quoted cr ;
- tests
s" picture.jpg" test s" http://mywebsite.com/picture/image.png" test s" myuniquefile.longextension" test s" IAmAFileWithoutExtension" test s" /path/to.my/file" test s" file.odd_one" test s" IDontHaveAnExtension." test ;</lang>
- Output:
cr tests 'picture.jpg' => '.jpg' 'http://mywebsite.com/picture/image.png' => '.png' 'myuniquefile.longextension' => '.longextension' 'IAmAFileWithoutExtension' => '' '/path/to.my/file' => '' 'file.odd_one' => '' 'IDontHaveAnExtension.' => '' ok
Fortran
The plan is to scan backwards from the end of the text until a non-extensionish character is encountered. If it is a period, then a valid file extension has been spanned. Otherwise, no extension. Yet again the "no specification" on the possibility of shortcut evaluation of compound logical expressions prevents the structured use of a DO WHILE(L1 > 0 & etc) loop because the possible evaluation of both parts of the expression means that the second part may attempt to access character zero of a text. So, the compound expression has to be broken into two separate parts.
The source incorporates a collection of character characterisations via suitable spans of a single sequence of characters. Unfortunately, the PARAMETER statement does not allow its constants to appear in EQUIVALENCE statements, so the text is initialised by DATA statements, and thus loses the protection of read-only given to constants defined via PARAMETER statements. The statements are from a rather more complex text scanning scheme, as all that are needed here are the symbols of GOODEXT.
The text scan could instead check for a valid character via something like ("a" <= C & C <= "z") | ("A" <= c & C <= "Z") | (0 <= c & C <= "9")
but this is not just messy but unreliable - in EBCDIC for example there are gaps in the sequence of letters that are occupied by other symbols. So instead, a test via INDEX into a sequence of all the valid symbols. If one was in a hurry, for eight-bit character codes, an array GOODEXT of 256 logical values could be indexed by the numerical value of the character. <lang Fortran> MODULE TEXTGNASH !Some text inspection.
CHARACTER*10 DIGITS !Integer only. CHARACTER*11 DDIGITS !With a full stop masquerading as a decimal point. CHARACTER*13 SDDIGITS !Signed decimal digits. CHARACTER*4 EXPONENTISH !With exponent parts. CHARACTER*17 NUMBERISH !The complete mix. CHARACTER*16 HEXLETTERS !Extended for base sixteen. CHARACTER*62 DIGILETTERS !File nameish but no . CHARACTER*26 LITTLELETTERS,BIGLETTERS !These are well-known. CHARACTER*52 LETTERS !The union thereof. CHARACTER*66 NAMEISH !Allowing digits and . and _ as well. CHARACTER*3 ODDITIES !And allow these in names also. CHARACTER*1 CHARACTER(72) !Prepare a work area. EQUIVALENCE !Whose components can be fingered. 1 (CHARACTER( 1),EXPONENTISH,NUMBERISH), !Start with numberish symbols that are not nameish. 2 (CHARACTER( 5),SDDIGITS), !Since the sign symbols are not nameish. 3 (CHARACTER( 7),DDIGITS,NAMEISH), !Computerish names might incorporate digits and a . 4 (CHARACTER( 8),DIGITS,HEXLETTERS,DIGILETTERS), !A proper name doesn't start with a digit. 5 (CHARACTER(18),BIGLETTERS,LETTERS), !Just with a letter. 6 (CHARACTER(44),LITTLELETTERS), !The second set. 7 (CHARACTER(70),ODDITIES) !Tack this on the end. DATA EXPONENTISH /"eEdD"/ !These on the front. DATA SDDIGITS /"+-.0123456789"/ !Any of these can appear in a floating point number. DATA BIGLETTERS /"ABCDEFGHIJKLMNOPQRSTUVWXYZ"/ !Simple. DATA LITTLELETTERS /"abcdefghijklmnopqrstuvwxyz"/ !Subtly different. DATA ODDITIES /"_:#"/ !Allow these in names also. This strains := usage!
CHARACTER*62 GOODEXT !These are all the characters allowed EQUIVALENCE (CHARACTER(8),GOODEXT)
c PARAMETER (GOODEXT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" !for an approved c 1 //"abcdefghijklmnopqrstuvwxyz" !file "extension" part c 2 //"0123456789") !Of a file name.
INTEGER MEXT !A fixed bound. PARAMETER (MEXT = 28) !This should do. CONTAINS CHARACTER*(MEXT) FUNCTION FEXT(FNAME) !Return the file extension part. CHARACTER*(*) FNAME !May start with the file's path name blather. INTEGER L1,L2 !Fingers to the text. L2 = LEN(FNAME) !The last character of the file name. L1 = L2 !Starting at the end... 10 IF (L1.GT.0) THEN !Damnit, can't rely on DO WHILE(safe & test) IF (INDEX(GOODEXT,FNAME(L1:L1)).GT.0) THEN !So do the two parts explicitly. L1 = L1 - 1 !Well, that was a valid character for an extension. GO TO 10 !So, move back one and try again. END IF !Until the end of valid stuff. IF (FNAME(L1:L1).EQ.".") THEN !Stopped here. A proper introduction? L1 = L1 - 1 !Yes. Include the period. GO TO 20 !And escape. END IF !Otherwise, not valid stuff. END IF !Keep on moving back. L1 = L2 !If we're here, no period was found. 20 FEXT = FNAME(L1 + 1:L2) !The text of the extension. END FUNCTION FEXT !Possibly, blank. END MODULE TEXTGNASH !Enough for this.
PROGRAM POKE USE TEXTGNASH
WRITE (6,*) FEXT("Picture.jpg") WRITE (6,*) FEXT("http://mywebsite.com/picture/image.png") WRITE (6,*) FEXT("myuniquefile.longextension") WRITE (6,*) FEXT("IAmAFileWithoutExtension") WRITE (6,*) FEXT("/path/to.my/file") WRITE (6,*) FEXT("file.odd_one") WRITE (6,*) "Approved ",GOODEXT END</lang>
The output cheats a little, in that trailing spaces appear just as blankly as no spaces. The result of FEXT could be presented to TRIM (if that function is available), or the last non-blank could be found. With F2003, a scheme to enable character variables to be redefined to take on a current length is available, and so trailing spaces could no longer appear. This facility would also solve the endlessly annoying question of "how long is long enough", manifested in parameter MEXT being what might be a perfect solution. Once, three was the maximum extension length (not counting the period), then perhaps six, but now, what?
.jpg .png .longextension Approved 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Note that if FEXT were presented with a file name containing trailing spaces, it would declare no extention to be present.
Go
<lang go>package main
import ( "fmt" "path" )
// An exact copy of `path.Ext` from Go 1.4.2 for reference: func Ext(path string) string { for i := len(path) - 1; i >= 0 && path[i] != '/'; i-- { if path[i] == '.' { return path[i:] } } return "" }
// A variation that handles the extra non-standard requirement // that extensions shall only "consists of one or more letters or numbers". // // Note, instead of direct comparison with '0-9a-zA-Z' we could instead use: // case !unicode.IsLetter(rune(b)) && !unicode.IsNumber(rune(b)): // return "" // even though this operates on bytes instead of Unicode code points (runes), // it is still correct given the details of UTF-8 encoding. func ext(path string) string { for i := len(path) - 1; i >= 0; i-- { switch b := path[i]; { case b == '.': return path[i:] case '0' <= b && b <= '9': case 'a' <= b && b <= 'z': case 'A' <= b && b <= 'Z': default: return "" } } return "" }
func main() { tests := []string{ "picture.jpg", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one", // Extra, with unicode "café.png", "file.resumé", // with unicode combining characters "cafe\u0301.png", "file.resume\u0301", } for _, str := range tests { std := path.Ext(str) custom := ext(str) fmt.Printf("%38s\t→ %-8q", str, custom) if custom != std { fmt.Printf("(Standard: %q)", std) } fmt.Println() } }</lang>
- Output:
picture.jpg → ".jpg" http://mywebsite.com/picture/image.png → ".png" myuniquefile.longextension → ".longextension" IAmAFileWithoutExtension → "" /path/to.my/file → "" file.odd_one → "" (Standard: ".odd_one") café.png → ".png" file.resumé → "" (Standard: ".resumé") café.png → ".png" file.resumé → "" (Standard: ".resumé")
Haskell
<lang Haskell>module FileExtension
where
myextension :: String -> String myextension s
|not $ elem '.' s = "" |elem '/' extension || elem '_' extension = "" |otherwise = '.' : extension where
extension = reverse ( takeWhile ( /= '.' ) $ reverse s ) </lang>
- Output:
map myextension ["picture.jpg" , "http://mywebsite.com/picture/image.png" , "myuniquefile.longextension" , "IAmAFileWithoutExtension" , "/path/to.my/file" , "file.odd_one"] [".jpg",".png",".longextension","","",""]
J
Implementation:
<lang J>require'regex' ext=: '[.][a-zA-Z0-9]+$'&rxmatch ;@rxfrom ]</lang>
Obviously most of the work here is done by the regex implementation (pcre, if that matters - and this particular kind of expression tends to be a bit more concise expressed in perl than in J...).
Perhaps of interest is that this is an example of a J fork - here we have three verbs separated by spaces. Unlike a unix system fork (which spins up child process which is an almost exact clone of the currently running process), a J fork is three independently defined verbs. The two verbs on the edge get the fork's argument and the verb in the middle combines those two results.
The left verb uses rxmatch to find the beginning position of the match and its length. The right verb is the identity function. The middle verb extracts the desired characters from the original argument. (For a non-match, the length of the "match" is zero so the empty string is extracted.)
Alternative non-regex Implementation
<lang J>ext=: #~ [: +./\ e.&'.' *. [: -. [: +./\. -.@e.&('.',AlphaNum_j_)</lang>
Task examples:
<lang J> ext 'picture.jpg' .jpg
ext 'http://mywebsite.com/picture/image.png'
.png
Examples=: 'picture.jpg';'http://mywebsite.com/picture/image.png';'myuniquefile.longextension';'IAmAFileWithoutExtension';'/path/to.my/file';'file.odd_one' ext each Examples
┌────┬────┬──────────────┬┬┬┐ │.jpg│.png│.longextension││││ └────┴────┴──────────────┴┴┴┘</lang>
Java
<lang java>public class Test {
public static void main(String[] args) { String[] filenames = {"picture.jpg", "http://mywebsite.con/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one"};
for (String filename : filenames) { String ext = "null"; int idx = filename.lastIndexOf('.'); if (idx != -1) { String tmp = filename.substring(idx); if (tmp.matches("\\.[a-zA-Z0-9]+")) { ext = tmp; } } System.out.println(filename + " -> " + ext); } }
}</lang>
picture.jpg -> .jpg http://mywebsite.con/picture/image.png -> .png myuniquefile.longextension -> .longextension IAmAFileWithoutExtension -> null /path/to.my/file -> null file.odd_one -> null
jq
Pending resolution of the inconsistency in the task description as of this writing, the following definitions exclude the delimiting period.
In the first section, a version intended for jq version 1.4 is presented. A simpler definition using "match", a regex feature of subsequent versions of jq, is then given.
<lang jq>def file_extension:
def alphanumeric: explode | unique | reduce .[] as $i (true; if . then $i | (97 <= . and . <= 122) or (65 <= . and . <= 90) or (48 <= . and . <= 57) else false end ); rindex(".") as $ix | if $ix then .[1+$ix:] as $ext | if $ext|alphanumeric then $ext # or ".\($ext)" if the period is wanted else "" end else "" end;</lang>
<lang jq>def file_extension:
match( "\\.([a-zA-Z0-9]*$)" ) // false | if . then .captures[0].string else "" end ;</lang>
Examples:
Using either version above gives the same results. <lang jq>"picture.jpg", "myuniquefile.longextension", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one" | "\(.) has extension: \"\(file_extension)\""</lang>
- Output:
<lang sh>$ jq -r -n -f Extract_file_extension.jq picture.jpg has extension: "jpg" myuniquefile.longextension has extension: "longextension" http://mywebsite.com/picture/image.png has extension: "png" myuniquefile.longextension has extension: "longextension" IAmAFileWithoutExtension has extension: "" /path/to.my/file has extension: "" file.odd_one has extension: ""</lang>
Lua
<lang Lua>-- Lua pattern docs at http://www.lua.org/manual/5.1/manual.html#5.4.1 function fileExt (filename) return filename:match("(%.%w+)$") or "" end
local testCases = {
"picture.jpg", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one"
} for _, example in pairs(testCases) do
print(example .. ' -> "' .. fileExt(example) .. '"')
end</lang>
- Output:
picture.jpg -> ".jpg" http://mywebsite.com/picture/image.png -> ".png" myuniquefile.longextension -> ".longextension" IAmAFileWithoutExtension -> "" /path/to.my/file -> "" file.odd_one -> ""
Oforth
If extension is not valid, returns null, not "". Easy to change if "" is required.
<lang Oforth>: fileExt(s) { | i |
s lastIndexOf('.') dup ->i ifNull: [ null return ] s extract(i 1+, s size) conform(#isAlpha) ifFalse: [ null return ] s extract(i, s size)
} </lang>
- Output:
fileExt("picture.jpg") println fileExt("http://mywebsite.com/picture/image.png") println fileExt("myuniquefile.longextension") println fileExt("IAmAFileWithoutExtension") println fileExt("/path/to.my/file") println fileExt("file.odd_one") println
Perl
<lang perl>sub extension {
my $path = shift; $path =~ / \. [a-z0-9]+ $ /xi; $& // ;
}</lang>
Testing: <lang perl>printf "%-35s %-11s\n", $_, "'".extension($_)."'" for qw[
http://example.com/download.tar.gz CharacterModel.3DS .desktop document document.txt_backup /etc/pam.d/login
];</lang>
- Output:
http://example.com/download.tar.gz '.gz' CharacterModel.3DS '.3DS' .desktop '.desktop' document '' document.txt_backup '' /etc/pam.d/login ''
Perl 6
The built-in IO::Path
class has an .extension
method:
<lang perl6>say $path.IO.extension;</lang> Contrary to this task's specification, it
- doesn't include the dot in the output
- doesn't restrict the extension to letters and numbers.
Here's a custom implementation which does satisfy the task requirements:
<lang perl6>sub extension (Str $path --> Str) {
$path.match(/:i ['.' <[a..z0..9]>+]? $ /).Str
}</lang>
Testing:
<lang perl6>printf "%-35s %-11s %-12s\n", $_, extension($_).perl, $_.IO.extension.perl for <
http://example.com/download.tar.gz CharacterModel.3DS .desktop document document.txt_backup /etc/pam.d/login
>;</lang>
- Output:
http://example.com/download.tar.gz ".gz" "gz" CharacterModel.3DS ".3DS" "3DS" .desktop ".desktop" "desktop" document "" "" document.txt_backup "" "txt_backup" /etc/pam.d/login "" ""
Phix
<lang Phix>function getExtension(string filename)
for i=length(filename) to 1 by -1 do integer ch = filename[i] if ch='.' then return filename[i..$] end if if find(ch,"\\/_") then exit end if end for return ""
end function
constant tests = {"mywebsite.com/picture/image.png",
"http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one"}
for i=1 to length(tests) do
printf(1,"%s ==> %s\n",{tests[i],getExtension(tests[i])})
end for</lang>
- Output:
mywebsite.com/picture/image.png ==> .png http://mywebsite.com/picture/image.png ==> .png myuniquefile.longextension ==> .longextension IAmAFileWithoutExtension ==> /path/to.my/file ==> file.odd_one ==>
PowerShell
<lang PowerShell> function extension($file){
$ext = [System.IO.Path]::GetExtension($file) if (-not [String]::IsNullOrEmpty($ext)) { if($ext.IndexOf("_") -ne -1) {$ext = ""} } $ext
} extension "picture.jpg" extension "http://mywebsite.com/picture/image.png" extension "myuniquefile.longextension" extension "IAmAFileWithoutExtension" extension "/path/to.my/file" extension "file.odd_one" </lang> Output:
.jpg .png .longextension
Python
Uses os.path.splitext and the extended tests from the Go example above.
<lang python>Python 3.5.0a1 (v3.5.0a1:5d4b6a57d5fd, Feb 7 2015, 17:58:38) [MSC v.1900 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import os >>> tests = ["picture.jpg", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one", # Extra, with unicode "café.png", "file.resumé", # with unicode combining characters "cafe\u0301.png", "file.resume\u0301"] >>> for path in tests:
print("Path: %r -> Extension: %r" % (path, os.path.splitext(path)[-1]))
Path: 'picture.jpg' -> Extension: '.jpg'
Path: 'http://mywebsite.com/picture/image.png' -> Extension: '.png'
Path: 'myuniquefile.longextension' -> Extension: '.longextension'
Path: 'IAmAFileWithoutExtension' -> Extension:
Path: '/path/to.my/file' -> Extension:
Path: 'file.odd_one' -> Extension: '.odd_one'
Path: 'café.png' -> Extension: '.png'
Path: 'file.resumé' -> Extension: '.resumé'
Path: 'café.png' -> Extension: '.png'
Path: 'file.resumé' -> Extension: '.resumé'
>>> </lang>
Racket
<lang Racket>#lang racket
- Note that for a real implementation, Racket has a
- `filename-extension` in its standard library, but don't use it here
- since it requires a proper name (fails on ""), returns a byte-string,
- and handles path values so might run into problems with unicode
- string inputs.
(define (string-extension x)
(cadr (regexp-match #px"(\\.alnum:+|)$" x)))
(define (string-extension/unicode x)
(cadr (regexp-match #px"(\\.(?:\\p{L}|\\p{N}|\\p{M})+|)$" x)))
(define examples '("picture.jpg"
"http://mywebsite.com/picture/image.png" "myuniquefile.longextension" "IAmAFileWithoutExtension" "/path/to.my/file" "file.odd_one" "" ;; Extra, with unicode "café.png" "file.resumé" ;; with unicode combining characters "cafe\u0301.png" "file.resume\u0301"))
(printf "Official task:\n") (for ([x (in-list examples)])
(printf "~s ==> ~s\n" x (string-extension x)))
(printf "\nWith unicode support:\n") (for ([x (in-list examples)])
(printf "~s ==> ~s\n" x (string-extension/unicode x)))
</lang>
- Output:
Official task: "picture.jpg" ==> ".jpg" "http://mywebsite.com/picture/image.png" ==> ".png" "myuniquefile.longextension" ==> ".longextension" "IAmAFileWithoutExtension" ==> "" "/path/to.my/file" ==> "" "file.odd_one" ==> "" "" ==> "" "café.png" ==> ".png" "file.resumé" ==> "" "café.png" ==> ".png" "file.resumé" ==> "" With unicode support: "picture.jpg" ==> ".jpg" "http://mywebsite.com/picture/image.png" ==> ".png" "myuniquefile.longextension" ==> ".longextension" "IAmAFileWithoutExtension" ==> "" "/path/to.my/file" ==> "" "file.odd_one" ==> "" "" ==> "" "café.png" ==> ".png" "file.resumé" ==> ".resumé" "café.png" ==> ".png" "file.resumé" ==> ".resumé"
REXX
Using this paraphrased Rosetta Code task's definition that:
a legal file extension only consists of mixed-case Latin letters and/or decimal digits. <lang rexx>/*REXX pgm extracts the file extension (defined above from the RC task) from a file name*/ @.= /*define default value for the @ array.*/ parse arg fID /*obtain any optional arguments from CL*/ if fID\== then @.1 = fID /*use the filename from the C.L. */
else do /*No filename given? Then use defaults.*/ @.1 = 'http://example.com/download.tar.gz' @.2 = 'CharacterModel.3DS' @.3 = '.desktop' @.4 = 'document' @.5 = 'document.txt_backup' @.6 = '/etc/pam.d/login' end
do j=1 while @.j\==; x= /*process (all of) the file name(s). */ p=lastpos(., @.j) /*find the last position of a period. */ if p\==0 then x=substr(@.j, p+1) /*Found a dot? Then get stuff after it*/ if \datatype(x, 'A') then x= /*Not upper/lowercase letters | digits?*/ if x== then x= " [null]" /*use a better name for a "null" ext.*/ else x= . || x /*prefix the extension with a period. */ say 'file extension=' left(x, 20) "for file name=" @.j end /*j*/ /*stick a fork in it, we're all done. */</lang>
output when using the default (internal) inputs:
file extension= .jpg for file name= picture.jpg file extension= .png for file name= http://mywebsite.com/pictures/image.png file extension= .longextension for file name= myuniquefile.longextension file extension= [null] for file name= IAmAFileWithoutExtension file extension= [null] for file name= /path/to.my/file file extension= [null] for file name= file.odd_one
sed
<lang sed>s:.*\.:.: s:\(^[^.]\|.*[/_]\).*::</lang> or <lang bash>sed -re 's:.*\.:.:' -e 's:(^[^.]|.*[/_]).*::'</lang>
- Output:
.jpg .png .longextension IAmAFileWithoutExtension
Sidef
<lang ruby>func extension (filename) {
given(filename.split('.').last) { when(filename) { "" } when(/[\/_]/) { "" } default { "." + _ } }
}
['mywebsite.com/picture/image.png',
'http://mywebsite.com/picture/image.png', 'myuniquefile.longextension', 'IAmAFileWithoutExtension', '/path/to.my/file', 'file.odd_one',
].each {|f| say "#{f} -> #{extension(f).dump}" }</lang>
- Output:
mywebsite.com/picture/image.png -> ".png" http://mywebsite.com/picture/image.png -> ".png" myuniquefile.longextension -> ".longextension" IAmAFileWithoutExtension -> "" /path/to.my/file -> "" file.odd_one -> ""
Tcl
Tcl's built in file extension command already almost knows how to do this, except it accepts any character after the dot. Just for fun, we'll enhance the builtin with a new subcommand with the limitation specified for this problem.
<lang Tcl>proc assert {expr} { ;# for "static" assertions that throw nice errors
if {![uplevel 1 [list expr $expr]]} { set msg "{$expr}" catch {append msg " {[uplevel 1 [list subst -noc $expr]]}"} tailcall throw {ASSERT ERROR} $msg }
}
proc file_ext {file} {
set res "" regexp {(\.[a-z0-9]+)$} $file -> res return $res
}
set map [namespace ensemble configure file -map] dict set map ext ::file_ext namespace ensemble configure file -map $map
- and a test:
foreach {file ext} {
picture.jpg .jpg http://mywebsite.com/picture/image.png .png myuniquefile.longextension .longextension IAmAFileWithoutExtension "" /path/to.my/file "" file.odd_one ""
} {
set res "" assert {[file ext $file] eq $ext}
}</lang>
VBScript
<lang vb> Function GetExtension(s) If InStr(s,"/") Then arr_s = Split(s,"/") fname = arr_s(UBound(arr_s)) ElseIf InStr(s,"\") Then arr_s = Split(s,"\") fname = arr_s(UBound(arr_s)) Else fname = s End If GetExtension = "" If InStr(fname,".") Then arr_x = Split(fname,".") If InStr(1,arr_x(UBound(arr_x)),"_") = 0 Then GetExtension = "." & arr_x(UBound(arr_x)) End If End If End Function
'Testing the function arr_t = Array("picture.jpg","http://mywebsite.com/picture/image.png",_ "myuniquefile.longextension","IAmAFileWithoutExtension",_ "/path/to.my/file","file.odd_one") For i = 0 To UBound(arr_t) WScript.StdOut.WriteLine arr_t(i) & " -> " & GetExtension(arr_t(i)) Next </lang>
- Output:
picture.jpg -> .jpg http://mywebsite.com/picture/image.png -> .png myuniquefile.longextension -> .longextension IAmAFileWithoutExtension -> '' /path/to.my/file -> '' file.odd_one -> ''
zkl
The File object has a method splitFileName that does just that, returning a list of the parts. The method knows about the OS it was compiled on (Unix, Windows). <lang zkl>valid:=Walker.chain(".",["a".."z"],["A".."Z"],["0".."9")).sink(String).walk(); foreach nm in (T("http://example.com/download.tar.gz","CharacterModel.3DS", ".desktop","document", "document.txt_backup","/etc/pam.d/login")){
ext:=File.splitFileName(nm)[-1]; if(ext-valid) ext=""; println("%35s : %s".fmt(nm,ext));
}</lang>
- Output:
Note: on Unix, .desktop is a hidden file, not an extension.
http://example.com/download.tar.gz : .gz CharacterModel.3DS : .3DS .desktop : document : document.txt_backup : /etc/pam.d/login :
- Draft Programming Tasks
- ALGOL 68
- ALGOL W
- AWK
- AWK examples needing attention
- Examples needing attention
- C
- C examples needing attention
- C++
- C++ examples needing attention
- C sharp
- Emacs Lisp
- Forth
- Forth examples needing attention
- Fortran
- Fortran examples needing attention
- Go
- Go examples needing attention
- Haskell
- Haskell examples needing attention
- J
- J examples needing attention
- Java
- Java examples needing attention
- Jq
- Jq examples needing attention
- Lua
- Lua examples needing attention
- Oforth
- Oforth examples needing attention
- Perl
- Perl 6
- Phix
- Phix examples needing attention
- PowerShell
- PowerShell examples needing attention
- Python
- Python examples needing attention
- Racket
- Racket examples needing attention
- REXX
- Sed
- Sed examples needing attention
- Sidef
- Sidef examples needing attention
- Tcl
- Tcl examples needing attention
- VBScript
- VBScript examples needing attention
- Zkl