Extract file extension
Filename extensions are a rudimentary but commonly used way of identifying files types.
- Task
Write a function or program that
- takes one string argument representing the path/URL to a file
- returns the filename extension according to the below specification, or an empty string if the filename has no extension.
If your programming language (or standard library) has built-in functionality for extracting a filename extension,
show how it would be used and how exactly its behavior differs from this specification.
- Specification
For the purposes of this task, a filename extension:
- occurs at the very end of the filename
- consists of a period, followed solely by one or more ASCII letters or digits (A-Z, a-z, 0-9)
- Test cases
Input Output Comment http://example.com/download.tar.gz
.gz
CharacterModel.3DS
.3DS
.desktop
.desktop
document
empty string document.txt_backup
empty string, because _
is not a letter or number/etc/pam.d/login
empty string, as the period is in the parent directory name rather than the filename
- Metrics
- Counting
- Word frequency
- Letter frequency
- Jewels and stones
- I before E except after C
- Bioinformatics/base count
- Count occurrences of a substring
- Count how many vowels and consonants occur in a string
- Remove/replace
- XXXX redacted
- Conjugate a Latin verb
- Remove vowels from a string
- String interpolation (included)
- Strip block comments
- Strip comments from a string
- Strip a set of characters from a string
- Strip whitespace from a string -- top and tail
- Strip control codes and extended characters from a string
- Anagrams/Derangements/shuffling
- Word wheel
- ABC problem
- Sattolo cycle
- Knuth shuffle
- Ordered words
- Superpermutation minimisation
- Textonyms (using a phone text pad)
- Anagrams
- Anagrams/Deranged anagrams
- Permutations/Derangements
- Find/Search/Determine
- ABC words
- Odd words
- Word ladder
- Semordnilap
- Word search
- Wordiff (game)
- String matching
- Tea cup rim text
- Alternade words
- Changeable words
- State name puzzle
- String comparison
- Unique characters
- Unique characters in each string
- Extract file extension
- Levenshtein distance
- Palindrome detection
- Common list elements
- Longest common suffix
- Longest common prefix
- Compare a list of strings
- Longest common substring
- Find common directory path
- Words from neighbour ones
- Change e letters to i in words
- Non-continuous subsequences
- Longest common subsequence
- Longest palindromic substrings
- Longest increasing subsequence
- Words containing "the" substring
- Sum of the digits of n is substring of n
- Determine if a string is numeric
- Determine if a string is collapsible
- Determine if a string is squeezable
- Determine if a string has all unique characters
- Determine if a string has all the same characters
- Longest substrings without repeating characters
- Find words which contains all the vowels
- Find words which contain the most consonants
- Find words which contains more than 3 vowels
- Find words whose first and last three letters are equal
- Find words with alternating vowels and consonants
- Formatting
- Substring
- Rep-string
- Word wrap
- String case
- Align columns
- Literals/String
- Repeat a string
- Brace expansion
- Brace expansion using ranges
- Reverse a string
- Phrase reversals
- Comma quibbling
- Special characters
- String concatenation
- Substring/Top and tail
- Commatizing numbers
- Reverse words in a string
- Suffixation of decimal numbers
- Long literals, with continuations
- Numerical and alphabetical suffixes
- Abbreviations, easy
- Abbreviations, simple
- Abbreviations, automatic
- Song lyrics/poems/Mad Libs/phrases
- Mad Libs
- Magic 8-ball
- 99 bottles of beer
- The Name Game (a song)
- The Old lady swallowed a fly
- The Twelve Days of Christmas
- Tokenize
- Text between
- Tokenize a string
- Word break problem
- Tokenize a string with escaping
- Split a character string based on change of character
- Sequences
11l
F extract_ext(path)
V m = re:‘\.[A-Za-z0-9]+$’.search(path)
R I m {m.group(0)} E ‘’
V paths = [‘http://example.com/download.tar.gz’,
‘CharacterModel.3DS’,
‘.desktop’,
‘document’,
‘document.txt_backup’,
‘/etc/pam.d/login’]
L(path) paths
print(path.rjust(max(paths.map(p -> p.len)))‘ -> ’extract_ext(path))
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login ->
Action!
INCLUDE "D2:CHARTEST.ACT" ;from the Action! Tool Kit
PROC FileExt(CHAR ARRAY path,ext)
BYTE pos,c
pos=path(0)
ext(0)=0
WHILE pos>0
DO
c=path(pos)
IF c='. THEN
EXIT
ELSEIF IsDigit(c)=0 AND IsAlpha(c)=0 THEN
RETURN
FI
pos==-1
OD
IF pos=0 THEN
RETURN
FI
SCopyS(ext,path,pos,path(0))
RETURN
PROC Test(CHAR ARRAY path)
CHAR ARRAY ext(10)
FileExt(path,ext)
PrintF("""%S"":%E""%S""%E%E",path,ext)
RETURN
PROC Main()
Put(125) PutE() ;clear the screen
Test("http://example.com/download.tar.gz")
Test("CharacterModel.3DS")
Test(".desktop")
Test("document")
Test("document.txt_backup")
Test("/etc/pam.d/login")
RETURN
- Output:
Screenshot from Atari 8-bit computer
"http://example.com/download.tar.gz": ".gz" "CharacterModel.3DS": ".3DS" ".desktop": ".desktop" "document": "" "document.txt_backup": "" "/etc/pam.d/login": ""
Ada
As originally specified
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
use Ada.Strings;
with Ada.Characters.Handling; use Ada.Characters.Handling;
procedure Main is
function extension (S : in String) return String is
P_Index : Natural;
begin
P_Index :=
Index (Source => S, Pattern => ".", From => S'Last, Going => Backward);
if P_Index = 0 then
return "";
else
for C of S (P_Index + 1 .. S'Last) loop
if not Is_Alphanumeric (C) then
return "";
end if;
end loop;
return S (P_Index .. S'Last);
end if;
end extension;
F1 : String := "http://example.com/download.tar.gz";
F2 : String := "CharacterModel.3DS";
F3 : String := ".desktop";
F4 : String := "document";
F5 : String := "document.txt_backup";
F6 : String := "/etc/pam.d/login:";
begin
Put_Line (F1 & " -> " & extension (F1));
Put_Line (F2 & " -> " & extension (F2));
Put_Line (F3 & " -> " & extension (F3));
Put_Line (F4 & " -> " & extension (F4));
Put_Line (F5 & " -> " & extension (F5));
Put_Line (F6 & " -> " & extension (F6));
end Main;
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login: ->
In response to problem discussions
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
use Ada.Strings;
with Ada.Characters.Handling; use Ada.Characters.Handling;
procedure Main is
function extension (S : in String) return String is
P_Index : Natural;
begin
P_Index :=
Index (Source => S, Pattern => ".", From => S'Last, Going => Backward);
if P_Index < 2 or else P_Index = S'Last then
return "";
else
for C of S (P_Index + 1 .. S'Last) loop
if not Is_Alphanumeric (C) then
return "";
end if;
end loop;
return S (P_Index .. S'Last);
end if;
end extension;
F1 : String := "http://example.com/download.tar.gz";
F2 : String := "CharacterModel.3DS";
F3 : String := ".desktop";
F4 : String := "document";
F5 : String := "document.txt_backup";
F6 : String := "/etc/pam.d/login:";
F7 : String := "filename.";
F8 : String := ".";
begin
Put_Line (F1 & " -> " & extension (F1));
Put_Line (F2 & " -> " & extension (F2));
Put_Line (F3 & " -> " & extension (F3));
Put_Line (F4 & " -> " & extension (F4));
Put_Line (F5 & " -> " & extension (F5));
Put_Line (F6 & " -> " & extension (F6));
Put_Line (F7 & " -> " & extension (F7));
Put_Line (F8 & " -> " & extension (F8));
end Main;
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> document -> document.txt_backup -> /etc/pam.d/login: -> filename. -> . ->
ALGOL 68
# extracts a file-extension from the end of a pathname. The file extension is #
# defined as a dot followed by one or more letters or digits #
OP EXTENSION = ( STRING pathname )STRING:
IF LWB pathname >= UPB pathname THEN
# the pathname has 0 or 1 characters and so has no extension #
""
ELIF NOT isalnum( pathname[ UPB pathname ] ) THEN
# the final character is not a letter or digit - no extension #
""
ELSE
# could have an extension #
INT pos := UPB pathname;
WHILE pos > LWB pathname AND isalnum( pathname[ pos ] ) DO
pos -:= 1
OD;
IF pathname[ pos ] = "." THEN
# the character before the letters and digits was a "." #
pathname[ pos : ]
ELSE
# no "." before the letters and digits - no extension #
""
FI
FI ; # EXTENSION #
# test the EXTENSION operator #
PROC test extension = ( STRING pathname, STRING expected extension )VOID:
BEGIN
STRING extension = EXTENSION pathname;
write( ( ( pathname
+ " got extension: ("
+ extension
+ ") "
+ IF extension = expected extension THEN "" ELSE "NOT" FI
+ " as expected"
)
, newline
)
)
END ; # text extension #
main:
( test extension( "http://example.com/download.tar.gz", ".gz" )
; test extension( "CharacterModel.3DS", ".3DS" )
; test extension( ".desktop", ".desktop" )
; test extension( "document", "" )
; test extension( "document.txt_backup", "" )
; test extension( "/etc/pam.d/login", "" )
)
- Output:
http://example.com/download.tar.gz got extension: (.gz) as expected CharacterModel.3DS got extension: (.3DS) as expected .desktop got extension: (.desktop) as expected document got extension: () as expected document.txt_backup got extension: () as expected /etc/pam.d/login got extension: () as expected
ALGOL W
begin
% extracts a file-extension from the end of a pathname. %
% The file extension is defined as a dot followed by one or more letters %
% or digits. As Algol W only has fixed length strings we limit the %
% extension to 32 characters and the pathname to 256 (the longest string %
% allowed by Algol W) %
string(32) procedure extension( string(256) value pathname ) ;
begin
integer pathPos;
% position to the previous character in the pathname %
procedure prev ; pathPos := pathPos - 1;
% get the character as pathPos from pathname %
string(1) procedure ch ; pathname( pathPos // 1 );
% checks for a letter or digit - assumes the letters are contiguous %
% in the character set - not true for EBCDIC %
logical procedure isLetterOrDigit( string(1) value c ) ;
( c <= "z" and c >= "a" ) or ( c <= "Z" and c >= "A" )
or ( c <= "9" and c >= "0" ) ;
% find the length of the pathname with trailing blanks removed %
pathPos := 255;
while pathPos >= 0 and ch = " " do prev;
% extract the extension if possible %
if pathPos <= 0
then "" % no extension: 0 or 1 character pathname %
else if not isLetterOrDigit( ch )
then "" % no extension: last character not a letter/digit %
else begin
while pathPos > 0 and isLetterOrDigit( ch ) do prev;
if ch not = "."
then "" % no extension: letters/digits not preceeded by "." %
else begin
% have an extension %
string(32) ext;
ext := " ";
% algol W substring lengths must be compile-time constants %
% hence the loop to copy the extension characters %
for charPos := 0 until 31 do begin
if pathPos <= 255 then begin
ext( charPos // 1 ) := pathname( pathPos // 1 );
pathPos := pathPos + 1
end
end for_charPos ;
ext
end
end
end extension ;
% test the extension procedure %
procedure testExtension( string(256) value pathname
; string(32) value expectedExtension
) ;
begin
string(32) ext;
ext := extension( pathname );
write( pathname( 0 // 40 )
, " -> ("
, ext( 0 // 16 )
, ") "
, if ext = expectedExtension then "" else "NOT"
, " as expected"
)
end ; % text extension %
testExtension( "http://example.com/download.tar.gz", ".gz" );
testExtension( "CharacterModel.3DS", ".3DS" );
testExtension( ".desktop", ".desktop" );
testExtension( "document", "" );
testExtension( "document.txt_backup", "" );
testExtension( "/etc/pam.d/login", "" );
end.
- Output:
http://example.com/download.tar.gz -> (.gz ) as expected CharacterModel.3DS -> (.3DS ) as expected .desktop -> (.desktop ) as expected document -> ( ) as expected document.txt_backup -> ( ) as expected /etc/pam.d/login -> ( ) as expected
AppleScript
AppleScript paths can have either of two formats, depending on the system used to access the items on a disk or network. The current task specification implies that the slash-separated "POSIX" format is intended in all cases. Some macOS "files" are actually directories called "bundles" or "packages". Their paths may or may not end with separators. Underscores are valid extension characters in macOS and extensions are returned without leading dots. When extracting extensions in AppleScript, one would normally follow the rules for macOS, but variations are possible. The task specification is taken at its word that the input strings do represent paths to files.
Vanilla
on getFileNameExtension from txt given underscores:keepingUnderscores : true, dot:includingDot : false
set astid to AppleScript's text item delimiters
-- Extract the file or bundle name from the path.
set AppleScript's text item delimiters to "/"
if (txt ends with "/") then
set itemName to text item -2 of txt
else
set itemName to text item -1 of txt
end if
-- Extract the extension.
if (itemName contains ".") then
set AppleScript's text item delimiters to "."
set extn to text item -1 of itemName
if ((not keepingUnderscores) and (extn contains "_")) then set extn to ""
if ((includingDot) and (extn > "")) then set extn to "." & extn
else
set extn to ""
end if
set AppleScript's text item delimiters to astid
return extn
end getFileNameExtension
set output to {}
repeat with thisString in {"http://example.com/download.tar.gz", "CharacterModel.3DS", ".desktop", "document", "document.txt_backup", "/etc/pam.d/login"}
set end of output to {thisString's contents, getFileNameExtension from thisString with dot without underscores}
end repeat
return output
- Output:
{{"http://example.com/download.tar.gz", ".gz"}, {"CharacterModel.3DS", ".3DS"}, {".desktop", ".desktop"}, {"document", ""}, {"document.txt_backup", ""}, {"/etc/pam.d/login", ""}}
ASObjC
AppleScriptObjectiveC makes the task a little easier, but not necessarily more efficient.
use AppleScript version "2.4" -- Mac OS X 10.10 (Yosemite) or later.
use framework "Foundation"
on getFileNameExtension from txt given underscores:keepingUnderscores : true, dot:includingDot : false
-- Get an NSString version of the text and extract the 'pathExtension' from that as AppleScript text.
set txt to current application's class "NSString"'s stringWithString:(txt)
set extn to txt's pathExtension() as text
if ((not keepingUnderscores) and (extn contains "_")) then set extn to ""
if ((includingDot) and (extn > "")) then set extn to "." & extn
return extn
end getFileNameExtension
set output to {}
repeat with thisString in {"http://example.com/download.tar.gz", "CharacterModel.3DS", ".desktop", "document", "document.txt_backup", "/etc/pam.d/login"}
set end of output to {thisString's contents, getFileNameExtension from thisString with dot without underscores}
end repeat
return output
- Output:
{{"http://example.com/download.tar.gz", ".gz"}, {"CharacterModel.3DS", ".3DS"}, {".desktop", ".desktop"}, {"document", ""}, {"document.txt_backup", ""}, {"/etc/pam.d/login", ""}}
AutoHotkey
data := ["http://example.com/download.tar.gz"
,"CharacterModel.3DS"
,".desktop"
,"document"
,"document.txt_backup"
,"/etc/pam.d/login"]
for i, file in data{
RegExMatch(file, "`am)\.\K[a-zA-Z0-9]+$", ext)
result .= file " --> " ext "`n"
}
MsgBox % result
- Output:
http://example.com/download.tar.gz --> gz CharacterModel.3DS --> 3DS .desktop --> desktop document --> document.txt_backup --> /etc/pam.d/login -->
AWK
The following code shows two methods.
The first one was provided by an earlier contributor and shows a little more awk syntax and builtins (albeit with a bug fixed: it was testing for underscores in the extension but not other characters such as hyphens). It can be adjusted to allow any character in the extension other than /, \, : or . by replacing [^a-zA-Z0-9]
with [\\/\\\\:\\.]
.
BEGIN {
arr[++i] = "picture.jpg"
arr[++i] = "http://mywebsite.com/picture/image.png"
arr[++i] = "myuniquefile.longextension"
arr[++i] = "IAmAFileWithoutExtension"
arr[++i] = "/path/to.my/file"
arr[++i] = "file.odd_one"
for (j=1; j<=i; j++) {
printf("%-40s '%s'\n",arr[j],extract_ext(arr[j]))
}
exit(0)
}
function extract_ext(fn, sep1,sep2,tmp) {
while (fn ~ (sep1 = ":|\\\\|\\/")) { # ":" or "\" or "/"
fn = substr(fn,match(fn,sep1)+1)
}
while (fn ~ (sep2 = "\\.")) { # "."
fn = substr(fn,match(fn,sep2)+1)
tmp = 1
}
if (fn ~ /[^a-zA-Z0-9]/ || tmp == 0) {
return("")
}
return(fn)
}
The second method is shorter and dispenses with the need to search for and remove the path components first. It too can be modified to allow all valid extensions (not just those described in the specification), by replacing \\.[A-Za-z0-9]+$
with \\.[^\\/\\\\:\\.]+$
.
BEGIN {
arr[++i] = "picture.jpg"
arr[++i] = "http://mywebsite.com/picture/image.png"
arr[++i] = "myuniquefile.longextension"
arr[++i] = "IAmAFileWithoutExtension"
arr[++i] = "/path/to.my/file"
arr[++i] = "file.odd_one"
for (j=1; j<=i; j++) {
printf("%-40s '%s'\n",arr[j],extract_ext(arr[j]))
}
exit(0)
}
function extract_ext(fn, pos) {
pos = match(fn, "\\.[^\\/\\\\:\\.]+$")
if (pos == 0) {
return ("")
} else {
return (substr(fn,pos+1))
}
}
Both examples give the output:
picture.jpg 'jpg' http://mywebsite.com/picture/image.png 'png' myuniquefile.longextension 'longextension' IAmAFileWithoutExtension '' /path/to.my/file '' file.odd_one ''
Batch File
@echo off
:loop
if "%~1"=="" exit /b
echo File Path: "%~1" ^| File Extension "%~x1"
shift
goto loop
- Output:
File Path: "http://example.com/download.tar.gz" | File Extension ".gz" File Path: "CharacterModel.3DS" | File Extension ".3DS" File Path: ".desktop" | File Extension ".desktop" File Path: "document" | File Extension "" File Path: "document.txt_backup" | File Extension ".txt_backup" File Path: "/etc/pam.d/login" | File Extension ""
BCPL
get "libhdr"
// Find filename extension, store at `v'
let extension(s, v) = valof
$( let loc = valof
$( for i=s%0 to 1 by -1
$( let c = s%i
if c = '.'
resultis i
unless 'A'<=c<='Z' | 'a'<=c<='z' | '0'<=c<='9'
resultis 0
$)
resultis 0
$)
test loc=0 do
v%0 := 0
or
$( v%0 := s%0-loc+1
for i=1 to v%0 do
v%i := s%(i+loc-1)
$)
resultis v
$)
let show(s) be
$( let v = vec 32
writef("*"%S*": *"%S*"*N", s, extension(s, v))
$)
let start() be
$( show("http://example.com/download.tar.gz")
show("CharacterModel.3DS")
show(".desktop")
show("document")
show("document.txt_backup")
show("/etc/pam.d/login")
$)
- Output:
"http://example.com/download.tar.gz": ".gz" "CharacterModel.3DS": ".3DS" ".desktop": ".desktop" "document": "" "document.txt_backup": "" "/etc/pam.d/login": ""
C
#include <assert.h>
#include <ctype.h>
#include <string.h>
#include <stdio.h>
/* Returns a pointer to the extension of 'string'.
* If no extension is found, returns a pointer to the end of 'string'. */
char* file_ext(const char *string)
{
assert(string != NULL);
char *ext = strrchr(string, '.');
if (ext == NULL)
return (char*) string + strlen(string);
for (char *iter = ext + 1; *iter != '\0'; iter++) {
if (!isalnum((unsigned char)*iter))
return (char*) string + strlen(string);
}
return ext;
}
int main(void)
{
const char *testcases[][2] = {
{"http://example.com/download.tar.gz", ".gz"},
{"CharacterModel.3DS", ".3DS"},
{".desktop", ".desktop"},
{"document", ""},
{"document.txt_backup", ""},
{"/etc/pam.d/login", ""}
};
int exitcode = 0;
for (size_t i = 0; i < sizeof(testcases) / sizeof(testcases[0]); i++) {
const char *ext = file_ext(testcases[i][0]);
if (strcmp(ext, testcases[i][1]) != 0) {
fprintf(stderr, "expected '%s' for '%s', got '%s'\n",
testcases[i][1], testcases[i][0], ext);
exitcode = 1;
}
}
return exitcode;
}
C#
public static string FindExtension(string filename) {
int indexOfDot = filename.Length;
for (int i = filename.Length - 1; i >= 0; i--) {
char c = filename[i];
if (c == '.') {
indexOfDot = i;
break;
}
if (c >= '0' && c <= '9') continue;
if (c >= 'A' && c <= 'Z') continue;
if (c >= 'a' && c <= 'z') continue;
break;
}
//The dot must be followed by at least one other character,
//so if the last character is a dot, return the empty string
return indexOfDot + 1 == filename.Length ? "" : filename.Substring(indexOfDot);
}
Using regular expressions (C# 6)
public static string FindExtension(string filename) => Regex.Match(filename, @"\.[A-Za-z0-9]+$").Value;
C++
#include <iostream>
#include <filesystem>
int main() {
for (std::filesystem::path file : { "picture.jpg",
"http://mywebsite.com/picture/image.png",
"myuniquefile.longextension",
"IAmAFileWithoutExtension",
"/path/to.my/file",
"file.odd_one",
"thisismine." }) {
std::cout << file << " has extension : " << file.extension() << '\n' ;
}
}
- Output:
"picture.jpg" has extension : ".jpg" "http://mywebsite.com/picture/image.png" has extension : ".png" "myuniquefile.longextension" has extension : ".longextension" "IAmAFileWithoutExtension" has extension : "" "/path/to.my/file" has extension : "" "file.odd_one" has extension : ".odd_one" "thisismine." has extension : "."
Clojure
(defn file-extension [s]
(second (re-find #"(\.[a-zA-Z0-9]+)$" s)))
- Output:
(map file-extension ["http://example.com/download.tar.gz" "CharacterModel.3DS" ".desktop" "document" "document.txt_backup" "/etc/pam.d/login"]) (".gz" ".3DS" ".desktop" nil nil nil)
CLU
CLU contains a built-in filename parser, which behaves slightly differently than the task specification. It returns the first, rather than last dotted part, and also accepts non-alphanumeric characters in the extension. Furthermore, it does not include the dot itself in its output.
% Find the extension of a filename, according to the task specification
extension = proc (s: string) returns (string)
for i: int in int$from_to_by(string$size(s), 1, -1) do
c: char := s[i]
if c>='A' & c<='Z'
| c>='a' & c<='z'
| c>='0' & c<='9' then continue end
if c='.' then return(string$rest(s,i)) end
break
end
return("")
end extension
% For each test case, show both the extension according to the task,
% and the extension that the built-in function returns.
start_up = proc ()
po: stream := stream$primary_output()
tests: sequence[string] := sequence[string]$[
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
]
stream$putleft(po, "Input", 36)
stream$putleft(po, "Output", 10)
stream$putl(po, "Built-in")
stream$putl(po, "---------------------------------------------------------")
for test: string in sequence[string]$elements(tests) do
stream$putleft(po, test, 36)
stream$putleft(po, extension(test), 10)
% Using the built-in filename parser
stream$putl(po, file_name$parse(test).suffix)
except when bad_format:
stream$putl(po, "[bad_format signaled]")
end
end
end start_up
- Output:
Input Output Built-in --------------------------------------------------------- http://example.com/download.tar.gz .gz tar CharacterModel.3DS .3DS 3DS .desktop .desktop desktop document document.txt_backup txt_backup /etc/pam.d/login
Common Lisp
(pathname-type "foo.txt")
=>
"txt"
D
Variant 1
import std.stdio;
import std.path;
void main()
{
auto filenames = ["http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"]
foreach(filename; filenames)
writeln(filename, " -> ", filename.extension);
}
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> document -> document.txt_backup -> .txt_backup /etc/pam.d/login ->
Variant 2
import std.stdio;
import std.string;
import std.range;
import std.algorithm;
void main()
{
auto filenames = ["http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"]
foreach(filename; filenames)
{
string ext;
auto idx = filename.lastIndexOf(".");
if(idx >= 0)
{
auto tmp = filename.drop(idx);
if(!tmp.canFind("/", "\\", "_", "*");
ext = tmp;
}
writeln(filename, " -> ", ext);
}
}
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login ->
Delphi
program Extract_file_extension;
{$APPTYPE CONSOLE}
uses
System.SysUtils,
System.Character;
const
TEST_CASES: array[0..5] of string = ('http://example.com/download.tar.gz',
'CharacterModel.3DS', '.desktop', 'document', 'document.txt_backup',
'/etc/pam.d/login');
function GetExt(path: string): string;
var
c: char;
begin
// Built-in functionality, just extract substring after dot char
Result := ExtractFileExt(path);
// Fix ext for dot in subdir
while (Result.IndexOf('/') > -1) do
begin
Result := Result.Substring(Result.IndexOf('/'), MaxInt);
Result := ExtractFileExt(Result);
end;
// Ignore empty or "." ext
if length(result) < 2 then
exit('');
// Ignore ext with not alphanumeric char (except the first dot)
for var i := 2 to length(result) do
begin
c := result[i];
if not c.IsLetterOrDigit then
exit('');
end;
end;
begin
for var path in TEST_CASES do
Writeln(path.PadRight(40), GetExt(path));
{$IFNDEF UNIX} readln; {$ENDIF}
end.
- Output:
http://example.com/download.tar.gz .gz CharacterModel.3DS .3DS .desktop .desktop document document.txt_backup /etc/pam.d/login
EasyLang
func isalphanum c$ .
c = strcode c$
return if c >= 65 and c <= 90 or c >= 97 and c <= 122 or c >= 48 and c <= 57
.
func$ exext path$ .
for i = len path$ downto 1
c$ = substr path$ i 1
if isalphanum c$ = 1
ex$ = c$ & ex$
elif c$ = "."
return ex$
else
break 1
.
.
.
for s$ in [ "http://example.com/download.tar.gz" "CharacterModel.3DS" ".desktop" "document" "document.txt_backup" "/etc/pam.d/login" ]
print s$ & " -> " & exext s$
.
Ed
Prints empty line for extension-less/extension-invalid names.
H
,p
g/.*/s/^.*(\.[[:alnum:]]+)$/\1/g
v/^\.[[:alnum:]]+$/s/.*//
,p
Q
- Output:
$ cat extension.ed | ed -E extension.input Newline appended 109 http://example.com/download.tar.gz CharacterModel.3DS .desktop document document.txt_backup /etc/pam.d/login .gz .3DS .desktop ? Warning: buffer modified
Emacs Lisp
(file-name-extension "foo.txt")
=>
"txt"
No extension is distinguished from empty extension but an (or ... "")
can give ""
for both if desired
(file-name-extension "foo.") => ""
(file-name-extension "foo") => nil
An Emacs backup ~
or .~NUM~
are not part of the extension, but otherwise any characters are allowed.
(file-name-extension "foo.txt~") => "txt"
(file-name-extension "foo.txt.~1.234~") => "txt"
Factor
Factor's file-extension word allows symbols to be in the extension and omits the dot from its output.
USING: assocs formatting kernel io io.pathnames math qw
sequences ;
IN: rosetta-code.file-extension
qw{
http://example.com/download.tar.gz
CharacterModel.3DS
.desktop
document
document.txt_backup
/etc/pam.d/login
}
dup [ file-extension ] map zip
"Path" "| Extension" "%-35s%s\n" printf
47 [ "-" write ] times nl
[ "%-35s| %s\n" vprintf ] each
- Output:
Path | Extension ----------------------------------------------- http://example.com/download.tar.gz | gz CharacterModel.3DS | 3DS .desktop | desktop document | document.txt_backup | txt_backup /etc/pam.d/login |
Forth
: invalid? ( c -- f )
toupper dup [char] A [char] Z 1+ within
swap [char] 0 [char] 9 1+ within or 0= ;
: extension ( addr1 u1 -- addr2 u2 )
dup 0= if exit then
2dup over +
begin 1- 2dup <= while dup c@ invalid? until then
\ no '.' found
2dup - 0> if 2drop dup /string exit then
\ invalid char
dup c@ [char] . <> if 2drop dup /string exit then
swap -
\ '.' is last char
2dup 1+ = if drop dup then
/string ;
: type.quoted ( addr u -- )
[char] ' emit type [char] ' emit ;
: test ( addr u -- )
2dup type.quoted ." => " extension type.quoted cr ;
: tests
s" http://example.com/download.tar.gz" test
s" CharacterModel.3DS" test
s" .desktop" test
s" document" test
s" document.txt_backup" test
s" /etc/pam.d/login" test ;
- Output:
cr tests 'http://example.com/download.tar.gz' => '.gz' 'CharacterModel.3DS' => '.3DS' '.desktop' => '.desktop' 'document' => '' 'document.txt_backup' => '' '/etc/pam.d/login' => '' ok
Fortran
The plan is to scan backwards from the end of the text until a non-extensionish character is encountered. If it is a period, then a valid file extension has been spanned. Otherwise, no extension. Yet again the "no specification" on the possibility of shortcut evaluation of compound logical expressions prevents the structured use of a DO WHILE(L1 > 0 & TEXT(L1:L1)etc) loop because the possible evaluation of both parts of the expression means that the second part may attempt to access character zero of a text. So, the compound expression has to be broken into two separate parts.
The source incorporates a collection of character characterisations via suitable spans of a single sequence of characters. Unfortunately, the PARAMETER statement does not allow its constants to appear in EQUIVALENCE statements, so the text is initialised by DATA statements, and thus loses the protection of read-only given to constants defined via PARAMETER statements. The statements are from a rather more complex text scanning scheme, as all that are needed here are the symbols of GOODEXT.
The text scan could instead check for a valid character via something like ("a" <= C & C <= "z") | ("A" <= C & C <= "Z") | (0 <= C & C <= "9")
but this is not just messy but unreliable - in EBCDIC for example there are gaps in the sequence of letters that are occupied by other symbols. So instead, a test via INDEX into a sequence of all the valid symbols. If one was in a hurry, for eight-bit character codes, an array GOODEXT of 256 logical values could be indexed by the numerical value of the character.
MODULE TEXTGNASH !Some text inspection.
CHARACTER*10 DIGITS !Integer only.
CHARACTER*11 DDIGITS !With a full stop masquerading as a decimal point.
CHARACTER*13 SDDIGITS !Signed decimal digits.
CHARACTER*4 EXPONENTISH !With exponent parts.
CHARACTER*17 NUMBERISH !The complete mix.
CHARACTER*16 HEXLETTERS !Extended for base sixteen.
CHARACTER*62 DIGILETTERS !File nameish but no .
CHARACTER*26 LITTLELETTERS,BIGLETTERS !These are well-known.
CHARACTER*52 LETTERS !The union thereof.
CHARACTER*66 NAMEISH !Allowing digits and . and _ as well.
CHARACTER*3 ODDITIES !And allow these in names also.
CHARACTER*1 CHARACTER(72) !Prepare a work area.
EQUIVALENCE !Whose components can be fingered.
1 (CHARACTER( 1),EXPONENTISH,NUMBERISH), !Start with numberish symbols that are not nameish.
2 (CHARACTER( 5),SDDIGITS), !Since the sign symbols are not nameish.
3 (CHARACTER( 7),DDIGITS,NAMEISH), !Computerish names might incorporate digits and a .
4 (CHARACTER( 8),DIGITS,HEXLETTERS,DIGILETTERS), !A proper name doesn't start with a digit.
5 (CHARACTER(18),BIGLETTERS,LETTERS), !Just with a letter.
6 (CHARACTER(44),LITTLELETTERS), !The second set.
7 (CHARACTER(70),ODDITIES) !Tack this on the end.
DATA EXPONENTISH /"eEdD"/ !These on the front.
DATA SDDIGITS /"+-.0123456789"/ !Any of these can appear in a floating point number.
DATA BIGLETTERS /"ABCDEFGHIJKLMNOPQRSTUVWXYZ"/ !Simple.
DATA LITTLELETTERS /"abcdefghijklmnopqrstuvwxyz"/ !Subtly different.
DATA ODDITIES /"_:#"/ !Allow these in names also. This strains := usage!
CHARACTER*62 GOODEXT !These are all the characters allowed
EQUIVALENCE (CHARACTER(8),GOODEXT)
c PARAMETER (GOODEXT = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" !for an approved
c 1 //"abcdefghijklmnopqrstuvwxyz" !file "extension" part
c 2 //"0123456789") !Of a file name.
INTEGER MEXT !A fixed bound.
PARAMETER (MEXT = 28) !This should do.
CONTAINS
CHARACTER*(MEXT) FUNCTION FEXT(FNAME) !Return the file extension part.
CHARACTER*(*) FNAME !May start with the file's path name blather.
INTEGER L1,L2 !Fingers to the text.
L2 = LEN(FNAME) !The last character of the file name.
L1 = L2 !Starting at the end...
10 IF (L1.GT.0) THEN !Damnit, can't rely on DO WHILE(safe & test)
IF (INDEX(GOODEXT,FNAME(L1:L1)).GT.0) THEN !So do the two parts explicitly.
L1 = L1 - 1 !Well, that was a valid character for an extension.
GO TO 10 !So, move back one and try again.
END IF !Until the end of valid stuff.
IF (FNAME(L1:L1).EQ.".") THEN !Stopped here. A proper introduction?
L1 = L1 - 1 !Yes. Include the period.
GO TO 20 !And escape.
END IF !Otherwise, not valid stuff.
END IF !Keep on moving back.
L1 = L2 !If we're here, no period was found.
20 FEXT = FNAME(L1 + 1:L2) !The text of the extension.
END FUNCTION FEXT !Possibly, blank.
END MODULE TEXTGNASH !Enough for this.
PROGRAM POKE
USE TEXTGNASH
WRITE (6,*) FEXT("Picture.jpg")
WRITE (6,*) FEXT("http://mywebsite.com/picture/image.png")
WRITE (6,*) FEXT("myuniquefile.longextension")
WRITE (6,*) FEXT("IAmAFileWithoutExtension")
WRITE (6,*) FEXT("/path/to.my/file")
WRITE (6,*) FEXT("file.odd_one")
WRITE (6,*)
WRITE (6,*) "Now for the new test collection..."
WRITE (6,*) FEXT("http://example.com/download.tar.gz")
WRITE (6,*) FEXT("CharacterModel.3DS")
WRITE (6,*) FEXT(".desktop")
WRITE (6,*) FEXT("document")
WRITE (6,*) FEXT("document.txt_backup")
WRITE (6,*) FEXT("/etc/pam.d/login")
WRITE (6,*) "Approved characters: ",GOODEXT
END
The output cheats a little, in that trailing spaces appear just as blankly as no spaces. The result of FEXT could be presented to TRIM (if that function is available), or the last non-blank could be found. With F2003, a scheme to enable character variables to be redefined to take on a current length is available, and so trailing spaces could no longer appear. This facility would also solve the endlessly annoying question of "how long is long enough", manifested in parameter MEXT being what might be a perfect solution. Once, three was the maximum extension length (not counting the period), then perhaps six, but now, what?
.jpg .png .longextension Now for the new test collection... .gz .3DS .desktop Approved characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
Note that if FEXT were presented with a file name containing trailing spaces, it would declare no extension to be present.
FreeBASIC
' FB 1.05.0 Win64
Function isAlphaNum(s As String) As Boolean
Return ("a" <= s AndAlso s <= "z") OrElse ("A" <= s AndAlso s <= "Z") OrElse("0" <= s AndAlso s <= "9")
End Function
Function extractFileExt(filePath As String) As String
If filePath = "" Then Return ""
Dim index As Integer = InstrRev(filePath, ".")
If index = 0 Then Return ""
Dim ext As String = Mid(filePath, index + 1)
If ext = "" Then Return ""
For i As Integer = 1 To Len(ext)
If Not isAlphaNum(Mid(ext, i, 1)) Then Return ""
Next
Return ext
End Function
Dim filePaths(1 To 6) As String = _
{ _
"http://example.com/download.tar.gz", _
"CharacterModel.3DS", _
".desktop", _
"document", _
"document.txt_backup", _
"/etc/pam.d/login" _
}
Print "File path"; Tab(40); "Extension"
Print "========="; Tab(40); "========="
Print
For i As Integer = 1 To 6
Print filePaths(i); Tab(40);
Dim ext As String = extractFileExt(filePaths(i))
If ext = "" Then
Print "(empty string)"
Else
Print ext
End If
Next
Print
Print "Press any key to quit"
Sleep
- Output:
File path Extension ========= ========= http://example.com/download.tar.gz gz CharacterModel.3DS 3DS .desktop desktop document (empty string) document.txt_backup (empty string) /etc/pam.d/login (empty string)
Frink
fileExtension[str] :=
{
if [ext] = str =~ %r/(\.[A-Za-z0-9]+)$/
return ext
else
return ""
}
files = ["http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"]
r = new array
for f = files
r.push[[f, "->", fileExtension[f]]]
println[formatTable[r, "right"]]
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login ->
FutureBasic
The underscores is a valid extension character in macOS and extensions are returned without leading dots.
include "NSLog.incl"
void local fn DoIt
CFArrayRef paths = @[@"http://example.com/download.tar.gz",@"CharacterModel.3DS",@".desktop",@"document",@"document.txt_backup",@"/etc/pam.d/login"]
CFStringRef path
for path in paths
NSLog(@"%@",fn StringPathExtension( path ))
next
end fn
fn DoIt
HandleEvents
- Output:
gz 3DS desktop null txt_backup null
Gambas
As Gambas has its own tools for file extension extraction I have used those rather than complicate the code to match the requested criteria.
Click this link to run this code
Public Sub Main()
Dim sDir As String = "/sbin"
Dim sFileList As String[] = Dir(sDir)
Dim sTemp As String
Dim sFile As String
For Each sTemp In sFileList
sFile = sDir &/ sTemp
Print File.Name(sFile) & Space(25 - Len(File.Name(sFile)));
Print File.Ext(sFile)
Next
End
Output:
.... mount.ntfs ntfs iptables-save mkfs.minix minix exfatlabel modprobe vgrename mkfs.ext2 ext2 lsmod umount.ecryptfs_private ecryptfs_private fstab-decode mount.ecryptfs ecryptfs ....
Go
package main
import "fmt"
func Ext(path string) string {
for i := len(path) - 1; i >= 0; i-- {
c := path[i]
switch {
case c == '.':
return path[i:]
case '0' <= c && c <= '9':
case 'A' <= c && c <= 'Z':
case 'a' <= c && c <= 'z':
default:
return ""
}
}
return ""
}
func main() {
type testcase struct {
input string
output string
}
tests := []testcase{
{"http://example.com/download.tar.gz", ".gz"},
{"CharacterModel.3DS", ".3DS"},
{".desktop", ".desktop"},
{"document", ""},
{"document.txt_backup", ""},
{"/etc/pam.d/login", ""},
}
for _, testcase := range tests {
ext := Ext(testcase.input)
if ext != testcase.output {
panic(fmt.Sprintf("expected %q for %q, got %q",
testcase.output, testcase.input, ext))
}
}
}
Haskell
module FileExtension
where
myextension :: String -> String
myextension s
|not $ elem '.' s = ""
|elem '/' extension || elem '_' extension = ""
|otherwise = '.' : extension
where
extension = reverse ( takeWhile ( /= '.' ) $ reverse s )
- Output:
map myextension ["http://example.com/download.tar.gz", "CharacterModel.3DS", ".desktop", "document", "document.txt_backup", "/etc/pam.d/login"] [".gz",".3DS",".desktop","","",""]
Posix compliant
On Unix systems, the penultimate file extension would be recognised, so using the Haskell library function takeExtension:
import System.FilePath.Posix (FilePath, takeExtension)
fps :: [FilePath]
fps =
[ "http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
]
main :: IO ()
main = mapM_ (print . takeExtension) fps
- Output:
".gz" ".3DS" ".desktop" "" ".txt_backup" ""
J
Implementation:
require'regex'
ext=: '[.][a-zA-Z0-9]+$'&rxmatch ;@rxfrom ]
Obviously most of the work here is done by the regex implementation (pcre, if that matters - and this particular kind of expression tends to be a bit more concise expressed in perl than in J...).
Perhaps of interest is that this is an example of a J fork - here we have three verbs separated by spaces. Unlike a unix system fork (which spins up child process which is an almost exact clone of the currently running process), a J fork is three independently defined verbs. The two verbs on the edge get the fork's argument and the verb in the middle combines those two results.
The left verb uses rxmatch to find the beginning position of the match and its length. The right verb is the identity function. The middle verb extracts the desired characters from the original argument. (For a non-match, the length of the "match" is zero so the empty string is extracted.)
Alternative non-regex Implementation
ext=: (}.~ i:&'.')@(#~ [: -. [: +./\. -.@e.&('.',AlphaNum_j_)
Task examples:
ext 'http://example.com/download/tar.gz'
.gz
ext 'CharacterModel.3DS'
.3DS
Examples=: 'http://example.com/download.tar.gz';'CharacterModel.3DS';'.desktop';'document';'document.txt_backup';'/etc/pam.d/login'
ext each Examples
┌───┬────┬────────┬┬┬┐
│.gz│.3DS│.desktop││││
└───┴────┴────────┴┴┴┘
Java
import java.io.File;
public static void main(String[] args) {
String[] strings = {
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login",
};
for (String string : strings)
System.out.println(extractExtension(string));
}
static String extractExtension(String string) {
/* we can use the 'File' class to extract the file-name */
File file = new File(string);
String filename = file.getName();
int indexOf = filename.lastIndexOf('.');
if (indexOf != -1) {
String extension = filename.substring(indexOf);
/* and use a regex to match only valid extensions */
if (extension.matches("\\.[A-Za-z\\d]+"))
return extension;
}
return "";
}
- Output:
.gz .3DS .desktop
JavaScript
let filenames = ["http://example.com/download.tar.gz", "CharacterModel.3DS", ".desktop", "document", "document.txt_backup", "/etc/pam.d/login"];
let r = /\.[a-zA-Z0-9]+$/;
filenames.forEach((e) => console.log(e + " -> " + (r.test(e) ? r.exec(e)[0] : "")));
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login ->
With JS embedded in browsers and other applications across most or all operating systems, we need some flexibility in any reusable takeExtension function.
One approach is to define a more general curried function, from which we can obtain various simpler and OS-specific functions by specialisation:
(() => {
'use strict';
// OS-INDEPENDENT CURRIED FUNCTION --------------------
// takeExtension :: Regex String -> FilePath -> String
const takeExtension = charSet => fp => {
const
rgx = new RegExp('^[' + charSet + ']+$'),
xs = fp.split('/').slice(-1)[0].split('.'),
ext = 1 < xs.length ? (
xs.slice(-1)[0]
) : '';
return rgx.test(ext) ? (
'.' + ext
) : '';
};
// OS-SPECIFIC SPECIALIZED FUNCTIONS ------------------
// takePosixExtension :: FilePath -> String
const takePosixExtension = takeExtension('A-Za-z0-9\_\-');
// takeWindowsExtension :: FilePath -> String
const takeWindowsExtension = takeExtension('A-Za-z0-9');
// TEST -------------------------------------------
// main :: IO()
const main = () => {
[
['Posix', takePosixExtension],
['Windows', takeWindowsExtension]
].forEach(
([osName, f]) => console.log(
tabulated(
'\n\ntake' + osName +
'Extension :: FilePath -> String:\n',
x => x.toString(),
x => "'" + x.toString() + "'",
f,
[
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
]
),
'\n'
)
)
};
// GENERIC FUNCTIONS FOR TESTING AND DISPLAY OF RESULTS
// comparing :: (a -> b) -> (a -> a -> Ordering)
const comparing = f =>
(x, y) => {
const
a = f(x),
b = f(y);
return a < b ? -1 : (a > b ? 1 : 0);
i
};
// compose (<<<) :: (b -> c) -> (a -> b) -> a -> c
const compose = (f, g) => x => f(g(x));
// justifyRight :: Int -> Char -> String -> String
const justifyRight = (n, cFiller, s) =>
n > s.length ? (
s.padStart(n, cFiller)
) : s;
// Returns Infinity over objects without finite length.
// This enables zip and zipWith to choose the shorter
// argument when one is non-finite, like cycle, repeat etc
// length :: [a] -> Int
const length = xs =>
(Array.isArray(xs) || 'string' === typeof xs) ? (
xs.length
) : Infinity;
// Map over lists or strings
// map :: (a -> b) -> [a] -> [b]
const map = (f, xs) =>
(Array.isArray(xs) ? (
xs
) : xs.split('')).map(f);
// maximumBy :: (a -> a -> Ordering) -> [a] -> a
const maximumBy = (f, xs) =>
0 < xs.length ? (
xs.slice(1)
.reduce((a, x) => 0 < f(x, a) ? x : a, xs[0])
) : undefined;
// tabulated :: String -> (a -> String) ->
// (b -> String) ->
// (a -> b) -> [a] -> String
const tabulated = (s, xShow, fxShow, f, xs) => {
// Heading -> x display function ->
// fx display function ->
// f -> values -> tabular string
const
ys = map(xShow, xs),
w = maximumBy(comparing(x => x.length), ys).length,
rows = zipWith(
(a, b) => justifyRight(w, ' ', a) + ' -> ' + b,
ys,
map(compose(fxShow, f), xs)
);
return s + '\n' + unlines(rows);
};
// take :: Int -> [a] -> [a]
// take :: Int -> String -> String
const take = (n, xs) =>
'GeneratorFunction' !== xs.constructor.constructor.name ? (
xs.slice(0, n)
) : [].concat.apply([], Array.from({
length: n
}, () => {
const x = xs.next();
return x.done ? [] : [x.value];
}));
// unlines :: [String] -> String
const unlines = xs => xs.join('\n');
// Use of `take` and `length` here allows zipping with non-finite lists
// i.e. generators like cycle, repeat, iterate.
// zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
const zipWith = (f, xs, ys) => {
const
lng = Math.min(length(xs), length(ys)),
as = take(lng, xs),
bs = take(lng, ys);
return Array.from({
length: lng
}, (_, i) => f(as[i], bs[i], i));
};
// MAIN ---
return main();
})();
- Output:
takePosixExtension :: FilePath -> String: http://example.com/download.tar.gz -> '.gz' CharacterModel.3DS -> '.3DS' .desktop -> '.desktop' document -> '' document.txt_backup -> '.txt_backup' /etc/pam.d/login -> '' takeWindowsExtension :: FilePath -> String: http://example.com/download.tar.gz -> '.gz' CharacterModel.3DS -> '.3DS' .desktop -> '.desktop' document -> '' document.txt_backup -> '' /etc/pam.d/login -> ''
jq
The following definitions include the delimiting period.
In the first section, a version intended for jq version 1.4 is presented. A simpler definition using "match", a regex feature of subsequent versions of jq, is then given.
def file_extension:
def alphanumeric: explode | unique
| reduce .[] as $i
(true;
if . then $i | (97 <= . and . <= 122) or (65 <= . and . <= 90) or (48 <= . and . <= 57)
else false
end );
rindex(".") as $ix
| if $ix then .[1+$ix:] as $ext
| if $ext|alphanumeric then ".\($ext)" # include the period
else ""
end
else ""
end;
def file_extension:
(match( "(\\.[a-zA-Z0-9]*$)" ) | .captures[0].string)
// "" ;
Examples:
Using either version above gives the same results.
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
| "\(.) has extension: \(file_extension)"
$ jq -r -n -f Extract_file_extension.jq
- Output:
http://example.com/download.tar.gz has extension: .gz CharacterModel.3DS has extension: .3DS .desktop has extension: .desktop document has extension: document.txt_backup has extension: /etc/pam.d/login has extension:
Jsish
#!/usr/bin/env jsish
/* Extract filename extension (for a limited subset of possible extensions) in Jsish */
function extractExtension(filename) {
var extPat = /\.[a-z0-9]+$/i;
var ext = filename.match(extPat);
return ext ? ext[0] : '';
}
if (Interp.conf('unitTest')) {
var files = ["http://example.com/download.tar.gz", "CharacterModel.3DS",
".desktop", "document", "document.txt_backup", "/etc/pam.d/login"];
for (var fn of files) puts(fn, quote(extractExtension(fn)));
}
/*
=!EXPECTSTART!=
http://example.com/download.tar.gz ".gz"
CharacterModel.3DS ".3DS"
.desktop ".desktop"
document ""
document.txt_backup ""
/etc/pam.d/login ""
=!EXPECTEND!=
*/
- Output:
prompt$ jsish --U extractExtension.jsi http://example.com/download.tar.gz ".gz" CharacterModel.3DS ".3DS" .desktop ".desktop" document "" document.txt_backup "" /etc/pam.d/login "" prompt$ jsish -u extractExtension.jsi [PASS] extractExtension.jsi
Julia
extension(url::String) = try match(r"\.[A-Za-z0-9]+$", url).match catch "" end
@show extension("http://example.com/download.tar.gz")
@show extension("CharacterModel.3DS")
@show extension(".desktop")
@show extension("document")
@show extension("document.txt_backup")
@show extension("/etc/pam.d/login")
- Output:
extension("http://example.com/download.tar.gz") = ".gz" extension("CharacterModel.3DS") = ".3DS" extension(".desktop") = ".desktop" extension("document") = "" extension("document.txt_backup") = "" extension("/etc/pam.d/login") = ""
Kotlin
// version 1.0.6
val r = Regex("[^a-zA-Z0-9]") // matches any non-alphanumeric character
fun extractFileExtension(path: String): String {
if (path.isEmpty()) return ""
var fileName = path.substringAfterLast('/')
if (path == fileName) fileName = path.substringAfterLast('\\')
val splits = fileName.split('.')
if (splits.size == 1) return ""
val ext = splits.last()
return if (r.containsMatchIn(ext)) "" else "." + ext
}
fun main(args: Array<String>) {
val paths = arrayOf(
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login",
"c:\\programs\\myprogs\\myprog.exe", // using back-slash as delimiter
"c:\\programs\\myprogs\\myprog.exe_backup" // ditto
)
for (path in paths) {
val ext = extractFileExtension(path)
println("${path.padEnd(37)} -> ${if (ext.isEmpty()) "(empty string)" else ext}")
}
}
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> (empty string) document.txt_backup -> (empty string) /etc/pam.d/login -> (empty string) c:\programs\myprogs\myprog.exe -> .exe c:\programs\myprogs\myprog.exe_backup -> (empty string)
Lua
-- Lua pattern docs at http://www.lua.org/manual/5.1/manual.html#5.4.1
function fileExt (filename) return filename:match("(%.%w+)$") or "" end
local testCases = {
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
}
for _, example in pairs(testCases) do
print(example .. ' -> "' .. fileExt(example) .. '"')
end
- Output:
http://example.com/download.tar.gz -> ".gz" CharacterModel.3DS -> ".3DS" .desktop -> ".desktop" document -> "" document.txt_backup -> "" /etc/pam.d/login -> ""
Mathematica /Wolfram Language
FileExtension is a built-in function:
FileExtension /@ {"http://example.com/download.tar.gz", "CharacterModel.3DS", ".desktop", "document","document.txt_backup","/etc/pam.d/login"}
- Output:
{"gz", "3DS", "", "", "txt_backup", ""}
Nanoquery
The File object type in Nanoquery has a built-in method to extract the file extension from a filename, but it treats all characters as potentially valid in an extension and URLs as not being. As a result, the .txt_backup extension is included in the output.
import Nanoquery.IO
filenames = {"http://example.com/download.tar.gz", "CharacterModel.3DS"}
filenames += {".desktop", "document", "document.txt_backup", "/etc/pam.d/login"}
for fname in filenames
println new(File, fname).getExtension()
end
- Output:
.gz .3DS .desktop .txt_backup
Nim
As can be seen in the examples, Nim standard library function splitFile
detects that a file such as .desktop
is a special file. But, on the other hand, it considers that an underscore is a valid character in an extension.
import os, strutils
func extractFileExt(path: string): string =
var s: seq[char]
for i in countdown(path.high, 0):
case path[i]
of Letters, Digits:
s.add path[i]
of '.':
s.add '.'
while s.len > 0: result.add s.pop()
return
else:
break
result = ""
for input in ["http://example.com/download.tar.gz", "CharacterModel.3DS",
".desktop", "document", "document.txt_backup", "/etc/pam.d/login"]:
echo "Input: ", input
echo "Extracted extension: ", input.extractFileExt()
echo "Using standard library: ", input.splitFile()[2]
echo()
- Output:
Input: http://example.com/download.tar.gz Extracted extension: .gz Using standard library: .gz Input: CharacterModel.3DS Extracted extension: .3DS Using standard library: .3DS Input: .desktop Extracted extension: .desktop Using standard library: Input: document Extracted extension: Using standard library: Input: document.txt_backup Extracted extension: Using standard library: .txt_backup Input: /etc/pam.d/login Extracted extension: Using standard library:
Objeck
use Query.RegEx;
class FindExtension {
function : Main(args : String[]) ~ Nil {
file_names := [
"http://example.com/download.tar.gz", "CharacterModel.3DS",
".desktop", "document", "document.txt_backup", "/etc/pam.d/login"];
each(i : file_names) {
file_name := file_names[i];
System.IO.Console->Print(file_name)->Print(" has extension: ")->PrintLine(GetExtension(file_name));
};
}
function : GetExtension(file_name : String) ~ String {
index := file_name->FindLast('.');
if(index < 0) {
return "";
};
ext := file_name->SubString(index, file_name->Size() - index);
if(ext->Size() < 1) {
return "";
};
if(<>RegEx->New("\\.([a-z]|[A-Z]|[0-9])+")->MatchExact(ext)) {
return "";
};
return ext;
}
}
- Output:
http://example.com/download.tar.gz has extension: .gz CharacterModel.3DS has extension: .3DS .desktop has extension: .desktop document has extension: document.txt_backup has extension: /etc/pam.d/login has extension:
OCaml
Since OCaml 4.04 there is a function Filename.extension:
let () =
let filenames = [
"http://example.com/download.tar.gz";
"CharacterModel.3DS";
".desktop";
"document";
"document.txt_backup";
"/etc/pam.d/login"]
in
List.iter (fun filename ->
Printf.printf " '%s' => '%s'\n" filename (Filename.extension filename)
) filenames
differs a little bit from the specification of this task.
- Output:
'http://example.com/download.tar.gz' => '.gz' 'CharacterModel.3DS' => '.3DS' '.desktop' => '' 'document' => '' 'document.txt_backup' => '.txt_backup' '/etc/pam.d/login' => ''
Oforth
If extension is not valid, returns null, not "". Easy to change if "" is required.
: fileExt( s -- t )
| i |
s lastIndexOf('.') dup ->i ifNull: [ null return ]
s extract(i 1+, s size) conform(#isAlpha) ifFalse: [ null return ]
s extract(i, s size)
;
- Output:
>"http://example.com/download.tar.gz" fileExt . .gz ok > ok >"CharacterModel.3DS" fileExt . .3DS ok > ok >".desktop" fileExt . .desktop ok >"document" fileExt . null ok >"document.txt_backup" fileExt . null ok >"/etc/pam.d/login" fileExt . null ok >
Pascal
Free Pascal
Program Extract_file_extension;
{FreePascal has the built-in function ExtractFileExt which returns the file extension.
* it does need charachters before the period to return the proper extension and it returns
* the extension including the period}
Uses character,sysutils;
Const arr : array of string = ('http://example.com/download.tar.gz','CharacterModel.3DS','.desktop',
'document','document.txt_backup','/etc/pam.d/login');
Function extractextension(fn: String): string;
Var
i: integer;
Begin
fn := 'prefix' + fn; {add charachters before the period}
fn := ExtractFileExt(fn);
For i := 2 to length(fn) Do {skip the period}
If Not IsLetterOrDigit(fn[i]) Then exit('');
extractextension := fn;
End;
Var i : string;
Begin
For i In arr Do
writeln(i:35,' -> ',extractextension(i))
End.
- Output:
http://example.com/download.tar.gz -> gz CharacterModel.3DS -> 3DS .desktop -> desktop document -> document.txt_backup -> /etc/pam.d/login ->
Perl
sub extension {
my $path = shift;
$path =~ / \. [a-z0-9]+ $ /xi;
$& // '';
}
Testing:
printf "%-35s %-11s\n", $_, "'".extension($_)."'"
for qw[
http://example.com/download.tar.gz
CharacterModel.3DS
.desktop
document
document.txt_backup
/etc/pam.d/login
];
- Output:
http://example.com/download.tar.gz '.gz' CharacterModel.3DS '.3DS' .desktop '.desktop' document '' document.txt_backup '' /etc/pam.d/login ''
Phix
with javascript_semantics function getExtension(string filename) for i=length(filename) to 1 by -1 do integer ch = filename[i] if ch='.' then return filename[i..$] end if if find(ch,"\\/_") then exit end if end for return "" end function constant tests = {"mywebsite.com/picture/image.png", "http://mywebsite.com/picture/image.png", "myuniquefile.longextension", "IAmAFileWithoutExtension", "/path/to.my/file", "file.odd_one", "http://example.com/download.tar.gz", "CharacterModel.3DS", ".desktop", "document", "document.txt_backup", "/etc/pam.d/login"} for i=1 to length(tests) do printf(1,"%s ==> %s\n",{tests[i],getExtension(tests[i])}) end for
- Output:
mywebsite.com/picture/image.png ==> .png http://mywebsite.com/picture/image.png ==> .png myuniquefile.longextension ==> .longextension IAmAFileWithoutExtension ==> /path/to.my/file ==> file.odd_one ==> http://example.com/download.tar.gz ==> .gz CharacterModel.3DS ==> .3DS .desktop ==> .desktop document ==> document.txt_backup ==> /etc/pam.d/login ==>
The builtin get_file_extension() could also be used, however that routine differs from the task description in that "libglfw.so.3.1" => "so", and all results are lowercase even if the input is not.
PHP
$tests = [
['input'=>'http://example.com/download.tar.gz', 'expect'=>'.gz'],
['input'=>'CharacterModel.3DS', 'expect'=>'.3DS'],
['input'=>'.desktop', 'expect'=>'.desktop'],
['input'=>'document', 'expect'=>''],
['input'=>'document.txt_backup', 'expect'=>''],
['input'=>'/etc/pam.d/login', 'expect'=>'']
];
foreach ($tests as $key=>$test) {
$ext = pathinfo($test['input'], PATHINFO_EXTENSION);
// in php, pathinfo allows for an underscore in the file extension
// the following if statement only allows for A-z0-9 in the extension
if (ctype_alnum($ext)) {
// pathinfo returns the extension without the preceeding '.' so adding it back on
$tests[$key]['actual'] = '.'.$ext;
} else {
$tests[$key]['actual'] = '';
}
}
foreach ($tests as $test) {
printf("%35s -> %s \n", $test['input'],$test['actual']);
}
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login ->
PicoLisp
(de extension (F)
(and
(fully
'((C)
(or
(>= "Z" C "A")
(>= "z" C "a")
(>= "9" C "0") ) )
(setq F (stem (member "." (chop F)) ".")) )
(pack F) ) )
(println (extension "http://example.com/download.tar.gz"))
(println (extension "CharacterModel.3DS"))
(println (extension ".desktop"))
(println (extension "document"))
(println (extension "document.txt_backup"))
(println (extension "/etc/pam.d/login"))
- Output:
"gz" "3DS" "desktop" NIL NIL NIL
Plain English
The 'Extract' imperative extracts parts of a path. When extracting an extension, it starts from the last period (.) in the path string and goes until the end of the string.
To run:
Start up.
Show the file extension of "http://example.com/download.tar.gz".
Show the file extension of "CharacterModel.3DS".
Show the file extension of ".desktop".
Show the file extension of "document".
Show the file extension of "document.txt_backup".
Show the file extension of "/etc/pam.d/login".
Wait for the escape key.
Shut down.
To show the file extension of a path:
Extract an extension from the path.
Write the extension to the console.
- Output:
.gz .3DS .desktop .txt_backup .d/login
PowerShell
function extension($file){
$ext = [System.IO.Path]::GetExtension($file)
if (-not [String]::IsNullOrEmpty($ext)) {
if($ext.IndexOf("_") -ne -1) {$ext = ""}
}
$ext
}
extension "http://example.com/download.tar.gz"
extension "CharacterModel.3DS"
extension ".desktop"
extension "document"
extension "document.txt_backup"
extension "/etc/pam.d/login"
Output:
.gz .3DS .desktop
Python
Uses re.search.
import re
def extractExt(url):
m = re.search(r'\.[A-Za-z0-9]+$', url)
return m.group(0) if m else ""
and one way of allowing for OS-specific variations in the character sets permitted in file extensions is to write a general and reusable curried function, from which we can obtain simpler OS-specific functions by specialisation:
'''Obtaining OS-specific file extensions'''
import os
import re
# OS-INDEPENDENT CURRIED FUNCTION -------------------------
# takeExtension :: Regex String -> FilePath -> String
def takeExtension(charSet):
'''The extension part (if any) of a file name.
(Given a regex string representation of the
character set accepted in extensions by the OS).'''
def go(fp):
m = re.search(
r'\.[' + charSet + ']+$',
(fp).split(os.sep)[-1]
)
return m[0] if m else ''
return lambda fp: go(fp)
# DERIVED (OS-SPECIFIC) FUNCTIONS -------------------------
# takePosixExtension :: FilePath -> String
def takePosixExtension(fp):
'''The file extension, if any,
of a Posix file path.'''
return takeExtension(r'A-Za-z0-9\-\_')(fp)
# takeWindowsExtension :: FilePath -> String
def takeWindowsExtension(fp):
'''The file extension, if any,
of a Windows file path.'''
return takeExtension(r'A-Za-z0-9')(fp)
# TEST ----------------------------------------------------
def main():
'''Tests'''
for f in [takePosixExtension, takeWindowsExtension]:
print(
tabulated(f.__name__ + ' :: FilePath -> String:')(
str
)(str)(f)([
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
])
)
print()
# GENERIC -------------------------------------------------
# tabulated :: String -> (a -> String) ->
# (b -> String) ->
# (a -> b) -> [a] -> String
def tabulated(s):
'''Heading -> x display function -> fx display function ->
number of columns -> f -> value list -> tabular string.'''
def go(xShow, fxShow, f, xs):
w = max(map(lambda x: len(xShow(x)), xs))
return s + '\n' + '\n'.join([
xShow(x).rjust(w, ' ') + ' -> ' + fxShow(f(x)) for x in xs
])
return lambda xShow: lambda fxShow: (
lambda f: lambda xs: go(
xShow, fxShow, f, xs
)
)
# MAIN ---
if __name__ == '__main__':
main()
- Output:
takePosixExtension :: FilePath -> String: http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> .txt_backup /etc/pam.d/login -> takeWindowsExtension :: FilePath -> String: http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> document.txt_backup -> /etc/pam.d/login ->
Quackery
[ bit
[ 0
$ "abcdefghijklmnopqrstuvwxyz"
$ "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
$ "1234567890." join join
witheach [ bit | ] ] constant
& 0 > ] is validchar ( c --> b )
[ dup $ "" = if done
dup -1 peek char . = iff
[ drop $ "" ] done
$ "" swap
reverse witheach
[ dup dip join
dup validchar iff
[ char . = if
[ reverse conclude ] ]
else
[ 2drop $ "" conclude ] ]
dup $ "" = if done
dup 0 peek char . != if
[ drop $ "" ] ] is extension ( $ --> $ )
[ cr dup echo$ say " --> "
extension
dup $ "" = iff
[ drop say "no extension" ]
else echo$
cr ] is task ( $ --> )
$ "http://example.com/download.tar.gz" task
$ "CharacterModel.3DS" task
$ ".desktop" task
$ "document" task
$ "document.txt_backup" task
$ "/etc/pam.d/login" task
- Output:
http://example.com/download.tar.gz --> .gz CharacterModel.3DS --> .3DS .desktop --> .desktop document --> no extension document.txt_backup --> no extension /etc/pam.d/login --> no extension
Racket
#lang racket
;; Note that for a real implementation, Racket has a
;; `filename-extension` in its standard library, but don't use it here
;; since it requires a proper name (fails on ""), returns a byte-string,
;; and handles path values so might run into problems with unicode
;; string inputs.
(define (string-extension x)
(cadr (regexp-match #px"(\\.[[:alnum:]]+|)$" x)))
(define examples '("http://example.com/download.tar.gz"
"CharacterModel.3DS"
".desktop"
"document"
"document.txt_backup"
"/etc/pam.d/login"))
(for ([x (in-list examples)])
(printf "~a | ~a\n" (~a x #:width 34) (string-extension x)))
- Output:
http://example.com/download.tar.gz | .gz CharacterModel.3DS | .3DS .desktop | .desktop document | document.txt_backup | /etc/pam.d/login |
Raku
(formerly Perl 6)
The built-in IO::Path
class has an .extension
method:
say $path.IO.extension;
Contrary to this task's specification, it
- doesn't include the dot in the output
- doesn't restrict the extension to letters and numbers.
Here's a custom implementation which does satisfy the task requirements:
sub extension (Str $path --> Str) {
$path.match(/:i ['.' <[a..z0..9]>+]? $ /).Str
}
# Testing:
printf "%-35s %-11s %-12s\n", $_, extension($_).perl, $_.IO.extension.perl
for <
http://example.com/download.tar.gz
CharacterModel.3DS
.desktop
document
document.txt_backup
/etc/pam.d/login
>;
- Output:
http://example.com/download.tar.gz ".gz" "gz" CharacterModel.3DS ".3DS" "3DS" .desktop ".desktop" "desktop" document "" "" document.txt_backup "" "txt_backup" /etc/pam.d/login "" ""
REXX
Using this paraphrased Rosetta Code task's definition that:
a legal file extension only consists of mixed-case Latin letters and/or decimal digits.
/*REXX pgm extracts the file extension (defined above from the RC task) from a file name*/
@.= /*define default value for the @ array.*/
parse arg fID /*obtain any optional arguments from CL*/
if fID\=='' then @.1 = fID /*use the filename from the C.L. */
else do /*No filename given? Then use defaults.*/
@.1 = 'http://example.com/download.tar.gz'
@.2 = 'CharacterModel.3DS'
@.3 = '.desktop'
@.4 = 'document'
@.5 = 'document.txt_backup'
@.6 = '/etc/pam.d/login'
end
do j=1 while @.j\==''; x= /*process (all of) the file name(s). */
p=lastpos(., @.j) /*find the last position of a period. */
if p\==0 then x=substr(@.j, p+1) /*Found a dot? Then get stuff after it*/
if \datatype(x, 'A') then x= /*Not upper/lowercase letters | digits?*/
if x=='' then x= " [null]" /*use a better name for a "null" ext.*/
else x= . || x /*prefix the extension with a period. */
say 'file extension=' left(x, 20) "for file name=" @.j
end /*j*/ /*stick a fork in it, we're all done. */
output when using the default (internal) inputs:
file extension= .gz for file name= http://example.com/download.tar.gz file extension= .3DS for file name= CharacterModel.3DS file extension= .desktop for file name= .desktop file extension= [null] for file name= document file extension= [null] for file name= document.txt_backup file extension= [null] for file name= /etc/pam.d/login
Ring
# Project : Extract file extension
test = ["http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"]
for n = 1 to len(test)
flag = 1
revtest = revstr(test[n])
nr = substr(revtest, ".")
if nr > 0
revtest2 = left(revtest, nr)
for m = 1 to len(revtest2)
if (ascii(revtest2[m]) > 64 and ascii(revtest2[m]) < 91) or
(ascii(revtest2[m]) > 96 and ascii(revtest2[m]) < 123) or
isdigit(revtest2[m]) or revtest2[m] = "."
else
flag = 0
ok
next
else
flag = 0
ok
if flag = 1
revtest3 = revstr(revtest2)
see test[n] + " -> " + revtest3 + nl
else
see test[n] + " -> (none)" + nl
ok
next
func revstr(cStr)
cStr2 = ""
for x = len(cStr) to 1 step -1
cStr2 += cStr[x]
next
return cStr2
Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> (none) document.txt_backup -> (none) /etc/pam.d/login -> (none)
Ruby
names =
%w(http://example.com/download.tar.gz
CharacterModel.3DS
.desktop
document
/etc/pam.d/login)
names.each{|name| p File.extname(name)}
output
".gz" ".3DS" "" "" ""
Apparently, the built-in method does not consider ".desktop" to be a file extension (on Linux).
Rust
use std::path::Path;
fn main() {
let filenames = &[
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login",
];
for filename in filenames {
println!(
"{:34} | {:8} | {:?}",
filename,
extension(filename),
Path::new(filename).extension()
);
}
}
fn extension(filename: &str) -> &str {
filename
.rfind('.')
.map(|idx| &filename[idx..])
.filter(|ext| ext.chars().skip(1).all(|c| c.is_ascii_alphanumeric()))
.unwrap_or("")
}
The built-in method requires a filename before the extension, allows any non-period character to appear in the extension, and returns None
if no extension is found.
- Output:
http://example.com/download.tar.gz | .gz | Some("gz") CharacterModel.3DS | .3DS | Some("3DS") .desktop | .desktop | None document | | None document.txt_backup | | Some("txt_backup") /etc/pam.d/login | | None
Scala
package rosetta
object FileExt {
private val ext = """\.[A-Za-z0-9]+$""".r
def isExt(fileName: String, extensions: List[String]) =
extensions.map { _.toLowerCase }.exists { fileName.toLowerCase endsWith "." + _ }
def extractExt(url: String) = ext findFirstIn url getOrElse("")
}
object FileExtTest extends App {
val testExtensions: List[String] = List("zip", "rar", "7z", "gz", "archive", "A##", "tar.bz2")
val isExtTestFiles: Map[String, Boolean] = Map(
"MyData.a##" -> true,
"MyData.tar.Gz" -> true,
"MyData.gzip" -> false,
"MyData.7z.backup" -> false,
"MyData..." -> false,
"MyData" -> false,
"MyData_v1.0.tar.bz2" -> true,
"MyData_v1.0.bz2" -> false
)
val extractExtTestFiles: Map[String, String] = Map(
"http://example.com/download.tar.gz" -> ".gz",
"CharacterModel.3DS" -> ".3DS",
".desktop" -> ".desktop",
"document" -> "",
"document.txt_backup" -> "",
"/etc/pam.d/login" -> "",
"/etc/pam.d/login.a" -> ".a",
"/etc/pam.d/login." -> "",
"picture.jpg" -> ".jpg",
"http://mywebsite.com/picture/image.png"-> ".png",
"myuniquefile.longextension" -> ".longextension",
"IAmAFileWithoutExtension" -> "",
"/path/to.my/file" -> "",
"file.odd_one" -> "",
// Extra, with unicode
"café.png" -> ".png",
"file.resumé" -> "",
// with unicode combining characters
"cafe\u0301.png" -> ".png",
"file.resume\u0301" -> ""
)
println("isExt() tests:")
for ((file, isext) <- isExtTestFiles) {
assert(FileExt.isExt(file, testExtensions) == isext, "Assertion failed for: " + file)
println("File: " + file + " -> Extension: " + FileExt.extractExt(file))
}
println("\nextractExt() tests:")
for ((url, ext) <- extractExtTestFiles) {
assert(FileExt.extractExt(url) == ext, "Assertion failed for: " + url)
println("Url: " + url + " -> Extension: " + FileExt.extractExt(url))
}
}
output
Url: picture.jpg -> Extension: .jpg Url: document.txt_backup -> Extension: Url: .desktop -> Extension: .desktop Url: CharacterModel.3DS -> Extension: .3DS Url: file.resumé -> Extension: Url: document -> Extension: Url: café.png -> Extension: .png Url: /etc/pam.d/login. -> Extension: Url: http://mywebsite.com/picture/image.png -> Extension: .png Url: IAmAFileWithoutExtension -> Extension: Url: /etc/pam.d/login -> Extension: Url: /etc/pam.d/login.a -> Extension: .a Url: file.odd_one -> Extension: Url: /path/to.my/file -> Extension: Url: myuniquefile.longextension -> Extension: .longextension Url: café.png -> Extension: .png Url: file.resumé -> Extension: Url: http://example.com/download.tar.gz -> Extension: .gz
sed
-Ene 's:.*(\.[A-Za-z0-9]+)$:\1:p'
Example of use:
for F in "http://example.com/download.tar.gz" "CharacterModel.3DS" ".desktop" "document" "document.txt_backup" "/etc/pam.d/login"
do
EXT=`echo $F | sed -Ene 's:.*(\.[A-Za-z0-9]+)$:\1:p'`
echo "$F: $EXT"
done
- Output:
http://example.com/download.tar.gz: .gz CharacterModel.3DS: .3DS .desktop: .desktop document: document.txt_backup: /etc/pam.d/login:
Sidef
func extension(filename) {
filename.match(/(\.[a-z0-9]+)\z/i).to_s
}
var files = [
'http://example.com/download.tar.gz',
'CharacterModel.3DS',
'.desktop',
'document',
'document.txt_backup',
'/etc/pam.d/login',
]
files.each {|f|
printf("%-36s -> %-11s\n", f.dump, extension(f).dump)
}
- Output:
"http://example.com/download.tar.gz" -> ".gz" "CharacterModel.3DS" -> ".3DS" ".desktop" -> ".desktop" "document" -> "" "document.txt_backup" -> "" "/etc/pam.d/login" -> ""
Smalltalk
The Filename class has a convenient suffix method for that; so we convert the string to a filename and ask it:
names := #(
'http://example.com/download.tar.gz'
'CharacterModel.3DS'
'.desktop'
'a.desktop'
'document'
'document.txt_backup'
'/etc/pam.d/login'
).
names do:[:f |
'%-35s -> %s\n' printf:{ f . f asFilename suffix } on:Stdout
]
- Output:
http://example.com/download.tar.gz -> gz CharacterModel.3DS -> 3DS .desktop -> a.desktop -> desktop document -> document.txt_backup -> txt_backup /etc/pam.d/login ->
Note: the task's description seems wrong to me; on a Unix machine, files beginning with "." are treated as hidden files (eg. in ls) and the suffix can be considered to be empty. As opposed to "a.desktop".
SNOBOL4
* Program: extract_extension.sbl
* To run: sbl extract_extension.sbl
* Description: Extract file extension
* Comment: Tested using the Spitbol for Linux version of SNOBOL4
filenames =
+ "http://example.com/download.tar.gz,"
+ "CharacterModel.3DS,"
+ ".desktop,"
+ "document,"
+ "document.txt_backup,"
+ "/etc/pam.d/login"
epat = ((span(&lcase &ucase '0123456789') ".") | "") . ext
p0
filenames ? (break(',') . s ',') | (len(1) rem) . s = "" :f(end)
reverse(s) ? epat
ext = reverse(ext)
output = ""
output = "Extension from file '" s "' is '" ext "'"
:(p0)
END
- Output:
Extension from file 'http://example.com/download.tar.gz' is '.gz' Extension from file 'CharacterModel.3DS' is '.3DS' Extension from file '.desktop' is '.desktop' Extension from file 'document' is '' Extension from file 'document.txt_backup' is '' Extension from file '/etc/pam.d/login' is ''
Standard ML
This just demonstrates how to functionally extend the built-in function to the alpha-numeric restriction. Since file names starting with '.' are supposed to be "hidden" files in Unix, they're not considered as an extension.
fun fileExt path : string =
getOpt (Option.composePartial (Option.filter (CharVector.all Char.isAlphaNum), OS.Path.ext) path, "")
val tests = [
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login"
]
val () = app (fn s => print (s ^ " -> \"" ^ fileExt s ^ "\"\n")) tests
- Output:
http://example.com/download.tar.gz -> "gz" CharacterModel.3DS -> "3DS" .desktop -> "" document -> "" document.txt_backup -> ""
Tcl
Tcl's built in file extension command already almost knows how to do this, except it accepts any character after the dot. Just for fun, we'll enhance the builtin with a new subcommand with the limitation specified for this problem.
proc assert {expr} { ;# for "static" assertions that throw nice errors
if {![uplevel 1 [list expr $expr]]} {
set msg "{$expr}"
catch {append msg " {[uplevel 1 [list subst -noc $expr]]}"}
tailcall throw {ASSERT ERROR} $msg
}
}
proc file_ext {file} {
set res ""
regexp -nocase {\.[a-z0-9]+$} $file res
return $res
}
set map [namespace ensemble configure file -map]
dict set map ext ::file_ext
namespace ensemble configure file -map $map
# and a test:
foreach {file ext} {
http://example.com/download.tar.gz .gz
CharacterModel.3DS .3DS
.desktop .desktop
document ""
document.txt_backup ""
/etc/pam.d/login ""
} {
set res ""
assert {[file ext $file] eq $ext}
}
TUSCRIPT
$$ MODE DATA
$$ testcases=*
http://example.com/download.tar.gz
CharacterModel.3DS
.desktop
document
document.txt_backup
/etc/pam.d/login
picture.jpg
http://mywebsite.com/picture/image.png
myuniquefile.longextension
IamAFileWithoutExtension
path/to.my/file
file.odd_one
thisismine
$$ MODE TUSCRIPT,{}
BUILD C_GROUP A0 = *
DATA {&a}
DATA {\0}
BUILD S_TABLE legaltokens=*
DATA :.{1-00}{C:A0}{]}:
LOOP testcase=testcases
extension=STRINGS (testcase,legaltokens,0,0)
IF (extension=="") CYCLE
PRINT testcase, " has extension ", extension
ENDLOOP
Output:
http://example.com/download.tar.gz has extension .gz CharacterModel.3DS has extension .3DS .desktop has extension .desktop picture.jpg has extension .jpg http://mywebsite.com/picture/image.png has extension .png myuniquefile.longextension has extension .longextension
VBScript
Function fileExt(fname)
Set fso = CreateObject("Scripting.FileSystemObject")
Set regex = new regExp
Dim ret
regex.pattern = "^[A-Za-z0-9]+$" 'Only alphanumeric characters are allowed
If regex.test(fso.GetExtensionName(fname)) = False Then
ret = ""
Else
ret = "." & fso.GetExtensionName(fname)
End If
fileExt = ret
End Function
'Real Start of Program
arr_t = Array("http://example.com/download.tar.gz", _
"CharacterModel.3DS", _
".desktop", _
"document", _
"document.txt_backup", _
"/etc/pam.d/login")
For Each name In arr_t
Wscript.Echo "NAME:",name
Wscript.Echo " EXT:","<" & fileExt(name) & ">"
Next
- Output:
NAME: http://example.com/download.tar.gz EXT: <.gz> NAME: CharacterModel.3DS EXT: <.3DS> NAME: .desktop EXT: <.desktop> NAME: document EXT: <> NAME: document.txt_backup EXT: <> NAME: /etc/pam.d/login EXT: <>
Visual Basic
Option Explicit
'-----------------------------------------------------------------
Function ExtractFileExtension(ByVal Filename As String) As String
Dim i As Long
Dim s As String
i = InStrRev(Filename, ".")
If i Then
If i < Len(Filename) Then
s = Mid$(Filename, i)
For i = 2 To Len(s)
Select Case Mid$(s, i, 1)
Case "A" To "Z", "a" To "z", "0" To "9"
'these characters are OK in an extension; continue
Case Else
'this one is not OK in an extension
Exit Function
End Select
Next i
ExtractFileExtension = s
End If
End If
End Function
'-----------------------------------------------------------------
Sub Main()
Dim s As String
s = "http://example.com/download.tar.gz"
Debug.Assert ExtractFileExtension(s) = ".gz"
s = "CharacterModel.3DS"
Debug.Assert ExtractFileExtension(s) = ".3DS"
s = ".desktop"
Debug.Assert ExtractFileExtension(s) = ".desktop"
s = "document"
Debug.Assert ExtractFileExtension(s) = ""
s = "document.txt_backup"
Debug.Assert ExtractFileExtension(s) = ""
s = "/etc/pam.d/login"
Debug.Assert ExtractFileExtension(s) = ""
s = "desktop."
Debug.Assert ExtractFileExtension(s) = ""
s = "a.~.c"
Debug.Assert ExtractFileExtension(s) = ".c"
s = "a.b.~"
Debug.Assert ExtractFileExtension(s) = ""
s = "a.b.1~2"
Debug.Assert ExtractFileExtension(s) = ""
End Sub
Wren
import "./pattern" for Pattern
import "./fmt" for Fmt
var p = Pattern.new("/W") // matches any non-alphanumeric character
var extractFileExtension = Fn.new { |path|
if (path.isEmpty) return ""
var fileName = path.split("/")[-1]
if (path == fileName) fileName = path.split("\\")[-1]
var splits = fileName.split(".")
if (splits.count == 1) return ""
var ext = splits[-1]
return p.isMatch(ext) ? "" : "." + ext
}
var paths = [
"http://example.com/download.tar.gz",
"CharacterModel.3DS",
".desktop",
"document",
"document.txt_backup",
"/etc/pam.d/login",
"c:\\programs\\myprogs\\myprog.exe", // using back-slash as delimiter
"c:\\programs\\myprogs\\myprog.exe_backup" // ditto
]
for (path in paths) {
var ext = extractFileExtension.call(path)
Fmt.print("$-37s -> $s", path, ext.isEmpty ? "(empty string)" : ext)
}
- Output:
http://example.com/download.tar.gz -> .gz CharacterModel.3DS -> .3DS .desktop -> .desktop document -> (empty string) document.txt_backup -> (empty string) /etc/pam.d/login -> (empty string) c:\programs\myprogs\myprog.exe -> .exe c:\programs\myprogs\myprog.exe_backup -> (empty string)
XPL0
func Ext(Str); \Return address of extension
char Str; int I, C, End;
string 0;
[I:= 0;
while Str(I) do I:= I+1;
End:= I;
loop [I:= I-1;
if Str(I) = ^. then return @Str(I);
if I = 0 then return @Str(End); \no dot found, return null
C:= Str(I);
if C>=^A & C<=^Z ! C>=^a & C<=^z ! C>=^0 & C<=^9 then \OK
else return @Str(End); \illegal char, return null
];
];
[Text(0, Ext("http://example.com/download.tar.gz")); CrLf(0);
Text(0, Ext("CharacterModel.3DS")); CrLf(0);
Text(0, Ext(".desktop")); CrLf(0);
Text(0, Ext("document")); CrLf(0);
Text(0, Ext("document.txt_backup")); CrLf(0);
Text(0, Ext("/etc/pam.d/login")); CrLf(0);
]
- Output:
.gz .3DS .desktop
zkl
The File object has a method splitFileName that does just that, returning a list of the parts. The method knows about the OS it was compiled on (Unix, Windows).
fcn extractFileExtension(name){
var [const] valid=Walker.chain(".",["a".."z"],["A".."Z"],["0".."9")).pump(String);
ext:=File.splitFileName(name)[-1];
if(ext - valid) ext="";
ext
}
foreach nm in (T("http://example.com/download.tar.gz","CharacterModel.3DS",
".desktop","document",
"document.txt_backup","/etc/pam.d/login")){
println("%35s : %s".fmt(nm,extractFileExtension(nm)));
}
- Output:
Note: on Unix, .desktop is a hidden file, not an extension.
http://example.com/download.tar.gz : .gz CharacterModel.3DS : .3DS .desktop : document : document.txt_backup : /etc/pam.d/login :
- Draft Programming Tasks
- 11l
- Action!
- Action! Tool Kit
- Ada
- ALGOL 68
- ALGOL W
- AppleScript
- AutoHotkey
- AWK
- Batch File
- BCPL
- C
- C sharp
- C++
- Clojure
- CLU
- Common Lisp
- D
- Delphi
- System.SysUtils
- System.Character
- EasyLang
- Ed
- Emacs Lisp
- Factor
- Forth
- Fortran
- FreeBASIC
- Frink
- FutureBasic
- Gambas
- Go
- Haskell
- J
- Java
- JavaScript
- Jq
- Jsish
- Julia
- Kotlin
- Lua
- Mathematica
- Wolfram Language
- Nanoquery
- Nim
- Objeck
- OCaml
- Oforth
- Pascal
- Free Pascal
- Perl
- Phix
- PHP
- PicoLisp
- Plain English
- PowerShell
- Python
- Quackery
- Racket
- Raku
- REXX
- Ring
- Ruby
- Rust
- Scala
- Sed
- Sidef
- Smalltalk
- SNOBOL4
- Standard ML
- Tcl
- TUSCRIPT
- VBScript
- Visual Basic
- Wren
- Wren-pattern
- Wren-fmt
- XPL0
- Zkl