Rosetta Code/Find bare lang tags: Difference between revisions

Content added Content deleted
(Added Erlang)
(Added Haskell solution)
Line 96: Line 96:
1 in "Perl"
1 in "Perl"
</pre>
</pre>

=={{header|Haskell}}==
There are actually many different Regex packages available for Haskell. For this example, I chose TDFA, a very fast POSIX ERE engine. To change engines, simply change the import statement. If you use a Perl-style RE engine, you'll have to modify the expressions slightly.

This solution can be compiled into a program that will either take space-delimited list of files as its argument, or take input from STDIN if no arguments are provided. The Media Wiki API bonus is not attempted.

<lang Haskell>import System.Environment
import Text.Printf
import Text.Regex.TDFA
import Data.List
import Data.Array
import qualified Data.Map as Map

splitByMatches :: String -> [MatchText String] -> [String]
splitByMatches str matches = foldr (\match acc ->
let before = take (matchOffset).head $ acc
after = drop (matchOffset + matchLen).head $ acc
matchOffset = fst.snd.(!0) $ match
matchLen = snd.snd.(!0) $ match
in before:after:(tail acc)
) [str] matches

{-| Takes a string and splits it into the different languages used. All text
before the language headers is put into the key "" -}
splitByLanguage :: String -> Map.Map String String
splitByLanguage str = Map.fromList.zip langs $ splitByMatches str allMatches
where langs = "":(map (fst.(!1)) allMatches)
allMatches = matchAllText (makeRegex headerRegex :: Regex) str
headerRegex = "==[[:space:]]*{{[[:space:]]*header[[:space:]]*\\|[[:space:]]*([^ }]*)[[:space:]]*}}[^=]*=="

{-| Takes a string and counts the number of time a valid, but bare, lang tag
appears. It does not attempt to ignore valid tags inside lang blocks. -}
countBareLangTags :: String -> Int
countBareLangTags = matchCount (makeRegex "<lang[[:space:]]*>" :: Regex)

main = do
args <- getArgs
(contents, files) <- if length args == 0 then do
-- If there aren't arguments, read from stdin
content <- getContents
return ([content],[""])
else if length args == 1 then do
-- If there's only one argument, read the file, but don't display
-- the filename in the results.
content <- readFile (head args)
return ([content],[""])
else do
-- Otherwise, read all the files and display their file names.
contents <- mapM readFile args
return (contents, args)
let bareTagMaps = map (Map.map countBareLangTags.splitByLanguage) $ contents
let tagsWithFiles = zipWith (\tags file -> Map.map (addFile file) tags) bareTagMaps files
let allBareTags = foldl combineMaps Map.empty tagsWithFiles
printBareTags allBareTags
where addFile file count = (count, if count>0 && file/="" then [file] else [])
combineMaps = Map.foldrWithKey insertItem
insertItem = Map.insertWith (\(newC,newF) (oldC,oldF) -> (oldC+newC,oldF++newF))
printBareTags :: Map.Map String (Int,[String]) -> IO ()
printBareTags tags = do
let numBare = Map.foldr ((+).fst) 0 tags
printf "%d bare language tags:\n\n" numBare
flip mapM_ (Map.toAscList tags) (\(lang,(count,files)) ->
if count <= 0 then return () else printf "%d in %s%s\n" count (
if lang == "" then "no language" else lang) (filesString files))

filesString :: [String] -> String
filesString [] = ""
filesString files = " ("++listString files++")"
where listString [file] = "[["++file++"]]"
listString (file:files) = "[["++file++"]], "++listString files</lang>

Here are the input files I used to test:

<pre><nowiki>
example1.wiki
-------------------------------------------------------------
Description

<lang>Pseudocode</lang>

=={{header|C}}==
<lang C>printf("Hello world!\n");</lang>

=={{header|Perl}}==
<lang>print "Hello world!\n"</lang>
</nowiki></pre>
<pre><nowiki>
example2.wiki
-------------------------------------------------------------
Description

<lang>Pseudocode</lang>

=={{header|C}}==
<lang>printf("Hello world!\n");</lang>

=={{header|Perl}}==
<lang>print "Hello world!\n"</lang>
<lang Perl>print "Goodbye world!\n"</lang>

=={{header|Haskell}}==
<lang>hubris lang = "I'm so much better than a "++lang++" programmer because I program in Haskell."</lang>
</nowiki></pre>

And the output:

<pre><nowiki>
6 bare language tags:

2 in no language ([[example1.wiki]], [[example2.wiki]])
1 in C ([[example2.wiki]])
1 in Haskell ([[example2.wiki]])
2 in Perl ([[example1.wiki]], [[example2.wiki]])
</nowiki></pre>


=={{header|Perl}}==
=={{header|Perl}}==