FASTA format: Difference between revisions
Content deleted Content added
Thundergnat (talk | contribs) m syntax highlighting fixup automation |
|||
Line 27:
{{trans|Python}}
<
|‘>Rosetta_Example_1
THERECANBENOSPACE
Line 51:
R r
print(fasta_parse(FASTA).map((key, val) -> ‘#.: #.’.format(key, val)).join("\n"))</
{{out}}
Line 61:
=={{header|Action!}}==
In the following solution the input file [https://gitlab.com/amarok8bit/action-rosetta-code/-/blob/master/source/fasta.txt fasta.txt] is loaded from H6 drive. Altirra emulator automatically converts CR/LF character from ASCII into 155 character in ATASCII charset used by Atari 8-bit computer when one from H6-H10 hard drive under DOS 2.5 is used.
<
CHAR ARRAY line(256)
CHAR ARRAY tmp(256)
Line 90:
ReadFastaFile(fname)
RETURN</
{{out}}
[https://gitlab.com/amarok8bit/action-rosetta-code/-/raw/master/images/FASTA_format.png Screenshot from Atari 8-bit computer]
Line 102:
The simple solution just reads the file (from standard input) line by line and directly writes it to the standard output.
<
procedure Simple_FASTA is
Line 129:
end loop;
end Simple_FASTA;</
{{out}}
Line 142:
<
procedure FASTA is
Line 187:
Map.Iterate(Process => Print_Pair'Access); -- print Map
end FASTA;</
=={{header|Aime}}==
<
text n, s;
Line 205:
}
o_(n);</
{{Out}}
<pre>>Rosetta_Example_1: THERECANBENOSPACE
Line 211:
=={{header|ALGOL W}}==
<
% reads FASTA format data from standard input and write the results to standard output %
% only handles the ">" line start %
Line 236:
readcard( line );
end while_not_eof
end.</
{{out}}
<pre>
Line 245:
=={{header|Arturo}}==
<
result: #[]
current: ø
Line 268:
}
inspect.muted parseFasta text</
{{out}}
Line 278:
=={{header|AutoHotkey}}==
<
(
>Rosetta_Example_1
Line 291:
Gui, add, Edit, w700, % Data
Gui, show
return</
{{out}}
<pre>>Rosetta_Example_1: THERECANBENOSPACE
Line 297:
=={{header|AWK}}==
<syntaxhighlight lang="awk">
# syntax: GAWK -f FASTA_FORMAT.AWK filename
# stop processing each file when an error is encountered
Line 349:
return
}
</syntaxhighlight>
{{out}}
<pre>
Line 360:
{{works with|QBasic|1.1}}
{{works with|QuickBasic|4.5}}
<
FOR i = 1 TO LEN(s$) - 1
IF MID$(s$, i, 1) = CHR$(32) OR MID$(s$, i, 1) = CHR$(9) THEN checkNoSpaces = 0
Line 389:
END IF
LOOP
CLOSE #1</
==={{header|True BASIC}}===
{{trans|QBasic}}
<
IF END #f THEN LET EOF = -1 ELSE LET EOF = 0
END DEF
Line 426:
LOOP
CLOSE #1
END</
=={{header|BASIC256}}==
<
first = True
Line 461:
next i
return True
end function</
=={{header|C}}==
<
#include <stdlib.h>
#include <string.h>
Line 501:
free(line);
exit(EXIT_SUCCESS);
}</
{{out}}
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 508:
=={{header|C sharp|C#}}==
<
using System.Collections.Generic;
using System.IO;
Line 559:
Console.ReadLine();
}
}</
=={{header|C++}}==
<
#include <fstream>
Line 602:
return 0;
}</
{{out}}
Line 610:
=={{header|Clojure}}==
<
(with-open [r (clojure.java.io/reader pathname)]
(doseq [line (line-seq r)]
(if (= (first line) \>)
(print (format "%n%s: " (subs line 1)))
(print line)))))</
=={{header|Common Lisp}}==
<
(defparameter *input* #p"fasta.txt"
"The input file name.")
Line 631:
:do (format t "~&~a: " (subseq line 1))
:else
:do (format t "~a" line)))</
{{out}}
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 637:
=={{header|Crystal}}==
If you want to run below code online, then paste below code to [https://play.crystal-lang.org/#/cr <b>playground</b>]
<
# create tmp fasta file in /tmp/
tmpfile = "/tmp/tmp"+Random.rand.to_s+".fasta"
Line 664:
# show fasta component
fasta.each { |k,v| puts "#{k}: #{v}"}
</syntaxhighlight>
{{out}}
<pre>
Line 674:
=={{header|Factor}}==
<
IN: rosetta-code.fasta
Line 683:
readln rest "%s: " printf [ process-fasta-line ] each-line ;
MAIN: main</
{{out}}
<pre>
Line 692:
=={{header|Forth}}==
Developed with gforth 0.7.9
<
char > constant marker
Line 708:
cr ;
Test
</syntaxhighlight>
{{out}}
<pre>
Line 718:
This program sticks to the task as described in the heading and doesn't allow for any of the (apparently) obsolete
practices described in the Wikipedia article :
<
Function checkNoSpaces(s As String) As Boolean
Line 755:
Print : Print
Print "Press any key to quit"
Sleep</
{{out}}
Line 764:
=={{header|Gambas}}==
<
Dim sList As String = File.Load("../FASTA")
Dim sTemp, sOutput As String
Line 779:
Print sOutput
End</
Output:
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 786:
=={{header|Go}}==
<
import (
Line 828:
fmt.Println(err)
}
}</
{{out}}
<pre>
Line 842:
We parse FASTA by hand (generally not a recommended approach). We use the fact that groupBy walks the list from the head and groups the items by a predicate; here we first concatenate all the fasta strings and then pair those with each respective name.
<
parseFasta :: FilePath -> IO ()
Line 858:
pair :: [String] -> [(String, String)]
pair [] = []
pair (x : y : xs) = (drop 1 x, y) : pair xs</
{{out}}
Line 868:
We parse FASTA using parser combinators. Normally you'd use something like Trifecta or Parsec, but here we use ReadP, because it is simple and also included in ghc by default. With other parsing libraries the code would be almost the same.
<
import Control.Applicative ( (<|>) )
import Data.Char ( isAlpha, isAlphaNum )
Line 885:
name = char '>' *> many (satisfy isAlphaNum <|> char '_') <* newline
code = concat <$> many (many (satisfy isAlpha) <* newline)
newline = char '\n'</
{{out}}
Line 893:
=={{header|J}}==
Needs chunking to handle huge files.
<
parseFasta=: ((': ' ,~ LF&taketo) , (LF -.~ LF&takeafter));._1</
'''Example Usage'''
<
>Rosetta_Example_1
THERECANBENOSPACE
Line 906:
parseFasta Fafile
Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED</
Nowadays, most machines have gigabytes of memory. However, if it's necessary to process FASTA content on a system with inadequate memory we can use files to hold intermediate results. For example:
<
chunkFasta=: {{
r=. EMPTY
Line 937:
end.
r
}}</
Here, we're using a block size of 2 bytes, to illustrate correctness. If speed matters, we should use something significantly larger.
Line 945:
Thus, if '~/fasta.txt' contains the example file for this task and we want to store intermediate results in the '~temp' directory, we could use:
<
And, to complete the task:
<
Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED</
=={{header|Java}}==
{{trans|D}}
{{works with|Java|7}}
<
import java.util.Scanner;
Line 981:
System.out.println();
}
}</
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 989:
=={{header|JavaScript}}==
The code below uses Nodejs to read the file.
<syntaxhighlight lang="javascript">
const fs = require("fs");
const readline = require("readline");
Line 1,018:
readInterface.on("close", () => process.stdout.write("\n"));
</syntaxhighlight>
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 1,029:
in each cycle, only as many lines are read as are required to compose an output line. <br>
Notice that an additional ">" must be provided to "foreach" to ensure the final block of lines of the input file are properly assembled.
<syntaxhighlight lang="jq">
def fasta:
foreach (inputs, ">") as $line
Line 1,040:
;
fasta</
{{out}}
<
Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED</
=={{header|Julia}}==
{{works with|Julia|0.6}}
<
if startswith(line, '>')
print(STDOUT, "\n$(line[2:end]): ")
Line 1,055:
print(STDOUT, "$line")
end
end</
=={{header|Kotlin}}==
{{trans|FreeBASIC}}
<
import java.util.Scanner
Line 1,088:
}
sc.close()
}</
{{out}}
Line 1,097:
=={{header|Lua}}==
<
local data = file:read("*a")
file:close()
Line 1,120:
for k,v in pairs(output) do
print(k..": "..v)
end</
{{out}}
Line 1,134:
<syntaxhighlight lang="m2000 interpreter">
Module CheckIt {
Class FASTA_MACHINE {
Line 1,238:
}
checkit
</syntaxhighlight>
=={{header|Mathematica}}/{{header|Wolfram Language}}==
Mathematica has built-in support for FASTA files and strings
<
THERECANBENOSPACE
>Rosetta_Example_2
Line 1,248:
LINESBUTTHEYALLMUST
BECONCATENATED
", "FASTA"]</
{{out}}
<pre>{"THERECANBENOSPACE", "THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED"}</pre>
=={{header|Nim}}==
<syntaxhighlight lang="nim">
import strutils
Line 1,274:
fasta(input)
</syntaxhighlight>
{{out}}
<pre>
Line 1,282:
=={{header|Objeck}}==
<
function : Main(args : String[]) ~ Nil {
if(args->Size() = 1) {
Line 1,307:
}
}
</syntaxhighlight>
{{out}}
Line 1,320:
The program reads and processes the input one line at a time, and directly prints out the chunk of data available. The long strings are not concatenated in memory but just examined and processed as necessary: either printed out as is in the case of part of a sequence, or formatted in the case of the name (what I call the label), and managing the new lines where needed.
{{works with|OCaml|4.03+}}
<
(* This program reads from the standard input and writes to standard output.
* Examples of use:
Line 1,361:
let () =
print_fasta stdin
</syntaxhighlight>
{{out}}
Rosetta_Example_1: THERECANBENOSPACE
Line 1,367:
=={{header|Pascal}}==
<syntaxhighlight lang="pascal">
program FASTA_Format;
// FPC 3.0.2
Line 1,407:
Close(InF);
end.
</syntaxhighlight>
FASTA_Format < test.fst
Line 1,416:
=={{header|Perl}}==
<
>Rosetta_Example_1
THERECANBENOSPACE
Line 1,434:
print;
}
}</
{{out}}
<pre>
Line 1,442:
=={{header|Phix}}==
<!--<
<span style="color: #004080;">bool</span> <span style="color: #000000;">first</span> <span style="color: #0000FF;">=</span> <span style="color: #004600;">true</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">fn</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">open</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"fasta.txt"</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"r"</span><span style="color: #0000FF;">)</span>
Line 1,466:
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
<span style="color: #7060A8;">close</span><span style="color: #0000FF;">(</span><span style="color: #000000;">fn</span><span style="color: #0000FF;">)</span>
<!--</
{{out}}
<pre>
Line 1,474:
=={{header|PicoLisp}}==
<
(in F
(while (from ">")
Line 1,481:
(prin (line T)) )
(prinl) ) ) )
(fasta "fasta.dat")</
{{out}}
<pre>
Line 1,491:
When working with a real file, the content of the <code>$file</code> variable would be: <code>Get-Content -Path .\FASTA_file.txt -ReadCount 1000</code>. The <code>-ReadCount</code> parameter value for large files is unknown, yet sure to be a value between 1,000 and 10,000 depending upon the length of file and length of the records in the file. Experimentation is the only way to know the optimum value.
{{works with|PowerShell|4.0+}}
<syntaxhighlight lang="powershell">
$file = @'
>Rosetta_Example_1
Line 1,513:
$output | Format-List
</syntaxhighlight>
{{Out}}
<pre>
Line 1,521:
===Version 3.0 Or Less===
<syntaxhighlight lang="powershell">
$file = @'
>Rosetta_Example_1
Line 1,543:
$output | Format-List
</syntaxhighlight>
{{Out}}
<pre>
Line 1,551:
=={{header|PureBasic}}==
<
Define Hdl_File.i,
Frm_File.i,
Line 1,579:
CloseFile(Hdl_File)
Input()
EndIf</
{{out}}
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 1,589:
and I use a generator expression yielding key, value pairs
as soon as they are read, keeping the minimum in memory.
<
FASTA='''\
Line 1,613:
yield key, val
print('\n'.join('%s: %s' % keyval for keyval in fasta_parse(infile)))</
{{out}}
Line 1,620:
=={{header|R}}==
<
library("seqinr")
Line 1,633:
cat(attr(aline, 'Annot'), ":", aline, "\n")
}
</syntaxhighlight>
{{out}}
<pre>
Line 1,641:
=={{header|Racket}}==
<
#lang racket
(let loop ([m #t])
Line 1,651:
(current-output-port)))))
(newline)
</syntaxhighlight>
=={{header|Raku}}==
(formerly Perl 6)
<syntaxhighlight lang="raku"
rule TOP { <entry>+ }
Line 1,675:
for $/<entry>[] {
say ~.<title>, " : ", .<sequence>.made;
}</
{{out}}
<pre>Rosetta_Example_1 : THERECANBENOSPACE
Line 1,684:
===version 1===
This REXX version correctly processes the examples shown.
<
parse arg iFID . /*iFID: the input file to be read. */
if iFID=='' then iFID='FASTA.IN' /*Not specified? Then use the default.*/
Line 1,699:
else $=$ || x
end /*j*/ /* [↓] show output of last file used. */
if $\=='' then say name':' $ /*stick a fork in it, we're all done. */</
{{out|output|text= when using the default input filename:}}
<pre>
Line 1,712:
::* sequences that contain blanks, tabs, and other whitespace
::* sequence names that are identified with a semicolon [''';''']
<
parse arg iFID . /*iFID: the input file to be read. */
if iFID=='' then iFID='FASTA2.IN' /*Not specified? Then use the default.*/
Line 1,733:
else $=space($ || translate(x, , '*'), 0)
end /*j*/ /* [↓] show output of last file used. */
if $\=='' then say name':' $ /*stick a fork in it, we're all done. */</
<pre>
'''input:''' The '''FASTA2.IN''' file is shown below:
Line 1,766:
=={{header|Ring}}==
<
# Project : FAST format
Line 1,789:
i = i + 1
end
</syntaxhighlight>
Output:
<pre>
Line 1,797:
=={{header|Ruby}}==
<
out, text = [], ""
strings.split("\n").each do |line|
Line 1,819:
EOS
puts fasta_format(data)</
{{out}}
Line 1,828:
=={{header|Run BASIC}}==
<
THERECANBENOSPACE
>Rosetta_Example_2
Line 1,845:
end if
i = i + 1
wend</
{{out}}
<pre>>Rosetta_Example_1: THERECANBENOSPACE
Line 1,854:
This example is implemented using an [https://doc.rust-lang.org/book/iterators.html iterator] to reduce memory requirements and encourage code reuse.
<
use std::env;
use std::io::{BufReader, Lines};
Line 1,918:
}
}
</syntaxhighlight>
{{out}}
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 1,924:
=={{header|Scala}}==
<
import java.util.Scanner
Line 1,942:
println("~~~+~~~")
}</
=={{header|Scheme}}==
<
(scheme file)
(scheme write))
Line 1,960:
(display (string-copy line 1)) (display ": "))
(else ; display the string directly
(display line))))))</
{{out}}
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 1,966:
=={{header|Seed7}}==
<
const proc: main is func
Line 1,992:
end if;
writeln;
end func;</
{{out}}
Line 2,002:
=={{header|Sidef}}==
{{trans|Ruby}}
<
var out = []
var text = ''
Line 2,026:
THERECANBESEVERAL
LINESBUTTHEYALLMUST
BECONCATENATED</
{{out}}
<pre>
Line 2,034:
=={{header|Tcl}}==
<
set f [open $filename]
set sep ""
Line 2,049:
}
fastaReader ./rosettacode.fas</
{{out}}
<pre>
Line 2,058:
=={{header|TMG}}==
Unix TMG: <!-- C port of TMG processes 1.04 GB FASTA file in 38 seconds on a generic laptop -->
<
loop: parse(line)\loop parse(( = {*} ));
line: ( name | * = {} | seqns );
Line 2,071:
spaces: << >>;
f: 1;</
=={{header|uBasic/4tH}}==
<syntaxhighlight lang="text">If Cmd (0) < 2 Then Print "Usage: fasta <fasta file>" : End
If Set(a, Open (Cmd(2), "r")) < 0 Then Print "Cannot open \q";Cmd(2);"\q" : End
Line 2,105:
Loop ' if not add the line to current string
Return (b@) ' return the string</
{{out}}
<pre>Rosetta_Example_1: THERECANBENOSPACE
Line 2,114:
{{trans|Kotlin}}
More or less.
<
var checkNoSpaces = Fn.new { |s| !s.contains(" ") && !s.contains("\t") }
Line 2,153:
}
}
}</
{{out}}
Line 2,162:
=={{header|XPL0}}==
<
int Ch;
def LF=$0A, EOF=$1A;
Line 2,184:
Echo;
];
]</
{{out}}
Line 2,194:
=={{header|zkl}}==
<
fcn(w){ // one string at a time, -->False garbage at front of file
line:=w.next().strip();
Line 2,201:
})
}.fp(data.walker()) : Utils.Helpers.wap(_);
}</
*This assumes that white space at front or end of string is extraneous (excepting ">" lines).
*Lazy, works for objects that support iterating over lines (ie most).
*The fasta function returns an iterator that wraps a function taking an iterator. Uh, yeah. An initial iterator (Walker) is used to get lines, hold state and do push back when read the start of the next string. The function sucks up one string (using the iterator). The wrapping iterator (wap) traps the exception when the function waltzes off the end of the data and provides API for foreach (etc).
FASTA file:
<
FASTA data blob:
<
">Rosetta_Example_1\nTHERECANBENOSPACE\n"
">Rosetta_Example_2\nTHERECANBESEVERAL\nLINESBUTTHEYALLMUST\n"
"BECONCATENATED");
foreach l in (fasta(data)) { println(l) }</
{{out}}
<pre>
|