FASTA format

From Rosetta Code
Revision as of 14:17, 3 April 2013 by Grondilu (talk | contribs) (perl6 entry)
FASTA format is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In bioinformatics, long character strings are often encoded in a format called FASTA. A FASTA file can contain several strings, each identified by a name marked by a '>' character at the beginning of the line.

Write a program that reads a FASTA file such as:

>Rosetta_Example_1
THERECANBENOSPACE
>Rosetta_Example_2
THERECANBESEVERAL
LINESBUTTHEYALLMUST
BECONCATENATED

And prints the following output:

Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED

Perl 6

Certainly not the most elegant way to do it, but that's a start: <lang Perl 6>say "{.[0]}: {.[1]>>.comb(/\N+/).join}" for ">Rosetta_Example_1 THERECANBENOSPACE >Rosetta_Example_2 THERECANBESEVERAL LINESBUTTHEYALLMUST BECONCATENATED".comb: / '>' (\N+)\n (<!before '>'>\N+\n?)+ /, :match</lang>