FASTA format: Difference between revisions

m
(Added uBasic/4tH version)
Line 906:
parseFasta Fafile
Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED</lang>
 
Nowadays, most machines have gigabytes of memory. However, if it's necessary to process FASTA content on a system with inadequate memory we can use files to hold intermediate results. For example:
 
<lang J>bs=: 2
chunkFasta=: {{
r=. EMPTY
bad=. a.-.a.{~;48 65 97(+i.)each 10 26 26
dir=. x,'/'
off=. 0
siz=. fsize y
block=. dest=. ''
while. off < siz do.
block=. block,fread y;off([, [ -~ siz<.+)bs
off=. off+bs
while. LF e. block do.
line=. LF taketo block
select. {.line
case. ';' do.
case. '>' do.
start=. }.line-.CR
r=.r,(head=. name,'.head');<name=. dir,start -. bad
start fwrite head
'' fwrite name
case. do.
(line-.bad) fappend name
end.
block=. LF takeafter block
end.
end.
r
}}</lang>
 
Here, we're using a block size of 2 bytes, to illustrate correctness. If speed matters, we should use something significantly larger.
 
The left argument to <code>chunkFasta</code> names the directory used to hold content extracted from the FASTA file. The right argument names that FASTA file. The result identifies the extracted headers and contents
 
Thus, if '~/fasta.txt' contains the example file for this task and we want to store intermediate results in the '~temp' directory, we could use:
 
<lang J> fasta=: '~temp' chunkFasta '~/fasta.txt'</lang>
 
And, to complete the task:
 
<lang J> ;(,': ',,&LF)each/"1 fread each fasta
Rosetta_Example_1: THERECANBENOSPACE
Rosetta_Example_2: THERECANBESEVERALLINESBUTTHEYALLMUSTBECONCATENATED</lang>
 
6,962

edits