Stream merge: Difference between revisions

← Older edit

Stream merge (view source)

Revision as of 21:36, 12 March 2024

2,182 bytes added , 2 months ago

Added FreeBASIC

Jjuanhdez

2,122

edits

Revision as of 18:44, 15 January 2022 (view source) Petelomax (talk \| contribs) m (→‎{{header\|Phix}}: added syntax colouring, marked p2js incompatible) ← Older edit		Latest revision as of 21:36, 12 March 2024 (view source) Jjuanhdez (talk \| contribs) (Added FreeBASIC)
(2 intermediate revisions by 2 users not shown)
Line 14: =={{header\|360 Assembly}}== No usage of tricks such as forbiden records in the streams. <~~lang~~syntaxhighlight lang="360asm">* Stream Merge 07/02/2017 STRMERGE CSECT USING STRMERGE,R13 base register Line 130: PG DS CL64 YREGS END STRMERGE</~~lang~~syntaxhighlight> {{in}} <pre style="height:20ex"> Line 167: =={{header\|Ada}}== <~~lang~~syntaxhighlight ~~Ada~~lang="ada">with Ada.Text_Io; with Ada.Command_Line; with Ada.Containers.Indefinite_Holders; Line 238: end loop; end Stream_Merge;</~~lang~~syntaxhighlight> =={{header\|ALGOL 68}}== NB, all the files (including the output files) must exist before running this. The output files are overwritten with the merged records. <~~lang~~syntaxhighlight lang="algol68"># merge a number of input files to an output file # PROC mergenf = ( []REF FILE inf, REF FILE out )VOID: BEGIN Line 344: # test the file merge # merge2( "in1.txt", "in2.txt", "out2.txt" ); mergen( ( "in1.txt", "in2.txt", "in3.txt", "in4.txt" ), "outn.txt" )</~~lang~~syntaxhighlight> {{out}} <pre> Line 350: =={{header\|ATS}}== <syntaxhighlight lang="ats"> ~~<lang ATS>~~ (* **** **** ) // Line 539: // } ( end of [main0] ) </syntaxhighlight> ~~</lang>~~ =={{header\|AWK}}== <syntaxhighlight lang="awk"> ~~<lang AWK>~~ # syntax: GAWK -f STREAM_MERGE.AWK filename(s) >output # handles 1 .. N files Line 608: errors++ } </syntaxhighlight> ~~</lang>~~ =={{header\|C}}== <syntaxhighlight lang="c">/ <lang C>/* * Rosetta Code - stream merge in C. * Line 654: return EXIT_SUCCESS; } </syntaxhighlight> ~~</lang>~~ =={{header\|C sharp\|C#}}== <~~lang~~syntaxhighlight lang="csharp"> using System; using System.Collections.Generic; Line 711: } } }</~~lang~~syntaxhighlight> {{out}} <pre>1 2 4 5 7 8 10 11 Line 718: =={{header\|C++}}== {{trans\|C#}} <~~lang~~syntaxhighlight lang="cpp">//#include <functional> #include <iostream> #include <vector> Line 813: mergeN(display, { v3, v2, v1 }); std::cout << '\n'; }</~~lang~~syntaxhighlight> {{out}} <pre>0 1 3 4 6 7 Line 820: =={{header\|D}}== <~~lang~~syntaxhighlight Dlang="d">import std.range.primitives; import std.stdio; Line 892: } } while (!done); }</~~lang~~syntaxhighlight> {{out}} Line 902: =={{header\|Elixir}}== <~~lang~~syntaxhighlight lang="elixir">defmodule StreamMerge do def merge2(file1, file2), do: mergeN([file1, file2]) Line 930: StreamMerge.merge2("temp1.dat", "temp2.dat") IO.puts "\nN-stream merge:" StreamMerge.mergeN(filenames)</~~lang~~syntaxhighlight> {{out}} Line 980: =={{header\|Fortran}}== This is a classic problem, but even so, Fortran does not supply a library routine for this. So...<~~lang~~syntaxhighlight ~~Fortran~~lang="fortran"> SUBROUTINE FILEMERGE(N,INF,OUTF) !Merge multiple inputs into one output. INTEGER N !The number of input files. INTEGER INF() !Their unit numbers. Line 1,047: CALL FILEMERGE(MANY,FI,F) !E pluribus unum. END !That was easy.</~~lang~~syntaxhighlight> Obviously, there would be variations according to the nature of the data streams being merged, and whatever sort key was involved. For this example, input from disc files will do and the sort key is the entire record's text. This means there is no need to worry over the case where, having written a record from stream S and obtained the next record from stream S, it proves to have equal precedence with the waiting record for some other stream. Which now should take precedence? With entirely-equal records it obviously doesn't matter but if the sort key is only partial then different record content could be deemed equal and then a choice has an effect. Line 1,057: The source for subroutine GRAB is within subroutine FILEMERGE for the convenience in sharing and messing with variables important to both, but not to outsiders. This facility is standard in Algol-following languages but often omitted and was not added to Fortran until F90. In its absence, either more parameters are required for the separate routines, or there will be messing with COMMON storage areas. =={{header\|FreeBASIC}}== {{trans\|C++}} <syntaxhighlight lang="vbnet">Sub Merge2(c1() As Integer, c2() As Integer) Dim As Integer i1 = Lbound(c1) Dim As Integer i2 = Lbound(c2) While i1 <= Ubound(c1) And i2 <= Ubound(c2) If c1(i1) <= c2(i2) Then Print c1(i1); i1 += 1 Else Print c2(i2); i2 += 1 End If Wend While i1 <= Ubound(c1) Print c1(i1); i1 += 1 Wend While i2 <= Ubound(c2) Print c2(i2); i2 += 1 Wend Print End Sub Sub MergeN(all() As Integer) Dim As Integer i = Lbound(all) While i <= Ubound(all) Print all(i); i += 1 Wend Print End Sub Dim As Integer v1(2) = {0, 3, 6} Dim As Integer v2(2) = {1, 4, 7} Dim As Integer v3(2) = {2, 5, 8} Merge2(v2(), v1()) MergeN(v1()) Dim As Integer all(8) = {v1(0), v2(0), v3(0), v1(1), v2(1), v3(1), v1(2), v2(2), v3(2)} MergeN(all()) Sleep</syntaxhighlight> {{out}} <pre> 0 1 3 4 6 7 0 3 6 0 1 2 3 4 5 6 7 8</pre> =={{header\|Go}}== '''Using standard library binary heap for mergeN:''' <~~lang~~syntaxhighlight lang="go">package main import ( Line 1,154 ⟶ 1,206: } } }</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,161 ⟶ 1,213: </pre> '''MergeN using package from [[Fibonacci heap]] task:''' <~~lang~~syntaxhighlight lang="go">package main import ( Line 1,220 ⟶ 1,272: } } }</~~lang~~syntaxhighlight> {{out}} <pre> Line 1,232 ⟶ 1,284: === conduit === <~~lang~~syntaxhighlight lang="haskell">-- stack runhaskell --package=conduit-extra --package=conduit-merge import Control.Monad.Trans.Resource (runResourceT) Line 1,250 ⟶ 1,302: runResourceT $ mergeSources inputs $$ sinkStdoutLn where sinkStdoutLn = Conduit.map (`BS.snoc` '\n') =$= sinkHandle stdout</~~lang~~syntaxhighlight> See implementation in https://github.com/cblp/conduit-merge/blob/master/src/Data/Conduit/Merge.hs Line 1,256 ⟶ 1,308: === pipes === <~~lang~~syntaxhighlight lang="haskell">-- stack runhaskell --package=pipes-safe --package=pipes-interleave import Pipes (runEffect, (>->)) Line 1,270 ⟶ 1,322: sourceFileNames <- getArgs let sources = map readFile sourceFileNames runSafeT . runEffect $ interleave compare sources >-> stdoutLn</~~lang~~syntaxhighlight> See implementation in https://github.com/bgamari/pipes-interleave/blob/master/Pipes/Interleave.hs =={{header\|Java}}== <~~lang~~syntaxhighlight ~~Java~~lang="java">import java.util.Iterator; import java.util.List; import java.util.Objects; Line 1,374 ⟶ 1,426: System.out.flush(); } }</~~lang~~syntaxhighlight> {{out}} <pre>1245781011 Line 1,382 ⟶ 1,434: {{trans\|C}} The IOStream type in Julia encompasses any data stream, including file I/O and TCP/IP. The IOBuffer used here maps a stream to a buffer in memory, and so allows an easy simulation of two streams without opening files. <syntaxhighlight lang="julia"> ~~<lang Julia>~~ function merge(stream1, stream2, T=Char) if !eof(stream1) && !eof(stream2) Line 1,421 ⟶ 1,473: println("\nDone.") </~~lang~~syntaxhighlight>{{output}}<pre> abcdefghijklmnopqrstuvwyxz Done. Line 1,428 ⟶ 1,480: =={{header\|Kotlin}}== Uses the same data as the REXX entry. As Kotlin lacks a Heap class, when merging N files, we use a nullable MutableList instead. All comparisons are text based even when the files contain nothing but numbers. <~~lang~~syntaxhighlight lang="scala">// version 1.2.21 import java.io.File Line 1,487 ⟶ 1,539: println(File("merged2.txt").readText()) println(File("mergedN.txt").readText()) }</~~lang~~syntaxhighlight> {{out}} Line 1,514 ⟶ 1,566: Optimized for clarity and simplicity, not performance. assumes two files containing sorted integers separated by newlines <~~lang~~syntaxhighlight lang="nim">import streams,strutils let stream1 = newFileStream("file1") Line 1,524 ⟶ 1,576: echo line for line in stream2.lines: echo line</~~lang~~syntaxhighlight> ===Merge N streams=== Line 1,530 ⟶ 1,582: Of course, as Phix and Nim are very different languages, the code is quite different, but as Phix, we use a priority queue (which is provided by the standard module <code>heapqueue</code>. We work with files built from the “Data” constant, but we destroy them after usage. We have also put the whole merging code in an procedure. <~~lang~~syntaxhighlight ~~Nim~~lang="nim">import heapqueue, os, sequtils, streams type Line 1,586 ⟶ 1,638: # Clean-up: delete the files. for name in Filenames: removeFile(name)</~~lang~~syntaxhighlight> {{out}} Line 1,604 ⟶ 1,656: =={{header\|Perl}}== We make use of an iterator interface which String::Tokenizer provides. Credit: we obtained all the sample text from http://www.lipsum.com/. <~~lang~~syntaxhighlight lang="perl">use strict; use warnings; use English; Line 1,729 ⟶ 1,781: # At this point every iterator has been exhausted. return; }</~~lang~~syntaxhighlight> {{out}} <pre>Merge of 2 streams: Line 1,739 ⟶ 1,791: =={{header\|Phix}}== Using a priority queue <!--<~~lang~~syntaxhighlight ~~Phix~~lang="phix">(notonline)--> <span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- file i/o</span> <span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">/</span><span style="color: #000000;">pqueue</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span> Line 1,787 ⟶ 1,839: <span style="color: #0000FF;">{}</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">delete_file</span><span style="color: #0000FF;">(</span><span style="color: #000000;">filenames</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">end</span> <span style="color: #008080;">for</span> <!--</~~lang~~syntaxhighlight>--> {{out}} <pre> Line 1,805 ⟶ 1,857: =={{header\|PicoLisp}}== <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(de streamMerge @ (let Heap (make Line 1,818 ⟶ 1,870: (if (in (cdar Heap) (read)) (set (car Heap) @) (close (cdr (pop 'Heap))) ) ) ) ) )</~~lang~~syntaxhighlight> <pre>$ cat a 3 14 15 Line 1,830 ⟶ 1,882: 2 3 5 7</pre> Test: <~~lang~~syntaxhighlight ~~PicoLisp~~lang="picolisp">(test (2 3 14 15 17 18) (streamMerge (open "a") Line 1,840 ⟶ 1,892: (open "b") (open "c") (open "d") ) )</~~lang~~syntaxhighlight> 'streamMerge' works with non-numeric data as well, and also - instead of calling 'open' on a file or named pipe - with the results of 'connect' or 'listen' (i.e. Line 1,851 ⟶ 1,903: There exists a standard library function <code>heapq.merge</code> that takes any number of sorted stream iterators and merges them into one sorted iterator, using a [[heap]]. <~~lang~~syntaxhighlight lang="python">import heapq import sys sources = sys.argv[1:] for item in heapq.merge(open(source) for source in sources): print(item)</~~lang~~syntaxhighlight> =={{header\|Racket}}== <~~lang~~syntaxhighlight lang="racket">;; This module produces a sequence that merges streams in order (by <) #lang racket/base (require racket/stream) Line 1,932 ⟶ 1,984: '(1 2 3 4 5 6 7 8 9 10)) (check-equal? (for/list ((i (merge-sequences/< '(2 4 6 7 8 9 10) '(1 3 5)))) i) '(1 2 3 4 5 6 7 8 9 10)))</~~lang~~syntaxhighlight> {{out}} Line 1,948 ⟶ 2,000: =={{header\|REXX}}== ===version 1=== <~~lang~~syntaxhighlight lang="rexx">/ REXX *************************************************************** * Merge 1.txt ... n.txt into all.txt * 1.txt 2.txt 3.txt 4.txt Line 2,027 ⟶ 2,079: Return o: Return lineout(oid,arg(1))</~~lang~~syntaxhighlight> {{out}} <pre>1 Line 2,050 ⟶ 2,102: No   ''heap''   is needed to keep track of which record was written, nor needs replenishing from its input file. <~~lang~~syntaxhighlight lang="rexx">/REXX pgm reads sorted files (1.TXT, 2.TXT, ···), and writes sorted data ───► ALL.TXT / @.=copies('ff'x, 1e4); call lineout 'ALL.TXT',,1 /no value should be larger than this. / do n=1 until @.n==@.; call rdr n; end /read any number of appropriate files./ Line 2,063 ⟶ 2,115: end /forever/ /keep reading/merging until exhausted./ /──────────────────────────────────────────────────────────────────────────────────────/ rdr: arg z; @.z= @.; f= z'.TXT'; if lines(f)\==0 then @.z= linein(f); return</~~lang~~syntaxhighlight> {{out\|output\|text=  is the same as the 1<sup>st</sup> REXX version when using identical input files,   except the output file is named   '''ALL.TXT'''}} <br><br> Line 2,070 ⟶ 2,122: {{works with\|Rakudo\|2018.02}} <syntaxhighlight lang="raku" ~~perl6~~line>sub merge_streams ( @streams ) { my @s = @streams.map({ hash( STREAM => $_, HEAD => .get ) })\ .grep({ .<HEAD>.defined }); Line 2,082 ⟶ 2,134: } say merge_streams([ @ARGS».&open ]);</~~lang~~syntaxhighlight> =={{header\|Ruby}}== <~~lang~~syntaxhighlight lang="ruby">def stream_merge(files) fio = files.map{\|fname\| open(fname)} merge(fio.map{\|io\| [io, io.gets]}) Line 2,109 ⟶ 2,161: puts "#{fname}: #{data}" end stream_merge(files)</~~lang~~syntaxhighlight> {{out}} Line 2,139 ⟶ 2,191: =={{header\|Scala}}== <~~lang~~syntaxhighlight lang="scala">def mergeN[A : Ordering](is: Iterator[A]): Iterator[A] = is.reduce((a, b) => merge2(a, b)) def merge2[A : Ordering](i1: Iterator[A], i2: Iterator[A]): Iterator[A] = { Line 2,158 ⟶ 2,210: nextHead ++ merge2Buffered(i1, i2) } }</~~lang~~syntaxhighlight> Example usage, demonstrating lazyness: <~~lang~~syntaxhighlight lang="scala">val i1 = Iterator.tabulate(5) { i => val x = i * 3 println(s"generating $x") Line 2,185 ⟶ 2,237: val x = merged.next println(s"output: $x") }</~~lang~~syntaxhighlight> {{out}} Line 2,221 ⟶ 2,273: =={{header\|Sidef}}== {{trans\|Raku}} <~~lang~~syntaxhighlight lang="ruby">func merge_streams(streams) { var s = streams.map { \|stream\| Pair(stream, stream.readline) Line 2,235 ⟶ 2,287: } say merge_streams(ARGV.map {\|f\| File(f).open_r }).join("\n")</~~lang~~syntaxhighlight> =={{header\|Tcl}}== Line 2,242 ⟶ 2,294: A careful reader will notice that '''$peeks''' is treated alternately as a dictionary ('''dict set''', '''dict get''') and as a list ('''lsort''', '''lassign'''), exploiting the fact that dictionaries are simply lists of even length. For large dictionaries this would not be recommended, as it causes [https://wiki.tcl.tk/3033 "shimmering"], but in this example the impact is too small to matter. <~~lang~~syntaxhighlight ~~Tcl~~lang="tcl">#!/usr/bin/env tclsh proc merge {args} { set peeks {} Line 2,262 ⟶ 2,314: merge {}[lmap f $::argv {open $f r}] </syntaxhighlight> ~~</lang>~~ =={{header\|UNIX Shell}}== Line 2,274 ⟶ 2,326: {{libheader\|Wren-seq}} No Heap class, so we use a List. Comparisons are text based even for numbers. <~~lang~~syntaxhighlight ~~ecmascript~~lang="wren">import "io" for File import "./ioutil" for FileUtil import "./str" for Str import "./seq" for Lst var merge2 = Fn.new { \|inputFile1, inputFile2, outputFile\| Line 2,325 ⟶ 2,377: // check it worked System.print(File.read("merged2.txt")) System.print(File.read("mergedN.txt"))</~~lang~~syntaxhighlight> {{out}} Line 2,351 ⟶ 2,403: =={{header\|zkl}}== This solution uses iterators, doesn't care where the streams orginate and only keeps the head of the stream on hand. <~~lang~~syntaxhighlight lang="zkl">fcn mergeStreams(s1,s2,etc){ //-->Walker streams:=vm.arglist.pump(List(),fcn(s){ // prime and prune if( (w:=s.walker())._next() ) return(w); Line 2,364 ⟶ 2,416: v }.fp(streams)); }</~~lang~~syntaxhighlight> Using infinite streams: <~~lang~~syntaxhighlight lang="zkl">w:=mergeStreams([0..],[2..,2],[3..*,3],T(5)); w.walk(20).println();</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,373 ⟶ 2,425: </pre> Using files: <~~lang~~syntaxhighlight lang="zkl">w:=mergeStreams(File("unixdict.txt"),File("2hkprimes.txt"),File("/dev/null")); do(10){ w.read().print() }</~~lang~~syntaxhighlight> {{out}} <pre> Line 2,389 ⟶ 2,441: </pre> Using the above example to squirt the merged stream to a file: <~~lang~~syntaxhighlight lang="zkl">mergeStreams(File("unixdict.txt"),File("2hkprimes.txt"),File("/dev/null")) .pump(File("foo.txt","w"));</~~lang~~syntaxhighlight> {{out}} <pre>