Retrieve and search chat history

From Rosetta Code
Task
Retrieve and search chat history
You are encouraged to solve this task according to the task description, using any language you may know.
Task

Summary: Find and print the mentions of a given string in the recent chat logs from a chatroom. Only use your programming language's standard library.

Details:

The Tcl Chatroom is an online chatroom. Its conversations are logged. It's useful to know if someone has mentioned you or your project in the chatroom recently. You can find this out by searching the chat logs. The logs are publicly available at http://tclers.tk/conferences/tcl/. One log file corresponds to the messages from one day in Germany's current time zone. Each chat log file has the name YYYY-MM-DD.tcl where YYYY is the year, MM is the month and DD the day. The logs store one message per line. The messages themselves are human-readable and their internal structure doesn't matter.

Retrieve the chat logs from the last 10 days via HTTP. Find the lines that include a particular substring and print them in the following format:

<log file URL>
------
<matching line 1>
<matching line 2>
...
<matching line N>
------

The substring will be given to your program as a command line argument.

You need to account for the possible time zone difference between the client running your program and the chat log writer on the server to not miss any mentions. (For example, if you generated the log file URLs naively based on the local date, you could miss mentions if it was already April 5th for the logger but only April 4th for the client.) What this means in practice is that you should either generate the URLs in the time zone Europe/Berlin or, if your language can not do that, add an extra day (today + 1) to the range of dates you check, but then make sure to not print parts of a "not found" page by accident if a log file doesn't exist yet.

The code should be contained in a single-file script, with no "project" or "dependency" file (e.g., no requirements.txt for Python). It should only use a given programming language's standard library to accomplish this task and not rely on the user having installed any third-party packages.

If your language does not have an HTTP client in the standard library, you can speak raw HTTP 1.0 to the server. If it can't parse command line arguments in a standalone script, read the string to look for from the standard input.

C[edit]

Starts from current date, prints out lines containing matching substring and also if the string is not found at all in the log of that particular day and also if the log of a day cannot be read for any reason, requires libcurl

 
/*Abhishek Ghosh, 18th October 2017*/
 
#include<curl/curl.h>
#include<string.h>
#include<stdio.h>
 
#define MAX_LEN 1000
 
void searchChatLogs(char* searchString){
char* baseURL = "http://tclers.tk/conferences/tcl/";
time_t t;
struct tm* currentDate;
char dateString[30],dateStringFile[30],lineData[MAX_LEN],targetURL[100];
int i,flag;
FILE *fp;
 
CURL *curl;
CURLcode res;
 
time(&t);
currentDate = localtime(&t);
 
strftime(dateString, 30, "%Y-%m-%d", currentDate);
printf("Today is : %s",dateString);
 
if((curl = curl_easy_init())!=NULL){
for(i=0;i<=10;i++){
 
flag = 0;
sprintf(targetURL,"%s%s.tcl",baseURL,dateString);
 
strcpy(dateStringFile,dateString);
 
printf("\nRetrieving chat logs from %s\n",targetURL);
 
if((fp = fopen("nul","w"))==0){
printf("Cant's read from %s",targetURL);
}
else{
curl_easy_setopt(curl, CURLOPT_URL, targetURL);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, fp);
 
res = curl_easy_perform(curl);
 
if(res == CURLE_OK){
while(fgets(lineData,MAX_LEN,fp)!=NULL){
if(strstr(lineData,searchString)!=NULL){
flag = 1;
fputs(lineData,stdout);
}
}
 
if(flag==0)
printf("\nNo matching lines found.");
}
fflush(fp);
fclose(fp);
}
 
currentDate->tm_mday--;
mktime(currentDate);
strftime(dateString, 30, "%Y-%m-%d", currentDate);
 
}
curl_easy_cleanup(curl);
 
}
}
 
int main(int argC,char* argV[])
{
if(argC!=2)
printf("Usage : %s <followed by search string, enclosed by \" if it contains spaces>",argV[0]);
else
searchChatLogs(argV[1]);
return 0;
}
 

Invocation and some output, actual output can be huge :

C:\rosettaCode>searchChatLogs.exe available
Today is : 2017-10-18
Retrieving chat logs from http://tclers.tk/conferences/tcl/2017-10-18.tcl
m 2017-10-18T08:21:17Z {} {AvL_42 has become available}
m 2017-10-18T08:50:17Z {} {de has become available}
m 2017-10-18T09:42:20Z {} {rmax_ has become available}
m 2017-10-18T10:23:54Z {} {dburns has become available}
m 2017-10-18T10:35:37Z {} {lyro has become available}
m 2017-10-18T10:38:48Z {} {rmax has become available}

Elixir[edit]

#! /usr/bin/env elixir
defmodule Mentions do
def get(url) do
{:ok, {{_, 200, _}, _, body}} =
url
|> String.to_charlist()
|> :httpc.request()
data = List.to_string(body)
if Regex.match?(~r|<!Doctype HTML.*<Title>URL Not Found</Title>|s, data) do
{:error, "log file not found"}
else
{:ok, data}
end
end
 
def perg(haystack, needle) do
haystack
|> String.split("\n")
|> Enum.filter(fn x -> String.contains?(x, needle) end)
end
 
def generate_url(n) do
date_str =
DateTime.utc_now()
|> DateTime.to_unix()
|> (fn x -> x + 60*60*24*n end).()
|> DateTime.from_unix!()
|> (fn %{year: y, month: m, day: d} ->
 :io_lib.format("~B-~2..0B-~2..0B", [y, m, d])
end).()
"http://tclers.tk/conferences/tcl/#{date_str}.tcl"
end
end
 
[needle] = System.argv()
:application.start(:inets)
back = 10
# Elixir does not come standard with time zone definitions, so we add an extra
# day to account for the possible difference between the local and the server
# time.
for i <- -back..1 do
url = Mentions.generate_url(i)
with {:ok, haystack} <- Mentions.get(url),
# If the result is a non-empty list...
[h | t] <- Mentions.perg(haystack, needle) do
IO.puts("#{url}\n------\n#{Enum.join([h | t], "\n")}\n------\n")
end
end

F#[edit]

#!/usr/bin/env fsharpi
let server_tz =
try
// CLR on Windows
System.TimeZoneInfo.FindSystemTimeZoneById("W. Europe Standard Time")
with
// Mono
 :? System.TimeZoneNotFoundException ->
System.TimeZoneInfo.FindSystemTimeZoneById("Europe/Berlin")
 
let get url =
let req = System.Net.WebRequest.Create(System.Uri(url))
use resp = req.GetResponse()
use stream = resp.GetResponseStream()
use reader = new System.IO.StreamReader(stream)
reader.ReadToEnd()
 
let grep needle (haystack : string) =
haystack.Split('\n')
|> Array.toList
|> List.filter (fun x -> x.Contains(needle))
 
let genUrl n =
let day = System.DateTime.UtcNow.AddDays(float n)
let server_dt = System.TimeZoneInfo.ConvertTimeFromUtc(day, server_tz)
let timestamp = server_dt.ToString("yyyy-MM-dd")
sprintf "http://tclers.tk/conferences/tcl/%s.tcl" timestamp
 
let _ =
match fsi.CommandLineArgs with
| [|_; needle|] ->
let back = 10
for i in -back .. 0 do
let url = genUrl i
let found = url |> get |> grep needle |> String.concat "\n"
if found <> "" then printfn "%s\n------\n%s\n------\n" url found
else ()
| x ->
printfn "Usage: %s literal" (Array.get x 0)
System.Environment.Exit(1)

Go[edit]

package main
 
import (
"fmt"
"io/ioutil"
"log"
"net/http"
"os"
"strings"
"time"
)
 
func get(url string) (res string, err error) {
resp, err := http.Get(url)
if err != nil {
return "", err
}
buf, err := ioutil.ReadAll(resp.Body)
if err != nil {
return "", err
}
return string(buf), nil
}
 
func grep(needle string, haystack string) (res []string) {
for _, line := range strings.Split(haystack, "\n") {
if strings.Contains(line, needle) {
res = append(res, line)
}
}
return res
}
 
func genUrl(i int, loc *time.Location) string {
date := time.Now().In(loc).AddDate(0, 0, i)
return date.Format("http://tclers.tk/conferences/tcl/2006-01-02.tcl")
}
 
func main() {
needle := os.Args[1]
back := -10
serverLoc, err := time.LoadLocation("Europe/Berlin")
if err != nil {
log.Fatal(err)
}
for i := back; i <= 0; i++ {
url := genUrl(i, serverLoc)
contents, err := get(url)
if err != nil {
log.Fatal(err)
}
found := grep(needle, contents)
if len(found) > 0 {
fmt.Printf("%v\n------\n", url)
for _, line := range found {
fmt.Printf("%v\n", line)
}
fmt.Printf("------\n\n")
}
}
}

Mathematica[edit]

matchFrom[url_String, str_String] := Select[StringSplit[Import[url, "String"], "\n"], StringMatchQ[str]]
 
getLogLinks[n_] :=
Select[Import["http://tclers.tk/conferences/tcl/", "Hyperlinks"],
First[
StringCases[#1, "tcl/" ~~ date__ ~~ ".tcl" :> DateDifference[DateObject[URLDecode[date], TimeZone -> "Europe/Berlin"], Now]]] <=
Quantity[n, "Days"] & ]
 
searchLogs[str_String] := Block[{data},
Map[
(data = matchFrom[#, str];
If[Length[data] > 0,
Print /@ Join[{#, "-----"}, data, {"----\n"}]]) &,
getLogLinks[10]]]
 
searchLogs["*lazy*"];
Output:
http://tclers.tk/conferences/tcl/2017%2d09%2d25.tcl
-----
m 2017-09-25T14:38:02Z hypnotoad {I'm lazy and the old implementation was called "zvfs", and there a lot of kit builders who are looking for that command}

----

Perl 6[edit]

Works with: Rakudo version 2017.04

No dependencies or third party libraries huh? How about no modules, no requires, no libraries, no imports at all. Implemented using a bare compiler, strictly with built-ins. (Which makes it kind-of verbose, but thems the trade-offs.)

my $needle = @*ARGS.shift // '';
my @haystack;
 
# 10 days before today, Zulu time
my $begin = DateTime.new(time).utc.earlier(:10days);
say " Executed at: ", DateTime.new(time).utc;
say "Begin searching from: $begin";
 
# Today - 10 days through today
for $begin.Date .. DateTime.now.utc.Date -> $date {
 
# connect to server, use a raw socket
my $http = IO::Socket::INET.new(:host('tclers.tk'), :port(80));
 
# request file
$http.print: "GET /conferences/tcl/{$date}.tcl HTTP/1.0\n\n";
 
# retrieve file
my @page = $http.lines;
 
# remove header
@page.splice(0, 8);
 
# concatenate multi-line entries to a single line
while @page {
if @page[0].substr(0, 13) ~~ m/^'m '\d\d\d\d'-'\d\d'-'\d\d'T'/ {
@haystack.push: @page.shift;
}
else {
@haystack.tail ~= '␤' ~ @page.shift;
}
}
 
# close socket
$http.close;
}
 
# ignore times before 10 days ago
@haystack.shift while @haystack[0].substr(2, 22) lt $begin.Str;
 
# print the first and last line of the haystack
say "First and last lines of the haystack:";
.say for |@haystack[0, *-1];
say "Needle: ", $needle;
say '-' x 79;
 
# find and print needle lines
.say if .contains( $needle ) for @haystack;
Output:

Sample output using a needle of 'github.com'

         Executed at: 2017-05-05T02:13:55Z
Begin searching from: 2017-04-25T02:13:55Z
First and last lines of the haystack:
m 2017-04-25T02:25:17Z ijchain {*** TakinOver leaves}
m 2017-05-05T02:14:52Z {} {rmax has become available}
Needle: github.com
-------------------------------------------------------------------------------
m 2017-04-28T07:33:59Z ijchain {<Napier> https://github.com/Dash-OS/tcl-modules/blob/master/react-0.5.tm}
m 2017-04-28T07:35:40Z ijchain {<Napier> https://github.com/Dash-OS/tcl-modules/blob/master/react/reducer-0.5.tm}
m 2017-04-28T08:25:39Z ijchain {<Napier> https://github.com/Dash-OS/tcl-modules/blob/master/examples/react.md}
m 2017-04-28T08:45:01Z ijchain {<Napier> https://github.com/Dash-OS/tcl-modules/blob/master/examples/react.md}
m 2017-04-28T09:21:22Z ijchain {<Napier> decorators damnit! :-P https://github.com/Dash-OS/tcl-modules/blob/master/decorator-1.0.tm}
m 2017-04-28T15:42:28Z jima https://gist.github.com/antirez/6ca04dd191bdb82aad9fb241013e88a8
m 2017-04-28T16:20:22Z ijchain {<dbohdan> Look at what I've just made: https://github.com/dbohdan/ptjd}
m 2017-04-29T05:48:13Z ijchain {<dbohdan> The brackets, rather than braces, are there because the [catch] is used for metaprogramming: https://github.com/dbohdan/ptjd/blob/c0a77ecfb34c619e30ec7c5e9f448879d41282e2/tests.tcl#L48}
m 2017-04-29T14:49:08Z ijchain {<dbohdan> If you want CI for Windows builds, it should possible to set it up relatively easily with AppVeyor and the Git mirror at https://github.com/tcltk/tcl}
m 2017-04-29T14:50:58Z ijchain {<dbohdan> Here's an example: https://github.com/dbohdan/picol/blob/trunk/appveyor.yml}
m 2017-04-29T20:33:25Z ijchain {<Napier> https://github.com/pfultz2/Cloak/wiki/C-Preprocessor-tricks,-tips,-and-idioms}
m 2017-04-30T08:20:05Z ijchain {<bairui> AvL_42: fwiw, I like and use Apprentice: https://github.com/romainl/Apprentice}
m 2017-04-30T21:35:22Z ijchain {<Napier> https://github.com/Dash-OS/tcl-modules#cswitch-flags-----expr-script-}
m 2017-05-01T04:16:00Z ijchain {<bairui> avl42: https://github.com/dahu/DiffLine  --  see if that helps with your next long-line vimdiff. DISCLAIMER: It *shouldn't* eat your hard-drive, but buyer beware, check the code and test it on unimportant stuff first.}
m 2017-05-01T07:08:26Z ijchain {<dbohdan> Do take a look at this one, though: https://github.com/dbohdan/sqawk/blob/master/lib/tabulate.tcl#L74}
m 2017-05-01T07:09:28Z ijchain {<dbohdan> And at https://github.com/dbohdan/jimhttp/blob/master/arguments.tcl}
m 2017-05-01T08:14:05Z ijchain {<dbohdan> I've reimplemented part of tcltest recently (with pretty colors!), and I'm pretty with this solution: https://github.com/dbohdan/ptjd/blob/master/tests.tcl#L49}
m 2017-05-03T14:57:24Z ijchain {<dbohdan> I think Tcl is okay to pretty good for compilery things if you already know how to do them. For instance, I found writing this tokenizer in Tcl a breeze: https://github.com/dbohdan/jimhttp/blob/master/json.tcl#L380}
m 2017-05-04T16:19:55Z stu {rkeene, files Makefile.in, itzev and spoto.conf are spotoconf: https://github.com/aryler/Tclarc4random/}
m 2017-05-04T20:26:28Z dgp https://github.com/flightaware/Tcl-bounties/issues/25

Python[edit]

#! /usr/bin/env python3
import datetime
import re
import urllib.request
import sys
 
def get(url):
with urllib.request.urlopen(url) as response:
html = response.read().decode('utf-8')
if re.match(r'<!Doctype HTML[\s\S]*<Title>URL Not Found</Title>', html):
return None
return html
 
def main():
template = 'http://tclers.tk/conferences/tcl/%Y-%m-%d.tcl'
today = datetime.datetime.utcnow()
back = 10
needle = sys.argv[1]
# Since Python does not come standard with time zone definitions, add an
# extra day to account for the possible difference between the local and the
# server time.
for i in range(-back, 2):
day = today + datetime.timedelta(days=i)
url = day.strftime(template)
haystack = get(url)
if haystack:
mentions = [x for x in haystack.split('\n') if needle in x]
if mentions:
print('{}\n------\n{}\n------\n'
.format(url, '\n'.join(mentions)))
 
main()

Ruby[edit]

#! /usr/bin/env ruby
require 'net/http'
require 'time'
 
def gen_url(i)
day = Time.now + i*60*60*24
# Set the time zone in which to format the time, per
# https://coderwall.com/p/c7l82a/create-a-time-in-a-specific-timezone-in-ruby
old_tz = ENV['TZ']
ENV['TZ'] = 'Europe/Berlin'
url = day.strftime('http://tclers.tk/conferences/tcl/%Y-%m-%d.tcl')
ENV['TZ'] = old_tz
url
end
 
def main
back = 10
needle = ARGV[0]
(-back..0).each do |i|
url = gen_url(i)
haystack = Net::HTTP.get(URI(url)).split("\n")
mentions = haystack.select { |x| x.include? needle }
if !mentions.empty?
puts "#{url}\n------\n#{mentions.join("\n")}\n------\n"
end
end
end
 
main

Scala[edit]

import java.net.Socket
import java.net.URL
import java.time
import java.time.format
import java.time.ZoneId
import java.util.Scanner
import scala.collection.JavaConverters._
 
def get(rawUrl: String): List[String] = {
val url = new URL(rawUrl)
val port = if (url.getPort > -1) url.getPort else 80
val sock = new Socket(url.getHost, port)
sock.getOutputStream.write(
s"GET /${url.getPath()} HTTP/1.0\n\n".getBytes("UTF-8")
)
new Scanner(sock.getInputStream).useDelimiter("\n").asScala.toList
}
 
def genUrl(n: Long) = {
val date = java.time.ZonedDateTime
.now(ZoneId.of("Europe/Berlin"))
.plusDays(n)
.format(java.time.format.DateTimeFormatter.ISO_LOCAL_DATE)
s"http://tclers.tk/conferences/tcl/$date.tcl"
}
 
val back = 10
val literal = args(0)
for (i <- -back to 0) {
val url = genUrl(i)
print(get(url).filter(_.contains(literal)) match {
case List() => ""
case x => s"$url\n------\n${x.mkString("\n")}\n------\n\n"
})
}

Tcl[edit]

Tcl 8.5+[edit]

#! /usr/bin/env tclsh
package require http
 
proc get url {
set r [::http::geturl $url]
set content [::http::data $r]
 ::http::cleanup $r
return $content
}
 
proc grep {needle haystack} {
lsearch -all \
-inline \
-glob \
[split $haystack \n] \
*[string map {* \\* ? \\? \\ \\\\ [ \\[ ] \\]} $needle]*
}
 
proc main argv {
lassign $argv needle
set urlTemplate http://tclers.tk/conferences/tcl/%Y-%m-%d.tcl
set back 10
set now [clock seconds]
for {set i -$back} {$i <= 0} {incr i} {
set date [clock add $now $i days]
set url [clock format $date \
-format $urlTemplate \
-timezone :Europe/Berlin]
set found [grep $needle [get $url]]
if {$found ne {}} {
puts $url\n------\n[join $found \n]\n------\n
}
}
}
 
main $argv

Jim Tcl[edit]

#! /usr/bin/env jimsh
proc get url {
if {![regexp {http://([a-z.]+)(:[0-9]+)?(/.*)} $url _ host port path]} {
error "can't parse URL \"$url\""
}
if {$port eq {}} { set port 80 }
set ch [socket stream $host:$port]
puts -nonewline $ch "GET /$path HTTP/1.0\n\n"
set content [read $ch]
if {[regexp {^HTTP[^<]+<!Doctype HTML.*<Title>URL Not Found</Title>} \
$content]} {
error {log file not found}
}
close $ch
return $content
}
 
proc grep {needle haystack} {
lsearch -all \
-inline \
-glob \
[split $haystack \n] \
*[string map {* \\* ? \\? \\ \\\\ [ \\[ ] \\]} $needle]*
}
 
proc main argv {
lassign $argv needle
set urlTemplate http://tclers.tk/conferences/tcl/%Y-%m-%d.tcl
set back 10
set now [clock seconds]
# Jim Tcl doesn't support time zones, so we add an extra day to account for
# the possible difference between the local and the server time.
for {set i -$back} {$i <= 1} {incr i} {
set date [expr {$now + $i*60*60*24}]
set url [clock format $date -format $urlTemplate]
catch {
set found [grep $needle [get $url]]
if {$found ne {}} {
puts $url\n------\n[join $found \n]\n------\n
}
}
}
}
 
main $argv

zkl[edit]

#<<<#
http://tclers.tk/conferences/tcl/:
2017-04-03.tcl 30610 bytes Apr 03, 2017 21:55:37
2017-04-04.tcl 67996 bytes Apr 04, 2017 21:57:01
...
 
Contents (eg 2017-01-19.tcl):
m 2017-01-19T23:01:02Z ijchain {*** Johannes13__ leaves}
m 2017-01-19T23:15:37Z ijchain {*** fahadash leaves}
m 2017-01-19T23:27:00Z ijchain {*** Buster leaves}
...
#<<<#
 
var [const] CURL=Import.lib("zklCurl")(); // libCurl instance
 
template:="http://tclers.tk/conferences/tcl/%4d-%02d-%02d.tcl";
ymd  :=Time.Clock.UTC[0,3]; // now, (y,m,d)
back  :=10; // days in the past
needle  :=vm.nthArg(0); // search string
foreach d in ([-back+1..0]){ // we want day -9,-8,-7..0 (today)
date :=Time.Date.subYMD(ymd, 0,0,-d); // date minus days
url  :=template.fmt(date.xplode());
haystack:=CURL.get(url); // (request bytes, header length)
haystack=haystack[0].del(0,haystack[1]); // remove HTML header
mentions:=haystack.filter("find",needle); // search lines
if(mentions) println("%s\n------\n%s------\n".fmt(url,mentions.text));
}

While zkl supports TCP natively and talking simple HTTP is easy, Curl is way easier and fully supports the protocol.

Output:
$ zkl bbb suchenwi
http://tclers.tk/conferences/tcl/2017-04-24.tcl
------
m 2017-04-24T05:33:53Z {} {suchenwi has become available}
m 2017-04-24T06:38:31Z suchenwi {Hi Arjen - and bye. off to donuts}
m 2017-04-24T06:55:57Z {} {suchenwi has left}
...
------

http://tclers.tk/conferences/tcl/2017-04-30.tcl
...
------
http://tclers.tk/conferences/tcl/2017-05-01.tcl

------
...
http://tclers.tk/conferences/tcl/2017-05-03.tcl
------
m 2017-05-03T16:19:54Z {} {suchenwi has become available}
m 2017-05-03T16:20:40Z suchenwi {/me waves}
m 2017-05-03T16:21:57Z suchenwi {I'm on countdown at work: 17 work days to go...
...