Yahoo! search interface: Difference between revisions
Content added Content deleted
(Added Go) |
|||
Line 250: | Line 250: | ||
End</lang> |
End</lang> |
||
'''[http://www.cogier.com/gambas/Yahoo!%20search%20interface.png Click here to see output (I have typed 'rosettacode' in the search box)]''' |
'''[http://www.cogier.com/gambas/Yahoo!%20search%20interface.png Click here to see output (I have typed 'rosettacode' in the search box)]''' |
||
=={{header|Go}}== |
|||
Yahoo! has evidently changed its search output format over the years and, if it is currently documented anywhere, then I couldn't find it. |
|||
The regular expression used below was figured out by studying the raw HTML and works fine as at 18th November, 2019. |
|||
<lang go>package main |
|||
import ( |
|||
"fmt" |
|||
"golang.org/x/net/html" |
|||
"io/ioutil" |
|||
"net/http" |
|||
"regexp" |
|||
"strings" |
|||
) |
|||
var ( |
|||
expr = `<h3 class="title"><a class=.*?href="(.*?)".*?>(.*?)</a></h3>` + |
|||
`.*?<div class="compText aAbs" ><p class=.*?>(.*?)</p></div>` |
|||
rx = regexp.MustCompile(expr) |
|||
) |
|||
type YahooResult struct { |
|||
title, url, content string |
|||
} |
|||
func (yr YahooResult) String() string { |
|||
return fmt.Sprintf("Title : %s\nUrl : %s\nContent: %s\n", yr.title, yr.url, yr.content) |
|||
} |
|||
type YahooSearch struct { |
|||
query string |
|||
page int |
|||
} |
|||
func (ys YahooSearch) results() []YahooResult { |
|||
search := fmt.Sprintf("http://search.yahoo.com/search?p=%s&b=%d", ys.query, ys.page*10+1) |
|||
resp, _ := http.Get(search) |
|||
body, _ := ioutil.ReadAll(resp.Body) |
|||
s := string(body) |
|||
defer resp.Body.Close() |
|||
var results []YahooResult |
|||
for _, f := range rx.FindAllStringSubmatch(s, -1) { |
|||
yr := YahooResult{} |
|||
yr.title = html.UnescapeString(strings.ReplaceAll(strings.ReplaceAll(f[2], "<b>", ""), "</b>", "")) |
|||
yr.url = f[1] |
|||
yr.content = html.UnescapeString(strings.ReplaceAll(strings.ReplaceAll(f[3], "<b>", ""), "</b>", "")) |
|||
results = append(results, yr) |
|||
} |
|||
return results |
|||
} |
|||
func (ys YahooSearch) nextPage() YahooSearch { |
|||
return YahooSearch{ys.query, ys.page + 1} |
|||
} |
|||
func main() { |
|||
ys := YahooSearch{"rosettacode", 0} |
|||
// Limit output to first 5 entries, say, from pages 1 and 2. |
|||
fmt.Println("PAGE 1 =>\n") |
|||
for _, res := range ys.results()[0:5] { |
|||
fmt.Println(res) |
|||
} |
|||
fmt.Println("PAGE 2 =>\n") |
|||
for _, res := range ys.nextPage().results()[0:5] { |
|||
fmt.Println(res) |
|||
} |
|||
}</lang> |
|||
{{out}} |
|||
Note there is some repetition between the pages. |
|||
<pre> |
|||
PAGE 1 => |
|||
Title : Rosetta Code |
|||
Url : https://rosettacode.org/wiki/Rosetta_Code |
|||
Content: Rosetta Code Rosetta Code is a programming chrestomathy site. Rosetta Code currently has 976 tasks, 231 draft tasks, and is aware of 756 languages, though we do not (and cannot) have solutions to every task in every language. 1 Places to start |
|||
Title : Rosetta Code - Wikipedia |
|||
Url : https://en.wikipedia.org/wiki/Rosetta_Code |
|||
Content: Rosetta Code is a wiki -based programming chrestomathy website with implementations of common algorithms and solutions to various programming problems in many different programming languages. 1 Website 1.1 Data and structure 1.2 Languages |
|||
Title : Rosetta Code (@rosettacode) | Twitter |
|||
Url : https://twitter.com/rosettacode |
|||
Content: The latest Tweets from Rosetta Code (@rosettacode). Twitter account for http://t.co/DuRZFWDfRn. The general idea here is for short announcements and the like. The ... |
|||
Title : Best of Rosettacode |
|||
Url : https://examples.p6c.dev/categories/best-of-rosettacode.html |
|||
Content: 99 Problems Rosettacode Cookbook Euler Games Interpreters Modules Other Grammars Perlmonks Rosalind Shootout ... |
|||
Title : Rosetta Code Blog |
|||
Url : https://blog.rosettacode.org/ |
|||
Content: (If you point 'rosettacode.com' to RosettaCode.org's IP address, you should still be able to see it) Second, I don't care if you want to use the name 'rosettacode' or 'rosetta code' in similar pursuits. I love that people have been calling task pages that have cropped up on various forums around the web as "rosetta code problems." That speaks ... |
|||
PAGE 2 => |
|||
Title : Rosetta Code | R-bloggers |
|||
Url : https://www.r-bloggers.com/rosetta-code/ |
|||
Content: Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another. |
|||
Title : Best of Rosettacode |
|||
Url : https://examples.p6c.dev/categories/best-of-rosettacode.html |
|||
Content: 99 Problems Rosettacode Cookbook Euler Games Interpreters Modules Other Grammars Perlmonks Rosalind Shootout ... |
|||
Title : Rosetta Code Blog |
|||
Url : https://blog.rosettacode.org/ |
|||
Content: (If you point 'rosettacode.com' to RosettaCode.org's IP address, you should still be able to see it) Second, I don't care if you want to use the name 'rosettacode' or 'rosetta code' in similar pursuits. I love that people have been calling task pages that have cropped up on various forums around the web as "rosetta code problems." That speaks ... |
|||
Title : What exactly is the purpose of Rosetta Code? - Quora |
|||
Url : https://www.quora.com/What-exactly-is-the-purpose-of-Rosetta-Code |
|||
Content: The name is a play on the Rosetta Stone. The Rosetta Stone featured a decree by King Ptolomy written in three scripts - Egyption hieroglyphs, Demotic, and Ancient Greek. |
|||
Title : One R Tip A Day: Rosetta Code |
|||
Url : https://onertipaday.blogspot.com/2009/07/rosetta-code.html |
|||
Content: Today I'd like to suggest the interesting Rosetta Code site: Rosetta Code is a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another. |
|||
</pre> |
|||
=={{header|GUISS}}== |
=={{header|GUISS}}== |