Talk:Yahoo! search interface: Difference between revisions

← Older edit

Talk:Yahoo! search interface (view source)

Revision as of 16:29, 20 September 2011

3,995 bytes added , 12 years ago

→‎Enchanced results

Anonymous user

24.85.131.247

Revision as of 19:44, 3 May 2009 (view source) MikeMol (talk \| contribs) (A disclaimer is insufficient.) ← Older edit		Latest revision as of 16:29, 20 September 2011 (view source) 24.85.131.247 (talk) (→‎Enchanced results)
(19 intermediate revisions by 8 users not shown)
Line 1: ==TOS violation== :Note -- the following applies to an earlier version of the task but not the current task. This task violates [http://www.google.com/accounts/TOS Google's TOS]. Line 23 ⟶ 25: Google API token isn't a thing easy to get. But some languagaes (Like .NET) have a real browser User-Agent, such as "Mozilla/5.0 (Firefox 3.0; X11)", other languges, like Python, you need to "hack" because user-agent is "Python/urllib2.0". Searching Google, sometimes maybe very util for some people. I think that putting a disclaimer in top of page is a nice solution. --[[User:Guga360\|Guga360]] : Normally, I'd be fine with hosting code that when used violates someone's TOS; The existence of the code isn't illegal in the country where RC is hosted (with the exception of copyright protection, DRM and the DMCA). The worst most entities can do is send me a DMCA notice (Which is easily attainable via WHOIS), I take down the code, and that's that. If ''Google'' gets upset with Rosetta Code, they can strike RC from their search index, and we lose 68% of our traffic and exposure. And, AFAIK, there's no recourse short of getting something like the Slashdot community up in arms. A DMCA notice is one thing; I can fight or fold. Getting dropped from Google's search index is rather like being cut off at the knees. --[[User:Short Circuit\|Short Circuit]] 19:44, 3 May 2009 (UTC) ::The task title can be changed, and disallowed by robots.txt, and a disclaimer. But i think that this is insufficient too. Feel free to remove that task. --[[User:Guga360\|Guga360]] : I've done a little bit of experiments; using directly a wget will result in a 403 http answer. Changing the user-agent (even a void string, or a non existing like MikeyMouse/1.0!) worked... '''But'''... I've read Yahoo! TOS, and tried wget on Yahoo!, and it works even without changing User-Agent. So, maybe, this same task can be changed in order to use Yahoo! instead? (Anyone with a better english could check the TOS, if it really does not disallow what Google disallows) --[[User:ShinTakezou\|ShinTakezou]] 21:50, 3 May 2009 (UTC) ::I would think that most search providers would not like programmatic searches as it is so easily abused. What would we loose by just dropping this task? Against what could be lost if we include it? --[[User:Paddy3118\|Paddy3118]] 22:29, 3 May 2009 (UTC) ::: I've not read the task specs (just the title). If the task was interesting, we loose an interesting task; if it was not, ... But if the task's creator would like to keep the idea, then s/he could take a look at Yahoo. Of course, even though Yahoo's TOS allows for automated use of their ''service'', this does not mean we (or anybody else) can flood it with search requests! And what world would it be if I take a perfectly legal and TOS-compliant code from RC, then ''abuse it'' and for this reason RC has trouble?! (I am thinking about the TOS of Yahoo... it seems to me an automated script using Yahoo search engine wouldn't violate their TOS, so that they have no reason to consider RC responsible of any ''abuse'' of the code... if it is possible such a relationship between an ''abuser'' and the source for the code which made the abuse possible, sites like [http://www.w3.org/Library/ this] or [http://pavuk.sourceforge.net/ this] shouldn't be indexed at all!) --[[User:ShinTakezou\|ShinTakezou]] 00:10, 4 May 2009 (UTC) ::::OK, i'm converting to Yahoo. And i'll add Python after this. --[[User:Guga360\|Guga360]] 02:10, 4 May 2009 (UTC) :::::Wow, faster than any other answer to my points:D ! Have you checked Yahoo's TOS? I would like to have confirmation about my interpretation of the text... I've not seen a section similar to Google's "section five", but ... those legal texts always bring me headaches. --[[User:ShinTakezou\|ShinTakezou]] 10:00, 4 May 2009 (UTC) == Enchanced results == What do you mean by "enchanced results" not working? (I suppose it should be "enhanced"; anyway the question is the same: what do you mean by "enhanced results", and is it a task requirements?) --[[User:ShinTakezou\|ShinTakezou]] 22:34, 4 May 2009 (UTC) :This is a enchanced result: :[[Image:YahooEnch.jpg]] :This is a normal result: :[[Image:Yahoo.jpg]] :A simple Regex change should fix it. :Or just change your Yahoo settings to not display enchanced results. And use Yahoo Cookies in a HTTP Request. :: Ok, this means it is not a task requirement handling them properly in the code... Right? --[[User:ShinTakezou\|ShinTakezou]] 09:43, 5 May 2009 (UTC) No, it isn't a task requeriment, but is VERY RECOMMENDED, because some results title will be like "Test - Wikipedia<nowiki><div id="ench-smb">...</div></nowiki>", but are 3 methods for fixing this: 1. Change regular expressions to match enchanced results correctly. 2. Change Yahoo settings to don't show enchanced results, and use your Cookie (as Guest) in a HTTP Request. 3. Remove anything after a "<div" in a result title. --[[User:Guga360\|Guga360]] 13:16, 5 May 2009 (GMT -3) I don't know regular expressions too much. Can someone convert <lang python>i[:i.index("</a></h3></div>")]</lang> in a regular expression? --[[User:Guga360\|Guga360]] 17:02, 5 May 2009 (UTC) == Haha! == Alarms are going off at Yahoo! "Look, look! Our search engine is being used!"