Talk:Web scraping: Difference between revisions

Content added Content deleted
(→‎Criticism: Agree)
Line 20: Line 20:
:I think it is worthwhile for the HTML parsing to be a separate task, because different languages are optimized for parsing vs. networking. For instance, [[XSLT]] is fine for parsing complex HTML, but incapable of pulling a page by itself. --[[User:IanOsgood|IanOsgood]] 18:10, 10 September 2008 (UTC)
:I think it is worthwhile for the HTML parsing to be a separate task, because different languages are optimized for parsing vs. networking. For instance, [[XSLT]] is fine for parsing complex HTML, but incapable of pulling a page by itself. --[[User:IanOsgood|IanOsgood]] 18:10, 10 September 2008 (UTC)
::Agreed. More complex HTML parsing should be its own task. The focus of this one should be getting the code from the remote site and then doing a basic operation with it. If an HTML task is created, it should also go in the Networking and Web Interaction category and should probably be linked to from here. --[[User:Mwn3d|Mwn3d]] 20:10, 10 September 2008 (UTC)
::Agreed. More complex HTML parsing should be its own task. The focus of this one should be getting the code from the remote site and then doing a basic operation with it. If an HTML task is created, it should also go in the Networking and Web Interaction category and should probably be linked to from here. --[[User:Mwn3d|Mwn3d]] 20:10, 10 September 2008 (UTC)

== The downsize of web scraping ==

I noticed that several solutions anchored UTC to the end of line, and the page now outputs "UTC Universal Time" instead. --[[User:Glennj|glennj]] 15:32, 12 August 2009 (UTC)