User:Bukzor: Difference between revisions

m
no edit summary
No edit summary
mNo edit summary
Line 19:
#*The mediawiki API is pretty straightforward. I feel done with that part.
# grab the HTML for those pages, put them into a DOM
#*I'm having trouble getting any of the builtin html or xml parsers to give me a DOM. [http://docs.python.org/library/htmlparser.html htmlparser] is just a ghetto little state machine, and the xml parsers are too strict (  is an 'unknown entity').
#*I've posted a stackoverflow question on this subject [http://stackoverflow.com/questions/2676872/how-to-parse-malformed-html-in-python-using-standard-libraries here]. --Bukzor 16:31, 20 April 2010 (UTC)
#*Despite everyone agreeing that Python doesn't have a builtin HTML->DOM parser, I've parsed the site A-Z with ElementTree with minimal effort. I had to fix a bunch of inavalid HTML though. Look at my edits for the previous couple days for details.
Line 27:
# save a report of pages->scores
 
--[[User:Bukzor|Bukzor]] 01:43, 24 April 2010 (UTC)
Anonymous user