User talk:Paddy3118/Vanity Search

From Rosetta Code

Archived From: User talk:Paddy3118

RC_vanity_search.py

Sad, but I like to see how my tasks are faring over time:

<lang python> Rosetta Code Vanity search:

   How many new pages has someone created?

import urllib, re

user = 'Paddy3118'

site = 'http://www.rosettacode.org' nextpage = site + '/wiki/Special:Contributions/' + user nextpage_re = re.compile(

   r'<a href="([^"]+)" title="[^"]+" rel="next">older ')

newpages = [] pagecount = 0 while nextpage:

   page = urllib.urlopen(nextpage)
   pagecount +=1
   nextpage = 
   for line in page:
       if not nextpage:
           # Search for URL to next page of results for download
           nextpage_match = re.search(nextpage_re, line)
           if nextpage_match:
               nextpage = (site + nextpage_match.groups()[0]).replace('&', '&')
               #print nextpage
               npline=line
       if '' in line:
           # extract N page name from title
           newpages.append(line.partition(' title="')[2].partition('"')[0])
   page.close()

nontalk = [p for p in newpages if not ':' in p]

print "User: %s has created %i new pages of which %i were not Talk: pages, from approx %i edits" % (

   user, len(newpages), len(nontalk), pagecount*50 )

print "New pages created, in order, are:\n ", print "\n ".join(nontalk[::-1])



nextpage = site + '/w/index.php?title=Special:PopularPages' nextpage_re = re.compile(

   r'<a href="([^"]+)" class="mw-nextlink">next ')

data_re = re.compile(

r'^

  • <a href="[^"]+" title="([^"]+)".*</a>.*\(([0-9,]+) views\)' ) title2rankviews = {} rank = 1 pagecount = 0 while nextpage: page = urllib.urlopen(nextpage) pagecount +=1 nextpage = for line in page: if not nextpage: # Search for URL to next page of results for download nextpage_match = re.search(nextpage_re, line) if nextpage_match: nextpage = (site + nextpage_match.groups()[0]).replace('&', '&') # print nextpage npline=line datamatch = re.search(data_re, line) if datamatch: title, views = datamatch.groups() views = int(views.replace(',', )) title2rankviews[title] = [rank, views] rank += 1 page.close() print "\n\n Highest page Ranks for user pages:" fmt = "  %-4s %-6s %s" # rank, views, title print fmt % ('RANK', 'VIEWS', 'TITLE') highrank = [title2rankviews.get(t,[99999, 0]) + [t] for t in nontalk] highrank.sort() for x in highrank: print fmt % tuple(x) </lang>

    Sample output on 21:28, 4 June 2009

    User: Paddy3118 has created 52 new pages of which 27 were not Talk: pages, from approx 500 edits
    New pages created, in order, are:
      Spiral
      Monty Hall simulation
      Web Scraping
      Sequence of Non-squares
      Anagrams
      Max Licenses In Use
      One dimensional cellular automata
      Conway's Game of Life
      Data Munging
      Data Munging 2
      Column Aligner
      Probabilistic Choice
      Knapsack Problem
      Yuletide Holiday
      Common number base conversions
      Octal
      Integer literals
      Command Line Interpreter
      First-class functions
      Y combinator
      Functional Composition
      Exceptions Through Nested Calls
      Look-and-say sequence
      Mutual Recursion
      Bulls and Cows
      Testing a Function
      Select
    
    
     Highest page Ranks for user pages:
      RANK VIEWS  TITLE
      102  2442   Monty Hall simulation
      106  2294   Knapsack Problem
      109  2234   Conway's Game of Life
      141  1798   Anagrams
      214  1131   Web Scraping
      218  1087   Max Licenses In Use
      230  1022   Spiral
      231  997    One dimensional cellular automata
      257  825    Sequence of Non-squares
      258  823    Yuletide Holiday
      274  762    Column Aligner
      314  645    Data Munging 2
      318  627    Data Munging
      320  623    Probabilistic Choice
      322  620    Y combinator
      323  614    First-class functions
      374  494    Command Line Interpreter
      385  446    Functional Composition
      403  412    Integer literals
      412  404    Mutual Recursion
      417  388    Bulls and Cows
      438  336    Look-and-say sequence
      439  336    Common number base conversions
      450  293    Octal
      468  250    Exceptions Through Nested Calls
      661  75     Select
      677  56     Testing a Function
    >>> 

    From which I deduce their must be a lot of students out there being set a knapsack like problem ;-)

    Sample output on 21:24, 17 June 2009 (UTC)

    User: Paddy3118 has created 57 new pages of which 29 were not Talk: pages, from approx 550 edits
    New pages created, in order, are:
      Spiral
      Monty Hall simulation
      Web Scraping
      Sequence of Non-squares
      Anagrams
      Max Licenses In Use
      One dimensional cellular automata
      Conway's Game of Life
      Data Munging
      Data Munging 2
      Column Aligner
      Probabilistic Choice
      Knapsack Problem
      Yuletide Holiday
      Common number base formatting
      Octal
      Integer literals
      Command Line Interpreter
      First-class functions
      Y combinator
      Functional Composition
      Exceptions Through Nested Calls
      Look-and-say sequence
      Mutual Recursion
      Bulls and Cows
      Testing a Function
      Select
      Sort stability
      Moving Average
    
    
     Highest page Ranks for user pages:
      RANK VIEWS  TITLE
      101  2501   Knapsack Problem
      102  2499   Monty Hall simulation
      108  2309   Conway's Game of Life
      137  1876   Anagrams
      201  1283   One dimensional cellular automata
      214  1162   Web Scraping
      220  1114   Max Licenses In Use
      226  1073   Spiral
      258  844    Yuletide Holiday
      259  842    Sequence of Non-squares
      271  804    Column Aligner
      297  707    Y combinator
      310  685    Data Munging 2
      311  683    First-class functions
      315  666    Data Munging
      320  649    Probabilistic Choice
      364  525    Command Line Interpreter
      377  487    Mutual Recursion
      378  480    Functional Composition
      391  456    Bulls and Cows
      398  447    Integer literals
      425  386    Common number base formatting
      439  361    Look-and-say sequence
      449  320    Octal
      458  288    Exceptions Through Nested Calls
      518  184    Sort stability
      556  146    Testing a Function
      577  122    Select
      724  4      Moving Average

    So in two weeks I've created two new tasks, and the Knapsack Problem has finally moved to become my top viewed task.

    Sample output on 07:06, 25 June 2009 (UTC)

    I have added no new pages, but I have broken into the top 100 list of page views!!:

     Highest page Ranks for user pages:
      RANK VIEWS  TITLE
      98   2595   Knapsack Problem
      102  2542   Monty Hall simulation
      108  2361   Conway's Game of Life
      133  1937   Anagrams
      199  1312   One dimensional cellular automata
      215  1179   Web Scraping
      220  1133   Max Licenses In Use
      222  1113   Spiral
      257  872    Yuletide Holiday
      260  870    Sequence of Non-squares
      267  829    Column Aligner
      296  739    Y combinator
      300  720    First-class functions
      311  695    Data Munging 2
      316  684    Data Munging
      320  664    Probabilistic Choice
      355  563    Command Line Interpreter
      373  519    Mutual Recursion
      376  517    Functional Composition
      380  496    Bulls and Cows
      394  468    Integer literals
      418  421    Common number base formatting
      435  378    Look-and-say sequence
      450  336    Octal
      460  305    Exceptions Through Nested Calls
      504  214    Sort stability
      543  169    Testing a Function
      563  146    Select
      582  130    Moving Average

    I did like writing the story around that Knapsack task.