Rosetta Code/Rank languages by number of users
Sort most popular programming languages based on the number of users on Rosetta Code. Show the languages with at least 100 users.
A way to solve the task:
Users of a language X are those referenced in the page https://rosettacode.org/wiki/Category:X_User, or preferably https://rosettacode.org/mw/index.php?title=Category:X_User&redirect=no to avoid redirections. In order to find the list of such categories, it's possible to first parse the entries of http://rosettacode.org/mw/index.php?title=Special:Categories&limit=5000. Then download and parse each language users category to count the users.
Sample output on 17 december 2017:
Language Users Rank -------------------------- C 373 1 C++ 261 2 Java 257 3 Python 243 4 JavaScript 228 5 PHP 163 6 Perl 162 7 SQL 131 8 UNIX Shell 120 9 C sharp 113 10 Pascal 109 11 BASIC 102 12
A Rosetta Code user usually declares using a language with the mylang template. This template is expected to appear on the User page. However, in some cases it appears in a user Talk page. It's not necessary to take this into account. For instance, among the 373 C users in the table above, 3 are actually declared in a Talk page.
Stata
<lang stata>copy "http://rosettacode.org/mw/index.php?title=Special:Categories&limit=5000" categ.html, replace import delimited categ.html, delim("@") enc("utf-8") clear keep if ustrpos(v1,"/wiki/Category:") & ustrpos(v1,"_User") gen i = ustrpos(v1,"href=") gen j = ustrpos(v1,char(34),i+1) gen k = ustrpos(v1,char(34),j+1) gen s = usubstr(v1,j+7,k-j-7) replace i = ustrpos(v1,"title=") replace j = ustrpos(v1,">",i+1) replace k = ustrpos(v1," User",j+1) gen lang = usubstr(v1,j+1,k-j) keep s lang gen users=.
forval i=1/`c(N)' { local s preserve copy `"https://rosettacode.org/mw/index.php?title=`=s[`i']'&redirect=no"' `i'.html, replace import delimited `i'.html, delim("@") enc("utf-8") clear count if ustrpos(v1,"/wiki/User") local m `r(N)' restore replace users=`m' in `i' erase `i'.html }
drop s gsort -users lang list if users>=100 save rc_users, replace</lang>
Output
+---------------------+ | lang users | |---------------------| 1. | C 373 | 2. | C++ 261 | 3. | Java 257 | 4. | Python 243 | 5. | JavaScript 186 | |---------------------| 6. | PHP 163 | 7. | Perl 162 | 8. | SQL 131 | 9. | UNIX Shell 120 | 10. | C sharp 113 | |---------------------| 11. | Pascal 109 | 12. | BASIC 102 | +---------------------+