(For up-to-date information (outages, ...) about this dataset, please consult the dataset's wiki page.)
Each request of a page, whether for editing or reading, whether a "special page" such as a log of actions generated on the fly, or an article from Wikipedia or one of the other projects, reaches one of our squid caching hosts and the request is sent via udp to a filter which tosses requests from our internal hosts, as well as requests for wikis that aren't among our general projects. This filter writes out the project name, the size of the page requested, and the title of the page requested.
Here are a few sample lines from one file:
fr.b Special:Recherche/Achille_Baraguey_d%5C%27Hilliers 1 624 fr.b Special:Recherche/Acteurs_et_actrices_N 1 739 fr.b Special:Recherche/Agrippa_d/%27Aubign%C3%A9 1 743 fr.b Special:Recherche/All_Mixed_Up 1 730 fr.b Special:Recherche/Andr%C3%A9_Gazut.html 1 737
In the above, the first column "fr.b" is the project name. The following abbreviations are used:
The second column is the title of the page retrieved, the third column is the number of requests, and the fourth column is the size of the content returned.
These are hourly statistics, so in the line
en Main_Page 242332 4737756101we see that the main page of the English language Wikipedia was requested over 240 thousand times during the specific hour. These are not unique visits.
In some directories you will see files which have names starting with "projectcount". These are total views per hour per project, generated by summing up the entries in the pagecount files. The first entry in a line is the project name, the second is the number of non-unique views, and the third is the total number of bytes transferred.
Domas Mituzas, a long-time volunteer db admin for WMF, started generating these statistics in 2007. Some of the older files (from 2010 through at least mid-2011) are also available at the Internet Archive thanks to Federico Leva.
The dataset is currently (2015) maintained by the Analytics team.
Up to 2015, the dataset has been produced by Webstatscollector.
From 2015 onwards, the dataset is getting produced by stripping down extra-information from Pagecounts-all-sites.
Return to the main index of public data sets provided on this server.
Return to the main index of project dumps in XML format.
Return to the main index of other content