DatasetAdded By Infochimps
This presents a kind of ‘what pages are visited’ statistics. It is applied to a squid access-log stream and redirected to profiling agent (webstatscollector) then the hourly snapshots are written in very trivial format. This can be used to both noticing strange activities, as well as spotting trends (specific events show up really nicely), let it be a movie premiere, a national holiday or any scandal.
A normal snapshot contains ~3.5M page titles and extracted is over 100MB. Entries inside are grouped by project, and in semi-alphabetic order.