Paper Review: Online Tracking: A 1-million-site Measurement and Analysis

In this paper, the authors performed the largest and most detailed measurement of tracking conducted on top 1 million websites. For each website, they have measured 15 features including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites (“cookie syncing”). There is no generic tool available that can measure the privacy on the web. The authors built a tool which will help regulators, self-regulators, the press, activists, and website operators in measuring the web privacy.

For this study, the authors have used OpenWPM, which is an automated version of a full-fledged browser. It has been used in the past for various studies.  It even supports stateful measurements which is important in the tracking ecosystem.

The tool for collecting measurement data for 1 million websites made over 90 million requests, making it the largest dataset on web tracking. As per results, there are around 81,000 third parties present, of which only 123 are present on over 1% of the sites. When we consider different third parties owned by the same entity, the results are quite surprising. 12 of the top 20 third parties are Google-owned entities. As an example, Facebook and Liverail are separate entities but Liverail is owned by Facebook. In fact, Google, Facebook, Twitter, and AdNexus are the only third-party entities present on more than 10% of sites.

Web privacy measurement has the potential to play a key role in keeping online privacy incursions and power imbalances in check. To achieve this potential, measurement tools must be made available broadly rather than just within the research community. In this work, the authors have tried to bring this ambitious goal closer to reality. One interesting metric that authors found that news websites have most trackers. To measure the variation, they used Alexa’s top 500 websites in each of 16 categories and measured the tracking. News, arts, and sports are the top 3 categories which have the most number of trackers.

Link to the paper: http://randomwalker.info/publications/OpenWPM_1_million_site_tracking_measurement.pdf

Leave a comment