In the year 2012 only 36 percent of the web is "Google-free" regarding w3techs.com. Recently (2015) Libert analyzed one million websites and found out that nearly nine of ten websites leak information to third parties.
Web Tracking includes collecting data about web surfing behavior for business, marketing, or other purposes. A disadvantage associated with web tracking is the potential loss of privacy for end users, as for example Mayer et al. illustrated. Engelhardt et al. also outline surveillance implications of web tracking.
Embedding resources is a common technique for web tracking. HTML allows embedding content from local and remote servers. During the parsing and interpretation process, the browser automatically loads content from any location specified in the HTML code. The server notices a resource request from the visitor's browser. A resource (script, image, etc.) embedded in a web site allows a third-party to track visitors over different domains. Embedding content is not necessarily associated with tracking but can used for this. By creating a new connection from the browser to another domain, a transmission of source/location information (IP address) is caused and may also reveal further protocol specific information like HTTP referrers. Eckersley shows how this kind of information can be used for passive web tracking.
This makes web tracking a possible threat for both, the privacy of end users and the security of companies. However, web tracking as a privacy threat and a potential data leakage seams to be underestimated in IT security research. One reason could be that there is a wide range of tracking and advertising providers. Due to the variety of different tracking providers, it is not feasible to grasp the whole picture of a specific company or person. Another reason for this underestimation might be the novelty of this threat. 10 years ago third-party web tracking was relatively rare but is now growing into a serious problem.
Web tracking has seen a remarkable usage increase during the last years. Unfortunately, an overview of how web tracking evolved within the last ~15 years is missing. A retrospective analysis using archived data was done to quantify the usage and distribution of web tracking and how it changed throughout the last decade.
A demonstration about the evolution of web tracking is provided below. Alternatively, you can watch the Video. Please note that the set of tested domains are the same for each year (3,558) - missing nodes indicate that no external request was detected during analysis in that year.
As we can see in the graph: embedding external content has seen a usage increase in popular websites today. We have found a significant increase of more than five fold of external requests. In the year 2015 we found an average of around 6 external requests per website. This means at least 6 other hosts were informed about each visit of a website. The most used external hosts could be connected to web tracking.
An important question is how many trackers cover how many popular websites' users and can thus track a large number of users. We analyzed how many of all websites are covered by the top N (1-50) trackers. In 2015 about 73 % of all analyzed websites are covered by the top three of the most included third party trackers from 2015. In 2005, only 10 % of the websites were covered by the top three from 2005.
This kind of analysis could also be interesting if domains of a specific company are grouped together. For example, to analyze the coverage of Google-hosts. For this analysis they are simply identified by "google" in their hostname - other services that are part of the Google company (2015), like doubleclick.net since 2007 or youtube.com since 2006, are not included. The results for this analysis are that in the year 2005 only 5 % of the websites were covered by Google. This figure rapidly increased for the following years (2006-2015): 19 %, 36 %, 49 %, 57 %, 65 %, 72 %, 78 %, 80 %, 81 %, 82 %. This means that Google increased their coverage from 5 % to 82 % in the last 10 years. The same analysis can be done for Facebook (identified by "facebook" or "fbcdn" in the hostname) for the years 2009-2015: 0.7 %, 7 %, 20 %, 27 %, 29 %, 29 %, 29 %.
We showed that a reason for the underestimation of third-party web tracking consequences could be due to the fact that it did not exist 10 years ago. From a security point of view, considering web tracking and the usage of PET (Privacy-Enhancing Technologies) should be a part of every corporate security policy. With our analysis we have proven that this is more important nowadays than 10 years ago.
Further information about the methodology, the implementation, and the results can be found in the research papers:
Currently, this website presents the results from the web tracking history analysis. Other (not yet finished) parts of the project deal with analyzing the current situation and web tracking protection mechanisms. Furthermore it is a goal of this project to give an outlook on the future of web tracking. This study has been made possible by the participation of the interdisciplinary project "Transformations of Privacy" (german) funded by VolkswagenStiftung.
The goal is to analyze how third-party web tracking has changed over the last decade. We identify a more than five fold increase in external requests between 2005 and 2014.
"Prediction is very difficult, especially about the future." (Niels Bohr). However, the goal is an estimation about the future development of web tracking and how to retain the users' privacy.
Tim Wambach, M.Sc., CISSP, studied computer science at the University of Applied Science in Trier, then worked as a Security Consultant in Munich and is currently a PhD candidate and research assistant at the University of Koblenz.