Click through the three pages → Same ID.
Close the browser window and reopen the site → Same ID.
Turn off your computer and come back tomorrow → Same ID.
Check your cookies → The site neither drops nor reads any cookies.
Check the URL → No dubious query strings.
So how can I preserve the ID and know that your specific device is returning to the site without having you log in and without dropping a cookie?
Cookies Are On Their Way OutIf you are a somewhat active internet user, you will have heard about the ongoing controversy regarding browser cookies and how they are being used. At present, cookie technology is increasingly being phased out by browsers and heavily regulated by privacy guidelines like the GDPR or the CCPA. While this development is certainly an important step towards a more privacy-focused internet, it is also taking a huge toll on the core functionality of most websites, their UX, the economic structure of the internet, and the digital analytics industry. While the demise of the browser cookie as a reliable identifier for a returning user is all but certain, there are still other web technologies that rely on storing information on a local machine.
The Role Of Cache
Enter: Cache. In essence, web caching means storing data from the web on your device, so the browser can reuse that data later when the same resource is requested again. For instance, when a user loads a web page for the first time, the server sends back the whole page to the browser. When the page is cached and the user requests the same page again on the following day, the browser remembers it, the server does not have to send it again, and it can be displayed from the browser cache right away. This is much faster and saves bandwidth. In general, caching technology enhances the delivery speed of web content significantly while also reducing the work needed to be done server-side.Caching can be executed by using ETags. ETags are IDs that are attached to every resource delivered by a server (e.g. a web page or an image). This is how the server knows whether the user has cached the newest version of the resource. When a resource on the server changes, a new ETag ID is generated for this resource.
User requests a website for the first time → No ETag in the request → Site is sent back with ETag 123 → Site is stored (cached) on the local device
User requests the same site again → ETag 123 is included in the request → The server checks whether the resource has changed (‘Is the ETag ID still the same?’) → If the ETag has not changed, the server instructs the browser to simply use the site that was delivered and cached on Monday → The resource does not have to be sent again, which saves time and bandwidth
Using Cache Technology To Track And Identify UsersWhile ETags serve a useful purpose when used for caching, the feature can also be hijacked and intentionally misused for user tracking.
Here is how I did it for my example above:
- I built a website with three pages
- I embedded the same iFrame on each of the pages. This iFrame is simply a white 1x1 pixel, which is invisible for the user.
- When this iFrame resource is requested, I am creating a random ID via PHP on the server side. I use this ID to override the ETag ID for the iFrame, which is usually issued automatically.
- Every time a user requests one of the three pages (and therefore requests that iFrame), my ETag ID is included in the request. Then, I am checking on the server side, if that ID exists or whether this is a first time request without an ETag.
→ If ETag exists: Returning visitor. Keep the ID and send the same one back.
→ If ETag does not exist: New visitor. New ID. From then on, this ID will be included in all request headers of this user’s device on the site.
- As a last step — here is how this ETag ID finds its way into the analytics:
Cookies and How You’re Tracked Online
How To Prevent ETag TrackingHowever, there are a few options for users to protect themselves from ETag tracking:
- Disable cache in the browser settings
Careful here — as mentioned above, caching can be very useful and has a lot of advantages.
- Modify headers with a browser add-on
While most browsers do not inherently offer the option to modify headers, there are plenty of browser extensions available, such as ModHeader. Why does this work? The ETag functionality relies on request- and response-headers to exchange the ID. For instance, if a user overrides the If-None-Match header to be blank on every request, a new ETag value will be generated on every page request. This prevents the user’s device from being identified.
Why This Is Important
Why am I testing these things? Why am I writing this article? I certainly do not intent to use this at scale. But while ETags can be used for evil, this example proves a larger point: Like most other technology, it is not necessarily harmful by default. It always depends on the application.
Better Apps through Better Privacy
I believe in open and transparent knowledge transfer in the industry — among analytics vendors, publishers, the advertising industry, and the internet users. In my opinion, the lack of which is one of the main reasons why we ended up in this messy cookie war: The internet ecosystem has always suffered from a lack of transparency. Tech evolves too fast for legislation to keep pace and it is impossible for the general public to understand the ins and outs of web technologies like cookies. And when they are being used inappropriately, the user understandably feels violated. But killing the technology as a result seems like a classic case of fighting the symptoms rather than the cause. The fact that a lot of tech companies misuse technologies like cookies, unfairly vilifies them in the public eye. And in turn leads to disproportionate measures by browsers and legislation. While these measures do a lot of good in terms of personal privacy, they simultaneously harm good and meaningful technological innovation at the same time.
ThinkPrivacy joins privacytools.io
There are always nuances. I strongly believe in the legitimacy and the importance of earnest digital analytics — As long as it is executed with the right level of privacy compliance. What’s next in store when it comes to legitimate visitor identification? ETags surely aren’t sustainable. But one thing is for sure: This industry will never get boring.
— If you want to discuss the example above or if you think you have found the new holy grail for user identification, feel free to reach out. —