ArXiv TLDR

SST-Guard: Detecting and Characterizing Server-Side Google Analytics in the Wild

🐦 Tweet
2604.27497

Muhammad Jazlan, Alexander Gamero-Garrido, Zubair Shafiq, Yash Vekaria

cs.CRcs.CY

TLDR

SST-Guard detects server-side Google Analytics by identifying semantic data patterns across network requests, cookies, and the window object, bypassing endpoint-based blockers.

Key contributions

  • Highlights the shift to server-side tracking (SST) which bypasses traditional client-side blockers.
  • Introduces SST-Guard, a system detecting server-side Google Analytics (sGA) via semantic data patterns.
  • Employs a multi-modal "value-template" approach across network requests, cookies, and window objects.
  • Validated with >93% accuracy, finding sGA on 4.21% of Tranco top-150k websites.

Why it matters

This paper addresses the growing challenge of server-side tracking, which renders current privacy tools ineffective. SST-Guard provides a novel, effective method to detect and block sGA, offering a crucial step forward in web privacy protection. Its findings reveal the widespread adoption of sGA, underscoring the urgency for new defense mechanisms.

Original Abstract

As web browsers increasingly restrict client-side tracking, the web tracking ecosystem is shifting from client-side to server-side tracking (SST). In SST, the browser sends tracking requests to an intermediate endpoint, which then forwards them to the tracker's endpoint, eliminating direct client-to-tracker requests. As a result, existing tracking protections that block requests to known tracker endpoints are rendered ineffective. In this paper, we investigate server-side implementation of Google Analytics, the most widely deployed third-party tracking service on the web today. We also present SST-Guard, a multi-modal, browser-based system for detecting and blocking server-side Google Analytics (sGA). Our key insight is that even when the tracker's endpoints change, sGA must necessarily still collect and share the same semantic information as client-side Google Analytics (e.g., identifiers, event metadata). Therefore, rather than detecting requests to known Google Analytics endpoints, SST-Guard aims to detect underlying artifacts of collection and sharing of these semantic values to any arbitrary endpoint. Operationalizing this insight is challenging because real-world sGA deployments commonly customize endpoints and obfuscate URLs/payloads. SST-Guard addresses this challenge using a value-template approach that employs regular expressions to match semantic value patterns across multiple modalities: network requests, cookies, and the window object. We validate SST-Guard on Tranco top-10k websites, detecting 4.02\% (403) sGA domains with over 93\% accuracy across three modalities, with network request classifier demonstrating the highest accuracy (99.8\%). By deploying SST-Guard in the wild, we find 4.21\% (6,314) of Tranco top-150k websites using sGA.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.