Summary – Why web ads, ad networks and HTML5 are both Good and Bad
- Data Collection and Tracking:
- Ad networks use cookies and tracking pixels to monitor user behavior, leading to comprehensive profiles of individual users.
- Device and Location Tracking:
- By gathering GPS and IP data, ad networks can track users’ movements, potentially invading their privacy.
- Social Profiling:
- Correlating interactions across platforms allows ad networks to infer users’ social circles, raising concerns about data misuse.
- Behavioral Analysis:
- Advanced algorithms predict user behavior, which can be used manipulatively in targeted advertising.
- Third-Party Data Sharing:
- Data shared with affiliates can lead to vulnerabilities, as third parties may not uphold strict security measures.
- Cross-Device Tracking:
- Users are tracked across devices, complicating efforts to maintain privacy and anonymity online.
- Government Access:
- Ad networks may be compelled to share user data with governments, often without user consent.
- Persistent Tracking Technologies:
- Cookies, supercookies, and device fingerprinting create challenges for users trying to evade tracking.
- Web APIs:
- Features like Local Storage, IndexedDB, and the Geolocation API allow for extensive data retention and tracking capabilities.
- Standardization and Implementation:
- W3C and WHATWG were instrumental in developing HTML5, with major tech companies driving its adoption, which has led to improved web applications but also greater risks to user privacy.
While HTML5 offers revolutionary web capabilities, its features also enable pervasive tracking and data collection that could compromise user privacy. As technology evolves, balancing innovation with ethical considerations remains crucial.
Full Content – Reasons – Danger od Ad networks
Ad networks can be used as a cover for surveillance because they inherently gather a wide array of data about user behavior and preferences, ostensibly for targeted advertising. Here are some ways in which this data could be misused or repurposed for spying:
- Data Collection and Tracking: Ad networks track users’ activities across websites and apps through cookies, device fingerprints, and tracking pixels. This enables the ad network to build comprehensive profiles on users, including their interests, browsing habits, locations, and even potential social connections based on shared activity across sites.
- Device and Location Tracking: Many ad networks collect GPS or IP-based location data to serve location-specific ads. However, this data also allows for near-real-time tracking of a person’s movements, which can be especially invasive when combined with device identifiers. Over time, this tracking can reveal a person’s regular routes, places they visit frequently, and even daily routines.
- Social Profiling: By correlating users’ interactions with social media platforms, online services, and even specific ad interactions, ad networks can infer a person’s social network, including family, friends, and colleagues. This information can be combined with other data sources for social profiling, which could reveal sensitive aspects of personal relationships.
- Behavioral Analysis and Predictive Profiling: With advanced AI and machine learning, ad networks analyze data to predict future actions, such as travel plans, purchases, or lifestyle changes. In the hands of a third party with spying intentions, this could be used to monitor or manipulate individuals by predicting behavior and tailoring content accordingly.
- Third-Party Data Sharing: Ad networks often share data with multiple partners and affiliates, leading to a complex web of information sharing that is difficult to control. These third parties may lack stringent security measures, making the data vulnerable to misuse by parties interested in surveillance.
- Data Aggregation Across Devices and Platforms: Many ad networks use methods to link users across multiple devices and services. This cross-device tracking enables them to create an even more detailed profile by aggregating data from all devices an individual uses, making it nearly impossible for users to avoid tracking without severely limiting their internet usage.
- Government Access via Third-Party Requests: Governments can access ad network data through direct requests or subpoenas for investigative purposes. Ad networks often hold an extensive cache of information that may be turned over without user consent if required by law.
In essence, ad networks already have the data infrastructure and data points necessary for surveillance, which could be exploited by organizations with the resources or intent to do so. While this is typically intended for marketing, the overlap with potential surveillance capabilities makes ad networks a convenient cover for spying on people if misused or accessed by entities with different motives.
Technical
Ad networks employ a variety of technical mechanisms to track and profile users. Here’s a breakdown of these techniques and how they contribute to user tracking:
- User Agent Strings:
- What it is: The user agent string is part of the HTTP header that browsers send to servers. It includes details about the browser, operating system, device type, and sometimes other software used.
- How it helps: By analyzing user agent strings, ad networks can distinguish between different browsers and devices. Even though it’s not unique on its own, it helps narrow down user identification when combined with other data points.
- Cookies and Supercookies:
- What they are: Cookies are small files stored on a user’s browser by websites to remember information. Supercookies are more persistent, stored at a lower level in the browser, making them harder to delete.
- How they help: Cookies can hold unique user IDs, which allow ad networks to track users across different websites. By embedding tracking cookies on multiple sites (like those running ads from the same network), ad networks can follow users’ online behavior across a large portion of the internet. Supercookies allow even more persistent tracking, as they are more challenging for users to clear.
- Device Fingerprinting:
- What it is: Device fingerprinting collects unique identifiers from a device, including details like screen resolution, time zone, installed fonts, and plugins.
- How it helps: These identifiers create a unique “fingerprint” of a device that’s highly resistant to change. Even if cookies are cleared, device fingerprints can re-identify a user when they revisit a site, allowing consistent tracking without relying on traditional tracking methods.
- IP Address Tracking:
- What it is: IP addresses are unique network identifiers assigned by internet service providers. While dynamic, many devices keep the same IP for extended periods.
- How it helps: By logging IP addresses, ad networks can identify approximate geographic locations and track users across multiple websites within the same browsing session. Combined with cookies or device fingerprints, IP addresses make cross-device tracking more accurate.
- Tracking Pixels (1×1 Transparent Pixels):
- What they are: Tracking pixels are tiny (often invisible) images embedded on web pages or emails that, when loaded, send a request to the ad server.
- How it helps: Each time a tracking pixel loads, it signals the ad network’s server, providing data like the user’s IP address, user agent, and sometimes additional data. Tracking pixels let ad networks know if users opened an email or visited a webpage, enabling detailed behavior tracking and measurement of ad effectiveness.
- JavaScript and Browser APIs:
- What it is: Many websites and ads load JavaScript code, which can access extensive device and browser information through APIs.
- How it helps: JavaScript can access data like battery status, network type, local storage, and even interaction patterns (e.g., mouse movements). These attributes add depth to device fingerprints, allowing ad networks to identify devices more accurately.
- Cross-Site Scripting (XSS):
- What it is: Some ad networks use JavaScript or HTML5 techniques that effectively collect data across different sites.
- How it helps: By tracking users across multiple websites that include ad network code, they can create a holistic profile. They use consistent IDs to monitor user paths and preferences, inferring personal interests and patterns.
- Local Storage and IndexedDB:
- What they are: Local storage and IndexedDB are browser storage options that persist even after cookies are deleted.
- How it helps: Ad networks sometimes store unique identifiers in local storage to re-identify users even if they clear their cookies. This persistence makes it harder for users to avoid tracking without advanced privacy tools or settings.
- Cross-Device Tracking (using login data and shared networks):
- What it is: Cross-device tracking combines data from multiple devices to identify users across all of them. This might rely on shared IP addresses, login information, or other common identifiers.
- How it helps: For instance, if a user logs into a service on both their phone and laptop, ad networks can link those devices based on shared identifiers. This helps track users consistently across different devices, creating a unified user profile.
- Location Data (GPS and Wi-Fi Data):
- What it is: On mobile devices, some ads access GPS or Wi-Fi information to determine the user’s exact location.
- How it helps: Location data allows ad networks to build profiles based on physical movement patterns, revealing daily routines and frequently visited places. Over time, this data can create a highly detailed profile, showing where users live, work, and shop.
Together, these techniques give ad networks a nearly comprehensive profile of a user’s behavior, location, preferences, and social interactions. When these data points are aggregated and analyzed with AI, ad networks can predict behaviors and even influence actions by displaying targeted content, turning what might look like routine tracking into something much more pervasive and potentially invasive.
How they do it
Ad networks employ a variety of technical mechanisms to track and profile users. Here’s a breakdown of these techniques and how they contribute to user tracking:
- User Agent Strings:
- What it is: The user agent string is part of the HTTP header that browsers send to servers. It includes details about the browser, operating system, device type, and sometimes other software used.
- How it helps: By analyzing user agent strings, ad networks can distinguish between different browsers and devices. Even though it’s not unique on its own, it helps narrow down user identification when combined with other data points.
- Cookies and Supercookies:
- What they are: Cookies are small files stored on a user’s browser by websites to remember information. Supercookies are more persistent, stored at a lower level in the browser, making them harder to delete.
- How they help: Cookies can hold unique user IDs, which allow ad networks to track users across different websites. By embedding tracking cookies on multiple sites (like those running ads from the same network), ad networks can follow users’ online behavior across a large portion of the internet. Supercookies allow even more persistent tracking, as they are more challenging for users to clear.
- Device Fingerprinting:
- What it is: Device fingerprinting collects unique identifiers from a device, including details like screen resolution, time zone, installed fonts, and plugins.
- How it helps: These identifiers create a unique “fingerprint” of a device that’s highly resistant to change. Even if cookies are cleared, device fingerprints can re-identify a user when they revisit a site, allowing consistent tracking without relying on traditional tracking methods.
- IP Address Tracking:
- What it is: IP addresses are unique network identifiers assigned by internet service providers. While dynamic, many devices keep the same IP for extended periods.
- How it helps: By logging IP addresses, ad networks can identify approximate geographic locations and track users across multiple websites within the same browsing session. Combined with cookies or device fingerprints, IP addresses make cross-device tracking more accurate.
- Tracking Pixels (1×1 Transparent Pixels):
- What they are: Tracking pixels are tiny (often invisible) images embedded on web pages or emails that, when loaded, send a request to the ad server.
- How it helps: Each time a tracking pixel loads, it signals the ad network’s server, providing data like the user’s IP address, user agent, and sometimes additional data. Tracking pixels let ad networks know if users opened an email or visited a webpage, enabling detailed behavior tracking and measurement of ad effectiveness.
- JavaScript and Browser APIs:
- What it is: Many websites and ads load JavaScript code, which can access extensive device and browser information through APIs.
- How it helps: JavaScript can access data like battery status, network type, local storage, and even interaction patterns (e.g., mouse movements). These attributes add depth to device fingerprints, allowing ad networks to identify devices more accurately.
- Cross-Site Scripting (XSS):
- What it is: Some ad networks use JavaScript or HTML5 techniques that effectively collect data across different sites.
- How it helps: By tracking users across multiple websites that include ad network code, they can create a holistic profile. They use consistent IDs to monitor user paths and preferences, inferring personal interests and patterns.
- Local Storage and IndexedDB:
- What they are: Local storage and IndexedDB are browser storage options that persist even after cookies are deleted.
- How it helps: Ad networks sometimes store unique identifiers in local storage to re-identify users even if they clear their cookies. This persistence makes it harder for users to avoid tracking without advanced privacy tools or settings.
- Cross-Device Tracking (using login data and shared networks):
- What it is: Cross-device tracking combines data from multiple devices to identify users across all of them. This might rely on shared IP addresses, login information, or other common identifiers.
- How it helps: For instance, if a user logs into a service on both their phone and laptop, ad networks can link those devices based on shared identifiers. This helps track users consistently across different devices, creating a unified user profile.
- Location Data (GPS and Wi-Fi Data):
- What it is: On mobile devices, some ads access GPS or Wi-Fi information to determine the user’s exact location.
- How it helps: Location data allows ad networks to build profiles based on physical movement patterns, revealing daily routines and frequently visited places. Over time, this data can create a highly detailed profile, showing where users live, work, and shop.
Together, these techniques give ad networks a nearly comprehensive profile of a user’s behavior, location, preferences, and social interactions. When these data points are aggregated and analyzed with AI, ad networks can predict behaviors and even influence actions by displaying targeted content, turning what might look like routine tracking into something much more pervasive and potentially invasive.
In Depth
- Local Storage:
- What it does: Provides a way to store key-value pairs in a persistent storage that remains even after the browser is closed.
- Impact: Enables websites to store more data about users without relying solely on cookies, which users can delete more easily. Local Storage persists until manually cleared, making it more reliable for tracking purposes.
- Session Storage:
- What it does: Similar to Local Storage but only lasts as long as the browser session. Once the tab or window is closed, Session Storage data is cleared.
- Impact: Useful for short-term tracking of user activity within a single session, which can still help profile user behavior across different parts of a website.
- IndexedDB:
- What it does: A more powerful and complex storage solution that allows storage of large amounts of structured data, including binary data.
- Impact: Provides a persistent storage option that can hold more detailed and complex data about users, making it a rich source of user information that can be used for profiling. It supports offline storage for web applications, but the data can also be repurposed for tracking.
- Geolocation API:
- What it does: Allows websites to request a user’s geographic location.
- Impact: When granted permission, this API provides precise location data that can be used for personalized content but also enables tracking of user movement patterns over time. Geolocation data, even in aggregate, can be highly sensitive.
- Web Workers:
- What it does: Enables background threads to run JavaScript independently of the user interface, allowing tasks to be processed without interrupting the main browser thread.
- Impact: Web Workers can facilitate complex tracking tasks or computations without impacting user experience, which means tracking scripts can run efficiently in the background, potentially without the user noticing any performance impact.
- Canvas and WebGL:
- What they do: Provide a way to render 2D and 3D graphics directly in the browser.
- Impact: The Canvas API, intended for graphics, is sometimes used in a technique called “canvas fingerprinting.” Canvas fingerprinting draws hidden images that vary slightly between devices due to hardware differences, helping identify and track individual users across sessions.
- Audio and Video API:
- What it does: Allows native support for audio and video playback within the browser, without needing plugins.
- Impact: While primarily used for media-rich web experiences, these APIs can also be leveraged to analyze system-level differences that are unique to each device, contributing to the fingerprinting of devices.
- File API:
- What it does: Provides methods to handle files in the browser, enabling users to upload, manipulate, and read local files through the browser.
- Impact: Although its primary use is for user interactions, the File API can help track user behavior by observing patterns in how users upload files, their filenames, and possibly metadata.
- WebSockets:
- What it does: Allows for real-time, bidirectional communication between the browser and server.
- Impact: WebSockets enable continuous data exchange and can be used to monitor user interactions more closely. This real-time data flow can allow for more immediate tracking and analytics.
- History and Navigation API:
- What it does: Provides control over the browser’s session history and allows for manipulation of the URL without reloading the page.
- Impact: While intended for smoother navigation, the History API can also help track user behavior within a single-page application by logging page changes and interactions even without reloading the page.
Who brought in HTML5?
HTML5 introduced these APIs and features to create richer, more interactive, and more responsive web experiences. The goal was to enable web applications that feel closer to native applications, with offline capabilities, multimedia support, and enhanced data handling. However, these enhancements also introduced new ways to track users in a more persistent, detailed, and hard-to-detect manner.
The W3C and WHATWG—with contributions from companies like Google, Apple, Mozilla, and Microsoft—were key players in standardizing HTML5. While the motivation behind HTML5 was largely to improve user experience, the tracking potential in these features highlights the privacy and security trade-offs that often come with technological progress.
Here’s a timeline and background on the introduction of Local Storage and IndexedDB:
1. Local Storage
- Introduction: Local Storage was introduced as part of HTML5 and became widely available in 2009-2010. HTML5 aimed to expand what browsers could do, making web applications behave more like desktop applications. Local Storage provided a simple way to store key-value pairs in a persistent manner.
- First Adoption: Local Storage was first implemented by Safari in 2007 (Safari 3.1), and later adopted by other major browsers:
- Firefox 3.5 (2009)
- Internet Explorer 8 (2009)
- Chrome 4 (2009)
- Opera 10.50 (2010)
- Pioneers and Standardization: The Web Hypertext Application Technology Working Group (WHATWG) and the World Wide Web Consortium (W3C) were instrumental in standardizing HTML5, including Local Storage. Companies like Google, Mozilla, Apple, and Opera worked together to define the spec, which they saw as essential for building richer web applications.
2. IndexedDB
- Introduction: IndexedDB was designed to offer a more robust, queryable database for browsers. Unlike Local Storage, which only allows simple key-value pairs, IndexedDB can store complex data structures, making it powerful for larger data needs. IndexedDB was officially introduced by the W3C and first implemented around 2011-2012.
- First Adoption: IndexedDB support was rolled out in major browsers as follows:
- Firefox 4 (2011)
- Chrome 11 (2011)
- Internet Explorer 10 (2012)
- Safari 7.1 (2014)
- Opera 15 (2013)
- Pioneers and Standardization: The W3C was the main organization responsible for standardizing IndexedDB. Major companies, including Google and Mozilla, contributed heavily to its development. Microsoft also played a role, particularly by including IndexedDB in Internet Explorer 10 to support rich web applications.
Key Organizations and Companies
- WHATWG: WHATWG (Web Hypertext Application Technology Working Group) was initially formed by Apple, Mozilla, and Opera. This group pioneered the HTML5 specification, including Local Storage, with the goal of advancing the web’s capabilities.
- W3C (World Wide Web Consortium): W3C later took over aspects of HTML5 development and formalized standards for both Local Storage and IndexedDB. They worked closely with browser vendors to ensure interoperability.
- Google, Mozilla, Apple, and Microsoft: These companies were among the main contributors to the HTML5 standard and were early adopters of Local Storage and IndexedDB, integrating these features into their respective browsers. Their aim was to enable richer web applications and offline functionality, though the side effect was enabling more persistent tracking capabilities.
Motivation and Impact
The initial motivation behind Local Storage and IndexedDB was to allow for a new generation of web applications with offline capabilities and improved performance. However, these features also introduced new possibilities for data storage and tracking, as information stored in these databases is harder for users to manage and clear. This shift allowed advertisers and third-party trackers to use these methods as more persistent tracking alternatives to cookies, which users could more easily delete or block.
Summary
Local Storage and IndexedDB represent significant steps in web technology, championed primarily by the WHATWG and W3C, and pushed forward by companies like Google, Mozilla, and Apple. While they have led to richer and more flexible web applications, they have also created new avenues for tracking, sparking ongoing debates about privacy and data control.