In the absence of effective enforcement, consumer rights are illusory. Effective enforcement may not only remedy harms that consumers have experienced, but also prevent harms from materialising and, in either event, it signals to market actors that consumer rights are to be taken seriously.1
EU consumer law may be enforced via private (individual and collective) and public means.2 Public enforcement is meant to address some of the potential shortcomings of private enforcement by individual consumers. As Faure and Weber explain, private enforcement by consumers is based on harm that has already occurred, whereas public authorities can engage in ex ante monitoring, which allows breaches to be discovered before they cause harm.3 Further, it may be the case that consumers do not enforce their rights because of rational apathy (not wanting to complain when the costs of doing so exceed the benefits), expecting other individuals to care more about enforcing said rights (the ‘free-riding problem’) or simply because they do not know that an infringement has occurred (information asymmetry).4 In digital markets, where (some) traders use dark patterns that exploit consumers’ cognitive biases – thereby impairing or superseding consumers’ decision-making - the problem of information asymmetry is exacerbated.5 Public enforcement becomes ever more important for keeping businesses in check.6 However, for authorities to enforce the law, potential infringements first need to be detected.7 The sheer scale of digital markets makes it likely that many potential infringements will fly under the radar of the enforcers, who have historically been under-resourced (in terms of financial, human and nowadays also technical capacity) and slow to act.8 Against this background, academics are calling for authorities to fight fire with fire, i.e. to use ‘consumer forensics’ methods9 or ‘EnfTech’10 (short for ‘enforcement technology’) tools to strengthen their surveillance of digital markets. The term EnfTech refers to authorities’ use of technology both for market monitoring and for ‘the active application of preventative measures, remedies or sanctions that support consumer protection’.11 Consumer forensics is another broad concept, covering ‘new insights and computational approaches that can help regulators, policymakers, and industry stakeholders better understand and prevent consumer harms on digital markets’.12 Technology could therefore be used both to uncover new harmful commercial practices and inform policy efforts, as well as throughout the enforcement life-cycle, from investigating the unlawfulness of practices deployed ‘in the wild’ to the direct execution of enforcement. In this chapter, I am solely concerned with the use of technology to investigate infringements on digital markets (i.e. market monitoring) as a precondition for taking enforcement action.
Some public authorities are rising to the occasion and developing both computational methods and the organisational capacity necessary to monitor digital markets,13 but overall the use of technological tools by enforcers is still in its infancy. As Goanta points out, ‘a lot of the technology that is needed in the exercise of investigation [...] powers by consumer [...] authorities simply does not yet exist’.14 That is also the case for technology that could facilitate the detection of unlawful dark patterns. It is therefore worthwhile to explore whether, and how, the detection of unlawful dark patterns could be automated.
Against this background, this chapter explores the technical feasibility of automating the detection of dark patterns that, under the current legal framework, are potentially unlawful. While dark patterns are used across a variety of digital environments, as Chapter 3 illustrates, this assessment is limited to Shopping dark patterns used on the web platform (although some of the points made herein may be applicable to other media that rely on graphical user interfaces, such as smartphones, and other types of dark patterns, such as Privacy dark patterns).
A computational framework for the automated detection of infringements needs to be able to facilitate automatic access to and collection of data from websites, and the subsequent analysis of that data. This chapter thus explores opportunities for the automated collection and analysis of data related to dark patterns, and reflects on the challenges and limitations of these methods. This exercise is based on a review of prior dark-pattern detection studies and web measurement literature more broadly.15 Web measurement is a relatively new subfield of privacy and cybersecurity studies that is concerned with observing websites and services at scale to detect, characterise and quantify web-based phenomena.16 Web measurement literature thus provides insights into what methods can be used to access, collect and analyse data from websites at scale, and what challenges may arise along the way.
As this chapter will show, some challenges of automated data collection and analysis are linked to the current regulation of web design through a predominantly technology neutral, principle-based substantive legal framework that employs vague concepts of general application. To be clear, I am not claiming that a technology-neutral legal framework is necessarily bad news altogether; as Chapter 4 explains, technology-neutral and technology-specific rules have advantages and drawbacks, and there are good reasons for policymakers to rely on a combination of both technology-neutral and technology-specific rules for the regulation of socio-technical artefacts like dark patterns. However, if we do want to use computational approaches to scale up infringement detection, more technology specificity is needed. Technology-neutral regulation introduces design variability, which complicates data collection efforts; it may entail data inaccessibility and design volatility, the latter of which may result in the deprecation of data collection and analysis methods; and it could lead to an inability to derive measurable infringement standards, which presents a significant obstacle for the automation of data analysis. This is the last piece in the theoretical proposition that this book puts forward: the effective regulation of socio-technical artefacts like dark patterns needs to rely on digital tools to detect infringements, and these, in turn, call for (some) technology specificity in the substantive legal framework. To illustrate this point, I focus on the three dark patterns that I devise policy solutions for in Chapter 7 – Hidden Costs, Hidden Subscription and Hard to Cancel.17 As will become clear when I discuss prior detection studies, however, this recommendation applies beyond these examples and consumer law: data protection law scholars have voiced similar concerns and recommendations.
This chapter proceeds as follows. Section 8.2 looks at what web measurement methods can and cannot do for the automated collection of data about dark patterns from websites. Section 8.3 takes stock of the methods used by dark patterns and web privacy measurement scholars to automatically analyse website data. Section 8.4 brings together the conclusions of my analysis in this chapter with those from prior chapters to reflect on how legal and technical solutions could make regulation targeting dark patterns (more) effective, and puts forward some recommendations for regulators.
This section discusses the use of crawling as a method for data collection in large-scale web measurement studies and how the extraction of data from websites can be optimised to cater to the collection of data about dark patterns. It then looks at some limitations of crawling as a data collection method.
Web crawling refers to visiting and browsing web pages automatically to collect data from them.18 A crawler that extracts the page source code or a targeted portion thereof is called a web scraper.19 Crawlers are used by search engines to index websites,20 by industry to research competitors (e.g. price monitoring),21 by malicious web actors (e.g. for ad fraud, comment spam and stealing content)22 and in research. Much of what we currently know about how the internet works is the result of research relying on web crawls.23 Crawls are a state-of-the art method in web measurement studies, where they are used to study various aspects of web security and threats to privacy, as well as compliance with privacy and data protection laws;24 according to Ahmad et al., around 16% of the papers published in prominent security, privacy and network measurement venues in 2015–2018 relied on crawling for data collection.25 More recently, researchers have started using crawlers for purposes that are (more closely) related to consumer protection, such as the reclassification of online customer reviews by review platforms26 and, as we will see throughout this section, dark pattern measurement.
While crawling has gained considerable traction with the web measurement community, there is no standardised, principled methodology for using crawlers in such studies, but rather a mix and match assortment of pragmatic solutions to methodological challenges that researchers may encounter.27 As a research field, web measurement has developed in an ad hoc manner, mirroring the need to quickly devise methods to detect vectors of harm in the fast-changing privacy and cybersecurity threat landscapes.28 Demir et al.29 and Englehardt et al.30 have reviewed web measurement literature with a view to systematising methodological considerations in a bottom-up manner, and discuss alternative study design choices researchers may pursue and best practice for particular tasks. To the best of my knowledge, these are the only studies that attempt to assemble a guidebook for web measurement studies. In the following sub-section, I discuss the main design choices web measurement researchers are faced with according to Demir et al. and Englehardt et al. and the options available in respect of each methodological dimension, as well as the approaches dark patterns researchers have followed. Section 8.2.2 discusses the limitations of web measurement methods, and the challenges of using them for the detection of unlawful dark patterns. Section 8.2.3 summarises this discussion.
Web measurement researchers have to make choices in terms of the methods used for target identification, interacting with a website and collecting data from it, as well as the features of the experimental environment.
The term target identification refers to the selection of websites to be crawled (sub-section A).31 The web continues to grow, and measuring every website and page thereof is neither feasible32 nor desirable because some studies focus on particular subtypes of websites. Every study, therefore, has to identify websites and pages to analyse. This is not a trivial task. The web is not a centrally managed repository of information.33 One of the architectural principles of the internet is its openness, which reflects a commitment on the part of the internet design community to a culture of distributed authority,34 and is a consequence of the lack of strong public structures to regulate the internet in its early days.35
Next is the question of crawler design, or what Englehardt et al. call ‘infrastructure’ (sub-section B).36 Studies have to first pick an underlying crawler technology.37 The technology behind web measurement studies has considerably evolved over the years,38 and nowadays there are a wide variety of crawler technologies. As discussed below, not all of them are equally suitable for each and every crawling task, and researchers therefore have to make a choice that suits the goals of their study. Once a crawler technology has been chosen, researchers may need to customise the crawler and develop ways to interact with web pages and extract certain elements of interest from them.39 Websites are heterogenous in both their content and structure.40 Certain website types and crawling tasks may therefore require extending a foundational crawler technology for the purpose of a specific investigation.
Lastly, researchers have to define the features of the experimental environment (sub-section C), i.e. some global attributes of the crawling experiment, such as their geolocation and the page-visit strategy.
In the following sub-sections, I discuss the options available in respect of each of these dimensions, and the approaches that dark patterns researchers have followed.
Absent a centralised, public repository of websites, researchers commonly focus their data collection efforts on the more popular websites as judged by top-website lists: a 2018 literature review found that over 22% of the web measurement studies published in 2017 relied on a top list.41 There are several website-ranking providers that are popular amongst web measurement researchers. The Alexa Top Sites ranking provided by Amazon used to dominate this research setting, having been used in over 90% of studies published in top networking venues in 2015–2019,42 including Mathur et al.’s measurement of dark patterns on shopping websites,43 but the service was retired in late 2022.44
A recent literature review by Ruth et al. found several other lists that researchers rely on:45 Cisco’s Umbrella 1 Million,46 the Majestic Million,47 the Secrank list,48 the Tranco Top Million List (formerly based on aggregated data from Alexa, Majestic and Umbrella)49 and the Trexa Top Million (based on aggregated data from Alexa and Tranco, now deprecated).50 The same study evaluated the relative accuracy of these lists and that of the Chrome User Experience Report (CrUX), a dataset that contains sites regularly visited by Google Chrome users,51 and found that CrUX provides the most accurate results when compared against ground truth data from Cloudflare, a content provider that serves the largest fraction of top websites.52 In response to the findings of this study, the Tranco list methodology was adapted to include data from CrUX and Cloudflare.53 Some recent web measurement studies rely directly on CrUX,54 and studies investigating dark patterns in cookie banners have used the Tranco ranking.55 Another advantage of CrUX (and Tranco, which uses CrUX in its aggregate rankings) is that they contain country- and region-specific data on website popularity, which is relevant for infringement monitoring, as infringements are jurisdiction specific.
For the purpose of conducting sector-specific (e.g. e-commerce) investigations, websites extracted from one of these resources need to be characterised by their purpose. A distinct advantage of the Alexa Top Sites ranking was that it categorised websites by topics, although Mathur et al. found the rate of false negatives in Alexa’s list of shopping websites to be very high.56 The researchers decided to rely instead on Webshrinker, a commercial solution for website categorisation,57 and validated the results by checking for the presence of an ‘Add to cart’ or similar button on random pages of the websites (the precise workings of this are discussed later in this sub-section).58 Aside from Webshrinker, web measurement researchers have relied on a wide array of website categorisation services developed for marketing, content filtering, threat assessment and content discovery purposes.59 Vallina et al. compared and evaluated 13 website categorisation services in terms of their suitability for academic research. They found that only a few services attained a sufficient level of coverage, with coverage varying substantially between services; services may return multiple or unexplained website categories; and there is great diversity in the website categories used within and across services.60 The researchers recommend manually inspecting random subsets of classified websites to determine whether the labelling is of sufficient quality, or developing sui generis classification methods. There are several studies in which researchers have relied on their own methods for website-purpose classification due to concerns about the accuracy and coverage of commercial services,61 and there is a research community that is active in the domain of website-purpose categorisation.62
Depending on the scope of the investigation, there may be a need to account for the linguistic variety of websites. The majority of dark pattern detection studies limit their scope to English-language websites. To detect the language of a website, researchers have used tools such as the polyglot63 and langdetect64 Python libraries and the Google Translate API.65 As we will see below, the language of the website is an important consideration in terms of designing interaction methods and detection methods.
Another matter that needs to be tackled in investigations is which pages of a website to focus the analysis on and how to identify them, as top lists only include website landing pages. Researchers have developed an alternative top list, Hispar, which contains both internal and landing web pages of popular websites.66 Hispar is, however, a generic list, and as such not suitable for thematic analyses (i.e. analyses which choose a particular page type to focus on); prior research has also questioned the validity of Hispar-generated results.67 In their crawl of e-commerce websites, Mathur et al. chose product pages as a starting point for data collection.68 The researchers developed a crawler that would randomly visit links on shopping websites. The crawler ranked the links according to their likelihood of leading to a product page, using a classifier that relied on the features of the URL, and identified visited pages as product pages where these had an ‘Add to cart’ or similar button. This button was detected by assigning a weighted score to visible HTML elements on a page based on their size, colour and whether they matched certain regular expressions. The crawler was able to extract and return product pages where they were present, and return no product pages where they were missing, on 86 out of the 100 websites on which it was validated.69
As the section introduction explained, websites are heterogenous in both their content and structure, and many web measurement studies require the building of target-specific crawlers.70 This process entails selecting a foundational crawler technology and adjusting it for the purposes of the investigation (e.g. by developing ways to simulate user interaction with the website and adopting a data collection strategy).
A myriad of crawler technologies are available, and the choice of a particular crawler implies trade-offs between computational overhead, developer effort, data accuracy and completeness.71 On one end of the spectrum, there are simple-to-use, lightweight tools like WGet72 and cURL,73 which handle web content as static files.74 However, as Chapter 2 explains, nowadays most websites are dynamic, and these tools’ inability to execute JavaScript poses a risk that the collected data will be incomplete.75 On the other end, there are browser automation frameworks like Selenium,76 PhantomJS77 and Puppeteer;78 these are more computationally expensive to use, but enable the collection of more complete data and allow the simulation of user interaction.79 Researchers have also developed dedicated frameworks for web measurement, such as OpenWPM,80 which is built atop of Selenium and comes with some pre-packaged data collection and user interaction functionalities for conducting large-scale privacy and web tracking measurements.81 While none of these tools are ready to be used out of the box for each and every crawling task, as they are generic technologies (with the exception of OpenWPM), they can be adjusted to fit the goals of a particular investigation. The ways in which crawler technology may be used to collect data related to dark patterns is discussed in the next sub-sections.
In order to collect data related to dark patterns, researchers typically simulate users’ interactions with a service for its intended purpose; this is done either manually or automatically.82 In the case of shopping websites, Mathur et al. built a checkout crawler atop OpenWPM that automates users’ primary interaction path with this type of digital product: buying a product.83 The crawler accessed product pages, on which it selected product options such as size and colour, added the product to the virtual shopping cart, viewed the cart and then went to the checkout page without completing the purchase. To complete this shopping flow, the crawler used scoring functions (see the previous sub-section for a description of how these work) to output the most likely ‘Add to cart’, ‘View cart’ and ‘Checkout’ buttons amongst interface elements, and clicked these buttons to proceed to the next stage of the purchasing process. Because HTML markup and design vary between websites, crawlers need to account for design alternatives and several edge cases to ensure the scalability of the chosen approach.84 In terms of design alternatives, for example, an ‘Add to cart’ button can be implemented as an HTML <button> element or an <image> element wrapped in an <a> element, and use various labels (e.g. ‘Add to cart’ or ‘Add to bag’). Building a crawler therefore necessitates evaluating the generalisability of its interaction methods repeatedly and iteratively adjusting them.85 Another consideration in this respect is the website language(s). Interaction methods that rely on the button text to simulate user actions on a website may need to provide support for multilingual options. Bouhoula et al. have used machine translation (the open source translation API LibreTranslate)86 to detect cookie notices on websites and interact with them.87
Edge cases may be related to the performance of additional actions on a website, such as selecting a country of origin for websites that sell internationally, or dismissing pop-ups. Mathur et al.’s checkout crawler was not able to handle country-of-origin selection.88 The researchers dealt with pop-ups by programming the crawler to click the close button, where it could be located.89 Notably, in the EU some of these pop-ups may be consent banners required by the ePrivacy Directive (ePD) and the General Data Protection Regulation (GDPR), and these cannot always be dismissed by merely closing the banner,90 but may require registering a user’s preferences before they can interact with the website. Privacy researchers have developed user-facing tools to automatically interact with a consent banner and register preferences (e.g. Consent-O-Matic)91 which could be integrated in a web measurement framework; for example, Senol et al. used Consent-O-Matic to measure the effect of users’ preferences on web tracking and data collection.92 Due to edge cases and design alternatives that were not accounted for, Mathur et al.’s checkout crawler was ultimately able to complete the checkout process on 66 out of the 100 websites on which it was validated.93
Once the desired interaction has been defined and implemented, the crawler needs to know what data to collect from the pages it visits. Some studies investigating the use of dark patterns in consent notices have collected screenshots from automatically visited websites.94 However, using only screenshots for dark pattern detection limits the scope of practices that can be detected to those that are static (i.e. those that manifest on a single screen and are permanently visible),95 and disregards information about how a particular design is implemented (e.g. whether an Activity Notification96 has been falsely generated). A recent study by Kirkman et al. instead relied on CSS selectors97 to locate HTML elements containing potential consent notices, and extracted information about the location and text content of the element, as well as its HTML, and took screenshots of the element using Selenium. The researchers used the screenshots to support the manual validation of their automated approach to collecting data about, and detecting dark patterns.98 Mathur et al.’s checkout crawler also extracted the page source code upon visiting it, and took screenshots every time the state of the page changed.99 The screenshots were later used in the manual dark-pattern-labelling process, and the page source code helped the researchers ascertain the deceptiveness of some dark patterns.100 The researchers adopted an additional strategy in order to account for the transient nature of some dark patterns and the dynamic nature of websites generally. First, once a page loaded completely, the crawler would divide it into page segments, which the researchers defined as ‘meaningful smaller sections of a web page’. Fig. 1 below illustrates the results of Mathur et al.’s segmentation algorithm as applied to a product page on a shopping website.
Fig. 1: HTML textual segments identified by Mathur et al.’s segmentation algorithm101
Further, because some dark patterns may be transient (e.g. an Activity Notification may appear repeatedly on the interface for a short period of time), and some website content may be loaded dynamically (e.g. in response to user interaction), the researchers used the Mutation Summary library,102 which summarises the changes made to the DOM tree,103 in order to capture segments from page updates during measurement. Mathur et al.’s measurements resulted in ∼13 million segments from the visited pages.104 These segments formed the basis of the researchers’ analysis of dark patterns as discussed in sub-section 8.3.2. Processing this kind of data volume manually is a resource-intensive and error-prone task. Automating data analysis for such vast datasets could yield more accurate results in a more efficient way; section 8.3 explores the options for doing so.
The last type of design consideration concerns the geolocation from which crawls are conducted and the page-visit strategy, both of which could affect the results of a crawling exercise. Researchers may also adjust their page-visit strategy based on ethical concerns.
A key consideration in setting up the experimental environment is the geolocation of crawls. Websites may serve different content depending on location (e.g. IP address), and this could also be influenced by the regulations that are applicable in a particular location.105 An automated measurement of dark patterns performed, for example, in the United States, may not be reflective of the practices that are being deployed in the EU. To conduct measurements from a location of interest, researchers may therefore rely on commercial VPN services to manipulate the location of measurements. For example, Sheil et al.106 used Mullvad VPN to (manually) investigate and compare the subscription sign-up and cancellation flows of news websites in five locations (UK, Netherlands, Germany, Texas and California) in light of the varying regulation in these jurisdictions. Choosing a VPN service provider is also a decision that requires care – a 2018 study tested 2269 servers operated by seven VPN services and found that at least one third of the tested servers were not located in the advertised countries.107 Researchers have adopted measures such as using IP geolocation services like WhatIsMyIPAddress.com108 and IPLocation109 to test the validity of the statements made by the VPN services they employ.110
The page-visit strategy refers to how (e.g. with a clean slate or recognisable browser) and when a crawler visits (or revisits) a page, and how many pages a crawler visits on a website.
Depending on the goal of a study, researchers may conduct stateful or stateless web crawls. Stateful measurements do not clear the browser’s profile between page visits, which means that cookies and other browser storage persist from site to site and in between several visits (i.e. different sessions) to the same website.111 By contrast, a stateless browser always appears to be a new user.112 OpenWPM supports stateful crawls.113 Stateful crawls are useful for web tracking measurements114 as well as for investigating personalised practices, as they facilitate the creation and use of persistent (across sessions) user profiles.115 They can also be used to gauge the deceptiveness of a transient dark pattern like a Countdown Timer.116 Mathur et al. adopted a stateful strategy to repeatedly visit pages that displayed Countdown Timers at regular intervals, and took screenshots.117
The page-visit strategy may also be affected by ethical considerations. Badly implemented web crawlers (e.g. crawlers that send too many requests to a website within a short period of time) can effectively act as a denial-of-service attack118 against the visited website, slowing down its loading time or making it unavailable for actual users and generating costs for website operators.119 In order to tackle this issue, researchers have adopted tactics such as limiting automated visits to a certain number of web pages on a website,120 crawling websites during low-load periods (e.g. at night)121 and throttling their crawlers (i.e. reducing the rate at which they send requests to websites).122
Web measurement methods have some limitations. The limitations identified in prior literature concern websites’ bot detection strategies and adversarial behaviour (sub-section A), measurement bias (sub-section B), website multilingualism (sub-section C), access restrictions (sub-section D) and design and HTML variability both qualitatively and in time (sub-section E). This section discusses these limitations, reflects, where relevant, on how they may affect the task of automatically collecting data about unlawful dark patterns, and shows how a more technology-specific legal framework could help overcome some of these challenges.
As the introduction to this section explains, web crawlers may be deployed for both benign and malicious purposes, and site owners may therefore have good reasons to take measures against automated web agents. There are various techniques for detecting web bots; some websites rely on site traversal (the navigational paths of the bot) to flag bots, whereas others look at the fingerprint of automated browsers or interaction characteristics (e.g. speed of mouse movement).123 Site owners may deploy different measures in response to automated visits, such as denying access, implementing CAPTCHAs or serving different content (so-called ‘cloaking’).124
The bad news is that these measures affect both benign and malicious bots.125 Moreover, recent studies have shown that there are websites that specifically target crawling frameworks and methods commonly used in web measurement research; Krumnow et al. found that at least 16.7% of websites in the Tranco top 100K list execute scripts which access properties specific to Selenium (and thereby OpenWPM, which uses Selenium) and scripts accessing properties that are specific to OpenWPM.126 The researchers also demonstrated that adversarial websites could attack and poison OpenWPM’s data collection modules.127 Web measurement frameworks therefore commonly incorporate some kind of bot detection mitigation strategy,128 and researchers are constantly developing new bot mitigation strategies, such as methods for hiding the fingerprint of popular crawling frameworks129 and adding more human-like qualities to automated interactions with websites.130 None of these methods are completely foolproof, however, as both bot detection and bot detection mitigation strategies are constantly evolving. I discuss the implications of adversarial website operator behaviour for public authorities in section 8.4.
The use of crawls as a proxy for human browsing data may result in a biased picture131 of websites’ practices.132 The web measurement research community has only recently started investigating this issue. Recent studies have shown that the operating system133 and the crawler technology used in a study,134 as well as its configurations and the network vantage points, may lead to different results,135 and that crawling may produce results that are different from commodity browsing experience altogether.136
While these studies document measurement discrepancies due to several experimental factors, the root causes of measurement discrepancies are still not very well understood.137 For now, web measurement researchers therefore recommend deploying several different crawler configurations and executing multiple measurements in order to reduce the risk of measurement bias.138 Even so, crawls cannot capture all of the diversity of actual user environments, owing to factors such as diversity in operating systems and their versions, browsers and their versions, and user profile history.139 These sources of bias are inherent in web measurement research and not entirely preventable.140
Multilingualism is a distinct challenge for investigations focused on the EU market, which features 24 official languages.141 As the previous section showed, data collection methods relying on website interaction need to account for linguistic variety. One way to do so is to use machine translation. Bouhoula et al., for example, have used LibreTranslate to automate interaction with cookie banners in 11 EU languages (Danish, German, English, Spanish, Finnish, French, Italian, Dutch, Polish, Portuguese and Swedish).142 The researchers report that LibreTranslate produced inaccurate translations for the short texts of interactive elements in some languages (e.g. Greek), resulting in the removal of these languages from their analysis.143 While mistranslations were rare(r) in the languages ultimately included in the study, some mistakes needed to be manually fixed. This is not to say that automating interaction with non-English websites is impossible; it is merely (significantly) more resource-intensive, especially since automatic translation services may not perform equally well on smaller European languages.
The data necessary for assessing the presence of a practice on a website may not be accessible. This may pose issues for infringement detection. For example, some websites require users to create an account to have full access to their content. Logins are also sometimes required for making a purchase on the website or cancelling subscriptions. Further, some dark patterns like Pressured Selling144 may only manifest post-purchase. Many dark patterns, therefore, hide behind these login walls. Introducing standardised requirements for processes like contract termination (e.g. a cancellation button that is accessible via the homepage) would ensure that at least some of the data related to potential infringements is accessible (while acknowledging that some aspects of compliance checking, e.g. checking whether a submitted cancellation request is actually effective, will still require purchases to be completed).
Another related issue is the server-side generation of (some) web content. The consequence of this is that information about how the content was generated (e.g. whether it is deceptive) may not be accessible. This was the case, for example, for some of the deceptive Activity Notifications in the Mathur et al. dataset.145 Access restrictions of this sort are a question for procedural law, as section 8.4 discusses.
Websites differ in their design, and there is therefore a lot of variability in the way some dark patterns are instantiated. For example, Hidden Costs may entail costs that are disclosed in obscure locations/in various font colours/sizes; a cancellation button may be placed in various places on the website; and the cancellation process (if it can be accessed) may involve a varying number of clicks and the navigation of other dark patterns that are deployed in its course. This makes the automation of the website interaction that is necessary in order to collect infringement-related data specifically for these practices a very difficult task, which researchers have not yet attempted. Further, even if interaction can be automated for some of the websites, this could result in data set bias: i.e. it may not generalise to the larger population of websites. This could be remedied by implementing express legal requirements as to the timing and presentation of information (e.g. size, placement, colour) and design requirements for cancellation processes (e.g. a cancellation button on the homepage). Standardising user interface design in these respects would also ensure that tools automating website interaction do not become deprecated as easily (although as discussed below, there are still some challenges due to changes in website design and code that occur over time).
However, even in the event that it becomes clear(er) what websites are supposed to look like in order to be compliant, there are a myriad of ways to express this in the source code (HTML, CSS and JavaScript); for example, a button can be implemented as an HTML <button> element or an <image> element wrapped in an <a> element. Design mining research exploring the evolution of website design has found that while the visual design of websites homogenised during the period 2007–2019, the source codes have become more dissimilar over time.146 Having a representative and large sample of websites, and iteratively designing rules to extract the necessary information/automate interaction based on the source code of these websites, is therefore imperative in order to ensure the scaling of data collection methods. This concern has been repeatedly noted in studies that collect data from/automate interaction with cookie banners – most of these studies focus on cookie banners provided by consent management platforms due to the predictability of their designs, but may not work well, or at all, on generic websites.147
Website design and underlying code may also change over time. This complicates the task of automatically collecting infringement-related data, in three respects. First, a particular design may be part of an ongoing A/B test, i.e. not the final adopted design.148 This issue could be addressed by, for example, revisiting the website at a later point or checking whether the website source code includes libraries provided by popular A/B testing platforms or configuration files related to currently running experiments.149 Second, an infringement may be time-bound; it may therefore be worthwhile to collect some non-modifiable data from the website, such as screenshots. Lastly, leaving aside cases of adversarial behaviour, automated data collection efforts may fail because of some updates in the underlying code. Any automatic data collection tool will thus need maintenance and adjustments in time, unless not just the design but also its implementation is standardised. This has organisational implications (which I discuss in section 8.4).
This section has shown that it is possible to collect data about dark patterns from shopping websites at scale using web measurement methods such as crawling and scraping. Researchers have devised ways to automatically interact with shopping websites the way a consumer would – adding a product to cart and checking it out – and to automatically extract information from them, including their source code. The source code of a website can reveal many things – not just the presence of a dark pattern, but also its deceptiveness (more on this in the following sub-section). Web measurement methods therefore seem (technically) well-suited for use in the discharge of public authorities’ market-monitoring functions.
That being said, while there is a great breadth of public information on the web which can be automatically accessed and extracted, web measurement is not easy. The web was not meant to be supervised, and the lack of a centralised website repository means that even a seemingly straightforward task like picking a starting point for an investigation is a challenge in and of itself. The web still does not want to be supervised: websites deploy ever-evolving bot mitigation techniques that affect both benign and malicious bots; and recent studies have shown that some websites target the tools used in web measurement specifically, and that some of these tools are vulnerable to purposeful attempts to jeopardise their data collection frameworks. This points to a need to constantly update a crawling infrastructure, as do regular, innocuous website updates.
We have also seen that automated browsing experiences may be different from genuine users’ experiences, for reasons that the web measurement research community are still trying to understand, and that different crawler settings may lead to different results. In the meantime, this means that there is some room for inaccurate results, which, as we will see in section 8.4, has some implications for the legitimacy of authorities’ use of web crawling for market monitoring. A related question is that of the generalisability of data collection methods across the EU, in light of the EU’s commitment to linguistic diversity. While scholars have recently started experimenting with automatic translation tools to adapt their crawling methods for websites in languages other than English, for now these tools are not perfect, and automating data collection from non-English-language websites will still require manual work.150
Further, not all information on the web is public: accessing a website or some of its pages may require creating accounts or conducting purchases, and some website resources may be generated server-side. This is something that more technology-specific policy could partially address – standardised requirements for processes like contract termination (e.g. a cancellation button that is accessible via the home page) could make some infringement-related data accessible. Access to server-side data is a matter of procedural law that I reflect on in section 8.4.
Lastly, the fact that the web was supposed to exist out of the reach of public authorities also means that there is not a great deal of uniformity in terms of website structure (e.g. where a cancellation option is placed) or in terms of design (e.g. what that option looks like), either qualitatively or in time. As this section has shown, the lack of uniform website design standards could impede the development of time-proof and generalisable methods for the collection of data about dark patterns. Again, more technology-specific regulation (the kind I outline in Chapter 7) could help bring automated infringement detection a step closer to becoming a reality.
This section explores whether potentially unlawful dark patterns can be automatically detected in website data. It reviews existing dark pattern detection studies, reflecting on the suitability of their methods for detecting infringements, as well as on possible obstacles to automation.
Before I proceed to outlining the available data analysis methods below, it should be noted that the feasibility of and the choice of method for automated data processing is influenced by the way data is represented. The data used by an algorithm needs to be processable by it;151 this means that raw data collected from websites (e.g. the HTML of a cookie banner) has to be pre-processed, which may involve developing ways to extract some features from it (e.g. clickable or text elements) that map to a quality of interest (e.g. the presence of a dark pattern). While my analysis in this section classifies dark pattern detection studies in terms of the data analysis method followed, the process of data pre-processing may involve a mixture of methods, which are described for each respective study.
Two main approaches to automated dark-pattern detection have been followed in prior studies: heuristics-based and machine-learning-based.
A heuristic is a ‘simple and quickly implemented solution to a problem’152 that may be based on common sense, experience, analogies, judgement or informed guesses.153 An example of a heuristic is an ‘if then’ rule. While heuristics offer ‘acceptably good’ rather than optimal solutions to a computational problem,154 and they may only give correct answers to some instances of a problem (i.e. they may not generalise well),155 they may be helpful when there is little data.156 Heuristics have been used for the development of privacy-enhancing technologies (e.g. ad-157 and tracker-blocking extensions),158 as well as in the study of unlawful behaviour by web trackers,159 and for phishing detection in the cybersecurity domain.160
Machine learning is a subfield of computer science that is concerned with ‘the question of how to build computer programs that improve their performance at some task through experience’.161 Broadly speaking, three main subtypes of machine learning can be distinguished based on the experience available to the algorithm: supervised, unsupervised and reinforcement learning.162 As reinforcement learning has not been used in any of the studies I review, I limit my discussion here to supervised and unsupervised learning.
The goal of supervised learning is to learn a function mapping some input (a collection of features that have been observed from an object or event of interest)163 to an output variable based on a training dataset of labelled input–output pairs;164 this function can then be used to map output from new input.165 Often, it will be difficult to collect the outputs automatically, and these will have to be provided manually by human ‘supervisors’.166 When the output is a real-valued variable, the machine-learning task is called ‘regression’, whereas when it is categorical, it is referred to as ‘classification’,167 and a wide array of machine-learning algorithms may be used for each task.168 The detection of unlawful dark patterns could be framed as a classification task (e.g. distinguishing between potentially unlawful and lawful dark patterns).
Since the ultimate goal of supervised learning is to map future output values, a model’s ability to perform well on previously unobserved inputs (called ‘generalisation’) is crucial.169 A best practice in this regard is dividing the available dataset into training and test sets, and comparing the model’s performance on both sets.170 A model that performs well on the training set but worse on the testing set is said to be ‘overfitting’ as it is too finely attuned to the variations in training data, which are likely to be noise (irrelevant information).171 On the other hand, a model that does not perform well on either set is ‘underfitting’.172 A commonly used performance measure in classification tasks is accuracy, which is the proportion of examples for which a model is able to produce the correct output.173 The capacity of the machine-learning model and the size and quality of the dataset have a major influence on a model’s ability to generate accurate outputs.174 Model capacity refers to the model’s ability to fit a wide array of functions. A higher-capacity model is more likely to overfit the data, and a simpler one to underfit.175 Machine-learning experts therefore have to find a balance between over- and underfitting (the so-called ‘Goldilocks model’)176 by adjusting a model’s capacity or testing and choosing amongst models with various capacities.177 In terms of dataset size, a larger dataset is more likely to be representative of the real-life variations of the analysed phenomenon,178 and reduces the likelihood that the model will find patterns that do not exist in the data.179 In classification problems, it is also important for the dataset to be balanced (i.e. the classes should be more or less equally represented in terms of the number of examples present in the dataset) in order for the model to be able to learn from sufficient examples of each class.180 This is particularly important for the detection of rare phenomena like dark patterns, which occur rarely on web pages relative to regular page elements or other design patterns.181 Having little data about a class means that the model may only learn to discern the majority class (i.e. the one on which there is a lot of data) well.182 Accuracy as a performance metric may be misleading in cases of class imbalance. For example, if 9,900 out of 10,000 (99%) HTML textual elements on websites are not dark patterns, and the model classifies all HTML elements as not dark patterns, the model achieves 99% accuracy without actually being useful for the task it was trained for. In such cases, researchers can turn to more error-sensitive evaluation metrics to increase their chances of flagging bad performance,183 and use data manipulation techniques like sampling to balance the dataset.184 As to data quality, incorrect labels or the presence of a lot of noise in the dataset may too lead to errors.
Unsupervised learning algorithms only experience features without labels.185 Their goal is to find interesting patterns in the data.186 An example of an unsupervised learning task is cluster analysis, which aims to find clusters of data points that are more similar to each other than to other clusters of data points in the dataset.187 While unsupervised learning is not a suitable approach for the automatic detection of infringements, as the results will still require (a great deal of) human interpretation, it can nevertheless be used in the process of dataset generation, as we will see below.
Machine-learning methods are used extensively in some subfields of artificial intelligence (AI), such as natural language processing (NLP) and computer vision (CV). NLP is the subfield of AI that broadly aims to ‘get computers to perform useful tasks using human language’ (i.e. text or speech).188 CV is concerned with enabling computers to derive meaningful information from visual data, such as images.189 As we will see in this section, dark patterns researchers have drawn on methods from both of these fields.
The next sub-sections examine studies that have developed heuristics-based (8.3.1) and machine-learning-based (8.3.2) detection methods, and take stock of the opportunities presented by these approaches, as well as their limitations and the difficulties researchers have reported in their efforts to create automated methods for the detection of dark patterns. The last sub-section (8.3.3) reflects on what the state of the art in terms of detection methods means for the detection of potentially unlawful dark patterns, and discusses some general challenges of automated data analysis and how (technology-specific) adjustments to the substantive legal regime could help overcome some of these.
Several recent efforts aimed at the detection of dark patterns on shopping websites, across platforms (websites and mobile apps) and in consent notices have relied on heuristics for the identification of dark patterns in UI elements.
In April 2023, the Dark Pattern Detection Project (DAPDE), funded by the German Federal Ministry of Justice and Consumer Protection, released a browser extension called the Dark Pattern Highlighter190 on GitHub.191 The Dark Pattern Highlighter is intended to work in a similar way to an ad blocker, but instead of blocking the use of dark patterns, it highlights them on a web page in order to increase consumer awareness, as Fig. 2 shows.
Fig. 2: Countdown Timer highlighted by the DAPDE Dark Pattern Highlighter on ashlen.co (May 2023)
The Highlighter stores two copies of a visited web page (its HTML DOM) with a time gap of 1.5 seconds. The pattern detection methods rely on applying regular expressions to textual elements and, for Countdown Timers,192 additionally verifying whether the text of the element has changed by comparing the two copies of the web page. Regular expressions are algebraic notations for characterising a set of strings, and they are particularly useful for searching in texts.193 The extension can currently detect and highlight Urgency/Scarcity dark patterns – Countdown Timers, Activity Notifications194 and Low-stock Messages195 – on English- and German-language websites. The Highlighter documentation also lists Forced Continuity/Hidden Subscription as one of the detected dark patterns, yet the detection tactic used – merely searching for mentions of a recurring payment on the web page (e.g. ‘$10.99/month after’) – is not indicative of this practice.196 While DAPDE is a research project, there is no associated paper for the Dark Pattern Highlighter and therefore no data on its performance. The reflections I present below on some of the pitfalls of the detection approach adopted by DAPDE are therefore based on an inspection of the extension code and the general limitations of heuristic methods like regular expressions.
Other detection projects have sought to identify dark patterns beyond textual data. AidUI is an automated approach to dark pattern detection in website and mobile app screenshots.197 AidUI uses CV and NLP to recognise a set of visual (icons) and textual cues in screenshots. It then applies heuristics to map the presence of certain lexical patterns and icons, colour intensity (determined using Opencv)198 and size differences between UI elements to dark patterns in the Mathur et al., Gray et al. and Brignull taxonomies (an overview of the relationship between these taxonomies is provided in Chapter 3).199 Amongst its other capabilities, AidUI can reliably detect Urgency and Scarcity dark patterns in screenshots. AidUI performs best when its text, colour and spatial analysis modules are combined (although there is no data in the paper on how the different modules affect the detection of individual dark patterns, nor about the modules used for each dark pattern).
Heuristics have also been used in studies focusing on the detection of dark patterns and EU data protection law violations in consent notices. DarkDialogs is a system developed by Kirkman et al. that automatically extracts consent dialogs from a website and detects the presence of 10 Privacy dark patterns.200 For each cookie notice, the system collects information about its location, text content and HTML, takes a screenshot, and attempts to locate and classify all clickable elements, which are identified based on CSS selectors.201 The heuristics mapping website data to dark patterns were based on the presence of clickables (e.g. the lack of an opt-out button), their state (e.g. pre-selected preference sliders), relative brightness with respect to the background colour (determined using an arbitrary threshold set by the researchers), the area of the dialog relative to the area of the visible web page (also determined using an arbitrary threshold) and the comprehensibility of the text (based on the Flesch–Kincaid reading ease test).202 The system correctly classified over 99% of the 1375 dark pattern instances in a dataset manually labelled by two researchers.
Bouhoula et al. developed a method for the automatic analysis of dark pattern presence and GDPR violations in cookie notices.203 Their method maps the results of crawling (screenshots and clickables) and machine-learning models (declared and actual cookie purposes) to GDPR violations and dark patterns using heuristics based on the presence of clickables (missing reject buttons); mismatches between the cookies set and UI elements (missing notices or undeclared purposes) or user preferences (ignored reject buttons); mismatches between the colours (determined with a cluster analysis of the screenshot, and compared in terms of a threshold set by the researchers) and the text styles (font, weight and colour) of banner interaction options; and the clickability of links not included in the cookie notice. Gundelach and Herrman also compared the colours of clickables, but their method was based on determining the colour of the majority of the pixels in the clickable; to identify whether users were presented with balanced choices (accept and reject buttons), the researchers compared the structural similarity204 of page screenshots after interacting with clickables, based on the intuition that either rejecting and accepting cookies would lead to the same content on the web page.205
The main limitation of heuristic approaches is that the rules generated by experts may not generalise well to the design variations that are deployed ‘in the wild’; this could be the case, for example, where the rules are formulated based on a biased sample. For instance, the regular expression the DAPDE highlighter uses for Low-stock Messages (Fig. 3) detects variations of phrases referring to a particular quantity of pieces/counts/items/% that are available/sold/claimed/redeemed, or mentions of a last item/article.
Fig. 3: Regular expression used by DAPDE highlighter to detect Low-stock Messages
This expression would not be able to detect Low-stock Messages that refer instead to e.g. ‘units’ that are ‘left’ or that do not have a reference quantity, such as ‘Only few left’. Detection methods of the sort can be improved by either basing the heuristics on representative examples or relying on NLP approaches (e.g. language models) to detect semantically similar – but lexically varied – textual dark patterns (see next sub-section).206 At the same time, as the analysis in Chapter 6 shows, the use of non-deceptive Urgency/Scarcity dark patterns is most likely not an unfair commercial practice.207 Existing detection methods could be further developed to, for example, look for common JavaScript patterns linked to fake Scarcity messaging/Countdown Timers in the page source; or to revisit a page using a Countdown Timer either immediately (to check whether it resets upon reloading) or upon expiry of the timer (to check whether the offer is still available afterwards).
With regard to screenshot-based approaches, while the distinct advantage of these is that they are able to work cross-device and cross-platform (both on the web and in mobile apps), therefore overcoming some of the challenges related to automated data collection and analysis on mobile devices,208 the indiscriminate collection and analysis of screenshots is very inefficient in a market-monitoring context. Further, using only screenshots for dark pattern detection limits the scope of practices that can be detected to those that are static (i.e. manifest on a single screen), and disregards information about how a particular design is implemented (e.g. if an Activity Notification is falsely generated). As the previous section noted, screenshots can serve an evidentiary function (in case the design of a website changes) and provide visual support for labelling purposes, but looking at other types of website data can offer a more comprehensive insight into the design practices they implement.
Heuristics-based dark pattern detection methods have some other limitations that they share with machine learning methods. These are discussed in section 8.3.3.
As we saw in the previous section, one of the main shortcomings of heuristics is that they may not be able to handle design variations very well. As the goal of machine learning methods is to derive some patterns from past data that are sufficiently general that they would be useful in new, unseen scenarios, it may be worthwhile to explore what machine learning methods could do for automated infringement detection. This section provides an overview of existing studies that use machine learning to detect dark patterns and violations of data protection law, offering insights into the potential of these methods as well as the challenges associated with their use for infringement detection purposes. I also include dataset and annotation papers in this analysis. As the section introduction explains, while it is possible to frame dark pattern or infringement detection as a classification problem that can be addressed using supervised machine-learning approaches, such methods need a high volume of high-quality, labelled data. Data labelling is therefore an integral and very important step of developing machine-learning methods for the detection of (unlawful) dark patterns. This process could be done entirely manually, but it can also involve recourse to unsupervised approaches to simplify the (manual) annotation process. This section therefore starts by discussing a study that used unsupervised methods to create a dataset, and then explores supervised approaches.
As seen in the previous sections, Mathur et al.’s crawler extracted ∼13 million segments from the visited pages. In order to identify dark patterns in these segments, the researchers relied on the textual data from the segments and used HDBSCAN, a clustering algorithm that converts the data into hierarchies of connected components,209 to organise the segments in a way that would make them more conducive for manual analysis.210 The manual analysis entailed the inspection of the text segments from the resulting 1768 clusters by the researchers, who used prior literature on dark patterns, impulse shopping and news articles of high pressure marketing techniques to develop a codebook of possible dark patterns, and relied on textual data, screenshots and website visits to annotate the data. The researchers found a total of 1818 dark pattern instances, split across seven categories and 15 types. While this approach to dark pattern detection is only semi-automated, Mathur et al.’s study resulted in a dataset linking text segments data to dark pattern subtypes; this could be used in supervised learning approaches, as the following section shows.
As the section introduction notes, the detection of unlawful dark patterns could be framed as a classification task.
Yada et al.’s 2022 paper,211 which sought to automate the detection of Mathur et al.’s dark patterns using NLP approaches, represents the first – and, to date, only – attempt to automate the detection of Shopping dark patterns using machine-learning methods. The researchers extended the Mathur et al. dataset using textual segments unrelated to dark patterns sourced from shopping websites, as shown in Fig. 4, and approached the detection of dark patterns as a binary (dark-pattern and non-dark-pattern) problem. They report high accuracy (over 90%) for all of the tested approaches.
Fig. 4: Examples of dark pattern and non-dark-pattern texts from Yada et al.212
Approaching the detection of Mathur et al.’s dark patterns as a binary text-classification task may not be entirely sound, and presents some limitations when viewed through the lens of market monitoring. First, the text-classification task takes as its input a textual segment from a single screen, whereas some dark patterns, like Hidden Costs, are interactive in that they present across various screens. Second, for some dark patterns (e.g. all of the dark patterns in the Sneaking category),213 the Mathur et al. dataset contains very few examples, which may make it unsuitable for generalisation. Third, the binary dark-pattern classification is not useful for market-monitoring purposes, as not all dark patterns map to legal violations.
Researchers have also attempted to uncover Privacy dark patterns and GDPR violations in cookie banners using machine-learning approaches. While it may not be possible to entirely or readily transpose the methods employed in these studies to the detection of Shopping dark patterns, some of those methods and the challenges associated with them are nevertheless informative for the task of detecting Shopping dark patterns that are potentially unlawful.
In the context of the DAPDE research project, Hausner and Gertz conducted an exploratory study in which they automatically extracted 2800 cookie banners and their clickables, along with text and CSS-style information of the clickable elements, from German websites.214 The researchers manually labelled the extracted data for a machine-learning (Support Vector Machine) algorithm to automatically distinguish between different ‘accept’ and ‘reject’ button types.215 Fig. 5 presents the outcome of their approach on a web page where the dissimilarly visualised user choices are highlighted. The researchers state that ‘the implemented framework is powerful enough to detect cookie banners on a wide range of web pages’, but the paper does not contain data on the performance of their approach.216
Fig. 5: Hausner and Gertz’s automated cookie banner analysis217
Other studies have sought ways to enable the detection of a wider range of Privacy dark patterns in cookie banners. Kocyigit et al. published an exploratory study attempting to identify measurable features for dark patterns that are commonly found in cookie-consent processes.218 The main motivation for this work was the lack of objective and measurable criteria in prior studies to identify dark patterns in online services, whereas machine-learning-based systems require structured data in the training process.219 The researchers proposed 31 features of cookie-consent processes that could be useful to automatically recognise dark patterns.220
Soe et al. (2020)221 manually collected cookie-consent notices from 300 Scandinavian news outlets and labelled their discerning features using the Gray et al. dark pattern taxonomy (discussed in Chapter 3).222 The dataset contains the following 10 features: website name; widget equality (design differences between accept-all/reject-all options); the labels of ‘not yes’ options; pop-up element location; being able to keep using the website while the pop-up is active (content blocking); number of words in the cookie banners; number of clicks required to reject all consent; whether the website lists the purpose of the cookies; existence and content of third-party cookies; and whether the website works after rejecting all cookies.223 The most significant challenge the researchers reported in the process of compiling this dataset was the considerable discrepancy in determining the presence of a given dark pattern. The researchers state that the low inter-rater reliability is unsurprising given that the description of dark patterns in dark pattern taxonomies like Gray et al.’s is insufficient for their characterisation. They posit that, for the automated detection of dark patterns to succeed, the concept needs to be further refined context-wise in such a way that the features that characterise it are clearly identifiable and these characterising features are easily computer-detectable. Guribye et al.224 reflect on their experience annotating the Soe et al. (2020) dataset and posit that ‘regulation and artificial intelligence (AI) solutions need to meet each other half way’.225 More specifically, the researchers call for regulation to ‘exact and standardise the consent elicitation design to a form that can be characterised and whose infringement can be AI detected’.226 A further study by Soe et al. (2022)227 used the manually labelled dataset made available by Soe et al. (2020) and trained several (one for each dark pattern) classifiers that identify whether a cookie banner contains a dark pattern of the Gray et al. typology.
Fig. 6 shows the accuracy score for each of the predicted dark patterns.
Fig. 6: Accuracy scores reported by Soe et al. (2022)228
The researchers report that while the accuracy of the trained models is promising, it also leaves a lot of room for improvement. They note that their results were affected by the very low number of examples of certain dark patterns, as well as the wide variability of cookie banner designs, which meant that their analysis required a significant amount of manual labelling. Relatedly, they note that annotators experienced difficulties in determining which dark pattern was present in a particular cookie banner;229 in other words, dark pattern detection is difficult for AI because it is also difficult for people.230 They echo Soe et al.’s (2020) call for ‘a better, more context specific, definition [...] to [...] eliminate this human labelling uncertainty problem’ in regulatory initiatives (related to data protection), which at present feature few legal rules constraining the use of dark patterns and engaging with UI design.231
Santos et al. (2021) manually annotated 400 cookie banners found on popular English-language websites. They focused on the purposes of cookie banners and how these were expressed (i.e. using misleading or vague language, technical jargon, framing), and mapped these characteristics to ePD and GDPR violations.232 The researchers report that experts may find it challenging to parse banner text and map it to generic and hard-to-operationalise legal requirements. Van Hofslot et al. tested the performance of large, pre-trained language models for the classification of (discrete) legal violations on the dataset annotated by Santos et al.233 The obtained results show that ‘using a state of the art classification model off the shelf or with minimal fine-tuning will not yield reliable results’, which, according to the researchers, suggests a need for more data annotation in this domain and for models specifically trained or fine-tuned on the task at hand.234 They also note that the small size of the dataset entails that even where the performance metrics suggest reasonable performance on certain classes, they cannot consider the models to be robust enough to be deployed in real-world settings.235
Where does the state of the art in terms of automated data analysis leave us? Researchers have been able to link various kinds of data extracted from websites - textual and design elements such as clickables and their style information (e.g. size and colour) – to dark patterns and legal infringements by using heuristic and machine-learning methods. While most of these efforts have been directed at Privacy dark patterns and data protection law infringements, and only a few projects target Shopping dark patterns specifically, these studies nevertheless illustrate that there is definitely merit in using computational methods to detect dark patterns in website data. Further, some methods developed by researchers can already be built on and repurposed for the task of consumer law infringement detection. For example, DAPDE’s heuristics-based methods could be enhanced by searching for typical JavaScript patterns linked to fake Scarcity and Urgency messaging in the page source. The code could also be altered to revisit a page using a Countdown Timer to check whether the offer is still available upon expiry or whether the timer resets upon refreshing the page. These are examples where the legal framework is relatively clear: as we saw in Chapter 6, the UCPD is rather critical of outright falsehoods.
Nevertheless, infringement detection is a difficult technical problem, not least because of the uncertainties created by the legal framework. Researchers’ use of heuristics that rely on arbitrary thresholds and the challenges reported in the manual labelling of dark pattern-related datasets reflect the difficulty of deriving measurable standards for the ‘consumer-friendliness’ of an UI design from technology-neutral legal frameworks. This means that even if a particular feature related to a dark pattern may be extracted from an interface (e.g. the size/colour/label of a cancellation button), given its indeterminate legal treatment, it is not possible to establish computationally whether an infringement has occurred.
The lack of concrete technical benchmarks for lawfulness could help explain why so far researchers have not attempted the detection of some dark patterns like Hidden Costs, Hidden Subscription and Hard to Cancel. When are Hidden Costs substantial? When is the presentation of subscription-related information sufficiently prominent so as not to mislead consumers, and what kind of label on an order confirmation button would indicate explicit consent to an auto-renewing subscription contract (Hidden Subscription)? How many steps would make a Hard to Cancel subscription too hard to cancel? More direct interface design regulation (e.g. standardising cancellation procedures by introducing uniform buttons, standardising information presentation and introducing explicit bans of certain practices, as suggested in Chapter 7) could address these questions. Some other challenges remain, however.
Multilingualism in the EU-27 market is also a challenge in the context of data analysis, and especially approaches relying on NLP methods. English is the most widely researched language in NLP, and the resources available in other languages are not as accurate or rich.236 Recent work in the domain of the detection of unfair contract terms shows, however, that there may be merit in attempting to automatically transfer annotations made on English-language texts to other languages,237 and, generally, large language models are making machine translation easier and easier.238
Further, like automated data-collection frameworks, methods for automated data analysis are too susceptible to adversarial behaviour; website operators may alter the source code of a website to avoid the detection of questionable practices.239 For example, the investigative group ProPublica developed a tool for Facebook users, which they could use to collect the ads they were seeing and details about why they were targeted, in order to build a database of political ads and populations advertisers were targeting. To recognise ads, the tool searched Facebook’s HTML for the word ‘sponsored’. Facebook added invisible letters to its HTML so the word registered as ‘SpSonSsoSredS’ to the tool; it also started marking other content as ‘sponsored’.240 This points to the need to update tools to keep abreast of evasive tactics.
Excluding intentional attempts to undermine existing detection approaches, new unlawful dark patterns may emerge in the future and existing detection methods may become deprecated. Implementing design standards (e.g. an EU-wide standardised cancellation button) or explicitly prohibiting certain vectors of manipulation (e.g. the use of colour or font size to obscure the presentation of information) could ensure that (at least) some aspects of harmful innovation are prevented due to the uniformisation of design.
Still, even if regulation were to become more technology-specific, no computational method for the automated detection of potential infringements is entirely accurate and foolproof.241 As I discuss in more detail in the following section, this could give rise to concerns about potentially unjust outcomes if authorities were to rely on computational tools in their investigations.
Now that we have taken stock of what computational methods can and cannot do for the automated detection of unlawful dark patterns on e-commerce websites, it is a good time to look at the bigger picture of what it means to regulate the use of dark patterns in digital consumer markets effectively, in both legal and technical terms (8.4.1), and what lessons policymakers and enforcement authorities could extract from my investigation (8.4.2).
In the previous chapters we saw that the effective regulation of socio-technical artefacts like dark patterns is not an easy task. While the EU already has a system of protection – the UCPD and the CRD – which is intended to guard both online and offline consumers against traders’ exploitative conduct, these technologically neutral instruments have not been able to address the proliferation of dark patterns on e-commerce websites to date. Non-compliance abounds in digital markets. In Chapter 4, I have theoretically positioned non-compliance with technology regulation as a symptom of technology-neutral policies that may not be conducive to legal certainty, and in Chapter 6 I have shown that, when applied to dark patterns, the UCPD and CRD leave significant room for interpretation. Room for interpretation may be a good thing in terms of future-proofing technology regulation, but it may be bad news for consumer-friendly user interface design. As we saw in Chapter 2, digital product design is a process that is influenced by technological, economic and organisational considerations. Leaving the interpretation and concrete translation of technology regulation into user interface design to well-resourced technologists could lead to the hijacking of consumer interests by private, profit-driven goals. Meanwhile, less-resourced market players may be unsure of how to perform this translation to further consumer protection, and indiscriminately buy third-party products that may not be compliant, or which they could not modify even if they wanted to. Drafting effective regulation for consumer-facing digital products such as user interfaces may, therefore, require policymakers to engage more closely with the design of these products by regulating in a more technology-specific way. As Sinders and Pershan put it, ‘tech policy is not just about rules; it must involve design considerations to render regulation or transparency requirements valuable and actionable’.242 The philosophy of regulation in digital markets may need to change, as Kollnig argues.243
However, whatever policy we have, we need to be able to enforce it to guarantee consumer rights; companies could choose not to comply with the law because they see it as toothless. Enforcement starts with being able to detect potential infringements of consumer law. As the chapter introduction explained, this is a gargantuan task for the public authorities policing digital markets. This is attributable not only to the heightened business-to-consumer informational asymmetries, which make it less likely that consumers will notice and complain about the use of dark patterns, but also the sheer scale and dynamicity of digital markets. Computational tools may provide a way out of this conundrum. This chapter has shown that, from a technical perspective, it is possible to automate the detection of some unlawful dark patterns on shopping websites. At the same time, however, the way we design technology regulation has a bearing on the technical feasibility of developing methods like this. Technology-neutral regulation goes hand in hand with design variability, which significantly complicates the collection of infringement-related data. It also entails design volatility, which is not conducive to the development of durable infringement detection methods. Technology-neutral regulation could also render some data inaccessible due to websites operators’ freedom to structure websites as they wish, and even where infringement-related data can be collected, the lack of measurable indicators of unlawfulness could impede the development of data analysis methods. More technology-specific substantive regulation in the form of design standards, and the prohibition of some harmful design practices, could help in this regard. In digital markets, technology specificity may therefore be a demand of both effective policymaking and effective policy monitoring.
What could policymakers learn from this? More technology specificity does not mean only technology specificity. As Chapter 4 explains and the introduction reiterates, both technology neutrality and technology specificity have potential benefits and drawbacks. Combining technology-neutral, principle-based and technology-specific rules is likely to be the best bet in a regulatory environment with regulatees who have various capacities and appetites for compliance. Technology-neutral rules also protect against gaps in legal protection that may occur as a result of changing landscapes of harm. In other words, as I argue in Chapter 7, the technology-neutral shape of the UCPD’s core prohibitions may be better left as is for now. This does not mean that the system is not ready for more incremental changes that would imbue it with more technology specificity, however.
What could these look like? As Chapter 7 shows, a general ban of dark patterns such as that laid down in Art. 25 of the Digital Services Act (DSA) may be an undesirable policy direction in the absence of a uniform definition of dark patterns, as it may not be very helpful when it comes to distinguishing between lawful and unlawful dark patterns. Further, this definition may never crystallise, as innovation in the e-commerce sector is a continuous process. We could instead regulate dark patterns in an (even) more technology-specific manner. Technology-specific policy could weed out some undesirable user interface design choices through prohibitions (e.g. prohibiting the placing of material information in drop-down menus) and prescriptions (e.g. requiring that all information be presented in the same font, size and colour on a particular page); provide uniform design standards (e.g. standardised presentations of material information or cancellation buttons); or also provide uniform, mandatory code for the implementation of these standards. These options reflect varying degrees of intrusion into traders’ freedom to conduct business, and the choice of the appropriate level is a question for thorough cost–benefit analysis.
The acid test for the effectiveness of user interface regulation is, however, more likely to be the policy’s adaptability than its level of technology specificity. As we saw in Chapter 4, socio-technical change scholars teach us that technology regulation is at continuous risk of regulatory disconnection. Technology will always change. Combining technology-specific and neutral rules may reduce this risk, but does not eliminate it entirely. This concerns how our regulatory environment operates. Given the complexity of designing effective regulation for socio-technical artefacts, policymakers could get it wrong, or a policy may simply be wrong for its time, and some mechanism for bouncing back from these situations seems necessary. My recommendation in this regard is to start small, by means of an incremental approach to the (technology-specific) regulation of certain dark patterns, and adjust policy as we go. As I argue in Chapter 7, in the case of some dark patterns – Hidden Costs, Hidden Subscriptions and Hard to Cancel – we have undeniable evidence of their potential to cause large-scale consumer financial detriment. In terms of adaptability, Chapter 7 shows that periodic evaluations and legislative reviews as foreseen in the DSA and Digital Markets Act; leaving the specification of some aspects of legislative acts and/or targeted changes thereto to the Commission via implementing and delegated acts respectively; and involving the industry in (co-)regulation through the New Approach could be useful tools in this regard. The last option could provide both effective and efficient solutions in light of the large and growing heterogeneity both of dark patterns and the media through which they can be deployed, but also presents the most challenges in terms of a comprehensive view of ‘good’ regulation. The New Approach has been under attack for its legitimacy deficits since its early days.244 While some issues with this regulatory technique have been addressed in the years since, as Micklitz explains, many gaps and concerns remain, especially surrounding stakeholder participation.245 Recourse to the New Approach is therefore an option that requires careful threading, particularly in light of the harsh criticism from academics246 and civil society organisations247 that the transplantation of the New Approach elsewhere in digital policy (the AI Act proposal)248 has faced.
As for enforcement authorities, even if more technology-specific policy could open the door to developing automated market-monitoring methods, it is important to acknowledge that this does not mean that all practices will become measurable. Computational methods are not a panacea. For example, testing the effectiveness of consumers’ right to termination may require authorities to take out and cancel subscriptions on e-commerce websites and other digital platforms in order to check that the cancellation has indeed taken effect. That being said, the (computational) detectability of some dark patterns may free up valuable resources for the manual detection of other potential infringements.249 Talking about automating infringement detection does, however, require some additional issues to be addressed. As highlighted elsewhere by academics and early adopters of computational tools, the uptake of technology in digital market investigations raises organisational and technological management questions,250 and may expose authorities to legal challenges.251
As the chapter introduction explained, a lot of the technology that authorities could use to surveil digital markets, including e-commerce websites that may be using unlawful dark patterns, does not exist yet. In organisational terms, authorities may therefore need to hire in-house staff who can develop, use and maintain digital-market-monitoring tools.252 As this chapter has shown, neither of these is a trivial task. Reflecting this need, some consumer and competition authorities have started setting up units of technologists to assist in investigative work. In the Netherlands, the Autoriteit Consument & Markt has set up a Taskforce on Data and Algorithms, which is a team of data scientists, engineers and data governance experts that develops tools in-house and has an advisory role in enforcement cases. The Taskforce has developed tools that can detect misleading reference pricing and deceptive Countdown Timers.253 The UK Competition and Markets Authority has also been a pioneer in this respect, having established a Data, Technology and Analytics (DaTA) unit of almost 50 people with backgrounds in disciplines like data science, engineering and digital forensics.254 The DaTA unit develops tools for data collection and analysis, and provides the authority with insights on how technologies work and their legal implications.255
While some authorities are developing their own tools, it is also an option for authorities to purchase software from or delegate the development and maintenance of software for the detection of infringements to private parties. However, adopting a ‘procurement mindset’ in the digitalisation of enforcement could, as Goanta and Spanakis argue, lead to the purchase of inadequate products, jeopardise efforts to develop technological capacity in public authorities by redirecting funds, and increase the opacity associated with decision-making.256 A way to minimise these risks while cutting expenses is for authorities to involve researchers in the development of these technologies.257 This could be achieved by facilitating formal collaborations through public tenders,258 or creating forums for exchanges of expertise between the academia and authorities: the US Federal Trade Commission, for example, co-organises the Technology and Consumer Protection workshop, which is a venue for research using computer science to further consumer welfare.259 Collaborations and tool sharing amongst authorities and the centralised (EU-level) development of tools could also pave the way towards increasing authorities’ digital enforcement capacity at a low(er) cost. The marginal cost of copying code is insignificant.260 Authorities could also collaborate with each other in the development of market monitoring tools. Finally, authorities could be equipped with tools by the European Commission. In 2022, the Commission set up the EU eLab, which is a platform that aims to provide national authorities with a digital toolbox with which to conduct online investigations.261
There are also questions of technological management. Data units need to be supplied with adequate equipment, as the tools they develop rely on computing power that may exceed the abilities of the equipment typically used by authorities;262 as Coglianese and Lai note, many governmental IT systems are old, if not antiquated.263 Some technological management questions may also arise as a consequence of procedural requirements; for example, authorities may need to take proper cybersecurity measures to ensure that evidence cannot be altered once collected. Governments’ cybersecurity capacity has historically been lacking as well.264
In terms of legal challenges, the legitimacy of using web measurement techniques for the purpose of conducting market investigations may be questioned. The Consumer Protection Cooperation (CPC) Regulation grants enforcement authorities wide-ranging powers to collect data from digital market participants: Art. 9(3)(a) empowers authorities to access any relevant documents, information or data related to an infringement of EU consumer law, and authorities can also carry out on-site inspections and enter any premises used by a trader for commercial purposes (by virtue of Art. 9(3)(c)). The CPC Regulation does not, however, explicitly refer to the use of technology in investigations. Goanta and Spanakis argue that web crawling and scraping can be interpreted as falling under either one of these broad provisions.265 The recently adopted General Product Safety Regulation (GPSR)266 is more explicit in this regard, and acknowledges in Recital 59 that market surveillance authorities may use ‘technological tools’ in the exercise of their powers under the Regulation.
There is, therefore, some legitimacy in the EU for market surveillance activities, including activities that involve automated tools. What is less clear, however, is what the consequences of these investigative activities ought to be and, more specifically, how to avoid unfair outcomes (for traders). Digital market monitoring is a form of technological control (of market participants' behaviour, that is). As Brownsword notes, technological control tools exist on a spectrum from ‘soft’ to ‘hard’.267 Technologies that improve regulatory performance by enhancing detection of non-compliance are on the soft end of the spectrum.268 On the hard end, regulators may use technology to prevent, disable or compel certain actions;269 in other words, digital enforcement tools can aim not just at perfect detection, but at perfect prevention.270 Brownsword explains that ‘when code and design leave regulatees with no option other than compliance [...] the legitimacy of the means employed by regulators needs urgent consideration’.271 While the kind of digital market monitoring this book advocates is on the soft end of the technological control spectrum, it may nevertheless give rise to legitimacy concerns. As Goanta et al. explain, ‘[i]njustices occurring in society often go generally unnoticed. However, technology can change that. The automation of legal enforcement could help in unveiling injustice at scale for the first time in human history [...]’.272 Authorities will therefore likely have to grapple with the proportionality of using digital market monitoring tools.273
In this respect, it is particularly important to consider the situation of companies that do not intentionally deploy unlawful dark patterns. As Chapter 2 shows, the platformisation and servitisation of web development and web design mean that smaller, less-resourced market players may unknowingly use non-compliant design elements and may not be fully able to adjust these because of a lack of either technical know-how, or control over third-party resources. The UCPD does not, however, care about the traders’ intentions or abilities. This gives rise to the need to consider whether the legal framework should be applied differently where the presence of dark patterns in an interface arises as a consequence of limited resources.274
Another related question is the degrees and types of errors that are acceptable when authorities use automated tools for market-monitoring purposes. As section 8.2.2 shows, web crawling methods are prone to measurement bias. Heuristics’ failure to generalise and the probabilistic nature of machine-learning methods could lead to detection errors (section 8.3.3). This could result not only in compliant conduct being labelled questionable (Type I errors, also known as false positives), but also non-compliant conduct not being caught (Type II errors, or false negatives). False negatives may lead to questionable market behaviour not being penalised, whereas false positives could lead to the unjustified investigation of market participants’ actions. What value should be attributed to these types of errors and what steps should be taken to prevent unjust outcomes are some other questions that need to be answered, which scholars have also raised with respect to administrative decision-making generally,275 in a computational antitrust context specifically276 and in relation to the criminal justice system.277
Authorities will also need to make sure that the proper procedure is followed to use web measurement methods for investigative purposes.278 Procedural law is not fully harmonised in the EU: for a long time, the EU legislator has confined its work to the harmonisation of substantive laws out of respect for Member States’ procedural autonomy.279 Echoing this principle, while the CPC regulation grants national authorities stronger investigative powers, it also allows Member States to set the conditions for, and impose limits on the exercise of these powers in national law (Recital 19), and stipulates that authorities ought to exercise these powers in compliance ‘with Union and national law, including with applicable procedural safeguards and with the principles of the Charter of Fundamental Rights of the European Union’ (Art. 10(2)). Procedural law does not, however, only present hurdles for digital enforcement; it can also set out the obligations of market players and curtail their opportunities to engage in adversarial behaviour. In legal domains where market monitoring is a more mature practice (product safety) and in recent legal instruments governing digital markets (e.g. DSA), these issues have been addressed via adjustments to the procedural legal framework. For example, the GPSR acknowledges (in Recital 59) that ‘market surveillance authorities are constantly improving the technological tools they use for online market surveillance’ and that ‘[f]or those tools to be operational, providers of online marketplaces should grant access to their interfaces’. Art. 40(1) DSA obliges very large platforms to provide national Digital Services Coordinators or give the Commission, ‘upon their reasoned request and within a reasonable period, specified in the request, access to data that are necessary to monitor and assess compliance with this Regulation’. Additional regulation may therefore be required to give public authorities – and their tools – access to the full commercial infrastructure that sustains digital infringements.280 The opportunities for adversarial behaviour in digital markets may also require a different, more proactive mindset from enforcement authorities. Digital enforcement is a cat-and-mouse game. While this is the case generally for sanction-based (rather than compliance-based) enforcement approaches, the pace at which traders can devise new methods to either harm consumers or game market-monitoring tools has changed. Authorities therefore need to keep abreast of technological innovations and their potential to cause harm to consumers, and make sure that the tools they use are up to the challenge. Market-monitoring tools do not offer a one-off magic solution to all of the challenges that enforcement authorities face in digital markets. The market has changed, and now public administration needs to follow.
While the discussion here has focused mostly on the challenges that enforcers interested in going digital may face, it should not discourage this pursuit. All of these challenges can be overcome.281 As with the adoption of any new technology by public authorities, digital-market-monitoring tools require proper planning.282