Counter-archiving Facebook

Abstract

The article proposes archival thinking as an analytical framework for studying Facebook. Following recent debates on data colonialism, it argues that Facebook dialectically assumes a role of a new archon of public records, while being unarchivable by design. It then puts forward counter-archiving – a practice developed to resist the epistemic hegemony of colonial archives – as a method that allows the critical study of the social media platform, after it had shut down researcher’s access to public data through its application programming interface. After defining and justifying counter-archiving as a method for studying datafied platforms, two counter-archives are presented as proof of concept. The article concludes by discussing the shifting boundaries between the archivist, the activist and the scholar, as the imperative of research methods after datafication.

Keywords

Application programming interface, archive, datafication, Facebook, methods

The Open University of Israel, Israel

Corresponding author(s):

Anat Ben-David, The Open University of Israel, The Dorothy De Rothschild Campus, One University Road, P.O. Box 808, Ra’anana 4353701, Israel. Email: anatbd@openu.ac.il

Introduction

In recent years, data extraction was a popular modus operandi in social media research. Platforms’ application programming interfaces (APIs) allowed the extraction of large volumes of data, which were then analysed using various computational, quantitative and qualitative methods to answer a broad set of research questions on the political, societal, economic and technological aspects of datafication (Rieder, 2013). In 2018, this modus operandi had to change, after Facebook – in response to misuse of users’ data as part of the Cambridge Analytica scandal – shut down hundreds of thousands of applications that used its API to extract both public and personal data, including the majority of the tools used by the research community. The revoked access had immediate consequences for media and communication scholars, who could no longer conduct independent, ethical and public interest research into the societal effects of social media, such as the formation of online communities, information disorders, political mobilization or discriminatory practices. Without access to data, researchers also cannot conduct independent auditing or monitoring of the role platforms themselves play in shaping social and political processes, such as the use of political advertising around election campaigns. The revoked access to data extraction sparked methodological debates, as researchers were seeking alternatives for studying massively datafied, algorithmic and computational platforms without the convenience and totality afforded by APIs. Among the methods that have been proposed to address what is now called ‘post-API’ research are collaboration with external companies (Puschmann, 2019), web scraping (albeit violating platforms’ terms of use; Bruns, 2019; Freelon, 2018) or returning to the digital fieldwork to think of new methods (Venturini and Rogers, 2019). This article contributes to this debate by suggesting archival theory as a framework for understanding Facebook as contemporary archons, and by introducing counter-archiving as a method of dissent to platforms’ appropriation of public data after datafication.

On its face, archival thinking is unfitting for studying algorithmic media. Moreover, declarations on the end of theory which have characterized uncritical views of big data also entailed a death sentence to the archive (Bowker, 2014). In the late 20th century, the humanities and social sciences were inflicted by ‘archive fever’ (Derrida, 1996), expressed as a renewed interest in archives, archiving and materialities as foundations of post-modern critique. Twenty years later, this ‘archive fever’ was replaced by an uncritical ‘data fever’ (Agostinho, 2016). The datafication of everything turns everything into an archive. Why bother with appraisal, description and ordering when the data can answer any question?

The justification for applying archival thinking as a framework for studying Facebook (and other data-driven companies) is that these companies meticulously collect user data at unprecedented scale, thereby forming new forms of commercial archives documenting every aspect of human life (Gehl, 2011). Couldry and Mejias (2019) propose the notion of ‘data colonialism’ to describe the monopolization of data collection as a new form of capitalism. They draw parallels between Western colonial powers’ appropriation of natural resources in past centuries, and the contemporary datafication and commodification of everyday life by digital platforms. Following Couldry and Mejias, this article argues that data colonialism is not only manifested in the datafication of personal and social behaviour, but also in the monopolization of the public record. Although the argument may apply to other social media platforms, the article focuses on Facebook, and draws further parallels between colonial archives and the social media platform to argue that by negotiating varying levels of access to their data, and by monopolizing their power to discern between private and public records, Facebook dialectically functions as a new ‘archon’, all the while being unarchivable by design. I subsequently propose counter-archiving as a ‘post-API’ method for studying Facebook. Counter-archiving has previously been conceived as a form of epistemic resistance that questions colonial archives’ hegemonic order, and that calls to understand them as sites of knowledge production, rather than knowledge retrieval (Stoler, 2002). Therefore, in the context of data colonialism, it is proposed to counter-archive Facebook to provide alternatives to the platform’s appropriation of public records, and to critique the epistemic affordances of the data it makes available as public.

To justify counter-archiving as a method, I begin by outlining the potential contribution of archival thinking to the study of social media and contextualize the argument on Facebook’s appropriation of public records in wider discussions on web archiving. Subsequently, I introduce counter-archiving as a method of dissent that borrows from epistemic responses to colonial archives and define how it may be applied as a post-API method for studying Facebook. After providing examples to counter-archives of Facebook as proof of concept, I conclude by discussing the limits of counter-archives as methods that are agonistic by design.

Facebook as archon

Derrida (1996) traces the origins of the archive in the Arkehion of Greek antiquity and refers to it primarily as a space of privilege commanded by archons – superior magistrates acting as guardians of documents. These archives gained their ability to represent the law by being situated at the intersection between private and public spaces: the documents were signed and stored in the private households of the archons, on account of the public recognition of their authority. In providing a post-colonial critique of the concept of the archive, Ariela Azoulay (2011) argues that Derrida’s emphasis on archive-as-place downplays the archon’s power. For Azoulay (2011), the archon’s role is not only realized as the guardian of documents in his domicile, but also as the one in charge of ‘distancing those wishing to enter the archive too early, before the materials stored within would become history, dead matter, the past’ (n.p.).

Both Derrida’s etymology of the archive as a public/private space, and Azoulay’s understanding of the archon’s role in distancing citizens from information that may be of political relevance in real time, are useful frameworks for understanding Facebook as self-appointed archon in the context of data colonialism and post-API research. In this section, I attempt to situate the company’s appropriation of the notion of the public archive in the context of the history of web archiving, and in the company’s recent attempts to brand temporary access to data sets of political advertisements as archives and libraries of transparency.

To web archivists and internet historians, post-API debates are not new. After nearly two decades of archiving large proportions of the open web, these practitioners and scholars were early to notice that social media platforms, and Facebook in particular, are unarchivable. Web archiving differs from data extraction in the sense that it is less concerned with how the data will be used now, and more with how to preserve data for access and use in the future (Brügger, 2012). Digital preservation experts argue that the digital cultural heritage of our times is at risk since digital media are prone to decay and deletion (UNESCO, 2003). Web archiving methods were developed to fight web decay, operating under the premise that the web constitutes an important part of humanity’s public record, and that there is eminent need in its preservation for posterity (Brügger and Milligan, 2018). From as early as 1996, the internet archive and national libraries around the world have been preserving petabytes of archived websites, and continue to do so on a daily basis (Costa et al., 2017). A growing community of researchers depends on web archives as scarce and reliable born-digital primary sources that support historical Internet research, for without web archives, it would be nearly impossible to find online evidence of the web’s past (Ben-David, 2016).

However, both the logic of web archiving and the ability to apply historical thinking in web research collapsed when the majority of the web’s content migrated to commercial social media platforms. Due to the conditions specified in platforms’ terms of use, social media data are no longer in the public domain, and while API access allows data extraction to some extent, archiving Facebook is legally impossible. To address the unarchivability of social media, web archivists began seeking ‘post-API’ workarounds years before Facebook’s API lockout. The solutions that have been proposed resemble the methodological solutions to post-API research described above and include attempts to reach collaborative agreement between the platform and the cultural heritage institution, the use of third-party services, and crowd-sourcing (Hockx-Yu, 2014). Most of these solutions have registered partial success. Most notorious of them is the Twitter archive at the Library of Congress. In 2010, the library had reached an agreement with the social media company to archive every public tweet posted since 2006. The collected tweets were meant to be made available for viewing after a 2-year embargo. After several years of data collection, the initiative did not bear fruit, primarily since the library was unable to find solutions to the copyright and privacy challenges involved in republishing the data (Zimmer, 2015). Other creative examples include the national library of New Zealand’s initiative to create a crowdsourced ‘time capsule’ of Facebook by asking citizens to donate their data (Deguara, 2019), and the Internet Archive’s use of the fictive Facebook account ‘Charlie Archivist’ to archive logged-in Facebook pages of public figures. This account has zero friends, thereby ensuring that users’ private data will not be compromised during the capture. Nevertheless, since current web archiving crawlers cannot fully capture the dynamic content of social media, the eventual capture of the logged-in pages is rather incomplete (see Figure 1).