Research and analysis

The impact of algorithmically driven recommendation systems on music consumption and production - a literature review

Published 9 February 2023

Authors: David Hesmondhalgh, Raquel Campos Valverde, D. Bondy Valdovinos Kaye, Zhongwei Li

All of the School of Media and Communication, University of Leeds, UK

Acknowledgements

The team would like to thank Emily Jarratt and Ben Moore at the Centre for Data Ethics and Innovation, and the following people for their very helpful input and advice: Dan Angus, Fernando Diaz, Ellis Jones, Ariadna Matamoros-Fernández, Richard Osborne, Markus Schedl and Luke Troyner. We would welcome information concerning any key sources and perspectives that we missed in surveying this vast terrain, and can be contacted via our University of Leeds email addresses.

1. Introduction

1.1 Context and remit

This research was commissioned by the Centre for Data Ethics and Innovation (CDEI). The literature review is just one element of research that the CDEI has been charged with undertaking as part of the UK Government’s response to the Second Report of the House of Commons Digital, Culture, Media and Sport Select Committee’s Inquiry into The Economics of Music Streaming.

The Select Committee reported concerns about a number of issues regarding recommendation systems on music streaming platforms (quotations from DCMS Select Committee 2021, p.79):

  • “several submissions” to the Inquiry “warned that algorithms, as with any recommendation system, could reflect biases that may subsequently reduce new music discovery, homogenise taste and disempower self-releasing artists”

  • written evidence submitted by many creators was “critical of the opacity of algorithmic curation, and called for greater oversight”

  • an announcement in 2020 by Spotify that “companies would be enabled” to pay for the music to which they hold rights to be promoted, in return for Spotify retaining a percentage of the compensation due to those rights-holders

  • that UK artists might be disadvantaged by streaming platforms’ use of algorithms based on “global play counts”, given that “territories with relatively larger populations can skew towards their own domestic artists”, meaning for example that UK artists might gain less attention than US artists

Expressed in written and oral evidence submissions, as well as the final Report, these concerns echo some of those articulated in the lively public debates that have been taking place in recent years, in the UK and internationally, about the impact of streaming platforms on musical production, consumption and culture. Anxieties about “algorithms” have been a regular feature of these debates.

In the light of such concerns, the Select Committee report (which discussed many other concerns about music streaming besides algorithms) recommended that “the government should commission research into the impact of streaming services’ algorithms on music consumption” (p.79). In its formal response to the report, the government agreed with this recommendation. Noting the publication of the CDEI’s 2020 report on bias in algorithmic decision-making (CDEI 2020)[footnote 1] and the discussion of recommendation algorithms in the Music Creators’ Earnings research commissioned by the UK Intellectual Property Office (Hesmondhalgh et al. 2021) the government asked CDEI to take a lead on research related to the recommendation.

In commissioning this literature review as the first stage in that research, the CDEI asked us to include in our considerations how and to what degree existing research has addressed a number of issues relevant to the concerns above, with a focus on “algorithmically-driven music recommendation systems”, in particular:

  • the question of “bias” in music streaming algorithms: how might different groups of artists and consumers be affected by algorithms?

  • the question of diversity: positive and negative impacts of algorithms on musical diversity

  • questions of transparency, opacity and oversight

These are therefore the main issues we seek to address here. A distinctive feature of the review is that we seek to put academic computer science research and “critical” social science and humanities research (notably a sub-field known as critical algorithm studies) into dialogue with each other, to a much greater extent than has been evident in existing scholarship.[footnote 2]

1.2 Key findings of this literature review

Throughout, MSP = music streaming platforms; MRS = music recommender systems.

  • we identify two main bodies of academic research that pay sustained attention to algorithmic recommendation in the realm of culture: a) academic computer science and b) critical social science and humanities research on socio-technical systems, especially critical internet studies and critical algorithm studies

  • each of these bodies of research have their own strengths and weaknesses. Academic computer science is solution-oriented, relatively clear in its accounts of the functioning of recommender systems, and rigorous in its pursuit of evidence. But it often seeks solutions in primarily technical terms, ignoring or downplaying complex social and cultural aspects of problems. Critical researchers explore potential problems in a way that often goes valuably beyond the assumptions and rhetoric of many computer scientists, policy-makers and tech businesses. But they often provide very little insight into how recommender systems operate and, partly because of limited access to organisations and data, little concrete evidence to back up their claims

  • in relation to music recommendation, there has been very little collaboration between these two types of research

  • there is very little publicly available research that examines in any depth the actual design and operation of music recommender systems (MRS), and their use by music streaming platforms (MSPs), as opposed to idealised models and experiments

  • there is very little sustained, publicly available research that examines the impact of MRS on music markets and experience, and how consumers/users of MSPs understand systems of recommendation

  • research published by academic computer scientists and employees of tech companies that seeks to consider problems of MRS makes frequent use of the concepts of fairness and bias

  • research from the social sciences and humanities has pointed to some potential limitations of these concepts, including potential confusion between technical and ethical understandings of the term “bias” and a problematic implication that “unbiased” or “balanced” results can realistically be achieved in complex cultural domains such as music taste

  • critical researchers tend to focus instead on questions of structural (in)justice and (in)equality, though the way in which these ideas are used is not always thoroughly explained

  • academic computer science research on MRS provides contributions that could potentially be useful in bringing about better understanding on the part of citizens and policy-makers about the concerns voiced in the DCMS Inquiry and submissions. In particular, it can provide information about problems and distortions in data sets and the role of MRS in reinforcing or amplifying them

  • given the concerns expressed in the Inquiry, the most relevant forms of “bias” identified in the literature in relation to music recommender systems (MRS) are a) “popularity bias”, a supposed tendency for MRS to favour items that are already popular, thus reinforcing or amplifying the success of the most successful artists and companies, which might also potentially limit the access of audiences to a wider diversity of music; and b) biases based on demographic characteristics, whereby music produced by artists belonging to certain categories (such as men) are favoured at the expense of music produced by others (such as women), which in turn may mean that the tastes and preferences of some users belonging to certain categories might be favoured over others. For critical researchers, this raises questions of justice and inequality

  • the demographic category that has been most addressed in academic computer science research is gender. There is some work on nationality/national location

  • other “biases”, for example those concerning distortions regarding data and systems in relation to ethnicity, social class, sexuality, age and able-bodiedness/disability of artists and users seem not to have been the focus of sustained analysis

  • a further concept that is relevant to concerns about algorithmic music recommendation is diversity. Some of the computer science research casts light on anxieties about homogenisation and fragmentation of musical consumption and taste. Social science and humanities research may help clarify distinctions between different kinds of diversity

  • computer science research on bias and diversity has tended to depend on simulations based on relatively limited datasets, or in some cases the use by researchers employed by MSPs of the company’s own proprietary data, which is not available to other researchers. Publicly available research on these topics, based on user studies and/or large scale online experiments, (where real users interact with MRS) is rare

  • problems of algorithmic opacity, transparency and accountability have been very widely discussed, often using the “black box” metaphor, and some researchers have identified simplifications and distortions in some uses of these terms. However, a more qualified and sophisticated notion of opacity may be appropriate for understanding the current way in which MRS operate

  • music is a commercial system and there is very little democratic specific oversight of the music industries, beyond copyright law and practice. There are some fragmented developments in regulation that are relevant to MRS, but more work needs to be done to develop policy thinking in this area

1.3 Key terms: algorithms, music streaming platforms and music recommender systems

The term “algorithm” has entered into everyday discourse but is potentially confusing and misleading. For most users of the term, including musicians, music industry professionals and music fans, it now refers rather vaguely to a computer-driven entity that shapes our experience of computing in mysterious ways. Technically, algorithm is merely a term for any process or set of rules to be followed in performing an operation or in solving a problem, especially by computer. But the algorithms that tend to be of greatest public concern today, such as those that lie behind Google’s search engines, Facebook’s News Feed or Spotify’s playlists, are much more complex than that definition indicates. They are constantly adjusted by people working for organisations that seek to create profit in order to pay shareholders and creditors, and that goal inevitably shapes how MSPs operate.

Algorithms, then, are better thought of as “algorithmic systems” (Seaver 2015), and in relation to culture (Seaver 2017: 1) - “part of broad patterns of meaning and practice”. The challenge is to understand how humans, machines and culture interact in creating and maintaining these algorithmic systems.

There are many different ways in which algorithms are now used to shape everyday online experiences (Algoaware.eu 2018: iii). These include: retrieving information via search engines, filtering out unwanted material (e.g. spam emails), producing content (e.g. automatically creating news stories) and providing lists of potential options for consumption via recommendation algorithms.

Many different kinds of algorithms are used in ways that are relevant to the production and consumption of music, but the focus of recent debate about the role of “algorithms” in music - including the concerns expressed in many submissions to the DCMS Inquiry - really seem to concern the music recommender systems (MRS) operated by music streaming platforms (MSPs). This no doubt is why, in commissioning our review, the CDEI specified the impact of algorithmically-driven recommendation systems on music consumption and production, rather than just using the hopelessly vague term “algorithms”.

The major music streaming platforms (MSPs) in the UK at time of writing are those operated by YouTube, Spotify, Apple, Amazon, Deezer, Tidal, TikTok and Soundcloud.[footnote 3] The rise of these platforms has been an important part in a revival in the fortunes of the recorded music industry since around 2014, following the crisis (from around 2000 onwards) associated with the digitalisation of music, which had allowed easy and “free” downloading and sharing of music that, in a previous era, might have been paid for by purchase of recordings.

As already indicated, these streaming platforms have been controversial internationally, for a variety of reasons addressed in the DCMS Inquiry Report and the IPO’s Music Creators’ Earnings research. In these controversies, concerns about “algorithms” (i.e. with music recommendation) have often been blurred - including in many submissions to the DCMS Inquiry - with concerns about playlists, a key way in which these MSPs organise the musical experience of their users. We clarify the relationship between playlists and recommender systems below.

Because of the centrality of recommendation in recent concerns and debates, this literature review focuses heavily on computer science research about music recommender systems (MRS). But because we are dealing with algorithmic systems, and therefore with the interactions of people, institutions and technologies in the cultural domain of music, we also draw on relevant social science and humanities research. As we will show, MRS can and should be understood as a sub-set of “cultural recommender systems” - recommender systems that shape and guide people’s consumption of information and entertainment (news, audio-visual content and so on).

1.4 Why music recommendation matters

While the issues surrounding music recommendation may not be as urgent as those raised by critics of applications of algorithms in the realm of domains such as criminal justice, credit and insurance (e.g. O’Neil 2016; Eubanks 2018), or the way in which algorithms used in search engines reproduce and reinforce racism (Noble 2018) nevertheless they matter hugely for musicians, people who work in the music industries, and those who enjoy music. MRS, and more generally the cultural recommender systems of which they are a prominent example, influence important aspects of our experiences of the world as audiences and consumers. They have a strong impact on which products and producers succeed and which fail, thereby shaping the lives and livelihoods of musicians and music industry professionals. Together, these factors also mean that MRS help shape cultural identity, and the crucial issues of whose voices and experiences get heard in society.

The implications of music for cultural identity go far beyond lyrics or the public statements of artists. Just as important for questions of identity and recognition, if not more so, are the distinctive musical ways in which the experiences and values of different groups are articulated in rhythms, textures, harmonies and melodies (Gilroy 1993: 72-110; Frith 2007). Music has been, and remains, a vitally important form of expression and identity for nearly all social groups (Manuel 1988). While some music originating in North America and Europe, whether classical or popular, has been disseminated across the world, many rich musical traditions remain unheard or unknown beyond the places where they flourish. As we shall see, there is an ongoing debate about whether this is changing, as the music produced in Latin America and Korea, or by companies and musicians originating there, begin to achieve greater international success. MSPs and MRS clearly have a key role to play in whether music circulates beyond its country of origin.

The way that MSRs, and the MSPs they are often embedded within, shape musical production and consumption should be seen as the latest stage in a long history of how the music industries and associated media technologies have shaped musical production and consumption - a process often known as “mediation”.

A key difference between older forms of mediation and those apparent in MSPs is the automated and personalised nature of MRS. According to some critics, a further novel feature is the greater opacity that arises from the computerised nature of automated and personalised recommendation. As we explain in the next section, this mix of change (computerisation and automation) and continuity (musical production and consumption have long been industrially and technologically mediated) means that an examination of relevant research needs to extend across a variety of disciplines.

2. Technical and critical approaches to recommender systems: computer science and beyond

2.1 Explanation and justification of the bodies of literature reviewed

  • our main concentration is on academic computer science research where there is a considerable body of research on recommender systems, including work that specifically focuses on MRS

  • we also refer to relevant neighbouring fields in computer science such as those of information retrieval, including specifically music information retrieval (MIR), which is concerned with how to identify and classify music - thereby providing important resources that MRS draw upon

  • research published by academic computer scientists and employees of tech companies on problems of MRS (and cultural recommender systems) is strongly focused on goals such as relevance and user satisfaction, and problems tend to be conceptualised in terms of fairness and bias

  • for reasons already indicated above, i.e. because algorithms need to be understood as algorithmic systems, involving interactions of people, organisations and technologies, any adequate approach to recommender systems (whether in general or in cultural domains such as music) needs to draw on social science and humanities research

  • in a context such as the present one, where we are dealing explicitly with problems, we need to draw in particular on “critical” social science and humanities research, so-called by its practitioners precisely because it seeks to identify and understand problems with prevailing systems, structures, practices and values.

  • “critical” does not always mean “negative” or “pessimistic” or “disapproving”, as it often does in everyday life. Many researchers who would describe themselves as critical in the social science and humanities sense of the word offered nuanced appraisals of cultural developments, encompassing merits as well as faults, and recognising complexity

  • such critical research offers the potential to go beyond other conceptualisations of cultural processes and technologies - including the focus on user satisfaction and bias just identified - that are prevalent among computer scientists, policy makers and platform businesses. Research focused on user satisfaction, bias and, in the case of recommender systems, “relevance”, often concentrate on technical solutions at the expense of economic, social, organisational and cultural factors. We especially emphasise here the domain now often known as “critical internet studies” and “critical algorithm studies”[footnote 4]

  • researchers in these intertwined fields have backgrounds in disciplines such as sociology, anthropology, law, and media and communication studies. Among the problems and concepts that feature in their research are “datafication” (the tendency to turn more and more aspects of social and cultural life into data), privacy, surveillance, the increasing power of information technology companies, and a perceived lack of transparency, accountability and oversight in the realm of IT[footnote 5]

  • rather than bias, the overarching normative frameworks underlying such critical work tends to be concepts of (in)justice and (in)equality

  • we expand on these comments in the rest of the section, and emphasise the importance of understanding MRS as recent examples of a long history of how music is mediated

2.2 Computer science

Inevitably, much academic computer science is oriented to solving problems related to the efficiency, accuracy and functionality of technical systems. As the editors of an authoritative handbook on recommender systems write, “Recommender systems research, aside from its theoretical contribution, is generally aimed at practically improving industrial RSs and involves research about various practical aspects that apply to the implementation of the systems” (Ricci et al. 2015: 17). They also note that most recommender systems are examples of “large scale usage of machine learning and data mining algorithms in commercial practice”.

In the case of recommender systems, a key term used by academic computer scientists and (apparently) tech company product teams is relevance, defined in terms of whether users find recommendations appropriate to their tastes, preferences and circumstances. Relevance is valued because of its potential to encourage engagement, user satisfaction and ultimately business success. These terms of evaluation are measured statistically by computer scientists, and often on the basis of “implicit feedback” - i.e. user behaviour, rather than on the basis of qualitative or quantitative studies of what users actually think and perceive.[footnote 6] By contrast, as we have already seen, critical research tends to focus on the implications of computer systems for society and culture as a whole, and/or for marginalised and oppressed groups.

However, within computer science there is a significant body of work that seeks to investigate ethical problems associated with computer systems, and the principal terms used in computer science to explore concerns about the problems of algorithms tend to be fairness, accountability and transparency.[footnote 7] The opposite or antonym of fairness tends to be conceived of as bias. In recent years, this term has also become widespread among many policy-makers and tech company product teams.[footnote 8]

The prevalence of the term “bias” in research that purports to deal with problems of computing systems, including recommender systems, undoubtedly derives from its use in statistics, to refer to some kind of systematic distortion, for computer science inevitably draws heavily on statistical methods to provide quantitative assessment of processes and outcomes. This technical understanding of bias no doubt makes it attractive as a concept to policy-makers who hope for ways of precisely measuring outcomes, in order to mitigate the subjectivity and uncertainty that always surround complex social and technological processes. In the IT industries, efforts to develop various techniques to identify and counter bias have intensified in recent years in the light of prominent media coverage of problems emerging from complex AI systems. Such techniques can be applied at various stages in a system: pre-processing (i.e. in gathering datasets), in-processing and post-processing (CDEI 2020: 169-70). We discuss some bias mitigation strategies in the context of MRS below, in section 4.

Many critical researchers believe that the notions of bias and fairness currently used in policy and industry are limited. Critical internet studies challenges what it sees as narrow understandings of problems associated with AI and algorithms, including the focus on measuring bias and fairness in computer science, policy and industry. Whittaker et al. (2018: 25) for example, write that some of the problems that emerging algorithms and statistical techniques seek to diagnose and mitigate have “deep social and historical roots, some of which are more cleanly captured by discrete mathematical representations than others”. They also argue that “different mathematical fairness criteria are mutually exclusive” and that observational fairness methods may “provide a false sense of reassurance” (27). More generally, critical researchers tend to use terms such as (in)justice and (in)equality to assess how well digital systems perform for society and culture as a whole, and/or for marginalised and excluded groups, finding bias and fairness inadequate for capturing the systemic and structural ways in which injustice operates.

In pushing back against the focus on bias and fairness, critical internet studies often questions optimistic views concerning the potential of information technology to bring great benefits to economies, societies and cultures. These views began to develop in the 1950s and arguably reached their peak in responses to the rise of the internet in the late 1990s and 2000s (Streeter 2011). At its most extreme, such “techno-optimism” or digital optimism sees technical solutions as superior to ones available through democratic deliberation. Some would claim that publications such as Wired magazine represent the apotheosis of such views.

Critical internet studies researchers have pointed to certain conceptual problems in the notion of bias. In doing so, some draw on older debates about the concept in critical social science. The idea that the media are “biased” for and against certain groups and political affiliations seems to have arisen in the mid twentieth century, and has been the basis of accusations from the left and right of the political spectrum. Yet as media researchers eventually pointed out (Hall et al. 1976; Hackett 1984), invocations of bias were a shaky foundation for evaluation of news and other media products. According to this critical view, the concept of bias ran the danger of simplifying the kinds of social realities that might be represented by media such as newspapers and television, by implicitly reducing the factors and elements involved, so that a supposed bias towards a group or issue could be corrected by adjustments to reporting procedures and content, so that “balance” or “impartiality” might be achieved. While some examples and forms of journalism can be deemed more truthful, accurate, careful or rigorous than others, the idea that any reporting could be free of inclinations or points of focus, or without any conscious or unconscious underlying values, became increasingly discredited. It may be that the extensive reliance on the concept of bias in digital policy may run into similar dangers of naivety regarding the problems of how social realities can be represented.

The way that the term “bias” is used in computer science, and therefore policy, tends to merge technical and ethical issues. Garcia-Gathright et al. (2018), computer science researchers seeking greater clarity about the concept of bias, note that, in machine learning contexts (as pointed out above, machine learning is fundamental to recommender systems), the term “bias” is used in divergent ways, “as unfair discrimination, or as a system having certain characteristics, some intended and some unintended”. They rightly point out that any dataset is “biased” in the latter interpretation. For them, “[t]his means we need to distinguish between intended and unintended/unfair outcomes”. But intention is moot - what matters more perhaps are outcomes, and the more important distinction may be between inevitable and avoidable bias, and the latter draws attention to the issues of justice and equality stressed by critical research. Ethicists Friedman and Nissenbaum (1996) come closer to this in their emphasis on systemic discrimination in their early definition of “computational bias”, quoted by Garcia-Gathright et al.: “Discrimination that is systemic and unfair in favoring certain individuals or groups over others in a computer system”.[footnote 9]

A further key distinction concerns the fact that overviews of bias from computer science research tend to be oriented to considering sources of bias, rather than outcomes. Olteanu et al. (2019) define data bias as “a systemic distortion in the data that compromises its representativeness” and identify biases introduced at different stages of data gathering and use (user biases, societal biases, data processing biases, analysis biases and biased interpretation of results). Baeza-Yates (2016) offers a related taxonomy, again emphasising stages and sources, but focusing more on the interplay between biases embedded in data, and those arising from how algorithms then act on that data. Their categorisation differentiates the following: activity bias arising from the characteristics of certain sub-groups of users, or unequal access on the part of users, data bias, sampling bias, algorithmic bias, interface bias and self-selection bias. But explicit normative discussion of what constitutes biased outcomes seems rare.

The terms “bias” and “biased” appeared in many submissions to the UK DCMS Inquiry in relation to the operation of algorithms in the realm of music. A remarkable number of submissions, for example, called for “oversight of platforms so that algorithms are not biased, and provide equal access to the streaming market for all artists, songwriters and performers”.[footnote 10] But the idea that computer systems might provide equal access to all musicians needs unpacking. Such a goal may involve similar simplifications to those involved in the idea that news reporting can ever be fully impartial, balanced or unbiased in its coverage of complex realities.

We raise these problems regarding the concept of bias as a caution about examining the problems of algorithmic recommendation too much through that lens. However, because it is such a key concept in the computer science literature (and it seems among many of the computer scientists and software engineers working for MSPs), we are compelled to use the term “bias” in this report. Moreover, in spite of its limitations, some of the computer science research using this term can cast light on some significant problems with algorithmic recommendation, as we show in what follows. The same may be true of the related concept of “diversity”, which we address in section 5 below.

2.3 Critical algorithm studies

What might critical internet studies contribute to our understanding of problems concerning automated cultural/musical recommendation, beyond the criticisms of bias already mentioned? The term “critical algorithm studies” is sometimes used to describe one important strand of critical internet studies. As already indicated, critical social science and humanities research on technological systems, including digital ones, tends to be focused on implications for society and culture, with an orientation towards questions of justice and equality, rather than technical ones. This sometimes includes concerns about the power of tech corporations in contemporary society (Zuboff 2019).

What kinds of implications for society and culture have been identified in the case of digital platforms and their recommender systems? A flavour of a very substantial research literature can be provided by referring to the work of Tarleton Gillespie, a critical researcher based at Microsoft Research New England, who has written about the specific implications that arise from the increasing use of algorithms “to select what is most relevant from a corpus of data composed of traces of our activities, preferences, and expressions”. He uses the term “public relevance algorithms” (Gillespie 2014) for this development, and categorises the implications as follows (Gillespie 2014: 168):

  • patterns of inclusion and exclusion: “the choices behind what makes it into an index in the first place, what is excluded, and how data is made algorithm ready”

  • cycles of anticipation: “the implications of algorithm providers’ attempts to thoroughly know and predict their users, and how the conclusions they draw can matter”

  • evaluation of relevance: “the criteria by which algorithms determine what is relevant, how those criteria are obscured from us, and how they enact political choices about appropriate and legitimate knowledge”

  • the promise of algorithmic objectivity: “the way the technical character of the algorithm is positioned as an assurance of impartiality, and how that claim is maintained in the face of controversy”[footnote 11]

  • entanglement with practice: “how users reshape their practices to suit the algorithms they depend on, and how they can turn algorithms into terrains for political contest, sometimes even to interrogate the politics of the algorithm itself”

  • the production of “calculated publics”: “how the algorithmic presentation of publics back to themselves shape a public’s sense of itself, and who is best positioned to benefit from that knowledge”

Research implicitly or explicitly related to such problems (and others) has grown exponentially since Gillespie’s chapter was published. There is no space to provide a comprehensive introduction to critical algorithm studies. Our focus is music, so we will refer to critical research on algorithms principally where it has contributed to studies of MSPs (see section 2.4 in particular). However, two preliminary points might usefully be made.

First, it would be wrong to think of the sub-field of critical algorithm studies as unified or monolithic. There is a tension for example between, on the one hand, studies that express concern about the role of algorithmic systems in shaping people’s behaviour while fulfilling the goals of powerful corporations, and on the other research adopting a more measured and perhaps at times descriptive approach. There are significant differences in the degree to which the agency of users is emphasised. A hint of some of the normative differences is provided by Seaver (2017), who questions critical framings of recommenders based on what he sees as simplistic dualisms of freedom and coercion.

Second, while critical research offers the chance to go beyond the sometimes limited understandings of society and culture apparent in academic computer science, we do not claim that critical algorithm studies are superior to such computer science research. Alongside some valuable contributions, some critical internet research does little more than assert that algorithms, including recommendation systems, are important and opaque, and that we should study them more. Some of it pays insufficient attention to actual practices and operations of recommendation and other algorithms and is overly reliant on media coverage, or on sceptical scrutiny of public statements and promotional materials of platform companies. There is a lack of engagement with technical detail, at the expense of rigorous understanding. Nevertheless, at their best, critical internet and critical algorithm studies offers rich resources for understanding recommender systems, and MRS in particular. Again, there is no space to be comprehensive here; instead in the following sub-sections we refer to these sub-fields mainly where they have specifically addressed music.

However, we note the following encouraging tendencies:

  • the increasing presence of critical researchers with backgrounds in computer science who are undertaking research that combines technical knowledge with conceptual sophistication (e.g. Rieder 2020);

  • growing collaborations between critical researchers and computer scientists and programmers in order to examine issues of policy and/or public concern (e.g. Möller et al. 2018);

  • the publication of ethnographic research where critical academics engage with communities that design and operate recommendation and other systems (the most notable examples are actually in music, viz. Seaver 2015, Hodgson 2021).

2.4 Critical research on music streaming

Since around 2015, an enormous amount of social science and humanities research has analysed how MSPs might be changing musical production and consumption. Much of this work draws on, and contributes to, critical internet studies. It pushes back against some of the main ways in which MSPs and many commentators frame their activities - as providing abundance in a cheap and convenient form, while restoring prosperity to the music industry after the crisis attributed to digitalisation in the early 21st century.

In analysis of the music industries and musical production, various critical studies have emphasised the way in which music increasingly operates as a form of information and data (Morris 2015, Prey 2016), and concerns have been expressed that this undermines music’s value as culture (Negus 2019) and brings new, powerful tech corporations into the realm of music (Hesmondhalgh and Meier 2018, Meier and Manzerolle 2019), subjecting music users to new forms of surveillance (Drott 2018). A highly critical account by a team of Sweden-based researchers (Eriksson et al. 2019) focused on Spotify’s links to finance and advertising, and attributed many features of the MSP to those links. The role of playlists in the new musical landscape has featured prominently in some of the most critical accounts (Eriksson 2020, Prey 2020, Tofalvy and Koltai 2021).

There have also been numerous studies of how MSPs and the changing musical system of which they form a central part are affecting the working conditions of musicians and other intermediaries (Marshall 2015, Baym 2018, Hesmondhalgh 2020), including the growing and complex role of social media (Jones 2020). More recent research has started to investigate the ways in which musicians and music industry professionals might be responding to the affordances and demands of MSPs; Baym et al. (2021), for example, examine how these groups respond to and interpret the abundance of metrics now available via platforms. Others consider how musical content itself might be changing or not as a result of “platformisation” (Morris 2020 and Hesmondhalgh 2021).

Other social science and humanities research on MSPs has tended to focus more on the pleasures and benefits that users feel they experience in using MSPs (Hagen 2015, Nowak 2016, Johansson et al. 2017) and the degree to which music streaming is enabling “discovery” (Morris and Powers 2016, Kjus 2016). Here too there has been considerable focus on playlists, but with a less pessimistic appraisal of the forms of musical interaction and community enabled by them (Hagen and Lüders 2017, Dhaenens and Burgess 2019) and the way in which users themselves are shaping genre formation (Airoldi et al. 2016).

Some of the research is focused on how MSPs select and highlight content - a process often known as “curation”. Bonini and Gandini (2019), for example, clarify on the basis of industry interviews, the mix of human and algorithmic recommendation that form what they see as new forms of music gatekeeping; Morgan (2020) investigates understanding by musicians and music businesses of the blend of automated and human “curation” involved in playlists. While much of this research refers to algorithmically-driven music recommendation, and/or acknowledges its importance, as part of investigations of ways in which musical production and consumption are changing, only a relatively small portion of the research discusses algorithmic music recommendation in any detail.

The tone of the research that has considered algorithmically-driven music recommendation closely has been markedly pessimistic. Media scholar Robert Prey (2018: 1094) has argued that the ways in which recommender systems constantly reflect back to us “categorized images of our self” on the basis of commerce are “heavily influenced by categories that are defined and demanded by advertisers and brand needs, thus incorporating music ever more into a system of promotion and consumerism”. Some researchers emphasise the potentially negative aesthetic implications of such changes. Born et al. (2021: 6) for example suggest that recommender systems assume that listeners’ tastes and interests “evolve according to a universal logic derived from an analysis of the aggregated behaviour of millions of listeners”, downplaying unpredictable and unruly musical choices. Pedersen (2020) argues that algorithmic recommendation, and the “datafication” on which it depends, is used by Spotify (the main focus of so many critical accounts of MSPs) to push users towards more “functional” forms of music consumption, i.e. using music to accompany other activities, relax, get ready for activity and so on, at the expense of music’s aesthetic value and potential emotional depth (see Hesmondhalgh 2021 for debates about music “functionalism”).

Something of an exception to the generally very pessimistic tone of social science and humanities research on music recommendation was Webster et al. (2016). This makes a brave effort to combine the critical perspective of Pierre Bourdieu with the more descriptive and “neutral” approach of Actor Network Theory, emphasising the need to combine consideration of both human and non-human actors in music recommendation - a common theme in much critical algorithmic studies. Whether such an elaborate theoretical apparatus is necessary to pursue consideration of the role of human and non-human entities, as the article implies, is not clear, but the article was and remains unusual in combining social and cultural theory with some consideration of how recommender systems work.

A very few researchers have gone deeper by performing experiments using digital methods to examine the outcomes of music recommendation. Notably, Eriksson et al. (2019: 100-101) used a set of bots to investigate the recommendations produced by Spotify’s “Radio” algorithmic system (which automatically plays a selection of related tracks when a track or playlist has finished playing), concluding that music recommendation algorithms at Spotify “did not really take advantage of the archival infinity of the service”.[footnote 12]

Critical approaches to MSPs, including music recommendation, are often in line with many recent negative appraisals of music streaming by musicians, activists and some users, some of which were represented in the DCMS Inquiry. In this respect, an advantage of the best critical research is that it provides deeper and historically-informed versions of the kinds of critiques that ordinary users and musicians are making of platforms and algorithmically-driven recommendation systems. A downside is that very little understanding emerges from critical research of how MRS per se actually operate (rather than for example research on how musicians and music professionals think they operate, and how they respond).

There are various factors that might explain this lack of detailed and specific attention to MRS on the part of critical researchers. Perhaps the primary one is the difficulty of gaining access to information and data about the design and functioning of MRS, and what their effects on consumption might be. This factor is almost certainly shaped by anxieties on the part of tech companies about commercial confidentiality and negative publicity, and perhaps also partly by a sense among tech employees that they do not have time to co-operate with critical academics as the latter seek to explore systemic and historical questions that can feel somewhat removed from the exigencies of daily working life in a streaming company. In addition, no doubt many critical social science and humanities researchers feel intimidated by the technical and statistical detail in computer science papers. Only if such barriers and obstacles can be overcome are major systemic interdisciplinary studies likely to emerge.

A significant exception to this lack of access to organisations and professionals involved in MRS is the research of anthropologist Nick Seaver and ethnomusicologist Thomas Hodgson, already noted above. Seaver has studied the values and attitudes of the engineers and research scientists behind MRS and MIR. His analysis is informed by in-depth consideration of the intellectual histories behind their assumptions and values (Seaver 2015, Seaver 2019, Goldschmitt and Seaver 2019).[footnote 13] As founder of a music technology start-up, Hodgson gained access to research summits, conferences and networking events involving Spotify engineers and researchers, as well as to record companies and musicians. We discuss their contributions further below.

2.5 MSPs as the latest stage in the mediation of music

Social science and humanities research has another virtue in the present context. It can illuminate the crucial issue raised above in Section 1.4: that MSPs represent the latest stage in how music is mediated. While this historical context is recognised by many of the most astute computer scientists, a historical understanding of the systemic nature of musical mediation, and problems arising from such systems, is beyond their professional purview.

Terms such as “gatekeepers” and “intermediaries” have long been used to refer to the people and institutions who shape the culture to which audiences are exposed. How do news organisations select from the vast numbers of things that happen in the world to decide what gets put into a newspaper or a television news bulletin? How are decisions made about which music is played on a radio station, and how often? Sociologists began to explore these and other related questions from the early twentieth century onwards.[footnote 14] More recent, historically-informed contributions from the social sciences and humanities, especially from media and communication studies, have helped to more clearly delineate what is new about emerging forms of cultural intermediation - including those involving algorithms - and how transformation is intertwined with continuities.

Condensing a number of contributions across a range of cultural sectors (Lotz 2018, Johnson 2019, Hesmondhalgh 2019, Poell et al. 2021), we can identify three key developments emerging from the application of digital networks (computerisation, the internet, mobile telephony etc.) to the realm of culture (information, entertainment and the arts).

  • first, digital networks made cultural products even more abundant, by allowing for the relatively affordable and easy creation and uploading of cultural content

  • second, informed by prevailing cultural assumptions about how computing should operate (for example, emphasising individualism over community) they made the experience of consumers and citizens more personalised

  • thirdly, in order to deal with this new abundance and provide personalisation, they made processes of selection and recommendation more automated: it is very often computers (programmed of course by humans) that make decisions that select, rank, order and list items

Alongside these transformations, critical research can also provide a sense of continuity. Processes of selection and recommendation continue to be driven primarily by the commercial goals of the companies that oversee production and distribution - in the case of the music industries, not only the MSPs that now combine the might of both radio and retail in a different era, but also the recording and publishing companies that have, for many decades, owned the rights that provide the means to make money and exert power in the realm of culture.

Understanding these and other related developments is not a luxury add-on to analysing problems surrounding recommendation systems. It is an essential part of any diagnosis. The best historical perspectives can help to push back against now-discredited optimistic claims that algorithmic recommendation would efficiently meet people’s cultural desires or needs and help democratise access to audiences. They can also mitigate against alarmist and simplistic views that technologies of personalisation and automation represent irreversible calamities for culture.

2.6 The need for technical and critical research

Critical research, then, tends to try to understand information technology, and recommender systems in particular, in long-term historical and systemic context, while recognising continuities. But in general critical research lacks engagement with specificities and mechanisms of cultural recommender systems, perhaps especially MRS. On the other hand, as we shall explore more below, the computer science literature is focused on identifying problems (often ones concerning relevance and user satisfaction, but also in some cases issues of “fairness” and “bias”) with mathematical precision, and finding primarily technical solutions to those problems. However, this technical and statistical focus often misses a bigger picture: the social and cultural systems within which MRS operate, and the effects of MRS on society and culture, not just individual consumers. We seek to engage with both research traditions in what follows and recommend that future research involves interdisciplinary teams where possible.

3. The role of algorithms in music recommendation

3.1 Music recommendation, algorithms, and playlists

A first step in understanding how algorithmically-driven recommendation systems shape the experiences and activities of users of MSPs is to consider the different forms that music recommendation in general takes on those platforms. An issue that needs to be taken into account here is that recommendation on MSPs is blurred with a number of other issues, such as placement on interfaces (an issue increasingly addressed as part of the emerging policy concept of “prominence”) and the importance of playlists.[footnote 15] Here are just a few examples of music recommendation in four of the most popular MSPs.

  • under “Browse” on its smartphone interface, Apple Music offers a horizontal scroll of playlists, the contents of some of which are personalised to the user, and therefore produced by MRS. On the date of writing, the first was “Today’s Hits. Apple Music Hits”, featuring a picture of Harry Styles

  • underneath is a horizontal scroll or “carousel” of “Your Go-To Playlists”, organised for one of us by genre, including some of Apple’s radio shows, such as Charlie Sloth’s Rap Show; a carousel of “Now in Spatial Audio”, followed by an expandable vertical list of “Hot Songs” and a carousel of New Releases; it is not clear whether this last is personalised

  • opening the Spotify home page on a laptop currently features, as its top and therefore leading item, under the heading “Good morning”, “Good afternoon” or “Good evening”, a mix of music and podcast items that the user has recently played, and items that are new and therefore based on some interpretation of what the user might enjoy listening to

  • further down the Spotify laptop home page is a list of “popular new releases”. This is undoubtedly a set of “recommendations”, but it is not clear whether these have been suggested algorithmically by a MRS, or whether they represent a selection chosen by a Spotify editorial team - perhaps geared to particular categories of user. In other words, the extent to which they are personalised is uncertain

  • playlists such as Spotify’s “Discover Weekly” and “Release Radar” provide personalised lists of tracks for individual users

  • services such as Spotify’s “radio” feature provide a series of tracks that are automatically played, once a playlist or album selected by a user has finished playing. Here there is no list of recommendations

  • YouTube Music curates “mix” playlists that recommend music tracks based on videos previously played on the platform, not necessarily confined to music videos. YouTube also suggests mix playlists to users based on others’ activity, such as users’ followers, or those followed by users

  • music is a key component in short videos shared on TikTok. TikTok algorithmically recommends videos using its “For You” feed. Songs embedded in videos, either manually or using TikTok’s internal audio library, are used to determine video recommendations to users

Platform music recommendation, then, takes quite varied forms. Some forms of recommendation involve playlists, some do not. Some recommendations are automatic and personalised and some are not. In fact many of the most widely-followed and -used MSP playlists are “curated” (compiled) by humans and are not personalised; all who access them at any time will see the same or similar content. In compiling such playlists, human curators undoubtedly draw on automated search and other computerised systems (Bonini and Gandini 2020), but ultimately humans make the final decisions about inclusion and exclusion.

By contrast, in those playlists that are algorithmically-driven, humans are of course integrally involved, but at the level of designing the systems (including machine learning ones) that then automatically generate inclusion/exclusion decisions. Each user gets a different playlist based on coding decisions made by those humans and by machine learning, as the algorithm learns from users’ practices. Spotify’s Discover Weekly playlist, for example - “your weekly mixtape of fresh music” - is different for all users, and changes every Monday.

This is important in the present context because musicians, music industry professionals and users sometimes blur concerns about playlists with concerns about “algorithms”. One especially strong point of debate has been the potential benefits that accrue from being placed on the most popular playlists and, relatedly, the opacity surrounding how decisions are made about which tracks and artists are included on those playlists and which are not. There is evidence that many of the most popular playlists are populated by artists signed to the “major” record companies,[footnote 16] at the expense of artists signed to independent record companies or uploading their own music and retaining control over their own rights (sometimes called DIY artists, though some of them have significant audiences) (Antal et al. 2021). Many submissions to the DCMS Inquiry complained about the difficulty and/or cost of achieving access to these playlists.[footnote 17] But the most influential forms of recommendation, widely-used humanly curated playlists such as Spotify’s “Rap Caviar”, are not generated by MRS. Concerns about algorithms and playlists are related, but the two need to be understood separately.

3.2 How do music recommendation systems (MRS) work, and how does this relate to other recommender systems?

In this sub-section, we discuss five techniques (3.2.1 to 3.2.5) that allow MRS to function. This is a necessary basis for understanding the issues of bias, diversity and opacity/transparency that will be addressed in Section 4. We draw here on the valuable survey of Schedl, Knees et al. (2015).[footnote 18] However, in 3.2.6, we also suggest ways in which, in order to understand MRS in practice, there is a need for critical research that examines the underlying practices and beliefs behind MRS, and user interactions with them.

3.2.1 Collaborative filtering

The key technique used to generate MRS is the use of collaborative filtering algorithms: “the process of filtering or evaluating items through the opinions of other people” (Schafer et al. 2007). These opinions are usually expressed in the form of “ratings”, but as Schedl, Knees et al. (2015) point out, “in the music domain, explicit rating data is relatively rare, and even when available, tends to be sparser than in other domains […].” Some MSPs do have features that allow users to provide simple ratings, such as “like” (e.g. pressing a thumbs up symbol) or “love” (e.g. pressing a heart-shaped symbol) but this is limited compared with the ratings (up to five stars, or scores up to a hundred) that many film and video audiences provide on video platforms and related services such as IMDB and Rotten Tomatoes. Moreover, ratings for the same song or artist might differ considerably over time, as fashions and audiences change, meaning that in order to capture such change in data, ratings would need to be timestamped, whereas in current practice this is apparently rare (Schedl et al. 2018: 12). Another issue facing those building MRS is that most tracks on MSPs receive zero or very few plays, which means that there is very little information about their use that MRS can draw upon to shape recommendations.[footnote 19]

Instead, in music recommendation, “implicit positive feedback is often drawn from uninterrupted (or unrejected) listening events” (454) - in other words from information about behaviour such as users choosing to skip or not skip tracks. Adding tracks to playlists can also be interpreted as a form of positive rating.

3.2.2 Content-based approaches

Music recommendation also relies heavily upon content-based approaches. According to Schedl, Knees et al. (2015: 454), this has been relatively more common in music recommendation than in other recommender domains, precisely because of the lack of ratings data that drives collaborative filtering. Content information include various kinds of metadata about tracks (artist and song titles, album covers), manual annotations of various kinds provided by “experts”[footnote 20] and various annotations data-mined from the web.

A separate branch of computer science, music information retrieval (MIR), develops systems that can provide such data. Just as importantly, MIR analyses musical content for significant features, using audio signal technologies. MRS could not function without MIR and the two need to be considered together in understanding either.[footnote 21] As Schedl, Knees et al. (2015) point out, the goal of finding content that is, to different degrees, similar to other content, based on any of a number of dimensions, is essential to collaborative filtering, and it is MIR that is the basis of judgements about similarity in MRS development and research.[footnote 22]

3.2.3 Contextual approaches

There is a third key approach to music recommendation, in addition to collaborative filtering based on implicit and explicit evaluations, and “content-based” approaches: contextual approaches. This is a growing area of research, with potential future implications for musical experience. Schedl, Knees et al. (2015) distinguish between

  • environment-related context: features that can be “measured by sensors on the user’s mobile device or obtained from external information services such as user location, the time [of day or week], weather, temperature, and so on”

  • user-related context, harder to measure information about the user’s activity, emotional state, or social environment (e.g. whether they are at a party, in the gym, or alone)

In a later paper, Schedl et al. (2018) make a different distinction, in order to identify three trending topics in MRS, all of them aimed at creating more personalised recommendations.[footnote 23] None of these featured strongly in MRS at the time Schedl et al. wrote this paper, but they predicted that these techniques would soon become more significant in academic research and industry practice. Research might investigate the degree to which this is the case.

  • psychologically-inspired music recommendation. They further divide this category into (a) recommendations based on personality, which some sources claim have achieved superior outcomes for (non-musical) recommender systems; (b) recommendations based on emotion (inferring the emotional state the listener is in, attributing emotion to music, and then performing the crucial but difficult task of connecting the two). As Schedl et al. (2018) recognise in passing, a problem here is a reliance on notions of personality and emotion that are psychologically simplistic, and a reliance on psychological research that uses such simplistic notions. This is a “problem” not only in terms of whether users consider results satisfactory (the main criterion of evaluation considered by computer scientists), but also in terms of the rather simplistic ways in which musical taste, experience and activity are understood, for example the idea that “extravert” people like “upbeat” music

  • situation-aware music recommendation: the various issues of location, time of day, activity, weather, day of the week already mentioned (see Gillhofer and Schedl 2015). Schedl et al. (2018: 13) comment that those situation-aware MRS that do exist usually employ only one or very few of such signals, and those that attempt to construct a more comprehensive view suffer from “a low number of data instances or users, rendering it very hard to build accurate context models”. Here the danger might be sociological as well as psychological simplification, and there may also be privacy/surveillance concerns

  • culture-aware music recommendation: Schedl et al. (2018: 14) criticise MRS for its disregard of cultural location and background as a factor. By contrast, MIR research has sought to move beyond the dominance of western music, but in terms of musical content and production, rather than on how the cultural context of audiences or users might be taken into account. Bauer and Schedl (2019) identify a very small body of work that does attempt to relate consumption to cultural context, but these measures depend on identifying culture with nations, potentially downplaying and underestimating the internal diversity within nations

3.2.4 Combinations of the above

It seems that many MRS combine one or more of the above techniques (the following draws on Schedl, Knees et al. 2015: 467). Combinations of content and context have apparently been rare. But combinations of collaborative filtering (CF) and content-based approaches are considered to improve the quality of recommendations. For example, any new item will have no preference data - a version of the “cold start” problem that troubles computer science.[footnote 24] Audio content analysis can detect similarities to other tracks that have previously gained such data about preference. Importantly, in terms of the concerns raised in recent debates, this combination can therefore be used to avoid favouring already popular items, and to increase novelty and diversity. Preference data will inevitably favour popular items, but audio-based information is “agnostic” about whether music is popular or not, so it is possible to recommend newer and lesser-known items. Various other hybrids combine CF with context (rather than content) descriptors to try to predict what kinds of music users might wish to hear while undertaking such activities as driving, working out, trying to get to sleep and so on. Such combinations seem to be on the rise.

3.2.5 Playlist generation

One of the reasons that playlists are so important in MSPs is that, by contrast with books and feature films, it is possible to consume quite a lot of individual items (i.e. tracks in the case of music) in a short period of time. So as well as recommending lists of albums from which a user might choose, MRS also involves automatic playlist generation (APG) and automatic playlist continuation (APC) (see examples in Section 3.1 above) and the latter plays a greater role in music than in other forms of cultural recommendation.

Schedl, Knees et al. (2015: 475) list three techniques that are used to generate playlists automatically:

  • constraint satisfaction, which attempt to construct playlists that satisfy user-specified search criteria

  • similarity heuristic methods, which build playlists by finding songs similar to a query or a “seed” track

  • machine learning approaches, used to improve playlists on the basis of what users do, for example only using tracks that occur together in other playlists

A later article (Schedl et al. 2018: 5-6) identifies problems with APC: the challenge of inferring the intended purpose of a playlist, especially in cold-start situations. APG in general relies on machine learning to extract information from manually-curated playlists. There is apparently a lack of clarity concerning whether taking into account the order of tracks in playlists creates better models. Users in user studies frequently complain about a lack of variety in APG and APC lists. Here again, it seems that serendipity rather than similarity is therefore the aim of many designers and product teams.

3.2.6 Practices, aims and goals of MRS engineers and organisations

Although surveys such as that of Schedl, Knees et al. (2015) are valuable in understanding the techniques used by MRS, in order to understand these systems it is arguably just as important to understand the practices, aims and goals of MRS engineers, and of the companies that employ them. Public statements by engineers are rare - and often not very revealing, seemingly aimed more at publicity and promotion than enlightenment (e.g. Cowan 2017). The same is true of occasional announcements of strategy by MSPs, often couched in the benign language of human connection and democratisation (e.g. Daniel Ek’s “founder’s letter” when Spotify became a public company in 2018 - “ see Fagan 2018). Ethnographic research potentially offers some illumination of these issues. Seaver’s research, mentioned above, throws light for example on the complexity of algorithms and the problems of talking about “the algorithm” and on the complex organisations behind algorithmic systems. The Chief Scientist of an MSP recounted to Seaver how the company’s “algorithm” had grown and multiplied to become

“not one algorithm at all, but “dozens and dozens” of sub-algorithms, each of which parses a different signal: What does a song sound like? How often does a user click? What has a listener liked in the past? A master algorithm orchestrates the sub-algorithms’ outputs together into an ‘ensemble’ […] that makes a simple decision: What song should be played next?” (422)

The Chief Scientist had gone from being the main person working on “the algorithm” to managing “teams of teams”.

Just as important, Seaver casts light on how organisations understand what they are doing, for example about how the engineers he studied talked of “hooking” users, trying to get them addicted to the product (Seaver 2019). Rather than seeing this as sinister, Seaver emphasises how such systems can be facilitating as well as constraining - like many other modern systems.

Meanwhile, in his interactions with engineers, computer scientists and industry executives, Hodgson (2021: 8) was struck by “their absolute faith in the technology they were building, a deeply rooted belief that it will ultimately be a force for good”. They talked about empowering musicians, and enabling connections between musicians and listeners across the globe. Yet these techno-optimists must surely be aware of public critiques of the companies they work for. If access can be gained, future research might provide more in-depth analysis of the organisational practices and goals of MSPs, within the constraints of commercial confidentiality.

4. The question of “bias” in music recommendation: how might different groups of artists and consumers be affected by algorithms?

4.1 Evaluating MRS

As explained in Section 2, bias has become the prevalent term in computer science, policy and industry for thinking about problems of fairness in relation to the use of digital networks. As we saw there, critical internet and algorithm studies researchers tend to use concepts such as justice and inequality instead, and to focus on the impacts of digital systems on society and culture, rather than just for businesses or consumers. We also discussed in that section some potential limitations of the concept of bias.

Before addressing the question of bias in more detail in this section, we need to locate it in the broader context of how MRS are evaluated in academic computer science and industry. Schedl et al. (2018: 6-7) point out that recommender systems as a whole have tended to adapt evaluation metrics from machine learning and information retrieval, and “that accuracy and related quantitative measures, such as precision, recall, or error measures (between predicted and true ratings) are still the most commonly employed criteria to judge the recommendation quality of a recommender system”. Like CF systems, these evaluation metrics are fundamentally reliant on ratings data, so that the relevance of recommended items can be statistically evaluated - a problem given the sparsity of ratings data in the musical “long tail” - the vast number of obscurer items beyond the most popular artists and tracks.[footnote 25]

There has been a significant move in recent years towards the adoption of “beyond-accuracy” evaluation measures, such as novelty, serendipity and diversity, and we shall see below that these are very relevant to issues raised in recent public debates about algorithms. However, Schedl et al. (2018: 13) warn that in academic computer science studies of recommender systems, the subjective perceptions of the human user (in the case of MRS, the listener) are “way too often” neglected or not properly addressed, in favour of quantitative measures derived from user behaviour. User studies of such perceptions appear to be limited in computer science, though there have been efforts there to develop holistic frameworks that combine “objective” and “subjective” evaluation criteria (Knijnenberg et al. 2012). It seems likely that many technology companies, including perhaps MSPs, undertake such user studies but if this is the case, their results are not made publicly available.

It is difficult to discern from state-of-the-art academic computer science research, such as that we have been discussing above, how technology companies - in this case MSPs - actually evaluate recommender systems in their daily practice, what role is played by above concepts, such as precision, recall, novelty, serendipity and diversity, and the extent to which MSP companies draw on user studies to take into account the actual perceptions of listeners. Spotify research scientists have, however, published papers on using mixed methods to gain insight into user behaviour and attitudes (Garcia-Gathright et al. 2018), including the results of work involving users about music search (Hosey et al. 2019).[footnote 26]

Future research might provide more information about evaluation methods used by MSPs - if access can be gained. Some clues however may be offered by reference to papers published by research scientists based in MSP companies. A Spotify team, Mehrotra et al. (2018), identified a trade-off between consumer relevance and “supplier fairness”, on the basis that a system optimising for relevance might be unfair to unpopular suppliers because of the way in which recommender systems suffer from “superstar economics” or “popularity bias” (see section 4.2 below): a tendency to favour the already most popular items. They performed an experiment on a large real-world data set which sought to measure and compare the user satisfaction produced by five techniques that handled the trade-off in different ways. (We discuss these techniques in greater detail below as they are directly relevant to the issue of “popularity bias”). “User satisfaction” here was measured purely on the basis of “the number of tracks the user listens to in a recommended set”. Again, we see how in some computer science research, subjective appraisals are often shunned: what users might think, feel or say about their levels of satisfaction is not considered relevant. But the investigation of trade-off suggests that some firms and their computer scientists are willing to contemplate other goals besides short-term profit maximisation in designing systems - even if some critics of streaming might claim that any pursuit of “supplier fairness” is itself an indirect way of producing longer-term profit via improved PR.

Returning now to the concept of bias, we can identify two main categories of this concept that have been the subject of substantial studies of bias in relation to music recommendation: popularity bias (i.e. favouring the most popular items in recommendations) and bias according to demographic characteristics. We mean “demographic” here to refer in principle to many different characteristics, including race, ethnicity, gender, class, age, disability, sexuality and nationality. Naturally, these are identity categories that raise important questions of justice and equality beyond the bias concept; hence the title of the relevant sub-section below.

In other domains beyond MRS, technical and critical studies have examined issues such as the return by search engines of results that are deeply problematic in terms of, for example, race (Noble 2018). In MRS, as we shall see, across both technical and critical research, substantial studies have been confined to a rather limited set of demographic characteristics - principally gender, but also to some extent nationality.

We address popularity bias in section 4.2 and demographic biases in 4.4 respectively, and also mention other areas of concern. There are overlaps between these categories here: so-called popularity bias can lead to “unfair” treatment of long-tail items and of users that seek them, who might belong to certain disadvantaged groups. There seem to be some efforts on the part of platforms to counter popularity bias, raising issues about the transparency and observability of platforms. So section 4.3 discusses some methods that have been used to investigate recommendations, including emergent research on strategies of recommendation optimization adopted by content creators and distributors.

4.2 Popularity bias in MRS

A great deal of research has been conducted in the field of academic computer science on popularity bias: the supposed tendency for recommender systems to favour already popular items as part of their efforts to predict what users might benefit from experiencing - an example of what some refer to as a “Matthew Effect”, where the successful get more successful.[footnote 27] Chiming with this, we have already seen that the DCMS Inquiry reported that “several submissions” warned that “algorithms, as with any recommendation system, could reflect biases that may subsequently reduce new music discovery, homogenise taste and disempower self-releasing artists”. One submission argues, for example, that “platforms skew everything in favour of the artists who exist at the top of the pyramid and the record companies who provide the immense backing to ensure their visibility”.[footnote 28] Might the concept of popularity bias in the computer science literature provide information and evidence to inform these debates, including information about whether such a bias exists, and if so how it comes about?

If already popular items are favoured in recommender systems and therefore made more popular at the expense of other items, then this matters. As a great deal of previous research has shown, and as the IPO Music Creators’ Earnings report demonstrated (Hesmondhalgh et al. 2021) music has long been a domain where superstar artists and those companies associated with them tend to benefit disproportionately at the expense of less well-known or less-exposed artists. If already successful artists and their tracks were to continue to gain popularity at the expense of obscurer items, or if that tendency were to be intensified, then this would potentially reinforce or exacerbate unjust inequalities among musicians and those associated with them. It would make it difficult for artists in “the long tail” (see above) to earn a living, not necessarily because of lesser talent, or because their tracks were less interesting or important to audiences, but because of an entrenchment of existing patterns of success. As well as artists, such popularity bias – if it exists – might also be problematic for audiences/users, for those who might enjoy items in the long tail may be deprived of the opportunity to become aware of them.

Very few MRS used by MSPs appear to be based on popularity alone.[footnote 29] Pure popularity-based recommenders do of course exist; it is perfectly possible to create recommendations based simply on ranking the highest-rated or most-consumed items in a certain category. It might even seem as though this might be the simplest and most effective way to please users: after all, it might be argued, if an item is popular, it must have some attractive features. In addition, people might well wish to know what other people are hearing and/or enjoying.[footnote 30] But the academic computer science literature suggests that most cultural recommender systems recognise the importance, for user experience and for the sake of society and culture, of incorporating novelty, discovery and serendipity into platforms. This implies that if popularity bias operates in MRS, it is something that emerges from a complex set of processes and factors rather than as a direct result of simply favouring the most popular items, as some contributions to recent debates seem to believe.

However, academic computer science research is unclear about the processes by which such bias emerges. It also provides little evidence about the existence and effects of popularity bias in real-life music streaming situations. This is in spite of a large amount of published research about the problem of popularity bias in general (rather than specifically in music). The problem may be a result of a lack of access to real-world data sets. Instead, academic computer scientists often seek to address popularity bias by running simulations based on a relatively small number of publicly available datasets, in order to develop solutions to various technical problems associated with popularity bias.[footnote 31] While there is frequent reference to popularity bias, the citations addressing fundamental questions about whether it happens are often rather old. Later papers often involve an exploration of some aspect of the potential problem of popularity bias – usually rather technical in nature – via simulation.

For example, Abdollahpouri (2019: 529), a leading computer science scholar of popularity bias in cultural recommender systems (including popularity bias in MRS), writes in an overview of his research that “[o]ne of the main biases in recommender systems is the problem of popularity bias” and that they “typically emphasize popular items (those with more ratings) over other ‘long-tail’, less popular ones that may only be popular among small groups of users”. But the two sources he cites (Bellogin et al. 2017 and Park and Tuzhilin 2008) do not really discuss evidence concerning the existence or otherwise of popularity bias. The first source, Bellogin et al. (2017), applies evaluations of recommendations derived from the information retrieval literature to recommender systems to assess their effectiveness. Their discussion of the sources of popularity bias in recommender systems is confined to how biases might be baked into the underlying datasets, ultimately concluding that “how users reach music tracks and decide to listen to them is the result of a complex combination of factors where some items have a higher prior probability to be reached than others” (623). This seems something of a truism. How do some items come to have this higher prior probability? Among the factors that critical researchers would point to is the aforementioned factor of greater marketing muscle on the part of some record companies. Perhaps future interdisciplinary research might bring together investigation of these underlying factors with explorations of popularity bias in algorithms themselves.

Among the few sources that appear to explore directly the question of the degree to which (cultural) recommenders push consumers towards the most popular items or to niches is Fleder and Hosanager (2009). Noting the existence of contrary views about this issue, they performed various simulations that show how recommenders, including CF-based ones “can lead to a reduction in sales diversity”, partly because they cannot recommend products with limited historical data. (The so-called “cold start problem” again). However, since 2009, as we have already seen in Section 3, there have been advances that have sought to overcome the problem of sparsity of data about new or untried tracks and artists by using content and contextual information to supplement collaborative filtering.

A paper that deals directly with the existence or otherwise of popularity bias is Jannach et al. (2013). Comparing different recommendation algorithms, they found that some had a strong bias towards recommending only relatively popular items over others. But they also noted a move away from an earlier excessive focus on “accuracy” as a criterion for evaluating recommendations (accuracy is used here in a technical sense to mean whether a recommended item has previously gained a high score from the rater). Recognising the seemingly rather obvious point that users don’t want to be recommended items that they are already familiar with, researchers began to offer quality, novelty, diversity, serendipity and other terms as alternative means of evaluating recommender systems.[footnote 32] In line with this shift away from recommending the familiar, numerous RS papers offered solutions to popularity bias. For example, Kamishima et al. (2014) carried out a simulation that showed that popularity bias could be corrected by enhancing “neutrality with respect to information” about popularity. In a more recent paper, Zhang et al. (2021) have differentiated three categories of ranking adjustment intended to counter popularity bias:

  • inverse propensity scoring (IPS), which reweights the examples used in data training sets

  • causal embedding, “which uses bias-free uniform data to guide the model to learn unbiased embedding” – raising the question of what “bias-free” might mean

  • ranking adjustment, which performs “post-hoc re-ranking” on the recommendation lists produced, or which performs “model regularization” on the training data – both of these methods seek to increase the score of less popular items

Future interdisciplinary research might investigate these and other methods intended to counter popularity bias in ways that take into account some of the criticisms of the bias concept discussed earlier.

It seems clear from the computer science literature then that a) research scientists are very much aware of popularity bias, and b) a great deal of research aims to counter it. What is not clear is what is happening in the recommendation systems used by MSPs. Are these services making use of these developments to shift attention away from the most popular items, at least for some users, and to encourage diversity and discovery?

To address this issue, it might be helpful to return to publications by MSP research scientists. In the paper already mentioned above, Mehrotra et al. (2018) compare five possible techniques involving different ways of dealing with the problems of relevance and fairness (see Section 3.6):

  1. Optimising for relevance (defined as when a recommendation “closely resembles user’s [sic] interest profile” (p.3)).

  2. Optimising for supplier fairness (where the content shown to users is spread well across the long tail of popularity).

  3. An “interpolated recommendation policy”, which jointly considers relevance and fairness, and allows the interplay between them to be assessed, by analysing the impact on satisfaction of different weighting of each.

  4. A probabilistic policy, whereby a weighting factor decides whether to recommend content based on fairness or relevance - i.e. introducing some random-ness.

  5. A guaranteed relevance policy, which “guarantees a certain minimum amount of relevance, following which the model has the freedom to show content based on any criterion, including fairness”.

Crucially, they also discussed “adaptive policies”, which sought to take into account the varying extent among users of “sensitivity towards fair content”, with some users “only interested in a particular group of suppliers” and others “more flexible around the distribution of suppliers”. In other words, some users seem to exhibit behaviour that shows tolerance of a wider range of artists.[footnote 33] To address this, they came up with a “user fairness affinity” measure and an “affinity aware recommender”, and then assessed their impact on user satisfaction (as explained in Section 3, assessed statistically in terms of the number of tracks listened to in a recommended set). They found that the fourth policy, the probabilistic one, performed best in terms of mean fairness, suggesting it should be used in cases where “designers do not want to severely impact fairness while keeping relevance higher” (p. 9). Examining costs and benefits across these trade-offs more holistically, they found that “[a]dapting to user’s [sic] affinity towards fair content gives us the best trade-off between fairness and relevance without negatively impacting satisfaction” (p. 10).

The UK Music Creators’ Earnings research (Hesmondhalgh 2021 et al.: 199 ) drew on comprehensive UK streaming data from a sample month (October) for each of the years from 2014 to 2020 and discerned a slight shift in popularity down the long tail (i.e. away from the most popular tracks and artists) since 2018. They also noted (p. 42) evidence from data firm BuzzAngle, reported in the music business press (Ingham 2018) of such a shift. Hesmondhalgh et al. speculated that this may be a result of key MSPs seeking diversification of content for at least some of their audiences. The motives for such a shift might include i) recognition on the part of some or all MSPs that users oriented towards diversity and novelty (perhaps identified via MSPs’ own market research) will engage more (thereby increasing conversion from “free” tiers to subscription, or retention of users across free and subscription tiers) when offered recommendations that enhanced these aspects of their experience; ii) a desire to respond to public concerns about rewards to musicians on MSPs by diversifying the number of creators who achieve certain levels of play.[footnote 34]

If such a shift is happening, this would suggest a move in the opposite direction from that claimed by those who believe that MSP recommendation algorithms (as opposed to curated playlists) favour entrenched or successful artists. It is worth noting in this context that in the music industries of the 1950s to the 2000s, a huge amount of attention was devoted to those records that were placed in chart listings, or popularity listings, compared with those who fell just outside those charts or “hit parades”. Charts and their contents were the subject of enormous coverage in the media and many music fans were exposed to them on radio and television and in newspapers and magazines. Many radio stations would tend to play the most popular songs, at least when they were gaining in popularity. The popularity bias of that system was immense. We could find no research that undertook systematic historical comparison of these issues, even though it seems important in evaluating MSPs versus older systems.

4.3 Relevant contributions from critical research: vernaculars, imaginaries, folk theories and optimization

A strand of critical research on recommendation (e.g. Rieder et al. 2018), and related research that seeks greater “platform observability” (Rieder and Hofmann 2020), suggest potential routes towards further understanding of popularity “bias” and cognate matters such as effects of recommendation on superstar effects and the long tail (see above). These modes of critical research might also contribute to studies of other forms of bias, fairness and injustice such as those in 4.4 below. Such research combines consideration of public statements by platform companies about their intent with analysis of patterns in recommendations made by particular platform features. For example, Matamoros-Fernández et al. (2021) investigated the results of YouTube’s “Up Next” feature over time and across a number of issues (“coronavirus”, “feminism”, and “beauty”). Their research showed, in line with the company’s commitment to diversification in this feature, significant variation in recommended videos over time. They also found evidence of a move towards prioritizing certain channels, notably those which are considered “authoritative”, and a reliance by YouTube on popularity and recency in ranking. Yet recommendations for “beauty” and “feminism” often favoured older viral videos with highly problematic content. Popularity alone, Matamoros-Fernández et al. (2021) concluded, was not enough to explain why certain problematic videos and less popular channels kept appearing. They also argued that research needed to address “platform vernaculars” and “issue vernaculars”, understandings of how the platform works. These are used by content creators and distributors to increase the chances of their content being amplified by the recommendation system.

This latter emphasis on “platform vernaculars” chimes with increasing policy interest in “algorithmic awareness” on the part of ordinary users of platforms (Algoaware.eu 2018). It also echoes a strand of critical internet studies research which investigates understandings, experiences and perceptions of algorithms on the part of content creators/distributors and ordinary users. Among the most notable contributions are Eslami et al. (2016)’s research on “folk theories” of social media feeds, Bucher (2016)’s phenomenology-inspired work on users’ “algorithmic imaginaries” concerning Facebook, and Bishop’s research on “algorithmic gossip” among beauty vloggers (Bishop 2019).[footnote 35] A certain amount of research in related vein has been conducted on music recommendation (e.g. Morgan 2020). Siles et al. (2020)’s paper, based on interviews and focus groups with Spotify users in Costa Rica, is perhaps the most explicit effort to connect with the concept of “folk theories” concerning algorithms.

A related issue explored by critical internet studies is optimization. While there is a vast technical literature on optimization, critical researchers claim that optimization is far more than a technical matter, because it carries social and political implications regarding corporate power (McKelvey 2018, Petre et al. 2019). There are potential cultural implications too. Building on such work in the context of music, Morris (2020) argues that the need to stand out from internet abundance is pushing “musicians, labels and other stakeholders to think more like data curators” (p.3). Morris et al. (2021) develop the concept of cultural optimization to draw attention to “the process of measuring, engineering, altering, and designing elements (e.g. interfaces, metadata, features, functions etc. of digital cultural goods) … to make them more searchable, discoverable, usable, and valuable in both economic and cultural senses” (162-3). Research along these lines is potentially valuable for understanding MRS because it seeks to understand ways in which MRS might have problematic consequences for musicians, other content creators and music industry professionals.

Morris et al. (2021: 165) argue that “algorithmic playlists represent the most prominent way that music is made ‘contingent’ and ‘platform-dependent’”. Playlists achieve this by “permitting the recycling of older tracks, aligning them with new releases or alternative compilations and extending the shelf-life of the digital music commodity”. In similar vein, Eriksson (2020) argues that playlists serve to contain uncertainty and contingency by making music amenable to mathematical calculation. Some of the more normative, historical and conceptual claims surrounding optimization and other related issues are not necessarily amenable to testing and verification in the ways preferred by researchers trained in positivist methods. Nevertheless, these theorisations might inform research of many different kinds, regarding the playlist strategies of MSPs, and the role of MRS in creating playlists.

This area remains surrounded by speculation however. There has been much public debate about how musicians might be reshaping musical content in order to respond to the requirements of platforms (see Hesmondhalgh 2021 for discussion), by means such as shorter songs, cramming attention-grabbing devices into the first thirty seconds (because it is well known that if users skip before the 30 second mark, a stream does not count), and cramming more tracks on to albums.[footnote 36] Some of the issues and claims are outlined by Hesmondhalgh (2021) and Morris et al. (2021); the first piece offers a sceptical assessment of debates about the putative aesthetic effects ascribed to such tactics - and indeed whether they really exist in the forms that are sometimes claimed. These matters might be explored in future research.

4.4 Bias and injustice by demographic characteristic or form of identity

4.4.1 Demographics, identity and data

A second category of RS-related bias that has received some attention from computer scientists concerns the way that certain demographic characteristics affect the recommendations that certain users receive, and relatedly how artists from different demographic groups are recommended. Many demographic characteristics overlap with key ways in which people are assigned and construct individual and collective identities, such as race, ethnicity, gender, class, sexuality, disability, age, nationality and so on. These forms of identity are all in different ways potentially related to inequality, privilege and justice, and so there are significant implications in MRS for the reinforcement or exacerbation of the social advantages of some artists at the expense of others. There are ramifications too for how some people (whether understood as consumers, users or audiences) in certain groups might more easily access music that might enrich their lives than those in other groups.

By some way, gender appears to be the most demographic category (or form of identity) addressed in treatments of bias and injustice in both computer science and critical internet research on MRS. The same appears to be true of studies of cultural recommenders beyond music.[footnote 37]

This greater degree of attention to gender appears to be a result of the way in which relevant data is captured in the registration processes for most digital platforms, including MSPs. Most registration processes involve users identifying as either male or female.[footnote 38] As we shall see, there is some work that examines biases related to nationality, such as “home bias” - the degree to which users in a particular country prefer music created in that country (Way et al. 2020: 705).[footnote 39] There are occasional references to age (most registration processes involve entering a date of birth and so such data is available to those with access to streaming data). Schedl, Hauger et al. (2015), for example, report as part of a larger study that younger people were easier to satisfy by recommending the most popular items whereas middle-aged and older listeners benefited from algorithms involving some element of CF (i.e. based on “ratings” rather than popularity). We found no substantial studies of biases concerning race, ethnicity and social class per se in relation to MRS, perhaps because of a lack of potentially sensitive information about racial and ethnic self-identification in the main existing datasets, and the complexity of assigning people to social classes. There is little work on sexuality.[footnote 40] These seem to be very unfortunate gaps in existing research.

4.4.2 Gender

Critical research offers the chance to place musical algorithmic gender bias in the context of histories of gender in music, how gender is constructed on digital platforms, and the construction and “performance” of gender in society and culture. Numerous writers have cast light on how, across classical and popular forms, music in general has been a male-dominated cultural practice. As a result of social gendering practices, in child-rearing, education and many other domains, men have dominated key musical professions (for example as composers, conductors, instrumentalists, DJs and music technologists) and many musical spaces are dominated by men (Bayton 1998, Leonard 2007). Music’s often intensely sexual and emotional nature makes it a contested site for relations between gender and sexuality (Reynolds and Press 1995). Genres strongly associated with women and girls, such as pop, have often been disparaged, while male-oriented genres such as rock have been celebrated (Hesmondhalgh 2013: 57-83). Misogyny and sexism have thrived in some musical cultures, but music has also served as a site where such behaviour and attitudes have been questioned and resisted (Brooks 2021).

What do we know about how MSPs and their MRS might be affecting these and other dynamics? A 2014 blog post by an influential MIR practitioner, Paul Lamere, founder of the Echo Nest, provided some early insights. Lamere analysed the most popular artists for 200,000 randomly selected listeners self-identifying as male or female and constructed top 40, top 200 and top 1000 charts (by number of listeners for each artist) for both genders combined, and for each gender. Across all these levels (top 40, 200, 1000), about 30% of artists on any gender-specific chart did not appear on the corresponding chart for the opposite gender. Lamere suggested that this information could be used to improve listener experience by recommending names that appeared only on the list of most popular artists for people of the same gender as the listener. But he also suggested that, even when a listener’s gender was unknown, user satisfaction could be improved by simply avoiding gender-polarizing artists in recommendations: artists that are popular for one gender, but not at all for the other (e.g. Avicii was much more popular with males than females, One Direction more popular with females). However, Lamere clarified that this was a way of dealing with the cold start problem. Once a new user had listened to “a dozen or so songs”, a more personalised set of recommendations could be produced.

Some critical researchers have found that recommendations favour male artists. Investigating which artists were suggested as “related artists” to the white, male, Irish folk-rock artist Damien Rice, critical media researcher Ann Werner (2020) found that predominantly white male artists were offered to users. Journalist Liz Pelly (2018) listened to a selection of Spotify’s most popular playlists for a month and took weekly logs of the playlist data, identifying from artist pages which gender they seemed to identify with.[footnote 41] She found these playlists to be “staggeringly male-dominated”. Pelly recognised that Spotify was acting within a wider music industry context, but concluded that “the sexist music industry status quo is upheld widely by Spotify, even as the platform exploits the woke optics of playlists like Feminist Friday”.

In the most thorough critical humanities research on music recommendation and gender that we have found, Eriksson and Johansson (2017) set up a “bot experiment” with 288 Spotify accounts to assess recommendations by gender. They found that out of the 485 artists and bands recommended to their bots, 386 (or 80 percent) were identified as male, and 73 (or 15 percent) were identified as female, while 24 (or 5 percent) were tagged as mixed duos or groups. 144 of the bots were registered as male, 144 as female, and while Eriksson and Johansson found that male and female users were given some artist recommendations not corresponding to their gender, the proportions of male artists, female artists and mixed bands recommended were almost identical for the “male” and “female” bots. It was not possible for the authors to compare these proportions with the gender identification of artists across the vast Spotify repertoire as a whole. 80% male is clearly a high figure. Like Pelly, Eriksson and Johansson recognised that Spotify’s recommendations were just one factor in this male skewing, and that other factors, including the actions of music industry intermediaries, critics and so on were also at play but nevertheless noted that they found the mismatch between recommendation of male and female artists “remarkable” (176).

Computer scientists have also explored impacts of algorithmic recommendation systems on music. Schedl, Hauger et al. (2015), investigated how a range of combinations of different music recommendation algorithms recommended artists for users of different genders. Measuring performance in terms of the computer science categories of precision, recall and satisfaction, they found little difference in the ability of all the different algorithms to satisfy men and women, other than a slightly greater satisfaction for women produced by popularity-based algorithms. Shakespeare et al. (2020) in an exploratory study on a somewhat limited Last.fm data set, found that all the algorithmic systems they used on the data resulted in “bias disparity”, i.e. an amplification by the algorithms of biases already existing in the data regarding preference for male and female artists, with male artists being favoured. In other words, all the algorithms amplified the preference for male artists existing in the underlying data.

Studies based on larger data sets portray a complex picture however. A report prepared for the European Commission’s Joint Research Centre by Aguiar et al. (2021), on the effects of mainly curated Spotify playlists on the success of women on Spotify, found that the relatively low female share of successful songs at Spotify “mainly arose from the relatively low female share of songs on the platform as a whole rather than anti-female bias in playlist decisions”.[footnote 42] Building on an earlier draft of that study, a recent paper by Epps-Darling et al. (2020), a team including two Spotify research scientists, investigated a number of issues using extensive Spotify data. This is significant, because it means the study was based on observing what real-world listeners did in response to MRS, rather than building simulations out of publicly-available datasets. One finding was that listeners generally stream fewer female or mixed-gender creator groups than male artists - though this varied considerably by genre. Perhaps most intriguingly in the present context, they also found that recommendation-based streaming contains a slightly higher proportion of female creators than does “organic” listening, defined as “tracks that are not recommended by editors or algorithms”. That proportion however remains low in recommendation-driven listening. Their definition of recommendation-based streaming included both algorithmic and humanly-curated recommendation, which means that the influence of automated MRS per se could not be discerned separately. More research using actual usage data from MSPs would be valuable in investigating these issues further, including where possible separating the effects of automated MRS from those of humanly curated recommendation.

4.4.3 Location and nationality

Computer scientists working on MRS have explored biases related to other demographic categories. In the study just cited above, Schedl, Hauger et al. (2015), for example, although they do not use the term bias (which became more fashionable shortly after their study was published), investigated how a range of combinations of different music recommendation algorithms recommended artists for users of different age groups and countries, as well as genders.

Compared with gender, however, there has generally been less focus on the issue of biases in location and nationality (of artists and users). Yet as Tofalvy and Koltai (2021) point out, location is a decisive economic and cultural factor in musical production and consumption. For a long time, various researchers on music history have pointed to core-periphery dynamics, with some countries and cities much more connected to international music industry networks than others. Multinational corporations owned in the global north dominated the distribution of the world’s most prominent and financially successful recorded music from the 1920s onwards (Negus 1992). The USA has long produced a high proportion of global hits. Laing (2008) showed that while the USA is the biggest music exporter, it also consumes very little non-domestically produced music. English has long been seen as the language of global popular music, and this has helped the UK punch above its weight in terms of global musical influence and success. Concerns about such cultural and linguistic inequalities have often been expressed in terms of the notion of “cultural imperialism” in the realm of music (Malm and Wallis 1992). But some researchers have claimed that in the age of globalisation the picture is more complex than one of outright US musical hegemony. Ferreira and Waldfogel (2013) found that the share of recorded music consumption worldwide originating from domestic artists increased from less than 50% in the 1980s to almost 70% in 2007. Some smaller, non-Anglophone countries such as Sweden have exerted considerable international musical influence (Baym 2011). There has been variation in global patterns of musical trade according to genre.

There were some early quantitative studies of how the digitalization of music might be affecting these issues.[footnote 43] The most notable, Gomez-Herrera et al. (2014), found that the shift towards local content had been reversed in the period of digitalization since 2006. There seems to have been much less research on how digital platforms and specifically their recommendation systems are affecting international distributions of attention and success. Some evidence is beginning to emerge, however. Critical researchers Tofalvy and Koltai (2021) collected information about the “related artists” page of 23 Hungarian metal bands and weighted their connections to other bands in the related artists feature in order to assess advantages or otherwise conferred by being signed to Hungarian or international record labels and by having English lyrics. Tofalvy and Koltai also examined the country of origin of the non-Hungarian bands most connected to Hungarian source bands by their outward and reciprocal ties. They found that Hungarian source bands signed to Hungarian record labels connected to 1.9 non-Hungarian bands on average, but source bands signed to international record labels connected to 17.3 non-Hungarian bands: a huge difference. Hungarian source bands whose lyrics were Hungarian had lower average connections compared to those whose lyrics were in English (10 to 12.9). The authors concluded that “the streaming platform replicates and reproduces local industry patterns, as the recommendation system represents and reproduces the bands’ geographical (dis)advantage” (20). As with some of the studies of gender discussed above, the authors acknowledged that other factors are at work besides algorithms - though these other factors are not discussed in detail.

The most notable computer science study of nationality and location bias appears to be that of a Spotify team, Way et al. (2020), who found on the basis of Spotify streaming logs from 2014 to 2019, that “preferences for local content have increased through the streaming era, and that trend is consistent across different genres, listener age groups, and registration cohorts” (706). Strikingly, this represents a reversal of the trend noted by Gomez-Herrera et al. (2014). However, while Way et al. (2020) sought to account for a number of sources of potential complexity (including the evidence that later adopters of MSPs apparently are more inclined to local content, compared with early adopters) they also reflected on the very considerable challenges facing any attempt to analyse musical change. This includes the way in which music itself may have changed over any significant period of analysis (such as the five-year period they studied).

Other significant computer science studies include the work of Bauer and Schedl (2019), which built on their earlier work on mainstreaminess, mentioned in section 4.2. They studied “mainstream-oriented listeners”, in order to differentiate the relationship of such listeners to the global mainstream from their relationship to national mainstreams. This was a way of trying to understand how national cultural contexts might influence users’ tastes, preferences and practices. Their experiments suggested that MRS might implement “demographic filtering” (i.e. taking account of the different musical tastes in different countries) before applying collaborative filtering, in order to improve recommendation. From a critical perspective, such a strategy might be deemed to reinforce a country’s existing levels of mainstreaminess, rather than seeking to achieve greater diversity by increasing users’ appetites for novelty and serendipity. But the recognition of different degrees of open-ness to local and global output on the part of users from different nations, apparent in the Bauer and Schedl (2019) suggests new ways that nuance might be added to debates about cultural imperialism and globalisation.

In the UK context, although the UK has almost certainly outperformed many other countries in terms of international music success, relative to its population and GDP, it still matters that UK repertoire continues to thrive within the UK. If music recommendation moves UK users towards non-UK repertoire (putting aside for now questions about overall diversity of national repertoire that UK consumers expose themselves to), then they might miss important ways in which music represents the experiences of people in the country. Related issues, though expressed in terms of business concerns, were raised during the DCMS Inquiry by the BPI, the trade association representing many UK record companies. They claimed that MRS could lead to UK artists losing out to those of bigger-population countries, because the “baking in” of the preferences of the population of larger countries for repertoire from their own compatriots - presumably referring here to the USA but perhaps also looking ahead to future Chinese global presence? - might make it harder for UK artists to be recommended in MSPs. Investigating such claims effectively would once again depend on access to international data.

5. The question of diversity: positive and negative impacts of MRS

5.1 Diversity as a topic in public debates about streaming

Some of the public submissions to the DCMS Select Committee Inquiry into The Economics of Music Streaming claimed negative impacts on the diversity of music experienced by users, and therefore on the diversity of music, musicians and businesses that gain attention and financial reward. The Select Committee report raised concerns about “homogenisation of taste” but did not expand on this. It may be that the Committee’s wording drew on a sense, sometimes expressed in public debate, that MSPs, via their recommender systems, somehow encourage users to listen to music that they are already familiar with, at the expense of expanding and enriching their musical experiences (Kang and Lam 2021; Koppe 2021; McDonald 2019). These might be seen as musical equivalents of concepts such as “echo chambers”, which refers to the idea that internet users often expose themselves to values and beliefs that they already agree with or “filter bubbles”, which expresses different though related concerns about individualisation and fragmentation in society, though with greater emphasis on the role of the platforms and their algorithms.[footnote 44] While they were developed in the realm of concerns about information, especially news and social/political opinion, these concepts have been applied in some cases to computer science studies of music, where the concern is that shared experiences that might serve to build communal feeling would be diminished by digitalisation’s personalising effects (Allen et al. 2017; Forsblom et al. 2012; Taramigkou et al. 2013; Zhang et al. 2012). A huge literature has developed around these concepts, especially in the context of news and information (rather than culture, art and entertainment). A significant corpus of empirical research questions the assumptions behind these terms, i.e. that the internet and social media really do create such dynamics of fragmentation, individualisation and narcissism (Zuiderveen Borgesius et al. 2016).

In academic computer science, the concept that has been most widely used to investigate such issues is diversity, though this is less abundant than the literature on bias. Despite the lack of clear evidence supporting this perceived fragmentation, a number of studies have addressed whether recommender systems lower the diversity of consumed content, therefore fostering confinement and “bubble” dynamics, and whether those stem from users’ pre-existing preferences or from their online activity (Villermet et al. 2021; Roth et al. 2020; Chang et al. 2021). Related to the above discussion of location bias, MSPs have also reacted to public concern about diversity and cultural hegemony of US-based artists, claiming to improve their recommendations to highlight a greater breadth of music content (Cirisano 2019; Spotify Newsroom 2018a; 2018b; 2018c; 2018d). Recent research commissioned by Spotify (Anderson et al. 2020) seems to confirm that algorithmic recommender systems produce less diverse music consumption, and when users drift away from such recommendations, their consumption diversity increases. However, a recent study by Bello and Garcia (2021) concludes that top music charts in streaming services have been increasing in diversity since 2017 across multiple countries, which some may interpret as meaning that calls for greater protection of national culture industries have been overstated in recent years. These two perspectives are not in contradiction if charts are considered the result of both machine and human processes. Rather, they support the idea that diversity increases with human intervention.

5.2 Types of diversity

Critical media research has long paid significant conceptual attention to the issue of diversity. Distinctions have been made between diversity of source, content, and exposure diversity:

  • source diversity measures the number and variety of culture-producing actors in a media environment and often takes into consideration ownership (e.g. corporate or shareholder-controlled versus family and co-operatively owned) and workforce structures (Bruns 2019; Pariser 2011; Napoli 1999; Nasta et al. 2016). The degree to which producers might be considered diverse by measures such as age, race, education, gender, nationality, religion, sexuality, physical abilities, etc. (e.g. Haim et al. 2017; Picard & Wikström 2008) has also been a key concern

  • content diversity measures the availability of different types of media content, such as the ideas and narratives that are conveyed; the perspectives that are used; the characters that are portrayed; or the artistic styles that are apparent (Napoli 1999; Roessler 2007; Wikström et al. 2018)

  • exposure diversity measures how individual users select and are exposed to source or content diversity via a set of media outlets over a period of time (e.g. McQuail 1992; Helberger 2011)

  • finally, aggregate diversity measures source or content diversity on an aggregate level; either across all (or a number of) users or across all (or a number of) creators (e.g. Karakaya and Ayetekin 2018)

All these conceptualisations could be applied to music. In computer science research on music streaming too, the concept of diversity has been used in various ways. Porcaro et al. (2021) provide a categorisation, and an overview of computer science research on the topic. As well as distinguishing diversity among items (e.g. tracks and artists) from diversity in users, they also discuss the behavioural diversity that arises from interactions of items and users. In Bello and Garcia (2021), diversity is an aggregate concept that includes multiple elements such as acoustic disparity, variety of songs, and label balance in top streaming charts. Attempts to enhance recommendation diversity by covering wider musical areas are somewhat diminished by explicitly defining it as genre diversity (Vargas et al. 2014), a potentially limited solution that does not adequately address the complexity of a globalised music market. Genres may be understood in different ways according to local history or customs, and regional variations to music genres are the norm rather than the exception. The use of particular instruments and performance practices may also blur definitions of genre, and national repertoire may share a number of common traits with other adjacent regions. A specific artist may also encompass multiple music genres at once. This is particularly true in the case of non-western music and Seaver has argued that the classifications used for diverse cultural sounds in recommender systems (Cureño 2021) stem from a western vision of the designer and listener as placed in the global centre of musical knowledge (Seaver 2015), drawing on a kind of “audio tourism” that is common in other areas of the music industry and retail markets (Kassabian 2004).

The limitations of the datasets used in some of these experiments concerning diversity put into question their applicability in such a genre-dynamic field such as music. For instance, in the groundbreaking study by Anderson et al. (2020) mentioned above, similarity between songs is inferred from their repeated appearance in user-generated playlists. Essential data used by MRS such as users’ ratings of music also varies across datasets (Wishwanath and Ahangama 2019), for instance when rating scales are based on two, four or more variables. Earlier scholarship (Tan et al. 2011) proposed the introduction of more complex knowledge graphs about music, indicating the difficulty of designing recommender systems based on high-accuracy pairwise relationships between items. When data used to establish these taste profiles stems from social sites like YouTube, relationships between songs and taste can be difficult to establish, as the same song can be tagged by users in seemingly contradictory categories, or even tagged or explicitly liked just to bookmark it (Tan et al. 2011; Freeman et al. 2022).

5.3 Diversity and user data

In the last decade, as already indicated in Section 3, there has been greater use in music recommendation of content data such as social tagging (Barragáns-Martínez et al. 2010) and social media data (Barragáns-Martínez et al. 2011; Sánchez Moreno et al. 2020), and of contextual data from other internet-enabled devices (Lozano Murciego et al. 2021). The aim is to enrich the quality and diversity of MRS and to create more complex knowledge graphs for use in recommendation systems. Putting aside for now the questions of privacy and ethics that arise from these latter approaches, user-generated data may lead to problematic categorisations, some of which may serve to homogenise diverse music genres (Cureño 2021). It might reflect users’ reproduction of certain keywords for their own commercial benefit (Bishop 2021). Users may create their own folk classifications according to their own language, musical background or access to music training, which in turn reflects global power structures (Besseny 2020). For example, there is a tendency for many western users to group many non-western musics music under the “world music” or “pop” categories. Drawing on sources such as social media for such content data might therefore also amplify the success of exoticised forms of streamed music highlighted by music scholars (Kärjä 2018) especially when a misinterpretation of a song or genre becomes viral. In studies that focus more specifically on the end user as producer or reducer of content diversity (Garg et al. 2020), definitions of such diversity are again marred by unclear definitions of genre. Despite research addressing this topic directly (e.g. Seaver 2015), it is unclear which taxonomies of music are used by either social media users or even MSPs to design their recommender systems. A significant attempt to organise Spotify genre taxonomies is currently under way at Spotify itself (McDonald n.d), but this classification effort does not explain why such music categories are organised in the way they are. As mentioned in section 4, the idea of diversity has also been invoked in studies that address the unequal performance of recommender systems and their algorithmic tools for different groups of people (Melchiorre et al. 2020; 2021), creating forms of bias that affect specific users only.

A final point on diversity. It is worth mentioning here approaches that address algorithmic tools as part of the social culture of digital spaces (Seaver 2017), including those that build “folk knowledge” about recommender systems (see above, plus DeVito et al. 2018) with information from a variety of sources, not least from previous understandings and anxieties about media technologies in society (Hesmondhalgh 2021). In doing so, people may anticipate the effects of algorithms in ways that modify their use of MSPs (Freeman et al. 2022; Haider and Sundin 2021), as well as train others to do so (Bishop 2020) further feeding this perceived diversity bias.

6. Transparency, opacity and oversight in relation to MRS

This final section briefly reviews challenges for policymakers to regulate MRS. Regulating algorithmic systems has presented several novel challenges for researchers and policymakers. The scale and complexity of these systems also presents practical challenges for experts to legibly explain how these systems function in a manner that would be productive and usable for policymakers (Coyle and Weller 2020). Not only are platforms composed of systems that are technically sophisticated, they involve a wider array of governable objects that include people, institutions, and algorithms. Public policy focus is often trained on platform developers (e.g. Google, Facebook, Spotify) but platform governance necessarily includes end users, consumers and musicians in the context of MSPs. Taking this into account, scholars have called for cooperative modes of governance that seek to incorporate democratic participation from citizens and stakeholders in the governing process (e.g. Scholz and Schneider 2017). Governance approaches need also to account for the host of political factors that influence platform ecosystems (Gorwa 2019). Gillespie (2017) further distinguishes between governance of platforms, external policies and regulations that impose measures of governance on platform developers, and governance by platforms, internal guidelines and terms of use set by platforms to govern themselves and their users. As a result, platforms pose “extraordinary challenges” for crafting policies and regulatory mechanisms that are “sufficiently flexible, adaptable, and pragmatic” (Crawford and Lumby 2013, p. 279).

As Crawford and Lumby (2013) note, in the realm of culture the governance of digital platforms exposes the limitations of existing media policy for democratic oversight of convergent media systems comprised by platforms, such as protocols, networks, and algorithms. Concerns regarding opacity in the music industries long predate the streaming era. The major way in which musical production and consumption are governed and regulated is via copyright law and practice, but copyright systems are often impenetrable to the musicians who come to depend on them for their living (Kaye and Gray 2021; cf. Street and Phillips 2016). MSPs arguably add new layers of complexity, and this includes algorithmic recommendation systems, which are dense, complicated, and difficult to interrogate.

A central issue surrounding the governance of recommender systems in general concerns their opaque nature. Some critical algorithm studies use the term “black box” to capture how the functioning of algorithmic systems is difficult to comprehend, even by their developers (e.g. Pasquale 2015). Seaver (2017) is critical of this conception, and more generally of the issue of access that we discuss in this paper, offering his notion of algorithms as culture, as an alternative. While he is surely right to want to go beyond this metaphor as a way of understanding recommender systems, lack of public understanding of algorithms is undoubtedly a real issue. A major source of opacity in oversight of MSPs is limited access to data. Despite the vast caches of music streaming data being tracked and stored, few music streaming data sets are available for public use (Schedl, Knees et al., 2015). The majority of data from the largest streaming firms are commercially confidential or locked behind non-disclosure agreements. Unlike tracing revenue streams using financial data, a complicated and challenging enough task on its own, recommendation systems rely on a much wider range of factors and user inputs.

Reducing opacity of course implies improving transparency. But recent research has raised questions about what is meant by transparency and what improved transparency would look like in practice (Ananny and Crawford 2018). Rieder and Hofmann (2020) question the efficacy of methods to improve the transparency of algorithmic systems that are less legible than other governable entities or systems. Broad calls for increased transparency struggle to specify what new information would actually be disclosed, how that disclosure would take place, and what tangible impact it would have on meaningful oversight. Algorithmic systems are also highly fragmented and complex posing additional challenges to make them more legible for scrutiny.

Yet some notion of platform and algorithmic opacity as a challenge for society, culture and democracy seems indispensable. Rieder and Hoffman’s proposed solution is to work towards making platforms more observable through concrete, actionable steps that can identify and calibrate regulatory approaches to better govern algorithmic black boxes. To make music platforms more observable for transparency regulation, Born et al. (2021) identify three key areas of music streaming transparency: i) consumer information and empowerment; ii) the nature and function of curation technologies; and iii) monitoring, auditing, and evaluating systems. Transparency regarding consumer information and empowerment concerns the moral and legal obligations of platforms to explain to users what personal data are being collected and how those data are used. Transparency regarding the nature and function of curation technologies attempts to circumvent the problems of interrogating algorithmic black boxes by indicating when and how algorithms are making decisions on MSPs, such as by flagging algorithmically curated playlists and offering additional user control inputs to allow individuals to change the way their music is recommended.

Transparency regarding monitoring, auditing, and evaluating MSPs, however, raises questions of oversight. Attempts to implement formal oversight have come from platform companies, government initiatives, and existing legal frameworks. Looking at terms of use and community guidelines of platforms can provide insights on governance by platforms (Gillespie, 2017). Going a step further, TikTok (2020) and Instagram (Mosseri, 2021) published press releases offering additional explanation about how their algorithmic recommender systems work. These internal sources provide a limited curated look at how systems function, condensed for lay-audiences, with many technical details removed. Facebook also launched an Oversight Board in 2020, a team of independent international experts to review content moderation decisions and advise on policy interpretation (Robertson, 2020).

Governments have also attempted to provide increased oversight. In the US, platform companies such as Google, Facebook, and Twitter have been called to testify before Congress. In the EU, the European Commission’s Directorate-General for Communications Networks, Content and Technology publishes annual reports to increase awareness and understanding of algorithmic decision-making systems.

The regulation of complex automated systems of recommendation looks set to be a major challenge for some years to come - including music recommendation. Only by promoting greater public understanding of automated recommendation can democracies undertake reasoned debate about the problems of algorithmic systems in this important domain.

7. Works cited

Abdollahpouri, H. (2019) ‘Popularity Bias in Ranking and Recommendation’, in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. New York, NY, USA: Association for Computing Machinery (AIES ‘19), pp. 529-530. doi:10.1145/3306618.3314309.

Aguiar, L. and Waldfogel, J. (2018) Platforms, Promotion, and Product Discovery: Evidence from Spotify Playlists. Working Paper 24713. National Bureau of Economic Research. doi:10.3386/w24713.

Aguiar, L., Waldfogel, J. and Waldfogel, S. (2021) Playlisting Favorites: Measuring Platform Bias in the Music Industry. w29017. Cambridge, MA: National Bureau of Economic Research. doi:10.3386/w29017.

Airoldi, M., Beraldo, D. and Gandini, A. (2016) ‘Follow the algorithm: An exploratory investigation of music on YouTube’, Poetics, 57, pp. 1-13. doi:10.1016/j.poetic.2016.05.001.

Algoaware.eu (2018) Algo:Aware: Raising awareness on algorithms. The European Commission’s Directorate-General forCommunications Networks, Content and Technology, pp. 1-133. Available at: https://actuary.eu/wp-content/uploads/2019/02/AlgoAware-State-of-the-Art-Report.pdf (Accessed: 24 March 2022).

Allen, D.P., Wheeler-Mackta, H.J. and Campo, J.R. (2017) The Effects of Music Recommendation Engines on the Filter Bubble Phenomenon. Available at: https://www.semanticscholar.org/paper/The-Effects-of-Music-Recommendation-Engines-on-the-Allen-Wheeler-Mackta/f608abc9e3f432fb47b9e3034e07ffae896de31a (Accessed: 24 March 2022).

Ananny, M. and Crawford, K. (2018) ‘Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability’, New Media & Society, 20(3), pp. 973-989. doi:10.1177/1461444816676645.

Anderson, A., Maystre, L., Mehrotra, R., Anderson, I. and Lalmas, M. (2020) ‘Algorithmic effects on the diversity of consumption on Spotify’, in International World Wide Web Conference. International World Wide Web Conference, Taipei, Taiwan. doi:https://doi.org/10.1145/3366423.3380281.

Antal, D., Fletcher, A. and Ormosi, P.L. (2021) ‘Music Streaming: Is It a Level Playing Field?’, Competition Policy International, 23 February. Available at: https://www.competitionpolicyinternational.com/music-streaming-is-it-a-level-playing-field/ (Accessed: 24 March 2022).

Baeza-Yates, R. (2016) ‘Data and algorithmic bias in the web’, in Proceedings of the 8th ACM Conference on Web Science. New York, NY, USA: Association for Computing Machinery (WebSci ‘16). doi:10.1145/2908131.2908135.

Bagal, Y. (2019) ‘The Attention Economy is Changing Pop Music’, Medium, 10 April. Available at: https://medium.com/@yashbagal/the-attention-economy-is-changing-pop-music-5833d4f9721d (Accessed: 12 April 2022).

Barragáns-Martínez, A.B., Rey Lopez, M., Costa Montenegro, E., Mikic Fonte, F.A., Burguillo, J.C. and Peleteiro, A. (2010) ‘Exploiting Social Tagging in a Web 2.0 Recommender System’, IEEE Internet Computing, 14(6), pp. 23-30. doi:10.1109/MIC.2010.104.

Bauer, C. and Schedl, M. (2019) ‘Global and country-specific mainstreaminess measures: Definitions, analysis, and usage for improving personalized music recommendation systems’, PLOS ONE, 14(6). doi:10.1371/journal.pone.0217389.

Baym, N., Bergmann, R., Bhargava, R., Diaz, F., Gillespie, T., Hesmondhalgh, D., Maris, E. and Persaud, C.J. (2021) ‘Making Sense of Metrics in the Music Industries’, International Journal of Communication, 15(0). Available at: https://ijoc.org/index.php/ijoc/article/view/17635 (Accessed: 12 April 2022).

Baym, N.K. (2011) ‘The Swedish Model: Balancing Markets and Gifts in the Music Industry’, Popular Communication, 9(1), pp. 22-38. doi:10.1080/15405702.2011.536680.

Baym, N.K. (2018) Playing to the crowd: musicians, audiences, and the intimate work of connection. New York: New York University Press (Postmillennial pop).

Bayton, M. (1998) Frock Rock: Women Performing Popular Music. Oxford, New York: Oxford University Press.

Bello, P. and Garcia, D. (2021) ‘Cultural Divergence in popular music: the increasing diversity of music consumption on Spotify across countries’, Humanities and Social Sciences Communications, 8(1), pp. 1-8. doi:10.1057/s41599-021-00855-1.

Bellogín, A., Castells, P. and Cantador, I. (2017) ‘Statistical biases in Information Retrieval metrics for recommender systems’, Information Retrieval Journal, 20(6), pp. 606-634. doi:10.1007/s10791-017-9312-z.

Besseny, A. (2020) ‘Lost in spotify: folksonomy and wayfinding functions in spotify’s interface and companion apps’, Popular Communication, 18(1), pp. 1-17. doi:10.1080/15405702.2019.1701674.

Bishop, S. (2019) ‘Managing visibility on YouTube through algorithmic gossip’, New Media & Society, 21(11-12), pp. 2589-2606. doi:10.1177/1461444819854731.

Bishop, S. (2020) ‘Algorithmic Experts: Selling Algorithmic Lore on YouTube’, Social Media + Society, 6(1). doi:10.1177/2056305119897323.

Bishop, S. (2021) ‘Influencer Management Tools: Algorithmic Cultures, Brand Safety, and Bias’, Social Media + Society, 7(1). doi:10.1177/20563051211003066.

Bonini, T. and Gandini, A. (2019) ‘“First Week Is Editorial, Second Week Is Algorithmic”: Platform Gatekeepers and the Platformization of Music Curation’, Social Media + Society, 5(4). doi:10.1177/2056305119880006.

Bonini, T. and Gandini, A. (2020) ‘The Field as a Black Box: Ethnographic Research in the Age of Platforms’, Social Media + Society, 6(4). doi:10.1177/2056305120984477.

Born, G., Morris, J.W., Diaz, F. and Anderson, A. (2021) Artificial intelligence, music recommendation, and the curation of culture. Montreal: CIFAR. Available at: https://841.io/doc/BornMorrisDiazAnderson%20-%20Artificial%20Intelligence,%20Music%20Recommendation,%20and%20the%20Curation%20of%20Culture%20(2021)%20-%209.pdf.

Brooks, D. (2021) Liner notes for the revolution: the intellectual life of black feminist sound. Cambridge, Massachusetts: The Belknap Press of Harvard University Press.

Bruns, A. (2019) Are filter bubbles real? Medford, MA: Polity Press (Digital futures).

Bucher, T. (2016) ‘The algorithmic imaginary: exploring the ordinary affects of Facebook algorithms’, Information, Communication & Society, 20(1), pp. 30-44. doi:10.1080/1369118X.2016.1154086.

Cañamares, R. and Castells, P. (2014) ‘Exploring Social Network Effects on Popularity Biases in Recommender Systems’, in RSWeb@RecSys. The 6th Workshop on Recommender Systems and the Social Web, Foster City, CA, USA. Available at: http://ceur-ws.org/Vol-1271/Paper2.pdf.

Centre for Data Ethics and Innovation (2020) Review into bias in algorithmic decision-making. Centre for Data Ethics and Innovation.

Chang, J.-W., Chiou, C.-Y., Liao, J.-Y., Hung, Y.-K., Huang, C.-C., Lin, K.-C. and Pu, Y.-H. (2021) ‘Music recommender using deep embedding-based features and behavior-based reinforcement learning’, Multimedia Tools and Applications, 80(26-27), pp. 34037-34064. doi:10.1007/s11042-019-08356-9.

Cirisano, T. (2019) ‘Pandora’s Mobile App Gets a Hyper-Personalized Revamp’, Billboard, 1 October. Available at: https://www.billboard.com/pro/pandora-refreshes-mobile-app-personalized-for-you-station-controls/ (Accessed: 24 March 2022).

Cowan, M. (2017) ‘How Spotify chooses what makes it onto your Discover Weekly playlist’, Wired UK, 1 September. Available at: https://www.wired.co.uk/article/tastemakers-spotify-edward-newett (Accessed: 15 April 2022).

Coyle, D. and Weller, A. (2020) ‘“Explaining” machine learning reveals policy challenges’, Science, 368(6498), pp. 1433-1434. doi:10.1126/science.aba9647.

Crawford, K. and Lumby, C. (2013) ‘Networks of Governance: Users, Platforms, and the Challenges of Networked Media Regulation’, International Journal of Technology Policy and Law, 1(3), pp. 270-282.

Cremonesi, P., Koren, Y. and Turrin, R. (2010) ‘Performance of recommender algorithms on top-n recommendation tasks’, in Proceedings of the fourth ACM conference on Recommender systems. New York, NY, USA: Association for Computing Machinery (RecSys ‘10), pp. 39-46. doi:10.1145/1864708.1864721.

Cureño, E.L. (2021) Negotiating Artistic Representation in the Era of #Worldmusic: Trends, Challenges, Authenticity, and the Artist’s Perspective. Thesis. Arizona State University. Available at: https://www.proquest.com/openview/6f9df3895548b90740cd8c315a57c7f8/1?pq-origsite=gscholar&cbl=18750&diss=y (Accessed: 13 March 2022).

DeVito, M.A., Birnholtz, J., Hancock, J.T., French, M. and Liu, S. (2018) ‘How People Form Folk Theories of Social Media Feeds and What it Means for How We Study Self-Presentation’, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: Association for Computing Machinery, pp. 1-12. Available at: https://doi.org/10.1145/3173574.3173694 (Accessed: 24 March 2022).

Dhaenens, F. and Burgess, J. (2019)’“Press play for pride”: The cultural logics of LGBTQ-themed playlists on Spotify’, New Media & Society, 21(6), pp. 1192-1211. doi:10.1177/1461444818808094.

Drott, E. (2018) ‘Why the Next Song Matters: Streaming, Recommendation, Scarcity’, Twentieth-Century Music, 15(3), pp. 325-357. doi:10.1017/S1478572218000245.

Ekstrand, M.D. and Kluver, D. (2020) ‘Exploring Author Gender in Book Rating and Recommendation’, arXiv:1808.07586 [cs] [Preprint]. Available at: http://arxiv.org/abs/1808.07586 (Accessed: 11 April 2022).

Epps-Darling, A., Cramer, H. and Bouyer, R.T. (2020) ‘Artist gender representation in music streaming’, in Proceedings of the 21st International Society for Music Information Retrieval Conference. Montreal, Canada, pp. 248-54.

Eriksson, M. (2020) ‘The editorial playlist as container technology: on Spotify and the logistical role of digital music packages’, Journal of Cultural Economy, 13(4), pp. 415-427. doi:10.1080/17530350.2019.1708780.

Eriksson, M., Fleischer, R., Johansson, A., Snickars, P. and Vonderau, P. (2019) Spotify Teardown: Inside the Black Box of Streaming Music. Cambridge, MA: MIT Press.

Eriksson, M. and Johansson, A. (2017) ‘Tracking Gendered Streams’, Culture unbound: Journal of current cultural research, 9(2), pp. 163-183. doi:10.25595/1449.

Eslami, M., Karahalios, K., Sandvig, C., Vaccaro, K., Rickman, A., Hamilton, K. and Kirlik, A. (2016) ‘First I “like” it, then I hide it: Folk Theories of Social Feeds’, in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. CHI ‘16: CHI Conference on Human Factors in Computing Systems, San Jose California USA: ACM, pp. 2371-2382. doi:10.1145/2858036.2858494.

Eubanks, V. (2018) Automating inequality: how high-tech tools profile, police, and punish the poor. First Edition. New York, NY: St. Martin’s Press.

Fagan, K. (2018) Spotify’s 35-year-old co-founder wrote an emotional letter to investors promising to make users ‘empathize’ with each other and to ‘feel part of a greater whole’, Business Insider. Available at: https://www.businessinsider.com/spotify-ceo-daniel-ek-letter-to-investors-2018-2 (Accessed: 15 April 2022).

Ferreira, F. and Waldfogel, J. (2013) ‘Pop Internationalism: Has Half a Century of World Music Trade Displaced Local Culture?’, The Economic Journal, 123(569), pp. 634-664. doi:10.1111/ecoj.12003.

Fleder, D. and Hosanagar, K. (2009) ‘Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity’, Management Science, 55(5), pp. 697-712. doi:10.1287/mnsc.1080.0974.

Fleder, D.M. and Hosanagar, K. (2007) Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity. SSRN Scholarly Paper ID 955984. Rochester, NY: Social Science Research Network. doi:10.2139/ssrn.955984.

Forsblom, A., Nurmi, P., Åman, P. and Liikkanen, L. (2012) ‘Out of the bubble: serendipitous even recommendations at an urban music festival’, in Proceedings of the 2012 ACM international conference on Intelligent User Interfaces. New York, NY, USA: Association for Computing Machinery (IUI ‘12), pp. 253-256. doi:10.1145/2166966.2167011.

Freeman, S., Gibbs, M. and Nansen, B. (2022) ‘“Don’t mess with my algorithm”: Exploring the relationship between listeners and automated curation and recommendation on music streaming services’, First Monday [Preprint]. doi:10.5210/fm.v27i1.11783.

Friedman, B. and Nissenbaum, H. (1996) ‘Bias in computer systems’, ACM Transactions on Information Systems, 14(3), pp. 330-347. doi:10.1145/230538.230561.

Frith, S. (2007) Taking popular music seriously: selected essays. Aldershot, Hampshire, England; Burlington, VT: Ashgate (Ashgate contemporary thinkers on critical musicology series).

Garcia-Gathright, J., Springer, A. and Cramer, H. (2018) ‘Assessing and Addressing Algorithmic Bias - But Before We Get There’, arXiv:1809.03332 [cs] [Preprint]. Available at: http://arxiv.org/abs/1809.03332 (Accessed: 13 March 2022).

Garg, S., Saurabh and Breja, M. (2020) ‘Social Network Analysis of YouTube: A Case Study on Content Diversity and Genre Recommendation’, in Singh, V., Asari, V.K., Kumar, S., and Patel, R.B. (eds) Computational Methods and Data Engineering. Singapore: Springer Singapore (Advances in Intelligent Systems and Computing), pp. 25-37. doi:10.1007/978-981-15-6876-3_3.

Gillespie, T. (2014) ‘The Relevance of Algorithms’, in Gillespie, T., Boczkowski, P.J., and Foot, K.A. (eds) Media Technologies: Essays on communication, materiality, and society. Cambridge, MA: Massachusettes Institute of Technology, pp. 167-194.

Gillespie, T. (2017) ‘Governance of and by platforms’, in Burgess, J., Poell, T., and Marwick, A. (eds) SAGE Handbook of Social Media. London: SAGE Publications, pp. 254-278. Available at: http://culturedigitally.org/wp-content/uploads/2016/06/Gillespie-Governance-ofby-Platforms-PREPRINT.pdf (Accessed: 28 July 2016).

Gillhofer, M. and Schedl, M. (2015) ‘Iron Maiden While Jogging, Debussy for Dinner?’, in He, X., Luo, S., Tao, D., Xu, C., Yang, J., and Hasan, M.A. (eds) MultiMedia Modeling. Cham: Springer International Publishing, pp. 380-391. doi:10.1007/978-3-319-14442-9_44.

Gilroy, P. (2003) The black Atlantic: modernity and double consciousness. Cambridge, Mass: Harvard Univ. Press.

Goldschmitt, K.E. and Seaver, N. (2019) ‘Shaping the Stream: Techniques and Troubles of Algorithmic Recommendation’, in Trippett, D., Ingalls, M.M., and Cook, N. (eds) The Cambridge Companion to Music in Digital Culture. Cambridge: Cambridge University Press (Cambridge Companions to Music), pp. 63-81. doi:10.1017/9781316676639.006.

Gomez-Herrera, E., Martens, B. and Waldfogel, J. (2014) What’s Going On? Digitization and Global Music Trade Patterns since 2006. European Commission, Institute for Prospective Technological Studies. Available at: https://joint-research-centre.ec.europa.eu/publications/whats-going-digitization-and-global-music-trade-patterns-2006_en (Accessed: 11 April 2022).

Gorwa, R. (2019) ‘What is platform governance?’, Information, Communication & Society, 22(6), pp. 854-871. doi:10.1080/1369118X.2019.1573914.

Hackett, R.A. (1984) ‘Decline of a paradigm? Bias and objectivity in news media studies’, Critical Studies in Mass Communication, 1(3), pp. 229-259. doi:10.1080/15295038409360036.

Hagen, A. (2015) Using Music Streaming Services: Practices, Experiences and the Lifeworld of Musicking. PhD Dissertation. University of Oslo. Available at: https://www.hf.uio.no/imv/forskning/prosjekter/skyogscene/publikasjoner/hagen2015.pdf (Accessed: 12 April 2022).

Hagen, A.N. and Lüders, M. (2017) ‘Social streaming? Navigating music as personal and social’, Convergence, 23(6), pp. 643-659. doi:10.1177/1354856516673298.

Haider, J. and Sundin, O. (2021) ‘Information literacy as a site for anticipation: temporal tactics for infrastructural meaning-making and algo-rhythm awareness’, Journal of Documentation, 78(1), pp. 129-143. doi:10.1108/JD-11-2020-0204.

Haim, M., Graefe, A. and Brosius, H.-B. (2018) ‘Burst of the Filter Bubble?: Effects of personalization on the diversity of Google News’, Digital Journalism, 6(3), pp. 330-343. doi:10.1080/21670811.2017.1338145.

Hall, S., Connell, I. and Curti, L. (2007) ‘The “unity” of current affairs television’, in CCCS Selected Working Papers. Routledge.

Helberger, N. (2011) ‘Diversity by Design’, Journal of Information Policy, 1, pp. 441-469. doi:10.5325/jinfopoli.1.2011.0441.

Hesmondhalgh, D. (2013) Why music matters. Chichester, West Sussex, UK; Malden, MA, USA: John Wiley & Sons Ltd.

Hesmondhalgh, D. (2019) The cultural industries. 4th edition. Thousand Oaks, CA: SAGE Publications.

Hesmondhalgh, D. (2020) ‘Is music streaming bad for musicians? Problems of evidence and argument’, New Media & Society, 23(12), pp. 3593-3615. doi:10.1177/1461444820953541.

Hesmondhalgh, D. (2021) ‘Streaming’s Effects on Music Culture: Old Anxieties and New Simplifications’, Cultural Sociology, 16(1), pp. 3-24. doi:10.1177/17499755211019974.

Hesmondhalgh, D. and Meier, L.M. (2018) ‘What the digitalisation of music tells us about capitalism, culture and the power of the information technology sector’, Information, Communication & Society, 21(11), pp. 1555-1570. doi:10.1080/1369118X.2017.1340498.

Hesmondhalgh, D., Osborne, R., Sun, H. and Barr, K. (2021) Music Creators’ Earnings in the Digital Era. Newport, UK: UK Intellectual Property Office. Available at: https://www.gov.uk/government/publications/music-creators-earnings-in-the-digital-era.

Hodgson, T. (2021) ‘Spotify and the democratisation of music’, Popular Music, 40(1), pp. 1-17. doi:10.1017/S0261143021000064.

Hosey, C., Vujović, L., St. Thomas, B., Garcia-Gathright, J. and Thom, J. (2019) ‘Just Give Me What I Want: How People Use and Evaluate Music Search’, in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. CHI ‘19: CHI Conference on Human Factors in Computing Systems, Glasgow Scotland Uk: ACM, pp. 1-12. doi:10.1145/3290605.3300529.

House of Commons Digital, Culture, Media and Sport Committee (2021) Economics of Music Streaming. London: House of Commons.

Jannach, D., Lerche, L., Gedikli, F. and Bonnin, G. (2013) ‘What Recommenders Recommend – An Analysis of Accuracy, Popularity, and Sales Diversity Effects’, in Carberry, S., Weibelzahl, S., Micarelli, A., and Semeraro, G. (eds) User Modeling, Adaptation, and Personalization. Berlin, Heidelberg: Springer, pp. 25-37. doi:10.1007/978-3-642-38844-6_3.

Johansson, S., Werner, A., Åker, P. and Goldenzwaig, G. (2017) Streaming Music: Practices, Media, Cultures. London: Routledge. doi:10.4324/9781315207889.

Johnson, D. (2019) Transgenerational Media Industries: Adults, Children, and the Reproduction of Culture. University of Michigan Press. doi:10.3998/mpub.9894091.

Jones, E. (2020) DIY music and the politics of social media. London: Bloomsbury.

Kamishima, T., Akaho, S., Asoh, H. and Sakuma, J. (2014) ‘Correcting Popularity Bias by Enhancing Recommendation Neutrality’, in RecSys Posters.

Kang, J.R.W. and Lam, A. (2021) ‘How you may be trapped in a filter bubble of music due to AI’, The Strand, 20 August. Available at: https://thestrand.ca/how-you-may-be-trapped-in-a-filter-bubble-of-music-due-to-ai/ (Accessed: 24 March 2022).

Karakaya, M.Ö. and Aytekin, T. (2018) ‘Effective methods for increasing aggregate diversity in recommender systems’, Knowledge and Information Systems, 56(2), pp. 355-372. doi:10.1007/s10115-017-1135-0.

Kärjä, A.-V. (2018) ‘Why Is Uncle Paintbrush so Funny? The Case of YouTube Translation of a Syrian Kurdish Wedding Song into Finnish’, Yearbook for Traditional Music, 50, pp. 141-164. doi:10.5921/yeartradmusi.50.2018.0141.

Kassabian, A. (2004) ‘Would You Like Some World Music with your Latte? Starbucks, Putumayo, and Distributed Tourism’, Twentieth-Century Music, 1(2), pp. 209-223. doi:10.1017/S1478572205000125.

Kaye, D.B.V. and Gray, J.E. (2021) ‘Copyright Gossip: Exploring Copyright Opinions, Theories, and Strategies on YouTube’, Social Media + Society, 7(3). doi:10.1177/20563051211036940.

Kjus, Y. (2016) ‘Musical exploration via streaming services: The Norwegian experience’, Popular Communication, 14(3), pp. 127-136. doi:10.1080/15405702.2016.1193183.

Knijnenburg, B.P., Willemsen, M.C., Gantner, Z., Soncu, H. and Newell, C. (2012) ‘Explaining the user experience of recommender systems’, User Modeling and User-Adapted Interaction, 22(4), pp. 441-504. doi:10.1007/s11257-011-9118-4.

Koppe, M. (2021) Do algorithms keep playing the same old song?, CNRS News. Available at: https://news.cnrs.fr/articles/do-algorithms-keep-playing-the-same-old-song (Accessed: 24 March 2022).

Laing, D. (2008) ‘World Music and the Global Music Industry: Flows, Corporations and Networks’, Popular Music History, 3(3), pp. 213-231.

Lamere, P. (2014) ‘Gender Specific Listening’, Music Machinery, 10 February. Available at: https://musicmachinery.com/2014/02/10/gender-specific-listening/ (Accessed: 11 April 2022).

Leonard, M. (2007) Gender in the music industry: rock, discourse, and girl power. Aldershot, Hampshire, England; Burlington, VT: Ashgate (Ashgate popular and folk music series).

Lotz, A.D. (2018) We now disrupt this broadcast: how cable transformed television and the internet revolutionized it all. Cambridge, Massachusetts: The MIT Press.

Lozano Murciego, A., Jiménez-Bravo, D.M., Valera Román, A., De Paz Santana, J.F. and Moreno-García, M.N. (2021) ‘Context-Aware Recommender Systems in the Music Domain: A Systematic Literature Review’, Electronics, 10(13), p. 1555. doi:10.3390/electronics10131555.

Malm, K. and Wallis, R. (1992) Media policy and music activity. London; New York: Routledge. Available at: http://site.ebrary.com/id/10166571 (Accessed: 12 April 2022).

Manuel, P. (1990) Popular musics of the Non-Western world: an introductory survey. New York Oxford: Oxford University press.

Marshall, L. (2015) ‘“Let’s keep music special. F– Spotify”: on-demand streaming and the controversy over artist royalties’, Creative Industries Journal, 8(2), pp. 177-189. doi:10.1080/17510694.2015.1096618.

Matamoros-Fernández, A., Gray, J.E., Bartolo, L., Burgess, J. and Suzor, N. (2021) ‘What’s “Up Next”? Investigating Algorithmic Recommendations on YouTube Across Issues and Over Time’, Media and Communication, 9(4), pp. 234-249. doi:10.17645/mac.v9i4.4184.

McDonald, G. (2019) On Netflix and Spotify, algorithms hold the power. But there’s a way to get it back., Experience Magazine. Available at: https://expmag.com/2019/11/endless-loops-of-like-the-future-of-algorithmic-entertainment/ (Accessed: 24 March 2022).

McDonald, G. (n.d.) Every Noise at Once (map), everynoise.com. Available at: https://everynoise.com/everynoise1d.cgi?vector=popularity&scope=all (Accessed: 7 April 2022).

McKelvey, F. (2018) Internet daemons: digital communications possessed. Minneapolis; London: University of Minnesota Press (Electronic mediations, 56).

McQuail, D. (1992) Media performance: Mass communication and the public interest. Newbury Park, CA: Sage.

Mehrotra, R., McInerney, J., Bouchard, H., Lalmas, M. and Diaz, F. (2018) ‘Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems’, in. doi:10.1145/3269206.3272027.

Meier, L.M. and Manzerolle, V.R. (2019) ‘Rising tides? Data capture, platform accumulation, and new monopolies in the digital music economy’, New Media & Society, 21(3), pp. 543-561. doi:10.1177/1461444818800998.

Melchiorre, A.B., Rekabsaz, N., Parada-Cabaleiro, E., Brandl, S., Lesota, O. and Schedl, M. (2021) ‘Investigating gender fairness of recommendation algorithms in the music domain’, Information Processing & Management, 58(5). doi:10.1016/j.ipm.2021.102666.

Melchiorre, A.B., Zangerle, E. and Schedl, M. (2020) ‘Personality Bias of Music Recommendation Algorithms’, in Fourteenth ACM Conference on Recommender Systems. RecSys ‘20: Fourteenth ACM Conference on Recommender Systems, Virtual Event Brazil: ACM, pp. 533-538. doi:10.1145/3383313.3412223.

Möller, J., Trilling, D., Helberger, N. and van Es, B. (2018) ‘Do not blame it on the algorithm: an empirical assessment of multiple recommender systems and their impact on content diversity’, Information, Communication & Society, 21(7), pp. 959-977. doi:10.1080/1369118X.2018.1444076.

Morgan, B.A. (2020) ‘Revenue, access, and engagement via the in-house curated Spotify playlist in Australia’, Popular Communication, 18(1), pp. 32-47. doi:10.1080/15405702.2019.1649678.

Morris, J.W. (2015) ‘Curation by code: Infomediaries and the data mining of taste’, European Journal of Cultural Studies, 18(4-5), pp. 446-463. doi:10.1177/1367549415577387.

Morris, J.W. (2020) ‘Music Platforms and the Optimization of Culture’, Social Media + Society, 6(3). doi:10.1177/2056305120940690.

Morris, J.W. and Powers, D. (2015) ‘Control, curation and musical experience in streaming music services’, Creative Industries Journal, 8(2), pp. 106-122. doi:10.1080/17510694.2015.1090222.

Morris, J.W., Prey, R. and Nieborg, D.B. (2021)’Engineering culture: logics of optimization in music, games, and apps’, Review of Communication, 21(2), pp. 161-175. doi:10.1080/15358593.2021.1934522.

Mosseri, A. (2021) Shedding More Light on How Instagram Works, Instagram. Available at: https://about.instagram.com/blog/announcements/shedding-more-light-on-how-instagram-works (Accessed: 10 April 2022).

Napoli, P.M. (1999) ‘Deconstructing the Diversity Principle’, Journal of Communication, 49(4), pp. 7-34. doi:10.1111/j.1460-2466.1999.tb02815.x.

Nasta, L., Pirolo, L. and Wikström, P. (2016) ‘Diversity in creative teams: a theoretical framework and a research methodology for the analysis of the music industry’, Creative Industries Journal, 9(2), pp. 97-106. doi:10.1080/17510694.2016.1154653.

Negus, K. (1992) Producing pop: culture and conflict in the popular music industry. London; New York; Melbourne: Edward Arnold.

Negus, K. (2019) ‘From creator to data: the post-record music industry and the digital conglomerates’, Media, Culture & Society, 41(3), pp. 367-384. doi:10.1177/0163443718799395.

Noble, S.U. (2018) Algorithms of oppression: how search engines reinforce racism. New York: New York University Press.

Nowak, R. (2016) ‘When is a discovery? The affective dimensions of discovery in music consumption’, Popular Communication, 14(3), pp. 137-145. doi:10.1080/15405702.2016.1193182.

Olteanu, A., Castillo, C., Diaz, F. and Kıcıman, E. (2019) ‘Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries’, Frontiers in Big Data, 2. doi:10.3389/fdata.2019.00013.

O’Neil, C. (2016) Weapons of math destruction: how big data increases inequality and threatens democracy. First edition. New York: Crown.

Pandora (2022) About Pandora, Pandora. Available at: https://www.pandora.com/about (Accessed: 24 March 2022).

Pariser, E. (2012) The filter bubble: what the Internet is hiding from you. London: Penguin books.

Park, Y.-J. and Tuzhilin, A. (2008) ‘The long tail of recommender systems and how to leverage it’, in Proceedings of the 2008 ACM conference on Recommender systems - RecSys ‘08. the 2008 ACM conference, Lausanne, Switzerland: ACM Press. doi:10.1145/1454008.1454012.

Pasquale, F. (2016) The black box society: the secret algorithms that control money and information. First Harvard University Press paperback edition. Cambridge, Massachusetts London, England: Harvard University Press.

Pedersen, R.R. (2020) ‘Datafication and the push for ubiquitous listening in music streaming’, MedieKultur: Journal of media and communication research, 36(69), pp. 71-89. doi:10.7146/mediekultur.v36i69.121216.

Pelly, L. (2018) Discover Weakly: Sexism on Spotify, The Baffler. Available at: https://thebaffler.com/latest/discover-weakly-pelly (Accessed: 24 March 2022).

Petre, C., Duffy, B.E. and Hund, E. (2019) ‘“Gaming the System”: Platform Paternalism and the Politics of Algorithmic Visibility’, Social Media + Society, 5(4). doi:10.1177/2056305119879995.

Picard, R. and Wikström, P. (2008) ‘Determinants of Domestic Music Repertoire: An Empirical Analysis’, in. 8th World Media Economics and Management Conference, Lisbon.

Poell, T., Nieborg, D.B. and Duffy, B.E. (2022) Platforms and cultural production. Medford: Polity Press.

Porcaro, L., Castillo, C. and Gómez, E. (2021) ‘Diversity by Design in Music Recommender Systems’, Transactions of the International Society for Music Information Retrieval, 4(1), pp. 114-126. doi:10.5334/tismir.106.

Prey, R. (2016) ‘Musica Analytica: The Datafication of Listening’, in Nowak, R. and Whelan, A. (eds) Networked Music Cultures. London: Palgrave Macmillan UK, pp. 31-48. doi:10.1057/978-1-137-58290-4_3.

Prey, R. (2018) ‘Nothing personal: algorithmic individuation on music streaming platforms’, Media, Culture & Society, 40(7), pp. 1086-1100. doi:10.1177/0163443717745147.

Prey, R. (2020) ‘Locating Power in Platformization: Music Streaming Playlists and Curatorial Power’, Social Media + Society, 6(2). doi:10.1177/2056305120933291.

Reynolds, S. and Press, J. (1995) The sex revolts: gender, rebellion, and rock’n’roll. Cambridge, Mass: Harvard Univ. Press.

Ricci, F., Rokach, L. and Shapira, B. (2015) ‘Recommender Systems: Introduction and Challenges’, in Ricci, F., Rokach, L., and Shapira, B. (eds) Recommender Systems Handbook. Boston, MA: Springer US, pp. 1-34. doi:10.1007/978-1-4899-7637-6_1.

Rieder, B. (2020) Engines of Order. Amsterdam: Amsterdam University Press. doi:10.5117/9789462986190.

Rieder, B. and Hofmann, J. (2020) ‘Towards platform observability’, Internet Policy Review, 9(4). doi:10.14763/2020.4.1535.

Rieder, B., Matamoros-Fernández, A. and Coromina, Ò. (2018) ‘From ranking algorithms to “ranking cultures”: Investigating the modulation of visibility in YouTube search results’, Convergence: The International Journal of Research into New Media Technologies, 24(1), pp. 50-68. doi:10.1177/1354856517736982.

Robertson, A. (2020) Facebook’s Oversight Board takes its first six cases, The Verge. Available at: https://www.theverge.com/2020/12/1/21755133/facebook-oversight-board-supreme-court-first-cases-hate-speech-pandemic-misinformation (Accessed: 10 April 2022).

Roessler, P. (2007) ‘Media Content Diversity: Conceptual Issues and Future Directions for Communication Research’, Annals of the International Communication Association, 31(1), pp. 464-520. doi:10.1080/23808985.2007.11679073.

Roth, C., Mazières, A. and Menezes, T. (2020) ‘Tubes and bubbles topological confinement of YouTube recommendations’, PLOS ONE, 15(4). doi:10.1371/journal.pone.0231703.

Sánchez-Moreno, D., López Batista, V., Vicente, M.D.M., Sánchez Lázaro, Á.L. and Moreno-García, M.N. (2020) ‘Exploiting the User Social Context to Address Neighborhood Bias in Collaborative Filtering Music Recommender Systems’, Information, 11(9), p. 439. doi:10.3390/info11090439.

Schafer, J.B., Frankowski, D., Herlocker, J. and Sen, S. (2007) ‘Collaborative Filtering Recommender Systems’, in Brusilovsky, P., Kobsa, A., and Nejdl, W. (eds) The Adaptive Web: Methods and Strategies of Web Personalization. Berlin, Heidelberg: Springer (Lecture Notes in Computer Science), pp. 291-324. doi:10.1007/978-3-540-72079-9_9.

Schedl, M. and Bauer, C. (2017) ‘Distance- and Rank-based Music Mainstreaminess Measurement’, in Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization. UMAP ‘17: 25th Conference on User Modeling, Adaptation and Personalization, Bratislava Slovakia: ACM, pp. 364-367. doi:10.1145/3099023.3099098.

Schedl, M., Hauger, D., Farrahi, K. and Tkalčič, M. (2015) ‘On the Influence of User Characteristics on Music Recommendation Algorithms’, in Hanbury, A., Kazai, G., Rauber, A., and Fuhr, N. (eds) Advances in Information Retrieval. Cham: Springer International Publishing (Lecture Notes in Computer Science), pp. 339-345. doi:10.1007/978-3-319-16354-3_37.

Schedl, M., Knees, P., McFee, B., Bogdanov, D. and Kaminskas, M. (2015) ‘Music Recommender Systems’, in Ricci, F., Rokach, L., and Shapira, B. (eds) Recommender Systems Handbook. Boston, MA: Springer US, pp. 453-492. doi:10.1007/978-1-4899-7637-6_13.

Schedl, M., Zamani, H., Chen, C.-W., Deldjoo, Y. and Elahi, M. (2018) ‘Current challenges and visions in music recommender systems research’, International Journal of Multimedia Information Retrieval, 7(2), pp. 95-116. doi:10.1007/s13735-018-0154-2.

Scholz, T. and Schneider, N. (eds) (2017) Ours to Hack and to Own: The Rise of Platform Cooperativism, A New Vision for the Future of Work and a Fairer Internet. OR Books. doi:10.2307/j.ctv62hfq7.

Seaver, N. (2015) Computing Taste: The Making of Algorithmic Music Recommendation. PhD Dissertation. UC Irvine.

Seaver, N. (2017) ‘Algorithms as culture: Some tactics for the ethnography of algorithmic systems’, Big Data & Society, 4(2). doi:10.1177/2053951717738104.

Seaver, N. (2019) ‘Captivating algorithms: Recommender systems as traps’, Journal of Material Culture, 24(4), pp. 421-436. doi:10.1177/1359183518820366.

Shakespeare, D., Porcaro, L., Gómez, E. and Castillo, C. (2020) ‘Exploring Artist Gender Bias in Music Recommendation’, arXiv:2009.01715 [cs] [Preprint]. Available at: http://arxiv.org/abs/2009.01715 (Accessed: 13 March 2022).

Siles, I., Segura-Castillo, A., Solís, R. and Sancho, M. (2020) ‘Folk theories of algorithmic recommendations on Spotify: Enacting data assemblages in the global South’, Big Data & Society, 7(1). doi:10.1177/2053951720923377.

Spotify Newsroom (2018a) ‘Celebrating a Decade of Discovery on Spotify’, For the Record, 10 October. Available at: https://newsroom.spotify.com/2018-10-10/celebrating-a-decade-of-discovery-on-spotify/ (Accessed: 25 March 2022).

Spotify Newsroom (2018b) ‘Discover Hits From Around the World With Spotify’s Global Cultures Initiative’, For the Record, 28 September. Available at: https://newsroom.spotify.com/2018-09-28/discover-hits-from-around-the-world-with-spotifys-global-cultures-initiative/ (Accessed: 25 March 2022).

Spotify Newsroom (2018c) ‘EuroPride 2018: Identify-ing the Music of Diversity’, For the Record, 2 August. Available at: https://newsroom.spotify.com/2018-08-02/europride-2018-identify-ing-the-music-of-diversity/ (Accessed: 25 March 2022).

Spotify Newsroom (2018d) ‘Represent! Celebrate the Diversity of Latin Music for Hispanic Heritage Month’, For the Record, 9 October. Available at: https://newsroom.spotify.com/2018-10-09/represent-celebrate-the-diversity-of-latin-music-for-hispanic-heritage-month/ (Accessed: 25 March 2022).

Street, J. and Phillips, T. (2016) ‘What Do Musicians Talk About When They Talk About Copyright?’, Popular Music and Society, 40(4), pp. 422-433. doi:10.1080/03007766.2015.1126099.

Streeter, T. (2011) The Net effect: romanticism, capitalism, and the internet. New York: NYU Press.

Sunstein, C.R. (2001) Echo chambers: Bush v. Gore, impeachment, and beyond. Princeton, N.J.: Princeton University Press.

Tan, S., Bu, J., Chen, C., Xu, B., Wang, C. and He, X. (2011) ‘Using rich social media information for music recommendation via hypergraph model’, ACM Transactions on Multimedia Computing, Communications, and Applications, 7S(1), pp. 1-22. doi:10.1145/2037676.2037679.

Taramigkou, M., Bothos, E., Christidis, K., Apostolou, D. and Mentzas, G. (2013) ‘Escape the bubble: guided exploration of music preferences for serendipity and novelty’, in Proceedings of the 7th ACM conference on Recommender systems. New York, NY, USA: Association for Computing Machinery (RecSys ‘13), pp. 335-338. doi:10.1145/2507157.2507223.

TikTok (2020) How TikTok recommends videos #ForYou, Newsroom - TikTok. Available at: https://newsroom.tiktok.com/en-us/how-tiktok-recommends-videos-for-you (Accessed: 11 February 2021).

Tofalvy, T. and Koltai, J. (2021) ‘“Splendid Isolation”: The reproduction of music industry inequalities in Spotify’s recommendation system’, New Media & Society [Preprint]. doi:10.1177/14614448211022161.

Van Couvering, E. (2007) ‘Is Relevance Relevant? Market, Science, and War: Discourses of Search Engine Quality’, Journal of Computer-Mediated Communication, 12(3), pp. 866-887. doi:10.1111/j.1083-6101.2007.00354.x.

Vargas, S., Baltrunas, L., Karatzoglou, A. and Castells, P. (2014) ‘Coverage, redundancy and size-awareness in genre diversity for recommender systems’, in Proceedings of the 8th ACM Conference on Recommender systems - RecSys ‘14. the 8th ACM Conference, Foster City, Silicon Valley, California, USA: ACM Press, pp. 209-216. doi:10.1145/2645710.2645743.

Villermet, Q., Poiroux, J., Moussallam, M., Louail, T. and Roth, C. (2021) ‘Follow the guides: disentangling human and algorithmic curation in online music consumption’, in Fifteenth ACM Conference on Recommender Systems. RecSys ‘21: Fifteenth ACM Conference on Recommender Systems, Amsterdam Netherlands: ACM, pp. 380-389. doi:10.1145/3460231.3474269.

Way, S.F., Garcia-Gathright, J. and Cramer, H. (2020) ‘Local Trends in Global Music Streaming’, Proceedings of the International AAAI Conference on Web and Social Media, 14, pp. 705-714.

Webster, J., Gibbins, N., Halford, S. and Hracs, B.J. (2016) ‘Towards a theoretical approach for analysing music recommender systems as sociotechnical cultural intermediaries’, in Proceedings of the 8th ACM Conference on Web Science. WebSci ‘16: ACM Web Science Conference, Hannover Germany: ACM, pp. 137-145. doi:10.1145/2908131.2908148.

Werner, A. (2020) ‘Organizing music, organizing gender: algorithmic culture and Spotify recommendations’, Popular Communication, 18(1), pp. 78-90. doi:10.1080/15405702.2020.1715980.

White, D.M. (1950) ‘The “Gate Keeper”: A Case Study in the Selection of News’, Journalism Quarterly, 27(4), pp. 383-390. doi:10.1177/107769905002700403.

Whittaker, M., Crawford, K., Dobbe, R., Fried, G., Kaziunas, E., Mathur, V., Myers-West, S., Richardson, R., Schultz, J. and Schwartz, O. (2018) AI Now Report 2018. AI Now. Available at: https://ainowinstitute.org/AI_Now_2018_Report.pdf (Accessed: 24 March 2022).

Wikström, P., Moreau, F. and Borreau, M. (2018) ‘Acoustic diversity of Western Popular music during a period of digital transformation’, in. The 20th International Conference on Cultural Economics, Melbourne, Australia.

Wishwanath, C.H.P.D. and Ahangama, S. (2019) ‘A Personalized Music Recommendation System based on User Moods’, in 2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer). 2019 19th International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka: IEEE. doi:10.1109/ICTer48817.2019.9023727.

Zhang, Yang, Feng, F., He, X., Wei, T., Song, C., Ling, G. and Zhang, Yongdong (2021) ‘Causal Intervention for Leveraging Popularity Bias in Recommendation’, arXiv:2105.06067 [cs] [Preprint]. doi:10.1145/3404835.3462875.

Zhang, Y.C., Séaghdha, D.Ó., Quercia, D. and Jambor, T. (2012) ‘Auralist: introducing serendipity into music recommendation’, in Proceedings of the fifth ACM international conference on Web search and data mining. New York, NY, USA: Association for Computing Machinery (WSDM ‘12), pp. 13-22. doi:10.1145/2124295.2124300.

Zuboff, S. (2019) The age of surveillance capitalism: the fight for a human future at the new frontier of power. First edition. New York: PublicAffairs.

Zuiderveen Borgesius, F.J., Trilling, D., Möller, J., Bodó, B., de Vreese, C.H. and Helberger, N. (2016) ‘Should we worry about filter bubbles?’, Internet Policy Review, 5(1). doi:10.14763/2016.1.401.

  1. The CDEI report on algorithmic bias discussed four sectors: policing, social services, finance and recruitment. It argued that algorithmic bias needs to be understood in relation to other sources of bias, and that there is now an opportunity to recognise and mitigate bias in some key areas. 

  2. The meaning of “critical” is discussed in section 2.1 below. 

  3. YouTube of course hosts a vast amount of content besides music, and it operates both a music subscription service, and a “free”, advertising-supported tier used by vast numbers of people. In addition, TikTok makes available vast amounts of visual content – though very often accompanied by music. MSPs can more accurately be described as streaming audio rather than music, because of the increasing amount of spoken word content they contain, especially podcasts. However, our focus here is on music, and we do not consider podcasts. 

  4. “Critical AI studies” and “platform studies” are terms also widely used to describe relevant areas of research. 

  5. The main conference for critical research in this area is probably the annual conference of the Association of Internet Researchers (AoIR). Many researchers in this field identify themselves as part of a wider interdisciplinary field of social science and humanities known as science and technology studies (STS). 

  6. For an early critical study of these criteria of quality, in the context of search engines, see van Couvering (2007). 

  7. There is now a flourishing conference of the Association for Computing Machinery (the vast scholarly association for computer science) that goes by this acronym FAccT - Fairness, Accountability and Transparency. 

  8. As is evident for example in the recent CDEI report on “bias in algorithmic decision-making” (CDEI 2020). 

  9. Garcia-Gathright and her colleagues are computer scientists seeking greater conceptual sophistication in the notion of bias, by drawing on interdisciplinary research. They are by no means alone in this endeavour, as the efforts of contributors to the FAccT conference mentioned earlier attest. 

  10. This wording is used in the submissions by Angela Reith, Anna Neale, Anonymous, Council of Music Makers, Fran O’Hanlon, Gareth Bonello, Iain Archer, Independent Music Publishers Forum, Irish Music Rights Organisation, Isaac Anderson, Joe Newman, Joshua Magill, Josienne Clarke, Just East of Jazz, Luke Williams, Natalia Wierzbicka, Oliver Julian, Renee Sheehan, and Wendy Kirkland. 

  11. Few users would now explicitly endorse the idea that algorithms offer objectivity - but it seems many of us still act upon that promise, behaving as though the results of algorithms are as near to objective as we can reasonably find within the time available. 

  12. We discuss related efforts towards “platform observability” in Section 4.2 below. See Eriksson and Johansson (2017) for another bot experiment. 

  13. A monograph based on Seaver’s 2015 doctoral dissertation and his further research and thinking since its completion is due in late 2022. 

  14. The term “gatekeeper” derives from White (1950), a study of how an employee of a newspaper decided which telegraphed news items to select as possible news stories. The study and the concept have been heavily criticised since, but the term has stuck, and has been used by some critical researchers investigating music streaming (e.g. Bonini and Gandini 2019). 

  15. Users can create and share their own playlists very easily on MSPs, but the more significant use of the term refers to playlists that are created by the platforms themselves. Playlists accounted for 31% of music listening time among USA listeners in 2016, compared with 22% for albums, according to the Music Business Association (Schedl et al. 2018: 5). 

  16. The “majors” are currently Universal, Sony and Warner, the three record companies that account for over 70% of global recorded music revenues. 

  17. For example, the testimony of musician Anna Neale. 

  18. Based on his ethnographic research, including a presentation and publication by Spotify recommender system engineers, Hodgson (2021: 6) offers a somewhat different breakdown of techniques, emphasising collaborative filtering, the natural language processing used to analyse what Schedl, Knees et al. (2015) present as content and context, and “audio analysis” or MIR techniques. 

  19. It is regularly claimed that Spotify and Apple Music host over 70 million audio tracks, the vast majority of which is music. Yet UK Official Charts Company data obtained by the Music Creators’ Earnings project show how few tracks receive streams. For example, in the UK in October 2020 only 4.28 million music recordings were streamed at all, and only around 395,000 were played more than a thousand times (Hesmondhalgh et al. 2021: 202). In music, the long tail is extremely long and thin. 

  20. The US streaming service Pandora used these content-based methods more than any other streaming service in its “music genome project” (Pandora 2022). 

  21. A significant development in Spotify’s consolidation of its music streaming pre-eminence was its purchase in 2014 of an MIR company, Echo Nest, an MIT start-up based in Greater Boston. 

  22. Apparently, MIR as a field of computer science refers only to musical content as content, and refers to metadata as metadata, whereas the field of music recommendation research tends to refer to both as content (Schedl, Knees et al. 2015: 454). 

  23. In this paper, Schedl Knees et al. do not use the term “context” for these recommendation techniques, but Bauer and Schedl (2019) frame their study of culture-aware music recommendation as a type of “context-aware” MRS, and the psychologically-inspired and situation-aware categories clearly correlate with the discussion of context in Schedl, Knees et al. 

  24. The “cold start” problem is that of making inferences about (often new) users or items for which little information exists. 

  25. Whereas many film and book fans provide ratings – for example ratings out of five – via various websites and apps, music users tend to do this much less. 

  26. Spotify research scientists seem to publish more of their work on music streaming in scientific journals than do researchers in other MSPs. We offer no speculation on why this might be the case, but our own experience, and anecdotal evidence from colleagues suggests that Spotify engages far more with academic computer science and critical social science and humanities research than do other MSPs. 

  27. The Matthew Effect was a term coined by sociologists Robert Merton and Harriet Zuckerman, named after the Parable of the Talents in the Christian New Testament’s Gospel of Matthew, to refer to the way that in terms of fame, status and wealth, the least well-off find it difficult to catch up: “to those that have, more will be given”. 

  28. Written evidence submitted by musician Matthew Tong. 

  29. There are numerous chart-based playlists such as Spotify’s Global Top 50 but such lists appear to be created by extracting global streaming data rather than algorithmic recommendation. 

  30. Cremonesi et al. (2010) report that, perhaps surprisingly, “a naive non-personalized algorithm can outperform some common recommendation approaches and almost match the accuracy of sophisticated algorithms”. Canamares and Castells (2014) report that “item popularity is actually an (intentional or accidental) ingredient of many state of the art recommendation algorithms” and discuss factors behind such effectiveness. 

  31. In the field of film recommendations, the Movieset database from 2000 was widely used; in music recommendation, without doubt the most discussed datasets are derived from Last.fm. Until recently, these were relatively small and/or out of date, but a new, much larger Last.fm derived LFM-2b database (see Melchiorre et al. 2021), covering 2005-2020 offers potential for better simulations. 

  32. In his 2010 book on music recommender systems, Celma advocated a shift towards evaluating recommender systems on the basis of “perceived quality” rather than predictive accuracy, and he associated perceived quality with items that were both personalised and novel. This perhaps represented a shift in thinking in tech companies and product teams. 

  33. A later study by Spotify research scientists (Anderson et al. 2020) distinguished between generalists and specialists. Academic computer scientists Schedl and Bauer (2017) built on their earlier work differentiating mainstream-oriented listeners, those unconcerned with consuming the most popular music, from “niche consumers”, those preferring music not much consumed by others, and which sought to measure the proximity of a user’s music preference to the musical mainstream (“mainstreaminess”). 

  34. See Hesmondhalgh 2020 and Hesmondhalgh et al. 2021 for discussion of these concerns. 

  35. An under-explored issue appears to be the potential for fruitful collaboration between critical researchers working on these concepts and computer science researchers of human-computer interaction (HCI), who often seem to rely on positivist forms of psychology. 

  36. A flavour is provided by this quotation from a blog (Bagal 2019): “When digital music services started offering listeners millions of songs on demand, it became really easy to get lost in the chaos induced by this abundance of choice. Labels realized this, and molded the creative process to better suit the nature of the medium, reducing pop music to short, easily digestible bits of information, offering listeners instant gratification with no complexity, enabling an ‘easy listening experience’”. 

  37. For example, in a study of book recommendation, Ekstand and Kluver (2020) found that common CF algorithms “tend to propagate at least some of each user’s tendency to rate or read male or female authors into their resulting recommendations”, though that tendency differed by algorithm. 

  38. Registration processes increasingly offer non-binary choices. Eriksson and Johansson (2017) report that only as late as autumn 2016 did Spotify introduce a third gender field of “non-binary” in their compulsory registration page. Perhaps for this reason, we have found no studies that address biases concerning non-binary and genderqueer artists and listeners. 

  39. Whereas many people would be troubled by a gender bias that reproduces societal sexism, home bias may well be perceived as positive in the context of US domination, because in most countries, a significant focus on domestic repertoire might increased diversity, and present experiences and subjectivities that are “local” and might be sidelined by globalised content. That bias can be used to describe positive and negative outcomes in this way might be seen by some as another problem of the term. 

  40. A fine critical study (Dhaenens and Burgess 2019) explores issues of sexuality but in relation to playlists rather than MRS. 

  41. Pelly (2018) approached the project by listening from a brand new account in order to confirm that gender bias would be reproduced by way of algorithmic recommendations – that when a user listens to mostly male-dominated playlists, what is produced are yet more male-dominated playlists. However, many of the playlists produced by Pelly appear to be humanly-created playlists rather than algorithmic or automated ones. Eriksson and Johansson (2017) describe a more elaborate and careful procedure for imputing gender, perhaps reflecting their greater resources as academics. They also offer theoretical rigour regarding gender. 

  42. This found that the relatively low female share of successful songs at Spotify “mainly arose from the relatively low female share of songs on the platform as a whole rather than anti-female bias in playlist decisions”. NB this is a study of the effects of playlists not of algorithmically-driven music recommendation. 

  43. “Digitalization” is a term widely used in the context of music to refer to a shift away from sales of “physical” recorded music artefacts such as CDs, towards digital distribution, from the end of the 1990s onwards. This involved extensive “free” downloading of copyrighted recorded music, and eventually online sales of digital files - sometimes known as “the iTunes model”. That model quickly mutated into subscription and advertising-supported streaming from around 2010 onwards, as MSPs became established, and the term “platformisation” has become increasingly popular in critical research for the rather different issues raised in this later phase. For this part of the literature review, the key matter is how platformisation and the MRS integral to it may be affecting dynamics of gender, location, nationality and other factors. 

  44. Echo chambers refers to the concern, often attributed in classic form to Sunstein (2001) that many users of digital media would tend to use the Internet not to “expand their horizons” but in a way that would favour content “specifically tailored to their own interests and prejudices”, with implications for the ability of societies to deliberate and negotiate conflicting viewpoints. “Filter bubbles”, a term popularised by Pariser (2012) emphasises the role of “a new generation of Internet filters”, roughly corresponding to what would later tend to be called algorithms, in creating “a unique universe of information for each of us”.