Abstract

The BBC is the world’s largest public service broadcaster. Every week it reaches more than 90% of the UK’s adult population and 489 million people worldwide. To ensure our audiences get the most engaging experience, our team develops recommender systems which aim to provide users with the most relevant pieces of content among the thousands the BBC publishes every day. All BBC output should serve the organization’s mission to “act in the public interest, serving all audiences through the provision of impartial, high-quality and distinctive output and services which inform, educate, and entertain.” Recommendations make no exception and, since they determine what our audiences see, they are in effect editorial choices at scale. How can we ensure that our recommendations are consistent with our mission and public service values, avoiding some of the harmful effects that might be associated with recommenders? In addressing this question, we identified two main challenges: 1) methodological challenges: public service values are hard to measure through specific metrics, therefore we have no clearly defined optimization function for our recommenders; 2) cultural/operational challenges: domain knowledge around public service values sits with our editorial staff, whereas data scientists are the recommendations specialists. We need to create a shared understanding of the problem and a common language to describe objectives and solutions across data science and editorial. Our paper describes the approach we devised to tackle these challenges, presenting a use case from our work on a BBC product, and reporting the lessons learned.

1 Introduction

Recommendations have become a common feature of online media platforms to the extent that users now expect to find recommended content as part of a personalized experience (Jones, 2022). Online digital content providers, such as Netflix, Amazon, YouTube, and Spotify, have used recommender systems for years to sift through their large catalogs and surface relevant and tailored content to their users.

The BBC produces thousands of new pieces of content every day, published via a range of services, including: BBC News, BBC Sport, BBC Sounds , BBC iPlayer, and World Service in different formats (including text, audio, and video). With such a vast quantity of content, and primarily with the aim to match relevant and engaging content to each user, unable to maintain a comprehensive view of this catalog and easily find what is right for them, our organization has deployed recommenders across its online portfolio.

Recommender systems in commercial settings are optimized to maximize engagement, i.e., “a set of user behaviors, generated in the normal course of interaction with the platform, which are thought to correlate with value to the user, the platform, or other stakeholders” (Stray, et al., 2022). However, the BBC is a public service organization focused on delivering against a set of values and purposes consistent with its public service remit. We want our recommendations to fulfill our public purposes and support our values and our mission to inform, educate, and entertain our audiences (British Broadcasting Corporation, 2023).

Metrics typically used to measure engagement, such as click-through-rate (CTR) or time spent (Davidson, et al., 2010), fall short from measuring the extent to which our aims are achieved. What should we optimize our recommendation algorithms for? And how? Answering these questions is primarily a methodological challenge: How do weencode these goals and values into our recommendation algorithms? The solutions are not only of technical nature, but also a socio-technical one, as the process we follow influences the technical artifacts created. Therefore, this entails a further challenge, mainly cultural and organizational: What is the best way to bring together the relevant disciplines (editorial, data science, and product) to ensure that the relevant domain knowledge and, therefore our public purposes, are reflected in our recommender systems?

Addressing these challenges is relevant beyond the specific case of the BBC. First, on the technical side, “there is little public documentation of the actual processes by which values are engineered into large recommender systems” (Stray, et al., 2022). This paper seeks to address that gap, by reporting on our approach and lessons learned in developing recommenders in a public service organization. Second, public service media organizations across the globe have been facing several challenges, including losing audiences to online digital content providers, budget cuts, regulations framed for analogue broadcasting, and threats to their independence (Jones, 2022). In this context, recommender systems may help create additional value for public service media, allowing them to offer more personalized content discovery experiences to their audiences. However, many open questions remain around the aims and the process to develop recommendations in the public service (Jones, 2022)—our paper contributes to addressing some of those questions. Finally, in terms of the societal effects of recommender systems, every week the BBC reaches more than 90% of the UK’s adult population (British Broadcasting Corporation, 2023) and 489 million people worldwide (British Broadcasting Corporation, 2021). This amplification of content may significantly influence society within the UK and beyond.

The paper is structured as follows. Section 2 covers the background to this paper, both in terms of the technical aspects of recommender systems and the BBC’s unique position in relation to providing recommendations. Section 3 highlights the unique problem space of embedding the BBC’s values and goals into recommender systems. Section 4 dives deeper into this challenge, highlighting the specific nature of the values from an editorial, product, and data science perspective, and Section 5 discusses the approach we have taken.

2 Background

2.1 The BBC’s values and public purposes

The BBC is a values-driven organization. The BBC’s guiding principles relate to delivering public service value to audiences, meaning that our services should be available to everyone, regardless of socio-economic factors, heritage, background, or community. It has a strong trusted, creative, cultural, and societal identity, with a reputation for operating in the public interest that reaches beyond the UK. 

The BBC’s very existence is enshrined in law via its Royal Charter (British Broadcasting Corporation, 2023), which specifies that “the Mission of the BBC is to act in the public interest, serving all audiences through the provision of impartial, high-quality and distinctive output and services which inform, educate and entertain.” The Charter also outlines the Corporation’s public purposes: 

  • To provide impartial news and information to help people understand and engage with the world around them.
  • To support learning for people of all ages.
  • To show the most creative, highest quality, and distinctive output and services.
  • To reflect, represent, and serve the diverse communities of all of the United Kingdom’s nations and regions and, in doing so, support the creative economy across the United Kingdom.
  • To reflect the United Kingdom, its culture, and values to the world.

With all of this in mind, our approach to content for our broadcasting and digital media services is different to more commercial media organizations. It makes our approach to recommendations different too. 

2.2 How recommender systems work

Recommender systems are ubiquitous: they are commonly used across social media (Lada, Wang, & Yan, 2021), music and video streaming services (Schedl, Knees, McFee, Bogdanov, & Kaminskas, 2015), online shopping websites, and ad targeting (Zhou, Ding, Tang, & Yin, 2018). Recommendations help users navigate through large quantities of products or content, narrowing down the material they are presented with. In terms of filtering information for users, recommenders are similar to search engines, albeit without the reliance on an explicit prompt or search query, and may be driven by both implicit and explicit user preferences (Jannach, Zanker, Felfernig, & Friedrich, 2011).

A typical architecture of recommender systems—see e.g., (Lada, Wang, & Yan, 2021) and (Google, 2020)—consists of a sequence where each subsequent step subsets or re-ranks a selection of content generated by the previous step:

  1. Candidate generation: the system selects a set of candidate content items from a large corpus.
  2. Ranking: the candidates are scored, ranked based on their scores, selecting a subset closer to the number of items which will be shown to the user.
  3. Re-ranking/post-processing: a final post-processing step may be included to ensure the final set of recommendations fulfill some business criteria.

Ranking is based on scores which are optimized according to a desired objective. In commercial settings, this is often engagement, which may be measured by different metrics, depending on the type of platform and the signal which is deemed to be the most indicative for the desired user behavior. In a great many cases this metric is click-through-rate. Beyond engagement, other priorities may impact the development of a recommender system, such as the need to provide users with a diverse set of content (Vrijenhoek, et al., 2021), to ensure users have a varied and engaging experience (Ziarani & Ravanmehr, 2021), or to satisfy other business needs. To that end, the recommender system may include a step to re-rank the set of candidates, optimizing this time for a different metric, such as diversity, novelty, or serendipity (Kaminskas & Bridge, 2016), or apply business rules which may filter out undesired content (Boididou, Sheng, Mercer Moss, & Piscopo, 2021).

2.3 Socio-technical challenges of recommender systems

2.3.1 Why socio-technical?

In the Introduction, we have used the term “socio-technical” to describe how we develop recommenders at the BBC. This section explains what we mean by using that term. We work with a socio-technical perspective in the sense that both the social interactions between the disciplines and the organizational requirements enable the development of recommender systems at the BBC and the technological artifacts result from those circumstances (Piscopo A. , 2019). This intertwinement of technical solutions and social and organizational elements determines the shape of the recommender systems that are developed by our team. In other words, it is a combination of policies, editorial guidelines, and business objectives which determine the problem space of our recommenders for the BBC and in turn the possible technical approaches we could use as a consequence. 

With that we can see our team as part of a larger system—the BBC—which is itself part of an even larger one, i.e., its audiences and the other organizations and governing bodies that determine its structure and policies. Understandably, these systems are extremely complex and, whereas we understand how those may influence our team and its activities, we keep the scope of the current paper limited to the product manager, editors, and data scientists who specifically work on developing recommenders.

2.3.2 Recommender systems in context

The debate about the ethical, social, and legal aspects of recommender systems is ongoing. Recommender systems, such as those commonly used in social media (Huszár, et al., 2022) or online media platforms (Medveded, Wu, & Gordon, 2019) (Twitter, 2023), rank content higher to maximize a certain user behavior (e.g., engagement) based on user preference. By consistently ranking some content higher, recommenders may amplify that content, while reducing the visibility of other content (Huszár, et al., 2022).

Underlying biases in input data and in the algorithm used may affect fairness of recommendations, with different categories of users, e.g., according to gender, age, or ethnicity, which means that they get recommendations of differing quality (Di Noia, Tintarev, Fatourou, & Schedl, 2022). Works such as (Mansoury, Abdollahpouri, Pechenizkiy, Mobasher, & Burke, 2020) have identified three stages in the recommendations lifecycle where 1. user interactions data is collected; 2. this data is fed into a recommendation algorithm; 3. the algorithm in turn generates the recommendations users interact with. These three stages create a feedback loop, where bias is created and amplified at each stage of the loop (Mansoury, Abdollahpouri, Pechenizkiy, Mobasher, & Burke, 2020). The authors of (Chen, et al., 2023) identify seven types of bias, depending on where they are generated in the feedback loop, their cause, and their effects: selection bias, exposure bias, conformity bias, position bias, inductive bias, popularity bias, and unfairness—for each of these types, they also provide the primary technical solutions (Chen, et al., 2023).

The effects of this feedback loop, where users discover and interact with content primarily through recommendations, has been called preference amplification: as users consume increasingly more recommended content, this is interpreted by the algorithm as a positive signal, leading it to surface even more content of the same type. This loop may likely narrow users’ interests toward the recommended content (filter bubble) and even reinforce their own existing views (echo chambers) (Kalimeris & Bhagat, 2021).

In recent years, the possible effects of recommender systems on society have been recognised at the legislative level. Provisions around the role of and obligations connected to the use of recommender systems are included in the EU Digital Services Act (European Parliament; Council of the European Union, 2022), which stresses the duty of online platforms hosting recommenders to provide their users with enough information about how recommendations are created and with the ability to influence that (e.g., by opting out). In the U.S., a recent petition to the Supreme Court asks the judges to rule upon the question of whether online content providers should be deemed liable for amplifying harmful or potentially criminal content through the use of recommendation algorithms ( Gonzalez v. Google, (Samuelson, 2023)). Such cases show that recommender systems may have unintended damaging effects on individuals and on society as a whole.

3 Problem Statement: Optimizing for what?

As a public service organization, the BBC clearly wants to avoid these individual and societal harms. But what, overall, are we optimizing for? For the BBC there is a temptingly simple answer—we are optimizing for our public service values.

In addition to the values and purpose discussed in Section 2.1, the BBC’s content must be “duly accurate,” provide a “range and depth of analysis and content” and adhere to “the highest editorial standards” (British Broadcasting Corporation, 2023). The latter cannot be interpreted without turning to the BBC’s editorial guidelines (British Broadcasting Corporation, 2023), where the list of values lengthens again: fairness, accuracy, impartiality, avoiding causing unjustifiable offense, and more.

Additionally, the BBC must consider legal obligations—e.g., preventing contempt of court or libel—and regulatory considerations—e.g., the UK broadcast regulator OFCOM has recently published guidance about how the BBC’s mission and public purposes should apply to our digital products (Ofcom, 2023).

Helpfully, and unlike many organizations deploying recommenders, the BBC controls the content catalog we are recommending from—we operate a “closed” recommendation system (Jones, 2022). The content we recommend, made by or commissioned for the BBC, already reflects our values. Individual pieces of content are thoroughly edited and, in one way or another, are creative, support learning, or showcase the UK’s culture and values.

However, when content is aggregated automatically by recommender systems there may be unintended consequences: amplifying some content, down-weighting others, or displaying content that should not be shown to certain groups (e.g., children) or in particular contexts (e.g., when recommending particularly sensitive content). In addition, engagement, measured through click-through-rate, is only a very rough approximation of our definition of value and can hardly be a complete signal of the many goals our recommenders should pursue. This represents a significant methodological challenge: How should goals and values be encoded into the BBC’s recommendation algorithms?

Overlapping with both values and regulatory considerations are issues arising from the algorithmic approach itself. This is a socio-technical challenge, with both organizational and cultural factors to consider. The process we follow in the development of machine learning products influences the technical solutions created. With our Machine Learning Engine Principles, (British Broadcasting Corporation, 2021) the BBC is committed to taking a responsible approach to the development and deployment of AI/ML, where issues such as bias and fairness must be considered. This requires that we bring together a variety of experts and voices to do so effectively. Therefore, a second question for the BBC is: How should different points of view (those represented by editorial, data, and product) be brought together to ensure relevant domain knowledge is represented in our recommender systems?

4 A Polyphony of Voices

The approach we take to address the challenges outlined above is primarily based on the close collaboration of different professionals—namely data scientists, editorial managers, and product managers . While all working towards the same objective, i.e., delivering recommender systems for services across the BBC, they have distinct approaches and, to some extent, differing priorities, like in a polyphony. To convey this tension, in this section we represent the three viewpoints of these roles separately. We bring these perspectives together and discuss the implications of our approach in Section 5.

4.1 The data scientist’s voice

Data scientists (DS) drive the development and implementation of our recommender systems. They provide insights based on content and user data, define the technical requirements of each system, and scope out the possible solutions. This work may be broken down into different stages: problem definition; exploration; model selection; optimization; and model refinement. These stages can be partially mapped to the machine learning lifecycle in ( Fazelpour & De-Arteaga, 2022), with problem formulation in their pipeline matching our problem definition, model design and development, and partially training data, matching our model selection and optimization stage; the algorithm-informed decision phase from (Fazelpour & De-Arteaga, 2022) cannot be directly connected to any of the stages we refer to in our pipeline, but its underlying questions, i.e., “how does deployment affect beliefs, values, and configurations of social groups?” are considered throughout our development process.

Figure 1 A diagram representing the different stages of our data science development process. Please note that the process may not always include all the stages in this diagram, and we may even go back to a previous step (e.g., in refining a model, we may need to go back to the implementation and optimization stage, if needed. This typically happens when improving existing recommenders.
 

After the problem definition phase, in which all functions take part, DS move onto the exploration stage. One of the primary goals of this step is to sift recommendations approaches, identifying those which are fit for purpose, i.e., likely to improve on the performance of previous recommenders in terms of engagement, without compromising on other aspects. At this stage, DS rely primarily on offline accuracy, e.g., NDCG or MAP , and coverage metrics, and focus significantly on the technical aspects of the models.

For each algorithm they select, DS identify possible trade-offs. These are analyzed in the model selection stage, where the aim is to select the algorithm which best represents a definition of quality as formulated by editors and product managers, reflects the BBC’s values and public purposes (see Section 2.1), and avoids harmful effects. Other considerations may also come into play, such as the amount of infrastructural change needed to implement the new approach and its running costs.

The objective of the optimizationstage is to ensure that the chosen model generates recommendations that are relevant and engaging, but also reflect the audience's diversity, have the necessary breadth and depth to help audiences understand and engage with the world, and are impartial. In our experience, offline and online metrics normally used to measure quality of recommendations fail to capture the complexity of these requirements. This means that there is no established set of metrics capturing our values that we can optimize our models towards—hence, at present we ensure our recommenders meet a benchmark for editorial acceptance. An approach we have taken to achieve that, and foster wider collaboration with editorial, has been to build an internal web-based assessment tool to facilitate the collection of feedback. This tool allows users to evaluate individual recommended items by means of a custom-designed qualitative scale and free text. In early prototypes, this tool uses a snapshot of our engine, including only an editorially selected list of content, generating recommendations given these constraints.

Either after or simultaneously alongside the optimization stage, DS work on model refinement, where we incorporate further editorial advice, especially with the aim of avoiding harmful associations of content, potential legal infringements, and anything that may lead to reputational damage to the BBC. The most common approach is to re-rank or subset candidate items (as per step 3 from Section 2.2). Behavioral tests are added to our code to ensure that quality remains constant, and all business rules are applied correctly. This stage and the previous one (i.e., optimization and model refinement) are those where collaboration with editorial and product managers is closer, with frequent catch-ups and feedback sessions.

Our data science development process may not always include all these stages, and we may even go back to a previous step, if we have evidence that the approach we have taken does not lead to good results. This is part of an iterative approach, where we try to continuously improve the performance of our models or address their shortcomings, e.g., by implementing an approach to reduce popularity bias into one of our personalized recommenders, through a technique called Inverse Propensity Scoring (Yang, et al., 2018).

4.2 The product manager’s voice

Product managers lead on defining strategy and objectives, identifying new product opportunities, and ensure teams have the context needed to build successful products. This includes defining a product that successfully overcomes the ‘Four Big Risks’ associated with building successful products: feasibility, usability, viability, and value (Cagan, 2017). Often product managers focus most on the latter two. Viability is whether a solution fits with various aspects of the business and in the BBC, this includes complying with our editorial Guidelines. Value is whether a customer would pay to or choose to use a product. In the case of the BBC, funded by a license fee paid by UK households, the relationship between payment and use of the product is complex. We typically find it more useful to consider whether a customer will choose to use the product and forgo the opportunity cost, primarily time that could be spent elsewhere.

One way to view recommendations in the context of the BBC is as means to maximize the value each user gets from our service. The BBC has a wealth of rich high-quality content, which we know our users love. However, they often struggle to find everything that may be useful or interesting to them. This is the problem recommendations aim to solve.

The most naïve approach to tracking our progress in solving this problem is to measure and optimize for engagement on content recommendations (measured through click through rate). Albeit it is limited with regard to understanding the aspects of user consumption which relate to our public purposes, this is our current, default approach in our live systems . This metric only measures impact within a single session, though, and optimizing for that may possibly lead to showing users content they know they want or have high awareness of already. Another, more advanced, approach would be to measure and optimize for retention—whether seeing or interacting with a recommendation impacts the likelihood of a user returning within a particular time frame. Spotify has developed an approach based on a metric capturing the time it takes for a user to become inactive for an entire week (Chandar, et al., 2022). While retention may provide a more meaningful signal as to whether our recommenders achieve our goals in a live system, they would likely still return a limited picture of their effects. Working with data scientists to define the desired user experience, and how we define whether it has been successful will continue to be crucial.

The other rare challenge in the BBC is our obligation to universality. We need to serve value to everyone in the UK, rather than prioritizing an easier-to-serve subsection. This is a rare challenge among our competitors who would typically choose a target market that is smaller and more homogenous than the entire UK population. This is not a problem we have good solutions for, but something we are very aware of. Among many other benefits, this is a key reason for collaborating closely with editors as they are experts in our obligations and how to meet them.

4.3 The editor’s voice

Editors provide the domain knowledge which drives the data scientists’ work with regard to recommendations quality and their adherence to editorial guidelines. For editorial colleagues there are inherent risks to any automated experience whether it is a personalized curation, related content, topics, or something else. Building trust and good communication and working towards a productive working relationship with editorial as equal partners is the best way to manage the viability in this area.

The complexity of the BBC’s output and purposes are such that we will need to leverage the intelligence of humans and combine it with the colossal power of machines. The BBC is an editorial organization and we’ve learned that output from our recommenders is most in line with our values when we value human skills such as editorial judgement and the specialist knowledge of content creators, curators and journalists in the process of development.

In-depth editorial knowledge is essential as so many factors can have an impact on tailoring and adjusting algorithms: the length of the text of an article, why editorial choose certain pieces of content to prioritize, production workflows, re-organization in editorial areas, content management systems, and the application of metadata. Insight into expertise like this can help with selecting the best data to train and test models.

‘Editorial colleagues’ include people who create content; curate homepages and collections; work with a specific format such as video, text, or audio; or work on a particular platform (TV, radio, online, social). They may be international, domestic, commercial, or public service. They may work in a different time zone or work unusual shift patterns. They may have very specific requirements and priorities to keep their work compliant. Usually, we will work with a subset of editorial colleagues who are digital specialists and have a bridging role between editorial and technical teams.

In assessing the output of recommendations systems these Editorial colleagues oversee:

  • Compliance: keeping content safe from legal, editorial, and reputational risk.
  • Ensuring the content as surfaced in the product is as accurate, impartial and relevant as it can be, and does not cause harm or offence particularly when dealing with difficult content.
  • Representing BBC values.
  • Assessing the breadth and quality of the offer.

The continuous feedback of editorial colleagues helps teams fine-tune algorithms and develop effective business rules to manage recommender system outputs.

4.4 A collaborative approach

In the previous sections, we have provided an account of the specific challenges of developing recommender systems in a public service organisation from the single voices of a data scientist, an editor, and a product manager. We bring these three voices together in this section.

The main challenges we face to develop recommender systems consistent with our public service remit are of a socio-technical nature (Section 3). We require not just to devise technological solutions to implement the BBC values and public purposes into our recommenders, but also to determine suitable processes, which bring together the different perspectives existing within the team and the whole organization. While we still use naïve engagement metrics to measure success of our recommenders after they go live, data scientists, editors, and product managers work in a collaborative fashion along all the development pipeline of our recommenders.

At the exploration stage all disciplines take part in a kick-off meeting to ensure that everyone understands the importance and role of the others. Explainability and transparency are among the keys to collaboration and building trust, so everyone continues to meet regularly throughout this discovery phase. Editors and data scientists keep decision logs of editorial issues and business rules, and their technical implementations. Data scientists document how a model works and what parameters affect its output.

At the model selection stage, the collaboration between the three parties becomes closer. The choice about moving forward with the implementation of a new algorithm is done in concert with editorial and product management colleagues from the service where the new recommender will be deployed (e.g., iPlayer, Sounds, etc), who evaluate whether the trade-offs implied by the algorithm, (e.g., higher accuracy but lower coverage) are compatible with their priorities.

Moving to the optimization stage, editorial colleagues may conduct a content audit to assess the volume and breadth of content, how it is labelled with signals the machine can interpret such as headlines, genres, and other metadata, and any niche or priority topics. Editors then compile a list of content that will deliberately stress test the recommender, such as content that contains political, legal, or distressing issues. The date, location, language, and priority of the content is also considered. On the other hand, product managers attempt to understand whether recommenders fully satisfy user needs or the goals of the organization and, based on that, help steer optimization efforts and define success measures for the work. Data scientists translate feedback on both sides into technical solutions.

Sample content is then fed into a prototype recommender to begin the model refinement stage. Editors review it via our internal web-based assessment tool. Based on the feedback received, in the form of scores and free text, Data scientists adjust the algorithm and apply business rules which may filter out undesired content to improve the output. This process continues over several sessions to iteratively improve until the scores reach a certain threshold, and the recommender is deemed ready to be deployed. We move to a monitoring phase, where editorial and product can monitor recommendations from a live endpoint. We deliver recommendations to live audiences only if our engine continuously performs well during a monitoring period, i.e., it achieves a quality threshold for n weeks in a row. At this final stage data scientists, editors, and product managers will collaborate on a release plan before making the new recommender available to the audience.

5 What Does “Personalization” Mean at the BBC?

This section addresses a key question, which is perhaps overdue: What does personalization mean at the BBC? This is an important question for the organization and, far from being exhaustive, the following paragraphs summarize some of our current thoughts about the topic.

As a starting point, we could look at the issue under a utilitarian perspective: The BBC is primarily licence-fee funded, which allows the organization to remain free of advertisements and independent of shareholder and political interest, but at the same time it bounds us to ensure that each and every one of our audience members gets value for money (British Broadcasting Corporation, 2023). 

Following this perspective, personalization allows each user to get (ideally) exactly what they are interested in and what they are likely to consume, thereby attaining value, and increasing their total usage of BBC products. This status must balance and coexist with the BBC’s mission and public service purposes.

Starting with a more general definition of personalization, in theory, and on the most simplistic of levels: if something was made for you, through being adapted or served to you uniquely, then it has been regarded as being personalized (Evans, Daniel, & Jaron, 2023). However, in practice and on a discipline level, personalization lacks a universally and consistently shared definition, which is not a problem unique to the BBC. Personalization is a multifaceted, complex, and nuanced concept and varies in interpretation by discipline. Where data science focuses heavily on tailored recommendations (often based on consumption history), social sciences bring language, meaning, and culture into the mix. The discrepancies seen in definitional understanding are not problematic per se, but rather become complicated at the point where the differing views and priorities conflict. While product development often has goals to increase consumption and give users more of what they like, editorial priorities lie in curating a universal experience in order to suit a wide diverse audience base. These goals might seem at odds with each other: striving formore personal relevance versus offering diversity and inclusivity.

Data scientists have objectives that serve either product or editorial goals. Metrics play a critical role: if click-through-rate is the measure of success, then creating algorithms to optimize the likelihood of users clicking on recommended content becomes the path forward. Editorial priority may perhaps be regarded as the antithesis to personalization, whereby the precedent for a universal, diverse, and sometimes shared experience, reflects the opposite of a personally relevant, interest based and often unique experience. This ‘tension’ in priority is uniquely positioned to public service, whereby assimilating editorial rigorand public purposes into a model of personalization is at the heart of what public services, like the BBC, is grappling with on both a conceptual and practical level. The truth is that it is possible that both editorial and product perspectives can be represented by having multiple algorithms that amplify different priorities: For example, one that prioritizes content for the individual alongside another that shows content that represents editorial values—chosen by editorial colleagues and ordered by an algorithm. In any case, these varying priorities reflect the unique objectives that different disciplines bring to the concept of personalization, adding further complexity to the challenge described within our paper. 

6 Bringing All Voices Together

Our approach to address the challenges outlined above is primarily through the close collaboration of different roles. Working like this enables us to align with our organizational, public service and editorial values. It also means that we are creating a collaborative culture and positive working environment that helps us deliver products and services more efficiently. Data scientists aim to create the most technically accurate recommenders, while product management focuses on maximizing user satisfaction, and editorial prioritizes adhering to a set of standards, akin to acting in the public's best interest.

It is when these three voices are unbalanced that trade-offs or compromises can distort outcomes, biasing solutions towards narrower notions of success. In this paper, we show that through focus only on developing recommenders with the highest technical excellence, we might compromise editorial values such as equity, or while opting for the most diverse set of recommendations, we might negatively impact user engagement. Given the unique set of objectives and challenges that comes with each discipline, trade-offs will ultimately need to be made. An algorithmically-driven future where the values of engagement outrank long standing values defined by editorials risks loss of diversity in content (e.g., (Bernstein, et al., 2021)), equity of content creators (Mehrotra, McInerney, Bouchard, Lalmas, & Diaz, 2018) and overall balance of viewpoint in news ((Allcott, Gentzkow, & Song, 2022), (Bernstein, et al., 2021)). It is clear that unless explicitly built into a system, editorial or public service values can be disregarded in favor of technical and product goals. The tensions extend beyond public service, as an issue impacting all data-driven systems: optimizing for clicks versus values, for reach versus breadth, or for user versus public service values.

Our collaborative approach leverages a diversity of viewpoints to develop our recommenders, bringing together people with different skills and task-relevant knowledge. These complementary mindsets are represented, albeit to different extents, at all stages of our machine learning pipeline. Moreover, whereas we have not empirically measured the effects of team diversity on our output , previous studies seem to back up our solution. Cognitive diversity may be epistemically beneficial for groups where individual strengths are complementary, especially in complex tasks that require innovation and creativity (Fazelpour & De-Arteaga, 2022). At the same time, diversity may also entail various trade-offs, e.g., between reliability of results and speed, which is consistent with our own experience (Fazelpour & De-Arteaga, 2022). Another trade-off involved by having a cognitively diverse team is the need to find a ‘common language’ to communicate across roles. The terminology and the concepts data scientists are familiar with are different from those of their editorial colleagues, the same applies between editorial and product management, and so on, which creates friction. In order to smooth that friction, efforts have started within the organization to increase data literacy and increase editorial knowledge among all staff.

With respect to our end product, i.e., recommender systems, it must be considered that they do not live in isolation within each of our services, but are part of complex ecosystems where algorithmically and manually selected content co-exist. Some of the solutions we adopt in our estate even mix the two solutions, applying personalization approaches to lists of editorially-selected content. Consequently, the objectives mentioned in Section 3 are to be considered holistically, e.g., aiming to serve a diverse set of content across multiple rails (e.g., in BBC Sounds), rather than focusing on diversity within each rail that makes use of recommendations. Besides, defining good or even valuable depends on the perspective of who the outcome benefitswhich will theninherit different value sets and success criteria. This was shown in recent research to understand the multiple perspectives of defining human values in recommender systems (Stray, et al., 2022), where establishing perspectives from social science, ethics, policy and law in addition to technical knowledge resulted in over 30 value statements, that ranged from privacy, wellbeing, and fairness, to inspiration, connection, and freedom of expression, among others. The authors concluded that the differing definitions of success eventually lead to differences in outcomes, trade-offs and sometimes conflict (Stray, et al., 2022).

Finally, some of our approaches are still naïve. First, once our recommenders are deployed live, we still measure their success in terms of engagement (CTR). Although we are currently working towards adopting more nuanced measures, we are aware that this will be a long process, given the organizational and infrastructural constraints of a large organization like the BBC. Second, we codify some values and considerations into business rules, which blocklist certain types of content or up-/down-weight others. We aim to move towards a more probabilistic approach, with a more nuanced understanding of content, which would allow us to reduce the likelihood of certain items appearing in contexts where they would be inappropriate while not affecting other aspects. For those values, which are not amenable to definition through business rules, we need to think about whether and how we might measure, evaluate, and optimize what we value—a challenge we are at the beginning of addressing. Third, our efforts towards algorithmic transparency are still in their infancy and, although there have been isolated experiments in the team, we still have no ways to make explicit for users how their recommendations are computed.

7 Conclusion

This paper describes how we develop recommender systems which support the BBC’s values and public purposes, while providing relevant and engaging content for our audiences. We identified two challenges in delivering on that goal: 1) How should goals and values be encoded into the BBC’s recommendation algorithms? And 2) How should different points of view (editorial, data, and product) be brought together to ensure the relevant domain knowledge is represented in our recommender systems? These challenges are not relevant only to the BBC, but they affect any value-based organization. Rather than being merely technical challenges, these are of a socio-technical nature, and so is our answer to them. Our approach brings together the perspectives of different roles, i.e., data science, editorial, and product management, at every stage of development, to ensure they are reflected in the final product. We have devised processes and tools to allow us to integrate the editorial point of view into our recommendations; these help us select accurate models, while ensuring that they follow our editorial guidelines and avoid harmful effects.

We are aware that our approach, especially in terms of technical solutions, still has room for improvement. We will invest efforts in devising reliable measurement of the extent to which our recommender systems fulfill our public service values. Efforts in that direction have already started and we expect to be able to share widely the lessons learned from our experiment in future publications. Both at the organization and team level, we are already taking steps to improve our process and make it more robust and future-proof. To that end, the recommendations team routinely collaborates with researchers, both inside (BBC R&D) and outside the BBC, and organize secondments with the aim of exploring future promising solutions. Regarding transparency, we have started exploring approaches to explanations for recommendations, to provide greater transparency and agency to users. To that end, we have collaborated with academics and practitioners outside the organization to fill the current knowledge gaps within the team (Piscopo, Inel, Vrijenhoek, Millecamp, & Balog, 2023).

Furthermore, future plans include efforts to improve data-maturity and data-literacy, ensuring appropriate governance and best practice in the introduction and implementation of algorithms and machine learning: e.g., the elevation and expansion of the data function within the BBC, including appointment of a new director-level position; the establishment of the pan-BBC Data Governance Committee, including data privacy and information security leaders, but also elevating consideration of ethics and responsible ML/A, via the establishment of a Responsible ML/AI subgroup. The data science team will grow allowing greater focus on public service metrics definition and operationalization. Finally, a Responsible ML/AI team will be established.

Creating recommender systems that fulfill the remit of public service organizations is a complex task. There will always be trade-offs, different perspectives, and tensions involved; we are not optimising for a simple or a single thing. This is not unique to the BBC, as we can see given the surge in efforts to establish responsible AI and data driven practices across the industry. However, the BBC is uniquely positioned to have both the luxury of defining success in ways that benefit society beyond pure user engagement as well as the obligation to balance a number of priorities that take into account public service values in addition to satisfying business and user needs. While non-public service entities might prioritize for profit, the BBC seeks “quality” engagement, and as such, is in a strong position to be redefining public service metrics for success.

Bibliography

Allcott, H., Gentzkow, M., and Song, L. (2022). Digital addiction. American Economic Review, 112(7), 2424-63.

Aridor, G., Goncalves, D., and Sikdar, S. (2020). Deconstructing the filter bubble: User decision-making and recommender systems. Proceedings of the 14th ACM Conference on Recommender Systems, (pp. 82-91).

Aydemir, F. B., and Dalpiaz, F. (2018). A roadmap for ethics-aware software engineering. Proceedings of the International Workshop on Software Fairness, (pp. 15-21).

Bernstein, A., De Vreese, C., Helberger, N., Schulz, W., Zweig, K., Heitz, L., and Tolmeijer, S. (2021). Diversity in News Recommendation. Dagstuhl Manifestos, 9(1), 43-61.

Boididou, C., Sheng, D., Mercer Moss, F. J., and Piscopo, A. (2021). Building Public Service Recommenders: Logbook of a Journey. Proceedings of the 15th ACM Conference on Recommender Systems, (pp. 538-540).

British Broadcasting Corporation. (2021, March 18). Across the UK. Retrieved April 2023, from Media Centre: https://www.bbc.co.uk/mediacentre/speeches/2021/across-the-uk

British Broadcasting Corporation. (2021, November 24). BBC on track to reach half a billion people globally ahead of its centenary in 2022. Retrieved April 2023, from Media Centre: https://www.bbc.co.uk/mediacentre/2021/bbc-reaches-record-global-audience

British Broadcasting Corporation. (2021). Responsible AI at the BBC: Our Machine Learning Engine Principles. Research and Development.

British Broadcasting Corporation. (2023). Charter and Agreement. Retrieved from About the BBC: https://www.bbc.com/aboutthebbc/governance/charter

British Broadcasting Corporation. (2023). Global news services. Retrieved from About the BBC: https://www.bbc.com/aboutthebbc/whatwedo/worldservice

British Broadcasting Corporation. (2023). Mission, values, and public purposes. Retrieved from About the BBC: https://www.bbc.com/aboutthebbc/governance/mission/

British Broadcasting Corporation. (2023, April 13). The BBC’s Editorial Values and Standards. Retrieved from Editorial Guidelines: https://www.bbc.co.uk/editorialguidelines/

British Broadcasting Corporation. (2023, April 12). The BBC’s services in the UK. Retrieved from About the BBC: https://www.bbc.com/aboutthebbc/whatwedo/publicservices

Cagan, M. (2017). Inspired: How to create tech products customers love. John Wiley & Sons.

Chandar, P., St. Thomas, B., Maystre, L., Pappu, V., Sanchis-Ojeda, R., Wu, T., . . . Jebara, T. (2022). Using Survival Models to Estimate User Engagement in Online Experiments. Proceedings of the ACM Web Conference 2022, (pp. 3186-3195).

Chen, J., Dong, H., Wang, X., Feng, F., Wang, M., and He, X. (2023). Bias and debias in recommender system: A survey and future directions. ACM Transactions on Information Systems, 41(3), 1-39.

Davidson, J., Liebald, B., Liu, J., Nandy, P., Van Vleet, T., Gargi, U., and Gupta, S. (2010). The YouTube video recommendation system. Proceedings of the fourth ACM conference on Recommender systems, 293-296.

Di Noia, T., Ostuni, V. C., Rosati, J., Tomeo, P., and Di Sciascio, E. (2014). An analysis of users' propensity toward diversity in recommendations. Proceedings of the 8th ACM Conference on Recommender Systems, (pp. 285-288).

Di Noia, T., Tintarev, N., Fatourou, P., and Schedl, M. (2022). Recommender systems under European AI regulations. Communications of the ACM, 65(4), 69-73.

European Commission. (2020). Proposal for a REGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL on a Single Market For Digital Services (Digital Services Act) and amending Directive 2000/31/EC, COM/2020/825 final.

European Parliament; Council of the European Union. (2022, October 19). Regulation (EU) 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market For Digital Services and amending Directive 2000/31/EC (Digital Services Act) (Text with EEA relevance).

Fazelpour, S., & De-Arteaga, M. (2022, January-June). Diversity in sociotechnical machine learning systems. Big Data & Society, 9(1), 1-14.

Google. (2020). Recommendation Systems Overview. Retrieved April 2023, from Google Developers: https://developers.google.com/machine-learning/recommendation/overview/types

Huszár, F., Ktena, S. I., O’Brien, C., Belli, L., Schlaikjer, A., and Hardt, M. (2022). Algorithmic amplification of politics on Twitter. Proceedings of the National Academy of Sciences, 119(1).

Jannach, D., Zanker, M., Felfernig, A., and Friedrich, G. (2011). Recommender Systems: An Introduction. Cambridge University Press.

Jones, E. (2022). Inform, educate, entertain...and recommend? Ada Lovelace Institute.

Kalimeris, D., and Bhagat, S. K. (2021). Preference amplification in recommender systems. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, (pp. 805-815).

Kaminskas, M., and Bridge, D. (2016). Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Transactions on Interactive Intelligent Systems (TiiS), 7(1), 1-42.

Lada, A., Wang, M., and Yan, T. (2021, January 26). How machine learning powers Facebook’s News Feed ranking algorithm. Retrieved from Engineering at Meta: https://engineering.fb.com/2021/01/26/ml-applications/news-feed-ranking/

Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., and Burke, R. (2020). Feedback loop and bias amplification in recommender systems. Proceedings of the 29th ACM international conference on information & knowledge management, (pp. 2145-2148).

Medveded, I., Wu, H., and Gordon, T. (2019, November 11). Powered by AI: Instagram’s Explore recommender system. Retrieved from Meta AI: https://ai.facebook.com/blog/powered-by-ai-instagrams-explore-recommender-system/

Mehrotra, R., McInerney, J., Bouchard, H., Lalmas, M., and Diaz, F. (2018). Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. Proceedings of the 27th acm international conference on information and knowledge management (pp. 2243–2251). New York, NY, USA: ACM.

Nguyen, T. T., Hui, P. M., Harper, F. M., Terveen, L., and Konstan, J. A. (2014). Exploring the filter bubble: the effect of using recommender systems on content diversity. Proceedings of the 23rd international conference on World Wide Web, (pp. 677-686).

Ofcom. (2023). Modernising the BBC’s Operating Licence.

Piscopo, A., Inel, O., Vrijenhoek, S., Millecamp, M., and Balog, K. ( 2023, January). Report on the 1st Workshop on Measuring the Quality of Explanations in Recommender Systems (QUARE 2022) at SIGIR 2022. ACM SIGIR Forum, 56(2), 1-16.

Samuelson, P. (2023). A Legal Challenge to Algorithmic Recommendations. Communications of the ACM, 66(3), 32-34.

Schedl, M., Knees, P., McFee, B., Bogdanov, D., and Kaminskas, M. (2015). Music Recommender Systems. In Recommender Systems Handbook (pp. 453–92). Boston: Springer.

Stray, J., Halevy, A., Assar, P., Hadfield-Menell, D., Boutilier, C., Ashar, A., . . . Zha. (2022). Building Human Values into Recommender Systems: An Interdisciplinary Synthesis.

Twitter. (2023, March 31). Twitter's Recommendation Algorithm. Retrieved from https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm

Vrijenhoek, S., Kaya, M., Metoui, N., Möller, J., Odijk, D., and Helberger, N. (2021). Recommenders with a mission: assessing diversity in news recommendations. Proceedings of the 2021 conference on human information interaction and retrieval, (pp. 173-183).

Yang, L., Cui, Y., Xuan, Y., Wang, C., Belongie, S., and Estrin, D. (2018). Unbiased offline recommender evaluation for missing-not-at-random implicit feedback. Proceedings of the 12th ACM conference on recommender systems, (pp. 279-287).

Zhou, M., Ding, Z., Tang, J., and Yin, D. (2018). icro Behaviors: A New Perspective in E-commerce Recommender Systems. Proceedings of WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining (WSDM 2018). New York: ACM.

Ziarani, R. J., and Ravanmehr, R. (2021). Serendipity in recommender systems: a systematic literature review. Journal of Computer Science and Technology, 36, 375-396.

Acknowledgments

This work reflects primarily the work done by the Datalab team in the last five and a half years. While this paper names only a handful of people as its authors, it is an account of a collective effort. All credit and our thanks go to every current and past member of Datalab.

 

© 2024, Alessandro Piscopo, Anna McGovern, Lianne Kerlin, North Kuras, James Fletcher, Calum Wiggins, and Megan Stamper.

 

Cite as: Alessandro Piscopo, Anna McGovern, Lianne Kerlin, North Kuras, James Fletcher, Calum Wiggins, and Megan Stamper, Recommenders With Values: Developing recommendation engines in a public service organization, 24-02 Knight First Amend. Inst. (Feb. 5, 2024), https://knightcolumbia.org/content/recommenders-with-values-developing-recommendation-engines-in-a-public-service-organization [https://perma.cc/DXK3-AYGV].