Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms

\nameFan Yao¹ \email[email protected] \AND\nameYiming Liao² \email[email protected] \AND\nameJingzhou Liu² \email[email protected] \AND\nameShaoliang Nie² \email[email protected] \AND\nameQifan Wang² \email[email protected] \AND\nameHaifeng Xu³ \email[email protected] \AND\nameHongning Wang¹ \email[email protected]

\addr¹Department of Computer Science, University of Virginia, USA
\addr²Meta, USA
\addr³Department of Computer Science, University of Chicago, USA

Abstract

On User-Generated Content (UGC) platforms, recommendation algorithms significantly impact creators’ motivation to produce content as they compete for algorithmically allocated user traffic. This phenomenon subtly shapes the volume and diversity of the content pool, which is crucial for the platform’s sustainability. In this work, we demonstrate, both theoretically and empirically, that a purely relevance-driven policy with low exploration strength boosts short-term user satisfaction but undermines the long-term richness of the content pool. In contrast, a more aggressive exploration policy may slightly compromise user satisfaction but promote higher content creation volume. Our findings reveal a fundamental trade-off between immediate user satisfaction and overall content production on UGC platforms. Building on this finding, we propose an efficient optimization method to identify the optimal exploration strength, balancing user and creator engagement. Our model can serve as a pre-deployment audit tool for recommendation algorithms on UGC platforms, helping to align their immediate objectives with sustainable, long-term goals.

1 Introduction

User-generated content (UGC) platforms have become an indispensable component of our daily lives (Bobadilla et al., 2013; Santos, 2022). Those platforms, including various social media (e.g., Facebook, Instagram), streaming services (e.g., YouTube, TikTok) and many more, count on algorithmic recommendation algorithms (Koren et al., 2009; Bobadilla et al., 2013) to help content consumers (i.e., users) navigate the vast ocean of content generated by creators. Unlike other content recommendation platforms such as Netflix and Spotify, user experience on UGC platforms critically relies on the active participation of creators (Zhuang et al., 2023), as the goal of enhancing user engagement is inherently linked to the abundancy and diversity of the content pool.

Recent studies have begun to explore how a platform’s algorithmic decisions, such as their employed recommendation algorithms and revenue sharing agreements, might influence the behavior of content creators and subsequently affect user welfare (Yao et al., 2023, 2024b, 2024c; Jagadeesan et al., 2024; Hron et al., 2022; Immorlica et al., 2024; Hu et al., 2023; Prasad et al., 2023). A common technique used in these works is to model the competition among creators who strive to establish their brand and comparative advantages by selecting topics that maximize the traffic or rewards from the platform. While these competition models offer valuable insights into how platforms’ interventions could lead to suboptimal outcomes in terms of content diversity and issues related to popularity bias, they ignore the dimension of content creation volume — an equally critical aspect of such competition dynamics. In fact, there are evidence suggesting that traffic received directly influences creator productivity (Hu et al., 2024; Zeng et al., 2024). Hu et al. (2024) found that creators whose content received boosted exposure significantly increased their video production without compromising quality on a leading video recommendation platform, and Zeng et al. (2024)’s field experiment on a large-scale video sharing social network showed that while popularity-based recommendation strategies boost content consumption, they can reduce content production. Similar effects has also been documented on Instagram. For instance, when the recommender system allocates more traffic to targeted groups of creators, their production frequency increases, as indicated by metrics such as the number of daily active creators and the daily average creation volume per group. However, disproportionately boosting traffic to certain creator groups can negatively impact others; for instance, directing more traffic to popular, or “head”, creators often diminishes engagement from less prominent, or “tail”, creators. And overall, a statistically significant positive correlation exists between content viewership and the corresponding creator’s productivity¹¹1Further details regarding the Instagram findings are protected under a non-disclosure agreement (NDA) and may be disclosed in future versions of this work..

Motivated by real-world evidence and the research gap in modeling production frequency within content creation competition, we introduce a new game-theoretical model, named Cournot Content Creation Competition ( $C^{4}$ ), aiming to study the impact of a platform’s recommendation strategy on creators’ production willingness. Our $C^{4}$ framework builds upon the Content Creator Competition ( $C^{3}$ ) framework introduced in Yao et al. (2023, 2024b, 2024c), assuming that creators compete for platform-allocated user traffic (Glotfelter, 2019; Hodgson, 2021). However, unlike $C^{3}$ and similar previous models, our approach models the competition where creators are aware of their expertise and consistently produce within their niche, but strategically adjust their production frequency to balance gain and cost, which is often observed on mature platforms such as YouTube and Instagram. Our proposed competition model resonates with the well-established Cournot competition (Cournot, 1838) in economics, where firms compete for revenue by strategically setting their production quantities. Hence, it inherits its name.

Our $C^{4}$ framework offers a powerful tool for analyzing the competition dynamics among creators, as it always yields a unique Pure Nash equilibrium (PNE) that enables a precise prediction of the total content creation volume under any specific recommendation strategy. Furthermore, our in-depth analysis of $C^{4}$ ’s equilibrium reveals a critical and interesting insight: while increased recommendation accuracy boosts immediate user satisfaction, it simultaneously reduces creators’ motivation to produce content, potentially compromising long-term user engagement. This finding, supported by both theoretical analyses and simulations, suggests the necessity to balance user and creator engagement through a careful control over the recommendation algorithm’s exploration strength at a per-user basis. We formulated this mechanism design challenge as a bi-level optimization problem and tackled it using a projected gradient descent approach with an efficient gradient approximation scheme, providing an effective method to achieve the optimal trade-off between user satisfaction and creator productivity.

In summary, our contributions are threefold: (a) modeling-wise, we introduce a new game-theoretical framework, $C^{4}$ , to investigate how recommendation algorithms affect content creation frequency among creators; (b) conceptually, we reveal a new insight that managing the exploration strength of the recommendation algorithm can balance between short-term user satisfaction and long-term creator engagement at equilibrium; and (c) technique-wise, we reformulate the mechanism design problem of identifying the optimal engagement trade-off at the equilibrium into a solvable offline optimization problem, tackled using approximated gradient descent. Our $C^{4}$ framework and its derived solution serve as a pre-deployment audit tool for platforms, assessing the effects of algorithmic choices on creator and user engagement.

2 Related Work

The study of online content creation economy has captured the attention of machine learning community recently, leading to a diverse collection of models addressing the dynamics of content creator competition (Ghosh and McAfee, 2011; Ghosh and Hummel, 2013; Ben-Porat and Tennenholtz, 2018; Yao et al., 2024b, 2023; Zhu et al., 2023; Hu et al., 2023; Jagadeesan et al., 2024; Hron et al., 2022; Dean et al., 2024). In these models, creators strategically select the type (Xu et al., 2024), topic (Hron et al., 2022; Jagadeesan et al., 2024; Yao et al., 2023), or quality (Ghosh and McAfee, 2011; Hu et al., 2023) of their content, competing for resources such as traffic (Hron et al., 2022; Ben-Porat and Tennenholtz, 2017; Ghosh and Hummel, 2013), user engagement (Yao et al., 2023), or platform-provided incentives (Zhu et al., 2023; Yao et al., 2024b). Some models aim to explore the properties of creator-side equilibrium, investigating how creators specialize at equilibrium (Jagadeesan et al., 2024), the impact of creators’ strategic behaviors on social welfare (Yao et al., 2023), and the design of optimization methods for long-term welfare considering these behaviors (Ben-Porat and Tennenholtz, 2017, 2018; Yao et al., 2024b; Zhu et al., 2023; Hu et al., 2023; Immorlica et al., 2024; Mladenov et al., 2020). Unlike these works, which generally assume equally paced creation frequency among creators, our proposed $C^{4}$ games consider scenarios where creators strategically control their creation quantities, allowing us to analyze how recommendation algorithms influence overall content creation volume at the equilibrium.

Our model of the platform’s recommendation algorithm draws inspiration from the proportional allocation concept in game theory, applicable to resource distribution (Caragiannis and Voudouris, 2016; Nguyen and Vojnovic, 2011) and contest design (Tullock, 1980). We assume that each user contributes a unit of traffic, which is allocated to creators based on both of their merit and effort (content creation frequency). This modeling is closely related to the Tullock contest (Tullock, 1980), also known as the lottery contest, where the probability of winning a fixed prize is proportionate to the effort expended relative to the total effort by all contestants. While the Nash equilibrium of the one-dimensional Tullock contest with homogeneous costs is well-understood (Ewerhart, 2015, 2017), our work extends this framework to include heterogeneous convex costs. A recent study (Yao et al., 2024a) also explored competition between human and Generative AI creators within a similar setup, examining its impact on total content creation volume. However, they did not address the influence of different recommendation algorithms, which we investigate in our work.

In a broader sense, our work contributes to a line of research evaluating the impact of recommender systems on individuals, specifically exploring how deployed algorithms shape user and content creator behavior (Dean et al., 2024; Cen et al., 2023, 2024; Lin et al., 2024; Dean and Morgenstern, 2022; Kalimeris et al., 2021; Eilat and Rosenfeld, 2023) and how we can design new algorithms to address these effects (Carroll et al., 2021; Yao et al., 2022a; Brantley et al., 2024; Yao et al., 2022b; Biyik et al., 2023; Prasad et al., 2023; Agarwal and Brown, 2023). Our study introduces a key insight: recommender algorithms optimized solely for user satisfaction can unintentionally reduce content creators’ willingness to engage, thereby impacting long-term user engagement. We address this challenge by proposing a solution that balances creator engagement and user satisfaction through imposing exploration strengths tailored to individual users.

3 The Cournot Content Creation Competition

In this section, we introduce the formulation of Cournot Content Creation Competition ( $C^{4}$ ), which models the competition among creators for traffic on UGC platforms. This model considers the potential impact of the platform’s traffic reallocation mechanisms, such as their deployed recommendation algorithms, where creators strategically choose their production frequencies to optimize their allocated traffic. Each $C^{4}$ game instance $\mathcal{G}$ is characterized by a tuple $(n,m,M,\{c_{i}\}_{i=1}^{n},\{\beta_{i}\}_{i=1}^{m})$ . We detail each component of this tuple as follows:

1.

Basic setups. There is a set of content creators denoted by $[n]=\{1,\cdots,n\}$ , and a set of users denoted by $\{u_{j}\}_{j=1}^{m}$ . We assume each user $j$ has a stable preference over creators and such relationship is captured by an $n$ -by- $m$ matrix $M$ with its $(i,j)$ -th entry $w_{ij}\in[0,1]$ denoting the strength of user $j$ ’s preference over creator $i$ ’s content. Each creator determines a production frequency $x_{i}\in\mathbb{R}_{\geq 0}$ in a unit amount of time (e.g., one week/one month). For the purpose of our analysis, $x_{i}$ can be interpreted interchangeably as either the production frequency or volume, provided there is no ambiguity²²2For the elegance of our theoretical analysis, we treat $x_{i}$ as a continuous variable, although the key messages and insights of this paper are preserved if $x_{i}$ is discrete.. Follow the terminology of game theory literature, $x_{i}$ is referred to as the action or pure strategy of creator $i$ . Each creator $i$ is associated with a cost function $c_{i}$ , which characterizes the cost for creating content at frequency $x_{i}$ . We have two assumptions about $c_{i}$ : 1. $c_{i}$ is increasing in $x_{i}$ and $c_{i}(x_{i})\rightarrow+\infty$ as $x_{i}\rightarrow+\infty$ , which reflects that content creation is always not free. 2. $c_{i}$ is convex in $x_{i}$ , indicating a non-decreasing marginal cost of improving production frequency.
2.

Platform intervention. We assume that each user $u_{j}$ contributes a unit amount of traffic, and the platform redistributes the total user traffic based on a relevance-based recommendation algorithm that adheres to certain probabilistic principle. Specifically, the recommendation algorithm matches $u_{j}$ to each piece of content produced by creator $i$ with a probability proportional to $\exp(\beta_{j}\sigma_{ij})$ , where $\sigma_{ij}=w_{ij}+\epsilon_{ij}$ represents the algorithm’s estimated relevance score. This score combines the true preference score $w_{ij}$ with independent Gaussian noise $\epsilon_{ij}\sim\mathcal{N}(0,\sigma^{2})$ . The parameter $\beta_{j}\geq 0$ governs the exploration of the matching for user $u_{j}$ : a higher $\beta_{j}$ results in a more precise matching, while a lower $\beta_{j}$ introduces more exploration into the matching results ³³3The randomness in matching results may stem from either the imperfect estimation of the preference score $w_{ij}$ or intentionally injected exploration strength based on the intervention mechanism. In this paper, we focus on the latter source of randomness and analyze how such exploration strength might affect the outcomes.. In this work, we analyze intervention mechanisms that operate under this Personalized Probabilistic Matching (PPM) principle, parameterized by $\bm{\beta}=\{\beta_{i}\}_{i=1}^{m}$ , and refer to it as PPM( $\bm{\beta}$ ) for ease of notations.

Creator utility. Creators are reward-seeking individuals who try to maximize the expected traffic for their created content while carefully balancing the costs. Under PPM( $\bm{\beta}$ ) and when each creator $i$ produces $x_{i}$ copies of content, the platform will allocate to $i$ the amount of traffic from $u_{j}$ proportional to $\mathbb{E}_{\epsilon_{ij}\sim\mathcal{N}(0,\sigma^{2})}\left[x_{i}e^{\beta_{j}% (w_{ij}+\epsilon_{ij})}\right]=x_{i}e^{\beta_{j}w_{ij}}\cdot e^{\frac{\sigma^{% 2}\beta_{j}^{2}}{2}}$ . Here we should note that $x_{i}$ does not suggest the creator would create $x_{i}$ pieces of identical content, but amount of content following his/her expertise. Therefore, we formulate creator $i$ ’s utility function as the following:

	$\displaystyle u_{i}(x_{i},\bm{x}_{-i};\bm{\beta})$	$\displaystyle=\sum_{j=1}^{m}\left(\frac{\mathbb{E}[x_{i}e^{\beta_{j}\sigma_{ij% }}]}{\sum_{k=1}^{n}\mathbb{E}[x_{k}e^{\beta_{j}\sigma_{kj}}]}\right)-c_{i}(x_{% i})$
		$\displaystyle=\sum_{j=1}^{m}\left(\frac{x_{i}e^{\beta_{j}w_{ij}}}{\sum_{k=1}^{% n}x_{k}e^{\beta_{j}w_{kj}}}\right)-c_{i}(x_{i}),$		(1)

where $\bm{x}_{-i}\in\mathbb{R}_{\geq 0}^{n-1}$ denotes the strategy profile of all creators except $i$ .

The $C^{4}$ game models the scenario where creators are aware of their expertise (i.e., what topic to create), and compete purely on creation quantity. This concept is akin to the extensively studied Cournot competition (Cournot, 1838) model in economics, where firms independently determine the output of homogeneous products at different costs. However, our model diverges from the classic Cournot competition in several key aspects. First of all, they have different revenue functions: in Cournot competition, the revenue for each firm is calculated as the product of price and production quantity, and the price only depends on all firms’ joint strategy. In contrast, in the $C^{4}$ game, the gain of a creator not only relies on other creators’ decision but also hinges on the platform’s traffic allocation algorithm. In addition, while Cournot competition typically incorporates only linear cost functions, the $C^{4}$ game accommodates general convex cost functions, offering a more nuanced reflection of the real cost faced by creators. These distinctions highlight the unique aspects of our model, while maintaining a conceptual link to traditional economic theories of competition.

Research questions:

Under the $C^{4}$ framework, an important and natural research question is how we can predict creators’ strategic choices in the competition. This is a fundamental question in game theory and we employ the concept of Pure Nash Equilibrium (PNE) (Nash Jr, 1950) to characterize the outcome of $C^{4}$ games. The definition of PNE is given by the following:

Definition 1

A joint strategy profile of all creators $\bm{x}^{*}=(x^{*}_{1},\cdots,x^{*}_{n})$ forms a pure Nash equilibrium (PNE), if for every creator $i$ , $x^{*}_{i}$ is a best response strategy that maximizes $u_{i}$ given other creators’ strategy $\bm{x}_{-i}$ ; formally,

u_{i}(x^{*}_{i},\bm{x}^{*}_{-i})\geq u_{i}(x_{i},\bm{x}^{*}_{-i})\,\,\text{ % for every }x_{i}\in\mathbb{R}_{\geq 0},\forall i\in[n].

(2)

In other word, a PNE represents a stable state when everyone is satisfied with their strategies and does not want to deviate. As we will demonstrate in the subsequent section, a PNE always exists in $C^{4}$ under any PPM( $\bm{\beta}$ ) and can be computed efficiently. This finding forms the basis for our further theoretical analysis and empirical simulations.

In addition to the predictability and stability of creators’ production strategies, it is equally crucial for platform designers to develop metrics that encourage the prosperity of the content ecosystem. They must also devise algorithmic solutions to optimize these metrics, balancing the trade-off between engagement of users and creators, using the available “knob” PPM( $\bm{\beta}$ ). We will explore these issues in the upcoming technical discussions.

4 The PNEs of $C^{4}$ Games and Their Properties

As widely known, the PNE does not always exist (Debreu, 1952; Fan, 1952; Glicksberg, 1952). However, our first main result establishes that, under mild assumptions, $C^{4}$ always admits a unique PNE.

Theorem 1

For any $C^{4}$ instance $\mathcal{G}$ $(n,m,M,\{c_{i}\}_{i=1}^{n},\bm{\beta})$ . If each $c_{i}$ is convex in $x_{i}$ , $\mathcal{G}$ admits a unique PNE.

Note that the primary challenge in proving Theorem 1 is to show $\mathcal{G}$ is a strictly monotone game, whereas the existence and uniqueness of PNE in such games is a classic result from Rosen (1965). Theorem 1 is interesting from multiple perspectives. First, it strictly generalizes previous equilibrium existence results in classic Tullock contest (Pérez-Castrillo and Verdier, 1992; Cornes and Hartley, 2005), which corresponds to the special case when $m=1$ and $w_{11}=\cdots=w_{n1}$ . Second, the fact that $C^{4}$ games are monotone is significant because it is well-known that the PNE of strictly monotone games can be found efficiently. For example, many natural multi-agent online learning dynamics such as mirror descent (Bravo et al., 2018), accelerated optimistic gradient (Cai and Zheng, 2023), and payoff-based learning (Tatarenko and Kamgarpour, 2020) guarantee the last-iterate convergence to the unique PNE in strictly monotone games, even when players have mere zeroth order feedback about their utility functions. These results suggest that the PNE of $\mathcal{G}$ is achievable if all creators use a reasonable update rule in their strategies. This observation not only makes this equilibrium a plausible prediction of real-world competition but also paves the way to our simulation-based studies in our experiments, where we use multi-agent mirror descent with perfect gradient to numerically solve the PNE of $C^{4}$ .

In addition to the existence and uniqueness properties, the following corollary characterizes the first-order characterization of $\mathcal{G}$ ’s PNE.

Corollary 1

The unique PNE $\bm{x}^{*}(\bm{\beta})$ of any $\mathcal{G}$ $(n,m,M,\{c_{i}\}_{i=1}^{n},\bm{\beta})$ satisfies the following first-order condition:

\frac{\partial u_{i}}{\partial x_{i}}\bigg{|}_{\bm{x}=\bm{x}^{*}(\beta)}=0,% \quad 1\leq i\leq n.

(3)

Since we have already shown the existence and uniqueness of $\mathcal{G}$ ’s PNE, Corollary 1 follows immediately according to Definition 1. Corollary 1 is useful for establishing further properties of $C^{4}$ games in Section 5.

5 The Trade-Off Between User and Creator Engagement

We have established that a unique PNE exists in any $C^{4}$ game under any PPM( $\bm{\beta}$ ) and can be naturally achieved by competing content creators. This raises a crucial question for platform designers: how should the quality of the PNE be evaluated? For any mature UGC platform, it is essential to balance user satisfaction, which is key to short-term prosperity, with creator engagement, which is crucial for long-term sustainability. Within our $C^{4}$ framework, this requires the platform designer to generate matching results that not only guarantee high user satisfaction (by improving the average matching quality at PNE) but also stimulate substantial content creation volume (by encouraging creators to increase their production frequency $x^{*}$ at PNE). We define these two objectives as follows:

Definition 2

For any $C^{4}$ instance $\mathcal{G}$ $(n,m,M,\{c_{i}\}_{i=1}^{n},\bm{\beta})$ , let $\bm{x}^{*}(\bm{\beta})=(x_{1}^{*},\cdots,x_{n}^{*})$ be the unique PNE under PPM( $\bm{\beta}$ ). Then the (short-term) total user satisfaction is defined as

U(\bm{x}^{*}(\bm{\beta});\bm{\beta})=\sum_{j=1}^{m}\sum_{i=1}^{n}\left(\frac{w% _{ij}x^{*}_{i}e^{\beta_{j}w_{ij}}}{\sum_{k=1}^{n}x^{*}_{k}e^{\beta_{j}w_{kj}}}% \right),

(4)

and the (long-term) total content creation volume is defined as

V(\bm{x}^{*}(\bm{\beta}))=\sum_{i=1}^{n}x_{i}^{*}.

(5)

The social welfare of the whole system is measured by a linear combination of $U$ and $V$ , defined as

W_{\lambda}(\bm{x}^{*}(\bm{\beta});\bm{\beta})=U(\bm{x}^{*}(\bm{\beta});\bm{% \beta})+\lambda V(\bm{x}^{*}(\bm{\beta});\bm{\beta}).

(6)

If we denote $\pi^{*}_{j}(\bm{x}^{*}(\bm{\beta}),\bm{\beta})=\sum_{i=1}^{n}\left(\frac{w_{ij% }x^{*}_{i}e^{\beta_{j}w_{ij}}}{\sum_{k=1}^{n}x^{*}_{k}e^{\beta_{j}w_{kj}}}\right)$ as the indicator of an individual user’s satisfaction or utility, which is the expected matching scores of user $j$ at the PNE under PPM( $\bm{\beta}$ ). And the total user satisfaction measure $U=\sum_{j=1}^{m}\pi^{*}_{j}$ is the accumulated user utility. We argue that $U$ primarily serves as a metric for short-term welfare evaluation, since it focuses solely on user satisfaction at a specific instance of matching outcomes but does not capture the dynamics of user engagement over time. This overlooks the crucial fact that sustained user engagement on a platform requires a continuous supply of relevant content, as users can hardly be satisfied by their previously consumed material. This limitation is also evident in $U$ ’s mathematical formulation: its value remains unchanged with a rescaling of $\{x_{i}^{*}\}$ , indicating that it fails to reflect changes in content volume or frequency that might affect long-term user engagement. On the other hand, the long-term prosperity of a UGC platform is fundamentally linked to the engagement of content creators. Therefore, we introduce the total content creation volume $V$ as an indicator of the long-term welfare of the platform.

For the platform designers, it is essential to develop metrics that balance both short-term and long-term considerations. Thus, we propose a hybrid social welfare metric, $W$ , which combines $U$ and $V$ to reflect both user satisfaction and content supply sustainability. However, understanding the mechanisms to optimize $U$ and $V$ independently is critical. In the following sections, we will explore the optimal matching mechanisms tailored to the exclusive objectives of $U$ (short-term) and $V$ (long-term), and then present an efficient algorithm designed to optimize $W$ .

Interestingly, our findings suggest that both $U$ and $V$ exhibit monotonicity with respect to the parameter $\bm{\beta}$ , even in scenarios involving a homogeneous user population. This uniform behavior of $U$ and $V$ offers valuable insights into how the adjustment of exploration strength could potentially impact platform performance. Our forthcoming theorem formally characterizes these observations.

Theorem 2

Consider any $C^{4}$ game with $m=1$ . If the elements of $M=[w_{1},\cdots,w_{n}]^{\top}$ are not identical, it holds that:

1.

$U(\beta)$ defined in Eq. (4) is strictly increasing in $\beta$ .
2.

$V(\beta)$ defined in Eq. (5) is strictly decreasing in $\beta\in[\beta_{0},+\infty)$ for some $\beta_{0}>0$ .

Theorem 2 conveys two significant insights. The first one, though perhaps unsurprising, reveals that improving matching accuracy corresponds to an increase in expected user satisfaction. Despite its intuitiveness, this is a strong observation because it holds without relying on any specific structural assumptions about creator cost functions. This means that regardless of the potential complexity in equilibrium structures due to creator costs, and even when the order of $x^{*}_{i}$ does not align with a creator’s capability $w_{i}$ , the metric $U$ is still monotonically increasing with respect to $\beta$ . The second insight may surprise some readers: it suggests that while keep increasing the matching accuracy motivates some creators to produce more content, it demotivates others, resulting in a net decrease in the overall volume of content creation. This finding illustrates an intrinsic trade-off between short-term matching accuracy and long-term content supply: strategies that enhance short-term user satisfaction can inadvertently reduce content creation frequency across creators. To the best of our knowledge, this result is novel and has not been discussed in similar studies.

Here is an intuitive explanation for why a large $\beta$ diminishes creators’ willingness to produce content. As the traffic allocation becomes more deterministic, the marginal gain from increasing production frequency diminishes because the amount of traffic accrued is largely determined by the relevance score, rather than volume. In the extreme case where $\beta\rightarrow+\infty$ , only the most relevant creator captures all the user traffic, regardless of her production volume. Consequently, due to the presence of production costs, this creator, and others, will only sustain the minimum viable productivity. Conversely, in the other extreme scenario where $\beta=0$ , i.e., user traffic is distributed uniformly among creators irrespective of relevance, the gain for each creator depends solely on their production frequency, prompting a productivity arms race. Clearly, both extremes are suboptimal, but they effectively illustrate the rationale behind our theoretical findings.

Although in Theorem 2 we consider the game instance $\mathcal{G}$ with $m=1$ as a representative snapshot of how creators compete for a single unit of user traffic (i.e., homogeneous user population), extending the time frame to encompass a sequence of heterogeneous users suggests that the observed trade-off between $U$ and $V$ remains consistent. In our experiments, we will demonstrate this trade-off in broader settings through simulations, e.g. when $m>1$ and with various complex user distributions.

The proof of Theorem 2, while delivering a clear message, is far from trivial. Since the dependencies of $U$ and $V$ on $\beta$ are indirectly linked through $x^{*}$ , which lacks a closed form, the derivation of their derivatives with respect to $\beta$ necessitates the use of the implicit function theorem (Krantz and Parks, 2002) to articulate the derivative of $x^{*}$ with respect to $\beta$ . This involves a complex matrix inverse, which we simplify using the Sherman–Morrison formula (Sherman, 1949) due to its structure being a diagonal matrix with a rank-one update. This proof technique not only supports our theorem but also inspires a novel first-order optimization approach to address the hybrid social welfare optimization discussed in Section 7. Detailed proofs are provided in Appendix A.2.

6 Finding the Optimal Trade-off through Optimization

Our theory thus far indicates that optimizing both user satisfaction $U$ and creator engagement $V$ is non-trivial, even when the user population is homogeneous, as achieving the optimal of $U$ and $V$ simultaneously is impossible. Consequently, an essential and intriguing question arises within any specific competitive environment $C^{4}$ : how can we identify the optimal trade-off between these two factors by optimizing any given welfare metric $W_{\lambda}$ ?

Generally, the welfare metric $W$ is influenced by three factors: the platform’s algorithmic recommendation policy PPM $(\bm{\beta})$ , the resulting content creation profile $\bm{x}^{*}$ at the PNE induced by $\bm{\beta}$ , and the relevance matrix $M$ . Thus, we can formulate the resulting optimization problem (OP) as follows:

		$\displaystyle\text{Find}\quad\arg\mathop{max}_{\bm{\beta}\in\mathbb{R}_{\geq 0% }^{m}}W_{\lambda}(\bm{x}^{*}(\bm{\beta}),\bm{\beta})$		(7)
		$\displaystyle s.t.\quad~{}~{}\bm{x}^{*}(\bm{\beta})~{}\text{is the PNE of}~{}% \text{{$\mathcal{G}$}{}}.$

In general, OP (7) presents a formidable challenge, as solving for a PNE of a game is known to be difficult (Daskalakis et al., 2009). Fortunately, the nice structure of $C^{4}$ allows us to utilize the implicit characterization of the PNE detailed in Corollary 1 to tackle OP (7) effectively. In the subsequent section, we demonstrate that the gradient of $W_{\lambda}$ w.r.t. $\bm{\beta}$ can be explicitly computed.

6.1 The Derivation of Exact Gradient

According to the chain rule, the first-order gradient of $W_{\lambda}$ w.r.t. $\bm{\beta}$ can be expressed as

\displaystyle\frac{dW_{\lambda}}{d\bm{\beta}}=\frac{dU(\bm{x}^{*}(\bm{\beta}),% \bm{\beta})}{d\bm{\beta}}+\lambda\frac{dV(\bm{x}^{*}(\bm{\beta}))}{d\bm{\beta}% }=\left(\frac{\partial U}{\partial\bm{x}^{*}}+\lambda\frac{\partial V}{% \partial\bm{x}^{*}}\right)\cdot\frac{d\bm{x}^{*}}{d\bm{\beta}}+\frac{\partial U% }{\partial\bm{\beta}}.

(8)

The evaluation of the gradient of $W_{\lambda}$ relies on the calculation of three vectors, $\frac{\partial U}{\partial\bm{x}^{*}},\frac{\partial V}{\partial\bm{x}^{*}}\in% \mathbb{R}^{1\times n}$ , and $\frac{\partial U}{\partial\bm{\beta}}\in\mathbb{R}^{1\times m}$ , as well as a Jacobian matrix $\frac{d\bm{x}^{*}}{d\bm{\beta}}\in\mathbb{R}^{n\times m}$ . The computations of $\frac{\partial U}{\partial\bm{x}^{*}}$ , $\frac{\partial V}{\partial\bm{x}^{*}}$ and $\frac{\partial U}{\partial\bm{\beta}}$ are straightforward and computationally light, which position the main challenge as the computation of $\frac{d\bm{x}^{*}}{d\bm{\beta}}$ . Fortunately, the first-order characterization of $\bm{x}^{*}$ by Corollary 1 enables us to express the gradient of $\bm{x}^{*}$ w.r.t. $\bm{\beta}$ using implicit function derivation (Krantz and Parks, 2002) as $\frac{d\bm{x}^{*}}{d\bm{\beta}}=-\left(\frac{\partial F}{\partial\bm{x}^{*}}% \right)^{-1}\cdot\frac{\partial F}{\partial\bm{\beta}}$ , where $F(\bm{x},\bm{\beta})=\left(\frac{\partial u_{i}}{\partial x_{i}}\right)_{i=1}^% {n}$ is an $n$ -valued function, and both $\frac{\partial F}{\partial\bm{x}^{*}}$ and $\frac{\partial F}{\partial\bm{\beta}}$ are matrices of dimensions $n\times n$ and $n\times m$ , respectively. The following proposition provides the exact formula for the gradient. The calculation is straightforward and we omit the detailed derivation.

Proposition 1

Let $\bm{x}^{*}=(x_{1}^{*},\cdots,x_{n}^{*})$ be the PNE of $\mathcal{G}(n,m,M,\{c_{i}\}_{i=1}^{n},\bm{\beta})$ . Then, the Jacobian matrix of $\bm{x}^{*}$ as a function of $\bm{\beta}$ is

\frac{d\bm{x}^{*}}{d\bm{\beta}}=\left(D+YZ^{\top}\right)^{-1}B\in\mathbb{R}^{n% \times m},

(9)

where $D$ is an $n\times n$ diagonal matrix given by

D=\text{diag}\left(c^{\prime\prime}_{1}+\sum_{j=1}^{m}\frac{P^{2}_{1j}}{x^{*2}% _{1}},\cdots,c^{\prime\prime}_{n}+\sum_{j=1}^{m}\frac{P^{2}_{nj}}{x^{*2}_{n}}% \right),

(10)

and $B,Y,Z$ are $\mathbb{R}^{n\times m}$ matrices calculated as follows ( $1\leq i\leq n,1\leq j\leq m$ ):

Y=\left[\frac{P_{ij}(1-2P_{ij})}{x^{*}_{i}}\right]_{ij},\quad Z=\left[\frac{P_% {ij}}{x^{*}_{i}}\right]_{ij},\quad B=\left[\frac{P_{ij}(1-2P_{ij})}{x^{*}_{i}}% \cdot\left(w_{ij}-\sum_{k=1}^{n}w_{kj}P_{kj}\right)\right]_{ij},

(11)

where $c^{\prime\prime}_{i}$ is the second-order derivative of creator $i$ ’s cost function, and $P_{ij}=\frac{x^{*}_{i}\exp(\beta_{j}w_{ij})}{\sum_{k=1}^{n}x^{*}_{k}\exp(\beta% _{j}w_{kj})}$ is the probability that creator $i$ is matched with user $j$ at the PNE under PPM $(\bm{\beta})$ .

6.2 Optimization with Approximated Gradients

Proposition 1 together with Eq. (8) offers us a possibility to directly apply gradient-based approaches for solving OP (7). However, the gradient computation requires the inversion of an $n\times n$ matrix, whose time complexity is $O(n^{3})$ and thus too cumbersome. To reduce the computational burden, we propose to approximately compute the gradient using the Sherman–Morrison-Woodbury formula (Sherman, 1949) to approximate the matrix inverse, inspired by the specific structure of the RHS of Eq. (9). According to Sherman–Morrison-Woodbury formula, it holds that

(D+YZ^{\top})^{-1}=D^{-1}-D^{-1}Y\left(I+Z^{\top}D^{-1}Y\right)^{-1}Z^{\top}D^% {-1},

(12)

and the computation of the RHS of Eq. (12) now requires a time complexity of $O(n^{2}m+nm^{2}+m^{3})$ . However, the size of the user population $m$ in practical scenarios is often even larger than $n$ . To efficiently compute the RHS of Eq. (12), we propose a method to “sketch” the matrices $Y$ and $Z$ by sampling a subset of users. Initially, each column of $Y$ and $Z$ corresponds to a user index $j$ . We begin by sampling a sub-population of $\mathcal{X}$ , indexed by $\mathcal{I}$ , with $|\mathcal{I}|=\tilde{m}=[\delta m]$ , where $\delta\in(0,1]$ denotes the sampling rate. With this sampled index set $\mathcal{I}$ , we construct matrices $\tilde{Y},\tilde{Z}\in\mathbb{R}^{n\times m}$ , where the $(i,j)$ -th entries are defined as follows:

\tilde{Y}_{ij}=\frac{P_{ij^{\prime}}(1-2P_{ij^{\prime}})}{x^{*}_{i}},\quad% \tilde{Z}_{ij}=\frac{P_{ij^{\prime}}}{x^{*}_{i}},

(13)

with $j^{\prime}$ being uniformly sampled from $\mathcal{I}$ . Given that $\tilde{Y},\tilde{Z}$ now possess reduced ranks of $[\delta m]$ , the computational complexity of evaluating $(D+\tilde{Y}\tilde{Z}^{\top})^{-1}$ is significantly lowered to $O(n^{2}\tilde{m}+n\tilde{m}^{2}+\tilde{m}^{3})$ . Algorithm 1 describes the steps for addressing OP (7).

Algorithm 1 Approximated Gradient Descent for Solving OP (7).

Input: The environment specified by

\mathcal{G}

, maximum iteration number

T

, sample rate

\delta

, learning rate

\eta

, initial mechanism PPM(

\bm{\beta}

for

t\in[T]

Find the PNE

\bm{x}^{*}

\mathcal{G}

under PPM(

\bm{\beta}

) using Algorithm 2,

Uniformly sample

[\delta m]

users from

\mathcal{X}

and use them to compute matrices

\tilde{Y},\tilde{Z}

in Eq. (13),

Compute the approximated gradient

\frac{dW_{\lambda}}{d\bm{\beta}}

using (8),(9),(12) with sketched matrices

\tilde{Y},\tilde{Z}

Update

\bm{\beta}=\bm{\beta}+\eta\frac{dW_{\lambda}}{d\bm{\beta}}.

Algorithm 1 requires solving for the PNE of $\mathcal{G}$ each time when $\bm{\beta}$ is updated. To accomplish this, we employ the multi-agent mirror descent method, as proposed in Bravo et al. (2018) and detailed in Algorithm 2 in Appendix, to serve as a subroutine⁴⁴4In Bravo et al. (2018), the algorithm is guaranteed to converge to PNE under zeroth order feedback. Here we use the perfect gradient as input and thus the convergence is also guaranteed.. To accelerate the convergence of Algorithm 1, the PNE strategy $\bm{x}^{*}$ obtained under the previous $\bm{\beta}$ is used as the initial strategy for computing the new PNE after updating $\bm{\beta}$ . Further implementation details are provided in the experiment section.

7 Experiments

To validate our theoretical findings and demonstrate the performance of Algorithm 1, we conduct simulations on instances of $\mathcal{G}$ constructed from both synthetic data and the MovieLens-1m dataset (Harper and Konstan, 2015). In our experiments, Algorithm 2 is employed to solve the PNE for each instance of $\mathcal{G}$ . Below, we first outline the specifications of these two simulation environments and then present our results.

Synthetic environment For the synthetic environment, we construct the user population $\mathcal{X}$ by setting an embedding dimension $d=32$ and independently sampling $50$ cluster centers, denoted as $\{\mathbf{c}_{1},\dots,\mathbf{c}_{50}\}$ , from the unit sphere $\mathbb{S}^{d-1}$ . For each center $\mathbf{c}_{i}$ , users belonging to cluster- $i$ are generated by sampling independently from a Gaussian distribution $\mathcal{N}(\mathbf{c}_{i},0.5^{2}\mathbf{I}_{d})$ . The sizes of the $50$ user clusters are determined uniformly at random, ensuring the total size of $\mathcal{X}$ is $m=1000$ . Similarly, $n=200$ creators are generated, and the relevance matrix $M\in\mathbb{R}^{n\times m}$ is defined by the dot product between each user-creator pair, which are then normalized to the range $[0,1]$ . This synthetic dataset encapsulates a class of clustered user and creator preference distributions. On the creators’ side, their cost functions are set to $c_{i}(x)=c_{i}x^{\rho}$ , with the default $\rho=1.5$ . The marginal costs $\{c_{i}\}_{i=1}^{n}$ are randomly sampled from a uniform distribution $\mathcal{U}[0.1,0.5]$ .

Environment constructed from MovieLens-1m dataset We use deep matrix factorization (Fan and Cheng, 2018) to train user and movie embeddings (with dimension set to $32$ ) by fitting the observed ratings in the range of 1 to 5. To ensure the quality of the trained embeddings, we performed a 5-fold cross-validation and obtained an averaged RMSE $=$ 0.739 on the test sets. With the same hyper-parameter, we train the user/item embeddings with the complete dataset. We randomly select $m=1000$ user embeddings to construct the population $\mathcal{X}$ and $n=200$ movie embeddings as the creator profiles. Similarly, The relevance matrix $M\in\mathbb{R}^{n\times m}$ is given by the dot product between each user-creator pair normalized to $[0,1]$ and creators’ cost functions are the same as we specified in the synthetic environment.

7.1 The Empirical Trade-Offs Between $U$ and $V$

Figure 1 illustrates the content creation frequency $x_{i}^{*}$ , user utility $\pi_{j}^{*}$ , and their corresponding aggregated values $U=\sum_{j}\pi_{j}^{*}$ , $V=\sum_{i}x_{i}^{*}$ under the PNE induced by different homogeneous $\beta$ (i.e., all users share the same $\beta$ ). The result in the right panel shows that a larger $\beta$ enhances overall user satisfaction $U$ but undermines total content creation $V$ . As $\beta$ increases, the drop in $V$ becomes more significant. This empirical finding supports Theorem 2 and suggests it holds under broader settings without the assumptions on creator cost function and user population structure. The left and middle plots illustrate each creator $i$ ’s creation frequency $x_{i}^{*}$ and each user $j$ ’s utility $\pi^{*}_{j}$ at the PNE, such that both $x_{i}^{*}$ and $\pi^{*}_{j}$ are rearranged in descending order. They show that when $\beta$ is shared across all users, its change affects $x^{*}_{i}$ and $\pi^{*}_{j}$ in the same direction.

Refer to caption — Figure 1: The left and the middle panel: the empirical distributions of content creation frequency $x_{i}^{*}$ and each user’s individual utility $\pi_{j}^{*}$ . Different colors represent results for PNEs induced by different $\beta$ . Right: the total content creation $V$ and total user satisfaction $U$ obtained under different $\beta$ . Error bars obtained from 10 independently generated environments.

7.2 The Optimal PPM $(\beta)$ Found by Algorithm 1

Next, we use Algorithm 1 to find the optimal PPM( $\bm{\beta}$ ) and investigate the properties of the optimal $\bm{\beta}$ . We set $\lambda=0.5$ and aim to maximize the objective $W_{\lambda}=U+0.5V$ , more results under different choices of $\lambda$ can be found in Appendix B. The initial $\bm{\beta}$ is set to $(100,\cdots,100)$ , representing a nearly deterministic matching for every user. Algorithm 1 is then run to update $\bm{\beta}$ . The sample rate and learning rate are set to $\delta=0.1,\eta=200$ . In addition to searching for personalized $\beta_{j}$ for each user $j$ , we also attempt to find a homogeneous $\beta$ (i.e., a fixed $\beta_{j}$ for each $j$ ) using Algorithm 1⁵⁵5The gradient of $W_{\lambda}$ with respect to a homogeneous $\beta$ can be readily obtained by summing all the partial derivatives of $W_{\lambda}$ with respect to $\beta_{j}$ ..

The first and third panels in Figure 2 show the evolution of $W_{\lambda}$ during the optimization process in both synthetic and MovieLens environments. As illustrated, Algorithm 1 successfully finds a better PPM( $\bm{\beta}$ ) compared to the baseline of exact matching for all users, with a significant gain of over 20% in the welfare metric. Furthermore, in both environments, personalized $\beta$ leads to a slightly better outcome compared to homogeneous $\beta$ .

The second and fourth panels depict the optimal $\beta_{j}$ for each user $j$ , arranged in descending order. These panels provide insights into how such a mechanism achieves better trade-offs. For each user index $j$ on the $x$ -axis, we also plot the average and the standard deviation of the relevance scores $\{w_{ij}\}_{i=1}^{n}$ associated with each user $j$ over all creators, shown as orange and green lines. Based on the definition, users with smaller average scores and higher standard deviations are considered more “picky” or selective, indicating high relevance scores with a small group of creators and low scores with many others. Conversely, users with higher average scores and smaller standard deviations are less selective and more open to exploration. The results show that the optimal $\bm{\beta}$ tends to increase the exploration strengths (by deploying smaller $\beta_{j}$ ) for less selective users. This approach is intuitive, as it safely increases exploration while minimizing losses in user engagement.

8 Conclusion

In this work, we introduced a new game-theoretical model $C^{4}$ (Cournot Content Creation Competition) to explore how creators strategically determine their creation frequency under a UGC platform’s recommendation algorithm. Our investigations reveal a critical balance between user satisfaction and creator engagement, mediated by the exploration strength of the recommendation. The existence and uniqueness of the PNE of $C^{4}$ games provide a predictive framework for assessing the effects of algorithmic choices on content diversity and volume. Through both theoretical analysis and empirical simulations, we demonstrated how varying the exploration strength can either enhance user engagement at the cost of reduced content diversity or encourage richer content creation at the expense of immediate user satisfaction. These findings disclose the delicate trade-offs platform designers face and highlight the utility of our model as a pre-deployment audit tool for optimizing recommendation algorithms to balance platforms’ long-term and short-term objectives.

While our $C^{4}$ model offers insights into strategic differentiation among creators regarding production quantity, it relies on a simplified assumption that creators maintain a fixed niche, consistently producing content on the same topic with similar quality. This assumption, though useful for modeling purposes, may be restrictive in real-world scenarios where creators dynamically adjust topics, vary content quality, and scale production quantity. Exploring the dynamics where creators compete across heterogeneous dimensions—such as topic variety, content quality, and production quantity—would be a valuable direction for future research. We leave this intriguing problem for future work.

Acknowledgment.

This work is supported in part by the NSF Award IIS-2128019, NSF Award CCF-2303372, AI2050 program at Schmidt Sciences (Grant G-24-66104) and Army Research Office Award W911NF-23-1-0030.

References

Agarwal and Brown (2023) Arpit Agarwal and William Brown. Online recommendations for agents with discounted adaptive preferences. arXiv preprint arXiv:2302.06014, 2023.
Ben-Porat and Tennenholtz (2017) Omer Ben-Porat and Moshe Tennenholtz. Shapley facility location games. In International Conference on Web and Internet Economics, pages 58–73. Springer, 2017.
Ben-Porat and Tennenholtz (2018) Omer Ben-Porat and Moshe Tennenholtz. A game-theoretic approach to recommendation systems with strategic content providers. Advances in Neural Information Processing Systems, 31, 2018.
Biyik et al. (2023) Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-wei Hsu, Mohammad Ghavamzadeh, and Craig Boutilier. Preference elicitation with soft attributes in interactive recommendation. arXiv preprint arXiv:2311.02085, 2023.
Bobadilla et al. (2013) Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutiérrez. Recommender systems survey. Knowledge-based systems, 46:109–132, 2013.
Brantley et al. (2024) Kianté Brantley, Zhichong Fang, Sarah Dean, and Thorsten Joachims. Ranking with long-term constraints. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pages 47–56, 2024.
Bravo et al. (2018) Mario Bravo, David Leslie, and Panayotis Mertikopoulos. Bandit learning in concave n-person games. Advances in Neural Information Processing Systems, 31, 2018.
Cai and Zheng (2023) Yang Cai and Weiqiang Zheng. Doubly optimal no-regret learning in monotone games. In International Conference on Machine Learning, pages 3507–3524. PMLR, 2023.
Caragiannis and Voudouris (2016) Ioannis Caragiannis and Alexandros A Voudouris. Welfare guarantees for proportional allocations. Theory of Computing Systems, 59:581–599, 2016.
Carroll et al. (2021) Micah Carroll, Dylan Hadfield-Menell, Stuart Russell, and Anca Dragan. Estimating and penalizing preference shift in recommender systems. In Proceedings of the 15th ACM Conference on Recommender Systems, pages 661–667, 2021.
Cen et al. (2023) Sarah H Cen, Andrew Ilyas, and Aleksander Madry. User strategization and trustworthy algorithms. arXiv preprint arXiv:2312.17666, 2023.
Cen et al. (2024) Sarah H Cen, Andrew Ilyas, Jennifer Allen, Hannah Li, and Aleksander Madry. Measuring strategization in recommendation: Users adapt their behavior to shape future content. arXiv preprint arXiv:2405.05596, 2024.
Cornes and Hartley (2005) Richard Cornes and Roger Hartley. Asymmetric contests with general technologies. Economic theory, 26:923–946, 2005.
Cournot (1838) Antoine Augustin Cournot. Recherches sur les principes mathématiques de la théorie des richesses, volume 48. L. Hachette, 1838.
Daskalakis et al. (2009) Constantinos Daskalakis, Paul W Goldberg, and Christos H Papadimitriou. The complexity of computing a nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.
Dean and Morgenstern (2022) Sarah Dean and Jamie Morgenstern. Preference dynamics under personalized recommendations. In Proceedings of the 23rd ACM Conference on Economics and Computation, pages 795–816, 2022.
Dean et al. (2024) Sarah Dean, Evan Dong, Meena Jagadeesan, and Liu Leqi. Recommender systems as dynamical systems: Interactions with viewers and creators. In Workshop on Recommendation Ecosystems: Modeling, Optimization and Incentive Design, 2024.
Debreu (1952) Gerard Debreu. A social equilibrium existence theorem. Proceedings of the National Academy of Sciences, 38(10):886–893, 1952.
Eilat and Rosenfeld (2023) Itay Eilat and Nir Rosenfeld. Performative recommendation: diversifying content via strategic incentives. In International Conference on Machine Learning, pages 9082–9103. PMLR, 2023.
Ewerhart (2015) Christian Ewerhart. Mixed equilibria in tullock contests. Economic Theory, 60:59–71, 2015.
Ewerhart (2017) Christian Ewerhart. The lottery contest is a best-response potential game. Economics Letters, 155:168–171, 2017. ISSN 0165-1765. doi: https://doi.org/10.1016/j.econlet.2017.03.030. URL https://www.sciencedirect.com/science/article/pii/S0165176517301325.
Fan and Cheng (2018) Jicong Fan and Jieyu Cheng. Matrix completion by deep matrix factorization. Neural Networks, 98:34–41, 2018.
Fan (1952) Ky Fan. Fixed-point and minimax theorems in locally convex topological linear spaces. Proceedings of the National Academy of Sciences, 38(2):121–126, 1952.
Ghosh and Hummel (2013) Arpita Ghosh and Patrick Hummel. Learning and incentives in user-generated content: Multi-armed bandits with endogenous arms. In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, pages 233–246, 2013.
Ghosh and McAfee (2011) Arpita Ghosh and Preston McAfee. Incentivizing high-quality user-generated content. In Proceedings of the 20th international conference on World wide web, pages 137–146, 2011.
Glicksberg (1952) Irving L Glicksberg. A further generalization of the kakutani fixed theorem, with application to nash equilibrium points. Proceedings of the American Mathematical Society, 3(1):170–174, 1952.
Glotfelter (2019) Angela Glotfelter. Algorithmic circulation: how content creators navigate the effects of algorithms on their work. Computers and composition, 54:102521, 2019.
Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1–19, 2015.
Hodgson (2021) Thomas Hodgson. Spotify and the democratisation of music. Popular Music, 40(1):1–17, 2021.
Hron et al. (2022) Jiri Hron, Karl Krauth, Michael I Jordan, Niki Kilbertus, and Sarah Dean. Modeling content creator incentives on algorithm-curated platforms. arXiv preprint arXiv:2206.13102, 2022.
Hu et al. (2024) Qinlu Hu, Ni Huang, and Renyu Philip Zhang. Viewer traffic allocation for small creator development: Experimental evidence from short-video platforms. Available at SSRN 4888995, 2024.
Hu et al. (2023) Xinyan Hu, Meena Jagadeesan, Michael I Jordan, and Jacob Steinhard. Incentivizing high-quality content in online recommender systems. arXiv preprint arXiv:2306.07479, 2023.
Immorlica et al. (2024) Nicole Immorlica, Meena Jagadeesan, and Brendan Lucier. Clickbait vs. quality: How engagement-based optimization shapes the content landscape in online platforms. In Proceedings of the ACM on Web Conference 2024, pages 36–45, 2024.
Jagadeesan et al. (2024) Meena Jagadeesan, Nikhil Garg, and Jacob Steinhardt. Supply-side equilibria in recommender systems. Advances in Neural Information Processing Systems, 36, 2024.
Kalimeris et al. (2021) Dimitris Kalimeris, Smriti Bhagat, Shankar Kalyanaraman, and Udi Weinsberg. Preference amplification in recommender systems. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 805–815, 2021.
Koren et al. (2009) Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. Computer, 42(8):30–37, 2009.
Krantz and Parks (2002) Steven George Krantz and Harold R Parks. The implicit function theorem: history, theory, and applications. Springer Science & Business Media, 2002.
Lin et al. (2024) Tao Lin, Kun Jin, Andrew Estornell, Xiaoying Zhang, Yiling Chen, and Yang Liu. User-creator feature dynamics in recommender systems with dual influence. arXiv preprint arXiv:2407.14094, 2024.
Mladenov et al. (2020) Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, and Craig Boutilier. Optimizing long-term social welfare in recommender systems: A constrained matching approach. In International Conference on Machine Learning, pages 6987–6998. PMLR, 2020.
Nash Jr (1950) John F Nash Jr. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49, 1950.
Nguyen and Vojnovic (2011) Thành Nguyen and Milan Vojnovic. Weighted proportional allocation. ACM SIGMETRICS Performance Evaluation Review, 39(1):133–144, 2011.
Pérez-Castrillo and Verdier (1992) J David Pérez-Castrillo and Thierry Verdier. A general analysis of rent-seeking games. Public choice, 73(3):335–350, 1992.
Prasad et al. (2023) Siddharth Prasad, Martin Mladenov, and Craig Boutilier. Content prompting: Modeling content provider dynamics to improve user welfare in recommender ecosystems. arXiv preprint arXiv:2309.00940, 2023.
Rosen (1965) J Ben Rosen. Existence and uniqueness of equilibrium points for concave n-person games. Econometrica: Journal of the Econometric Society, pages 520–534, 1965.
Santos (2022) Marcelo Luis Barbosa dos Santos. The “so-called” ugc: an updated definition of user-generated content in the age of social media. Online Information Review, 46(1):95–113, 2022.
Sherman (1949) Jack Sherman. Adjustment of an inverse matrix corresponding to changes in the elements of a given column or row of the original matrix. Annu. Math. Statist., 20:621, 1949.
Tatarenko and Kamgarpour (2020) Tatiana Tatarenko and Maryam Kamgarpour. Bandit learning in convex non-strictly monotone games. arXiv preprint arXiv:2009.04258, 2020.
Tullock (1980) Gordon Tullock. Efficient rent seeking. In Toward a theory of the rent-seeking society. College Station, TX: Texas A&M University Press, 1980.
Xu et al. (2024) Renzhe Xu, Haotian Wang, Xingxuan Zhang, Bo Li, and Peng Cui. Ppa-game: Characterizing and learning competitive dynamics among online content creators. arXiv preprint arXiv:2403.15524, 2024.
Yao et al. (2022a) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. Learning from a learning user for optimal recommendations. In International Conference on Machine Learning, pages 25382–25406. PMLR, 2022a.
Yao et al. (2022b) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. Learning the optimal recommendation from explorative users. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 9457–9465, 2022b.
Yao et al. (2023) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. How bad is top- $k$ recommendation under competing content creators? In International Conference on Machine Learning. PMLR, 2023.
Yao et al. (2024a) Fan Yao, Chuanhao Li, Denis Nekipelov, Hongning Wang, and Haifeng Xu. Human vs. generative ai in content creation competition: Symbiosis or conflict? In International Conference on Machine Learning. PMLR, 2024a.
Yao et al. (2024b) Fan Yao, Chuanhao Li, Karthik Abinav Sankararaman, Yiming Liao, Yan Zhu, Qifan Wang, Hongning Wang, and Haifeng Xu. Rethinking incentives in recommender systems: Are monotone rewards always beneficial? Advances in Neural Information Processing Systems, 36, 2024b.
Yao et al. (2024c) Fan Yao, Yiming Liao, Mingzhe Wu, Chuanhao Li, Yan Zhu, James Yang, Jingzhou Liu, Qifan Wang, Haifeng Xu, and Hongning Wang. User welfare optimization in recommender systems with competing content creators. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3874–3885, 2024c.
Zeng et al. (2024) Zhiyu Zeng, Zhiqi Zhang, Dennis Zhang, and Tat Chan. The impact of recommender systems on content consumption and production: Evidence from field experiments and structural modeling. Available at SSRN, 2024.
Zhu et al. (2023) Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, and Michael I Jordan. Online learning in a creator economy. arXiv preprint arXiv:2305.11381, 2023.
Zhuang et al. (2023) Wei Zhuang, Qingfeng Zeng, Yu Zhang, Chunmei Liu, and Weiguo Fan. What makes user-generated content more helpful on social media platforms? insights from creator interactivity perspective. Information processing & management, 60(2):103201, 2023.

Appendix to Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms

A Omitted Proofs

A.1 Proof of Theorem 1

Proof First of all, we argue that given any $\mathcal{G}$ $(n,m,M,\{c_{i}\},\bm{\beta})$ , for any creator $i$ , there exists an $\delta_{i}>0$ such that any $x_{i}\in[0,\delta_{i}]$ cannot be an equilibrium strategy. This is because given any $\bm{x}_{-i}\in\mathbb{R}^{n-1}_{\geq 0}$ , $u_{i}$ as a function of $x_{i}$ has a continuous and strictly positive gradient at $x_{i}=0$ , meaning that there exists a $\delta_{i}>0$ such that $\frac{\partial u_{i}}{\partial x_{i}}\bigg{|}_{x_{i}=t}>0,\forall t\in[0,% \delta_{i}]$ regardless of what other creators’ strategies are. In other word, for any $x_{i}\leq\delta_{i}$ , creator $i$ can always increase her strategy to strictly improve her utility. As a result, any potential PNE $\bm{x}^{*}$ must satisfy that $x_{i}^{*}\geq\delta_{i}$ .

On the other hand, since $c_{i}(x_{i})\rightarrow+\infty$ when $x_{i}\rightarrow+\infty$ but the traffic gain for each creator is at most $m$ , we have $u_{i}(x_{i},\bm{x}_{-i})\rightarrow-\infty,\forall\bm{x}_{-i}\in\mathbb{R}^{n-% 1}_{\geq 0}$ when $x_{i}\rightarrow+\infty$ . As a result, any equilibrium strategy must also be upper bounded by a uniform constant $\Delta>0$ .

To argue the existence and uniqueness of PNE of $\mathcal{G}$ , in the following we may with out loss of generality restrict each creator $i$ ’s strategy set to a convex set $[\delta_{i},\Delta]$ .

For any fixed $\bm{\beta}=(\beta_{1},\cdots,\beta_{m})$ , let $a_{ij}=\exp(\beta_{j}w_{ij})$ and Eq. (1) can be simplified to

\displaystyle u_{i}(x_{i},\bm{x}_{-i};\bm{\beta})=\sum_{j=1}^{m}\left(\frac{x_% {i}a_{ij}}{\sum_{k=1}^{n}x_{k}a_{kj}}\right)-c_{i}(x_{i}),

(14)

For simplicity we denote $g_{j}(\bm{x})=\sum_{k=1}^{n}x_{k}a_{kj}$ and $u_{i}$ can be expressed as $u_{i}(x_{i},\bm{x}_{-i})=\sum_{j=1}^{m}\frac{x_{i}a_{ij}}{g_{j}(\bm{x})}-c_{i}% (x_{i})$ . Our proof starts from a sufficient condition from Rosen (1965) for a game to be monotone. A game is said to satisfy the diagonal strict concavity (DSC) condition if (1) each player has a concave utility function in his own strategy in a convex strategy space; and (2) there exists some non-zero parameter $\lambda=(\lambda_{1},\cdots,\lambda_{n})$ such that the Hessian matrix given by

H_{kl}(\bm{x};\lambda)\triangleq\frac{\lambda_{k}}{2}\frac{\partial^{2}u_{k}(% \bm{x})}{\partial x_{k}\partial x_{l}}+\frac{\lambda_{l}}{2}\frac{\partial^{2}% u_{l}(\bm{x})}{\partial x_{l}\partial x_{k}}

(15)

is strictly negative-definite. In Rosen (1965), it is shown that any game satisfying $\lambda$ -DSC condition has a unique pure Nash equilibrium (PNE); such a game is often referred to as monotone games.

First of all, we already argued that each creator $i$ ’s strategy set is $[\delta_{i},\Delta]$ , which is a convex set. Core to our proof is to show that game $\mathcal{G}$ is $\bm{1}$ -DSC under the theorem conditions. Direct calculation shows that for any $1\leq k\leq l\leq n$ ,

	$\displaystyle\frac{\partial^{2}u_{k}(\bm{x})}{\partial x_{k}\partial x_{l}}=% \frac{a_{kj}a_{lj}}{g_{j}^{3}}\cdot(-g_{j}+2a_{kj}x_{k}),$
	$\displaystyle\frac{\partial^{2}u_{l}(\bm{x})}{\partial x_{l}\partial x_{k}}=% \frac{a_{kj}a_{lj}}{g_{j}^{3}}\cdot(-g_{j}+2a_{lj}x_{l}),$

and therefore the Hessian matrix of $\mathcal{G}$ specified by the RHS of Eq. (15) is equal to

	$\displaystyle-[H(\bm{x})]$	$\displaystyle=\sum_{j=1}^{m}g_{j}^{-3}\begin{bmatrix}a_{1j}\\ a_{2j}\\ \vdots\end{bmatrix}\begin{bmatrix}2\sum_{i\neq 1}x_{i}a_{ij}&\sum_{i\notin\{1,% 2\}}x_{i}a_{ij}&\ldots\\ \sum_{i\notin\{1,2\}}x_{i}a_{ij}&2\sum_{i\neq 2}x_{i}a_{ij}&\ldots\\ \vdots&\vdots&\ddots\end{bmatrix}\begin{bmatrix}a_{1j},a_{2j},\ldots\\ \end{bmatrix}+\begin{bmatrix}\frac{\partial^{2}c_{1}}{\partial x_{1}^{2}}&0&% \ldots\\ 0&\frac{\partial^{2}c_{2}}{\partial x_{2}^{2}}&\ldots\\ \vdots&\vdots&\ddots\end{bmatrix}$
		$\displaystyle\triangleq\sum_{j=1}^{m}g_{j}^{-3}\bm{a}_{j}H_{j}\bm{a}_{j}^{\top% }+H_{0},$		(16)

where $g_{j}$ in the above expressions denotes $g_{j}(\bm{x})$ , vector $\bm{a}_{j}=(a_{1j},\cdots,a_{nj})^{\top}$ . We can see that if all the cost functions are strictly convex, the second diagonal matrix $H_{0}$ in the RHS of Eq. (16) is strictly positive-definite (PD). Therefore, it suffices to show that (1) for all $j\in[m]$ , $H_{j}$ is PD. To see this, let $z_{i}=x_{i}a_{ij}$ and we show that for any $\bm{y}=(y_{1},\cdots,y_{n})\in\mathbb{R}^{n}$ , $\bm{y}^{\top}H_{j}\bm{y}\geq 0$ , and the equality holds if and only if $\bm{y}=\bm{0}$ . In fact, note that

$\displaystyle\bm{y}^{\top}H_{j}\bm{y}$	$\displaystyle=2\sum_{i=1}^{n}y_{i}^{2}\left(\sum_{j\neq i}z_{j}\right)+2\sum_{% i<j}y_{i}y_{j}\left(\sum_{k\notin\{i,j\}}z_{k}\right)$
	$\displaystyle=\sum_{i=1}^{n}y_{i}^{2}\left(\sum_{j\neq i}z_{j}\right)+\left[% \sum_{i=1}^{n}y_{i}^{2}\left(\sum_{j\neq i}z_{j}\right)+2\sum_{i<j}y_{i}y_{j}% \left(\sum_{k\notin\{i,j\}}z_{k}\right)\right]$
	$\displaystyle=\sum_{i=1}^{n}y_{i}^{2}\left(\sum_{j\neq i}z_{j}\right)+\sum_{k=% 1}^{n}z_{k}\left[\sum_{j\neq k}y_{j}^{2}+2\sum_{i<j,i\neq k,j\neq k}y_{i}y_{j}\right]$
	$\displaystyle=\sum_{i=1}^{n}y_{i}^{2}\left(\sum_{j\neq i}z_{j}\right)+\sum_{i=% 1}^{n}z_{i}\left(\sum_{j\neq i}y_{j}\right)^{2}\geq 0.$	(17)

Because $x_{i}$ and $a_{i}$ are all strictly positive, each $z_{i}$ must also be strictly positive. Hence, Eq. (17) can take value zero if and only if $y_{i}=0,\forall i\in[n]$ . Therefore, $H_{j}$ is PD for any $j\in[m]$ , which completes the proof.

A.2 Proof of Theorem 2

First let’s recall the definition of $U,V$ when $m=1$ :

U(\bm{x}^{*}(\beta);\beta)=\sum_{i=1}^{n}\left(\frac{w_{i}x^{*}_{i}e^{\beta w_% {i}}}{\sum_{k=1}^{n}x^{*}_{k}e^{\beta w_{k}}}\right),

(18)

V(\bm{x}^{*})=\sum_{i=1}^{n}x_{i}^{*}.

(19)

In the following, we prove the monotonicity of $U(\beta)$ and $V(\beta)$ by showing $\frac{d\ln U}{d\beta}>0$ and $\frac{dV}{d\beta}<0$ , respectively. Before presenting the detailed proof, we first derive some relevant definitions and their properties that will be used in the proof.

For simplicity we omit the superscript ^∗ in $\bm{x}^{*}$ and simply use $\bm{x}$ to refer to the PNE of $\mathcal{G}$ . When $m=1$ , the creator utility function writes

u_{i}(x_{i},x_{-i})=\frac{x_{i}e^{\beta w_{i}}}{\sum_{k=1}^{n}x_{k}e^{\beta w_% {k}}}-c_{i}(x_{i}),i\in[n].

(20)

Let $P_{i}=\frac{x_{i}a_{i}}{\sum_{k=1}^{n}x_{k}a_{k}}$ , where $a_{i}=e^{\beta w_{i}}$ . First of all, we claim that it is without loss of generality to consider the regime where $P_{i}\leq\frac{1}{3}$ . To see this, consider the following two $C^{4}$ instances:

	$\displaystyle\mathcal{G}_{1}(n,m=1,\bm{w}=(w_{1},\cdots,w_{n}),\bm{c}=(c_{1},% \cdots,c_{n}),\bm{\beta}),$
	$\displaystyle\mathcal{G}_{2}(3n,m=1,[\bm{w},\bm{w},\bm{w}],[\bm{c}/3,\bm{c}/3,% \bm{c}/3],[\bm{\beta},\bm{\beta},\bm{\beta}]).$

Clearly, both games $\mathcal{G}_{1},\mathcal{G}_{2}$ have unique PNE. Let the PNE of $\mathcal{G}_{1}$ be denoted by $\bm{x}_{1}^{*}$ . In $\mathcal{G}_{2}$ , since its $3n$ players are divided into three identical groups, its PNE can be represented as $(\bm{x}_{2}^{*},\bm{x}_{2}^{*},\bm{x}_{2}^{*})$ , where each group of $n$ players follow the same strategy. Moreover, for any $1\leq i\leq n$ , it is straightforward to observe that the $i$ -th player’s utility functions in $\mathcal{G}_{1}$ and $\mathcal{G}_{2}$ differ only by a multiplicative constant of $3$ . Consequently, we have $\bm{x}_{1}^{*}=\bm{x}_{2}^{*}$ . Therefore, for any $C^{4}$ instance $\mathcal{G}_{1}$ with $m=1$ , we can always construct an equivalent instance $\mathcal{G}_{2}$ that shares the same PNE structure, while ensuring $P_{i}\leq\frac{1}{3}$ . This justifies the assumptions that, without loss of generality, we can take $P_{i}\leq\frac{1}{3}$ .

Another property we need is that the PNE strategy $x_{i}$ of any player $i$ in $\mathcal{G}_{1}$ is bounded in a compact region $[0,L]$ for some constant $L$ , regardless of the values of $\bm{\beta}$ . To see this, note that $c_{i}(x_{i})$ is increasing in $x_{i}$ and goes to infinity as $x_{i}\rightarrow+\infty$ while $P_{i}$ is upper bounded by $1$ . As a result, for any $x_{i}$ such that $u_{i}(x_{i},\bm{x}_{-i})\leq 1-c_{i}(x_{i})<-c_{i}(0)=u_{i}(0,\bm{x}_{-i})$ , $x_{i}$ cannot be a PNE strategy as switching to $0$ increases player $i$ ’s utility. Therefore, if we take

L=\mathop{max}_{i\in[n]}\{\inf_{x}\{x\geq 0:1-c_{i}(x_{i})<-c_{i}(0)\}\},

it holds that $x_{i}\leq L,\forall i\in[n]$ .

Let $F(x,\beta)=\left(\frac{\partial u_{i}}{\partial x_{i}}\right)_{i=1}^{n}$ be an $n$ -value function. From Corollary 1 we know $\bm{x},\beta$ satisfy $F(\bm{x},\beta)=0$ . And Theorem 1 guarantees that for any $\beta\geq 0$ , the $\bm{x}$ implicitly determined by $F(\bm{x},\beta)=0$ exists and is unique. Therefore, by the implicit function theorem (Krantz and Parks, 2002), the derivative of $\bm{x}$ w.r.t. $\beta$ can be written as

\frac{d\bm{x}}{d\beta}=-\left(\frac{\partial F}{\partial\bm{x}}\right)^{-1}% \cdot\frac{\partial F}{\partial\beta},

where $\left[\frac{\partial F}{\partial\bm{x}}\right]_{n\times n}$ is the Jacobian matrix (which is also the Hessian of $\mathcal{G}$ when $\lambda=\bm{1}$ , see Eq. (15)) and $\frac{\partial F}{\partial\beta}\in\mathbb{R}^{n\times 1}$ is the partial derivative of $F$ w.r.t. $\beta$ .

The first-order derivative of $u_{i}$ can be calculated as

\frac{\partial u_{i}}{\partial x_{i}}=\frac{e^{\beta w_{i}}}{\sum_{k=1}^{n}x_{% k}e^{\beta w_{k}}}-x_{i}\left(\frac{e^{\beta w_{i}}}{\sum_{k=1}^{n}x_{k}e^{% \beta w_{k}}}\right)^{2}-c^{\prime}_{i}(x_{i}),

(21)

and we can use it to further obtain the following explicit expressions in terms of the derivatives of $F$ :

		$\displaystyle\left(\frac{\partial F_{i}}{\partial x_{i}}\right)=-\frac{1}{x_{i% }^{2}}P_{i}^{2}\left(1-2P_{i}\right)-\frac{1}{x_{i}^{2}}P_{i}^{2}-c_{i}^{% \prime\prime}(x_{i}),$
		$\displaystyle\left(\frac{\partial F_{i}}{\partial x_{j}}\right)=-\frac{1}{x_{i% }x_{j}}P_{i}P_{j}\left(1-2P_{i}\right),j\neq i,$
		$\displaystyle\left(\frac{\partial F_{i}}{\partial\beta}\right)=\frac{P_{i}}{x_% {i}}\cdot(1-2P_{i})\cdot\left(w_{i}-\sum_{k=1}^{n}P_{k}w_{k}\right).$		(22)

Let’s define a positive definite diagonal matrix

D=\text{diag}\left(\frac{P_{1}^{2}}{x_{1}^{2}}+c^{\prime\prime}_{1},\cdots,% \frac{P_{n}^{2}}{x_{n}^{2}}+c^{\prime\prime}_{n}\right).

Since $c^{\prime\prime}_{i}>0$ , we can introduce variables

\delta_{i}\in(0,1)\text{~{}such that~{}}\frac{P_{i}^{2}}{x_{i}^{2}}+c^{\prime% \prime}_{i}=\frac{P_{i}^{2}}{\delta_{i}x_{i}^{2}}.

(23)

In addition, let’s also define

\bm{y}=\left(\frac{P_{1}(1-2P_{1})}{x_{1}},\cdots,\frac{P_{n}(1-2P_{n})}{x_{n}% }\right)^{\top},\bm{z}=\left(\frac{P_{1}}{x_{1}},\cdots,\frac{P_{n}}{x_{n}}% \right)^{\top},

(24)

then we have $\frac{\partial F}{\partial\bm{x}}=D+\bm{y}\bm{z}^{\top}$ and from Sherman–Morrison formula (Sherman, 1949), it holds that

	$\displaystyle-\left(\frac{\partial F}{\partial\bm{x}}\right)^{-1}$	$\displaystyle=(D+\bm{y}\bm{z}^{\top})^{-1}$
		$\displaystyle=D^{-1}-\frac{D^{-1}\bm{y}\bm{z}^{\top}D^{-1}}{1+\bm{z}^{\top}D^{% -1}\bm{y}},$		(25)

where

	$\displaystyle D^{-1}=\text{diag}\left(\frac{\delta_{1}x_{1}^{2}}{P_{1}^{2}},% \cdots,\frac{\delta_{n}x_{n}^{2}}{P_{n}^{2}}\right),$
	$\displaystyle D^{-1}\bm{y}=\left(\frac{\delta_{1}x_{1}(1-2P_{1})}{P_{1}},% \cdots,\frac{\delta_{n}x_{n}(1-2P_{n})}{P_{n}}\right)^{\top},$
	$\displaystyle\bm{z}^{\top}D^{-1}=\left(\frac{\delta_{1}x_{1}}{P_{1}},\cdots,% \frac{\delta_{n}x_{n}}{P_{n}}\right).$

Since $P_{i}\leq\frac{1}{3}$ , we have $1-2P_{i}\geq\frac{1}{3}>0$ . With all the notations introduced so far we are now ready to give the formal proof of Theorem 2.

Proof

We prove the monotonicity of $U(\beta)$ and $V(\beta)$ by showing $\frac{d\ln U}{d\beta}>0$ and $\frac{dV}{d\beta}<0$ .

The monotonicity of $U(\beta)$ : The first-order derivative of $\ln U$ w.r.t. $\beta$ is given by

	$\displaystyle\frac{d\ln U}{d\beta}$	$\displaystyle=\frac{1}{U}\cdot\left(\frac{\partial U}{\partial\bm{x}}\cdot% \frac{d\bm{x}}{d\beta}+\frac{\partial U}{\partial\beta}\right)$
		$\displaystyle=-\frac{1}{U}\cdot\frac{\partial U}{\partial\bm{x}}\cdot\left(% \frac{\partial F}{\partial\bm{x}}\right)^{-1}\cdot\frac{\partial F}{\partial% \beta}+\frac{1}{U}\cdot\frac{\partial U}{\partial\beta}.$		(26)

where $\frac{\partial U}{\partial\bm{x}}\in\mathbb{R}^{1\times n}$ is the partial derivative of $U$ w.r.t. $\bm{x}$ . Let $a_{i}=e^{\beta w_{i}}$ , we will first show $\frac{1}{U}\cdot\frac{\partial U}{\partial\beta}\geq 0$ . In fact, calculation shows

	$\displaystyle\frac{1}{U}\cdot\frac{\partial U}{\partial\beta}$	$\displaystyle=\frac{\sum_{k=1}^{n}x_{k}a_{k}}{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}% \cdot\left(\frac{(\sum_{k=1}^{n}w_{k}^{2}x_{k}a_{k})(\sum_{k=1}^{n}x_{k}a_{k})% -(\sum_{k=1}^{n}w_{k}x_{k}a_{k})^{2}}{(\sum_{k=1}^{n}x_{k}a_{k})^{2}}\right)$
		$\displaystyle=\frac{\sum_{k=1}^{n}w_{k}^{2}x_{k}a_{k}}{\sum_{k=1}^{n}w_{k}x_{k% }a_{k}}-\frac{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}{\sum_{k=1}^{n}x_{k}a_{k}}.$		(27)

From Cauchy–Schwarz inequality, it holds that

\sum_{k=1}^{n}w_{k}^{2}x_{k}a_{k}\cdot\sum_{k=1}^{n}x_{k}a_{k}\geq\left(\sum_{% k=1}^{n}\sqrt{w_{k}^{2}x_{k}a_{k}\cdot x_{k}a_{k}}\right)^{2}=\left(\sum_{k=1}% ^{n}w_{k}x_{k}a_{k}\right)^{2}.

Therefore, the RHS of Eq. (27) is greater than or equal to $0$ . Hence, it suffices to show

-\frac{1}{U}\cdot\frac{\partial U}{\partial\bm{x}}\cdot\left(\frac{\partial F}% {\partial\bm{x}}\right)^{-1}\cdot\frac{\partial F}{\partial\beta}>0.

(28)

Also note that

$\displaystyle\frac{1}{U}\cdot\frac{\partial U}{\partial x_{i}}$	$\displaystyle=\frac{\sum_{k=1}^{n}x_{k}a_{k}}{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}% \cdot\left(\frac{w_{i}a_{i}(\sum_{k=1}^{n}x_{k}a_{k})-a_{i}\sum_{k=1}^{n}w_{k}% x_{k}a_{k}}{(\sum_{k=1}^{n}x_{k}a_{k})^{2}}\right)$
	$\displaystyle=\frac{w_{i}a_{i}}{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}-\frac{a_{i}}{% \sum_{k=1}^{n}x_{k}a_{k}}$
	$\displaystyle=\frac{w_{i}a_{i}}{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}-\frac{P_{i}}{x_% {i}},$	(29)

and substitute Eq. (22), (25), and (29) into the LHS of Eq. (28), we obtain

	$\displaystyle-\frac{1}{U}\cdot\frac{\partial U}{\partial\bm{x}}\cdot\left(% \frac{\partial F}{\partial\bm{x}}\right)^{-1}\cdot\frac{\partial F}{\partial\beta}$
$\displaystyle=$	$\displaystyle\left[\frac{w_{i}a_{i}}{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}-\frac{P_{i% }}{x_{i}}\right]^{\top}_{i\in[n]}\cdot\left[D^{-1}-\frac{D^{-1}\bm{y}\bm{z}^{% \top}D^{-1}}{1+\bm{z}^{\top}D^{-1}\bm{y}}\right]\cdot\left[\frac{P_{i}}{x_{i}}% \cdot(1-2P_{i})\cdot\left(w_{i}-\sum_{k=1}^{n}P_{k}w_{k}\right)\right]_{i\in[n]}$
$\displaystyle=$	$\displaystyle\frac{1}{T}\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)^{2}-\frac{% \left(\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)\right)^{2}}{T(1+\sum_{i=1}^{% n}\delta_{i}(1-2P_{i}))},$	(30)

where $T=\frac{\sum_{k=1}^{n}w_{k}x_{k}a_{k}}{\sum_{k=1}^{n}x_{k}a_{k}}=\sum_{i=1}^{n% }P_{i}w_{i}$ . Therefore, it suffices to prove

\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)^{2}-\frac{\left(\sum_{i=1}^{n}% \delta_{i}(1-2P_{i})(w_{i}-T)\right)^{2}}{1+\sum_{i=1}^{n}\delta_{i}(1-2P_{i})% }>0.

(31)

Since $1-2P_{i}>0$ , from Cauchy–Schwarz inequality it holds that

		$\displaystyle\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)^{2}\cdot\left(1+\sum_% {i=1}^{n}\delta_{i}(1-2P_{i})\right)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)^{2}\cdot\sum_{i=1}^{n% }\delta_{i}(1-2P_{i})+\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)^{2}$
	$\displaystyle\geq$	$\displaystyle\left(\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)\right)^{2}+\sum% _{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)^{2}$
	$\displaystyle>$	$\displaystyle\left(\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T)\right)^{2},$

where the last inequality holds because $\{\bm{w}_{i}\}$ are not identical so there exists at least one $j\in[n]$ such that $\delta_{j}(1-2P_{j})(w_{j}-T)^{2}>0$ . Therefore, Eq. (31) holds and we have $\frac{d\ln U}{d\beta}>0$ .

The monotonicity of $V(\beta)$ : Next we show $\frac{dV}{d\beta}<0$ . Since $\frac{\partial V}{\partial\beta}=0$ , this is equivalent to show

\frac{\partial V}{\partial\bm{x}}\cdot\left(\frac{\partial F}{\partial\bm{x}}% \right)^{-1}\cdot\frac{\partial F}{\partial\beta}>0.

(32)

Note that $\frac{dV}{dx_{i}}=1$ , we have

	$\displaystyle\frac{\partial V}{\partial\bm{x}}\cdot\left(\frac{\partial F}{% \partial\bm{x}}\right)^{-1}\cdot\frac{\partial F}{\partial\beta}$
$\displaystyle=$	$\displaystyle-\left[1,1,\cdots,1\right]\cdot\left[D^{-1}-\frac{D^{-1}\bm{y}\bm% {z}^{\top}D^{-1}}{1+\bm{z}^{\top}D^{-1}\bm{y}}\right]\cdot\left[\frac{P_{i}}{x% _{i}}\cdot(1-2P_{i})\cdot\left(w_{i}-\sum_{k=1}^{n}P_{k}w_{k}\right)\right]_{i% \in[n]}$
$\displaystyle=$	$\displaystyle-\sum_{i=1}^{n}\frac{\delta_{i}x_{i}(1-2P_{i})(w_{i}-T)}{P_{i}}+% \frac{1}{1+\sum_{i=1}^{n}\delta_{i}(1-2P_{i})}\sum_{i=1}^{n}\frac{\delta_{i}x_% {i}(1-2P_{i})}{P_{i}}\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T).$	(33)

To show the RHS of Eq. (33) is positive, it suffices to show

\left(1+\sum_{i=1}^{n}\delta_{i}(1-2P_{i})\right)\sum_{i=1}^{n}\frac{\delta_{i% }x_{i}(1-2P_{i})(w_{i}-T)}{P_{i}}<\sum_{i=1}^{n}\frac{\delta_{i}x_{i}(1-2P_{i}% )}{P_{i}}\sum_{i=1}^{n}\delta_{i}(1-2P_{i})(w_{i}-T).

(34)

Plugin $T=\sum_{i=1}^{n}P_{i}w_{i}$ into Eq. (34), it is equivalent to show that

		$\displaystyle\left(1+\sum_{i=1}^{n}\delta_{i}(1-2P_{i})\right)\sum_{i=1}^{n}% \frac{\delta_{i}x_{i}(1-2P_{i})w_{i}}{P_{i}}-\sum_{i=1}^{n}P_{i}w_{i}\sum_{i=1% }^{n}\frac{\delta_{i}x_{i}(1-2P_{i})}{P_{i}}$
	$\displaystyle<$	$\displaystyle\sum_{i=1}^{n}\frac{\delta_{i}x_{i}(1-2P_{i})}{P_{i}}\sum_{i=1}^{% n}\delta_{i}(1-2P_{i})w_{i},$

and note that $1=\sum_{i=1}^{n}P_{i}$ , it is equivalent to show

\sum_{i=1}^{n}[\delta_{i}(1-2P_{i})+P_{i}]\sum_{i=1}^{n}\frac{\delta_{i}x_{i}(% 1-2P_{i})w_{i}}{P_{i}}<\sum_{i=1}^{n}\frac{\delta_{i}x_{i}(1-2P_{i})}{P_{i}}% \sum_{i=1}^{n}[\delta_{i}(1-2P_{i})+P_{i}]w_{i}.

(35)

Let $a_{i}=\frac{\delta_{i}x_{i}(1-2P_{i})}{P_{i}},b_{i}=\delta_{i}(1-2P_{i})+P_{i}$ , then Eq. (35) is equivalent to

\sum_{i=1}^{n}b_{i}\cdot\sum_{i=1}^{n}a_{i}w_{i}<\sum_{i=1}^{n}a_{i}\cdot\sum_% {i=1}^{n}b_{i}w_{i}\Longleftrightarrow\sum_{i>j}(w_{i}-w_{j})\left(\frac{b_{i}% }{a_{i}}-\frac{b_{j}}{a_{j}}\right)>0.

Next, without loss of generality we show that for any $1\leq i<j\leq n$ , if $w_{i}>w_{j}$ then it also holds that $\frac{b_{i}}{a_{i}}>\frac{b_{j}}{a_{j}}$ for sufficiently large $\beta$ . In fact, from $P_{i}=\frac{x_{i}e^{\beta w_{i}}}{\sum_{k=1}^{n}x_{k}e^{\beta w_{k}}}$ and Eq. (23) we obtain

$\displaystyle\frac{b_{i}}{a_{i}}\cdot\frac{a_{j}}{b_{j}}=$	$\displaystyle\frac{\delta_{j}(\delta_{i}(1-2P_{i})+P_{i})(1-2P_{j})}{\delta_{i% }(\delta_{j}(1-2P_{j})+P_{j})(1-2P_{i})}\cdot e^{\beta(w_{i}-w_{j})}$
$\displaystyle=$	$\displaystyle\frac{1+P_{i}\delta_{i}/(1-2P_{i})}{1+P_{j}\delta_{j}/(1-2P_{j})}% \cdot e^{\beta(w_{i}-w_{j})}$
$\displaystyle=$	$\displaystyle\frac{1+\frac{P_{i}^{3}}{(1-2P_{i})(P_{i}^{2}+x_{i}^{2}c_{i}^{% \prime\prime}(x_{i}))}}{1+\frac{P_{j}^{3}}{(1-2P_{j})(P_{j}^{2}+x_{j}^{2}c_{j}% ^{\prime\prime}(x_{j}))}}\cdot e^{\beta(w_{i}-w_{j})}.$	(36)

On the one hand, because $1-2P_{i},1-2P_{j}\in[\frac{1}{3},1]$ , $x_{i},x_{j}\in[0,L]$ , and $c_{i}^{\prime\prime}(x_{i}),c_{j}^{\prime\prime}(x_{j})>0$ , the first terms of the LHS of Eq. (36) is a positive number lower bounded away from zero. On the other hand, $w_{i}>w_{j}$ ensures that the second term $e^{\beta(w_{i}-w_{j})}$ can be arbitrarily large as long as $\beta$ is sufficiently large. Therefore, there must exist a $\beta_{0}>0$ such that for any $\beta>\beta_{0}$ , $\frac{b_{i}}{a_{i}}\cdot\frac{a_{j}}{b_{j}}>1$ holds for any $i>j$ . As a result, $(w_{i}-w_{j})\left(\frac{b_{i}}{a_{i}}-\frac{b_{j}}{a_{j}}\right)>0$ holds, which completes the proof.

B Additional Experiments

We use the following Multi-agent Mirror Descent (MMD) algorithm as the PNE solver of $C^{4}$ , whose convergence is guaranteed by Bravo et al. (2018). Since each creator’s strategy set $\mathcal{X}_{i}=[0,+\infty)$ , we can simply choose a projection mapping $\text{Proj}_{\mathcal{X}_{i}}(\bm{x})=(\mathop{max}(x_{i},0))_{i=1}^{n}$ . The gradients of utility functions can be implemented directly since they have closed forms. Through our experiment, the default $T=10000,\eta=0.1,\epsilon=1e-2,x_{i}^{(0)}=1.0$ . Algorithm 2 is a simplified version of Algorithm 1 in Bravo et al. (2018) where we replace the gradient estimation to the exact gradient. According to Theorem 5.1 in Bravo et al. (2018), Algorithm 2 converges to the unique PNE of any $C^{4}$ game with probability 1.

Algorithm 2 Multi-agent Mirror Descent (MMD) with perfect gradient

Input: Maximum iteration number

T

, step size

\eta

, each player

i

’s utility function

u_{i}

, error tolerance

\epsilon

, initial strategy

\bm{x}_{i}=\bm{x}^{(0)}_{i}

repeat

Compute the exact gradient

\bm{g}_{i}=\nabla_{i}u_{i}(\bm{x}_{i},\bm{x}_{-i}),\forall i\in[n]

Update

\bm{x}_{i}\leftarrow\text{Proj}_{\mathcal{X}_{i}}(\bm{x}_{i}+\eta\bm{g}_{i}),% \forall i\in[n]

until Maximum iteration number is reached or

\|(\bm{g}_{1},\cdots,\bm{g}_{n})\|_{2}<\epsilon.

Output:

(\bm{x}_{1},\cdots,\bm{x}_{n})

Figure 3 illustrates the trade-off between $U$ and $V$ in the MovieLens environment, and Figure 4 plots the same information as shown in Figure 2 but with a different value of $\lambda=0.1$ . As we can see, the optimal PPM( $\bm{\beta}$ ) found by Algorithm 1 conveys a consistent message: it prioritizes the recommendation accuracy for users with more determined preferences while increasing the exploration strength for less selective users.