Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparency of popularity scoring algorithm used in trending posts, hashtags, people, news #1458

Open
JohannesBuchner opened this issue Jun 3, 2024 · 0 comments

Comments

@JohannesBuchner
Copy link

JohannesBuchner commented Jun 3, 2024

Given that mastodon tries to avoid algorithms that impose preferences onto users, it would be good to thoroughly document the algorithms used in the "explore" sections.

Currently the documentation has vague statements. On https://docs.joinmastodon.org/methods/trends/ :

  • "View hashtags that are currently being used more frequently than usual."
  • "Links that have been shared more than others."
  • "Tags that are being used more frequently within the past week." (this is also incorrect if I look at the code below)
    The links at the bottom lead to code with no comments.

What is the threshold? How is frequency determined? How is the parent pool of hashtags found?

After digging for an hour, I found these functions in the code:

      expected  = 1.0
      observed  = (status.reblogs_count + status.favourites_count).to_f

      score = if expected > observed || observed < options[:threshold]
                0
              else
                ((observed - expected)**2) / expected
              end

      decaying_score = if score.zero? || !eligible?(status)
                         0
                       else
                         score * (0.5**((at_time.to_f - status.created_at.to_f) / options[:score_halflife].to_f))
                       end
      expected  = tag.history.get(at_time - 1.day).accounts.to_f
      expected  = 1.0 if expected.zero?
      observed  = tag.history.get(at_time).accounts.to_f
      max_time  = tag.max_score_at
      max_score = tag.max_score
      max_score = 0 if max_time.nil? || max_time < (at_time - options[:max_score_cooldown])

      score = if expected > observed || observed < options[:threshold]
                0
              else
                ((observed - expected)**2) / expected
              end

      if score > max_score
        max_score = score
        max_time  = at_time

        # Not interested in triggering any callbacks for this
        tag.update_columns(max_score: max_score, max_score_at: max_time)
      end

      decaying_score = max_score * (0.5**((at_time.to_f - max_time.to_f) / options[:max_score_halflife].to_f))

      next unless decaying_score >= options[:decay_threshold]

      items << { score: decaying_score, item: tag }

From this, I can see that the admin can alter the behaviour with options.

I think it would be nice to be transparent about the algorithms used, both to users and to developers.

I suggest two improvements:

  1. on the "Explore" page, add a "more information" link to the "These are posts from across the social web that are gaining traction today. Newer posts with more boosts and favorites are ranked higher." popup, which leads to a documentation page that presents the algorithm configuration used in this instance. Same for each of Posts, Hashtags, People, News
  2. on that documentation page, give the algorithm used with the option values of the instance.

For the posts, this could look something like:

<details><summary>These are posts from across the social web that are gaining traction today. Newer posts with more boosts and favorites are ranked higher (click for details)</summary>
<p>
This instance uses the algorithm below with the options
 <ul>
     <li>threshold = 100
     <li>score_halflife = 1234s
  </ul>
The popularity score of an eligible post is computed with the number of reblogs and favourites, and the age in seconds of a post as:
<blockquote>
      expected  = 1.0
      observed  = reblogs_count + favourites_count
      if expected > observed or observed < threshold:
          score = 0
      else:
           score = ((observed - expected)**2) / expected * (0.5**(age / score_halflife))
</blockquote>
</details>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant