Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use order_by argument as cursor #3

Closed
domangi opened this issue Mar 24, 2021 · 2 comments · Fixed by #7
Closed

Use order_by argument as cursor #3

domangi opened this issue Mar 24, 2021 · 2 comments · Fixed by #7
Assignees
Labels
bug Something isn't working enhancement New feature or request
Projects

Comments

@domangi
Copy link

domangi commented Mar 24, 2021

What

  • The order_by argument should accept not only field names, but also sql statements, e.g. CONCAT(field, otherfield) or FIELD(field, 'value1', 'value2')
  • The order_by value, which defaults to id, should be used as the cursor, for filtering and for ordering
  • All custom ordering logic should be removed from the gem
  • Update readme, to specify that the order_by column is expected to be a unique and should be indexed, and describing how to use field concatenation in case the user does not care about the performance impact

Why

Currently, whenever you pass a value in the order_by argument, the paginator takes care of building a unique cursor by concatenating the order_by column and the id.

sql =
    if custom_order_field?
      "CONCAT(#{@order_field}, '-', #{id_column})"
    else
      id_column
    end

Since it is not possible to add an index on a computed value like concat(field, other_field), this can lead to performance issues, when paginating on big tables. See this article for more details about performance in cursor pagination.

For this reason the responsibility of ordering by a unique attribute should me moved outside of the gem. When using the gem, if you need to order by a column that is not unique, you should find the best solution for you to make it unique, and pass that to the paginator.
This could be done by adding a new column to the table that represents the wanted ordering and is unique, or similarly using a view.

To allow ordering by a non unique column, without changing the database, the gem should allow to order by a SQL statement.

Paginator.new(..., order_by: "concat('name','-','id')", ...)

This would still ensure the pagination works correctly, since it will order and filter by a unique cursor, but it could lead to performance issues, since there is no index on the cursor.

@nicolas-fricke nicolas-fricke added enhancement New feature or request bug Something isn't working labels Mar 24, 2021
@nicolas-fricke
Copy link
Member

Thanks for filing this issue @domangi! I've added the bug label since the documentation currently recommends to add an index on the concatenated field, which, as it turns out, is not possible in MySQL and Postgres databases. Neither support indexes on calculated fields.

This change would remove functionality from the gem and therefore require a major version bump. But as from the article you referenced, it probably makes sense to let the user actively decide if they really want to pay this performance price and not do this behind the scenes in the gem.

@nicolas-fricke nicolas-fricke self-assigned this Apr 7, 2021
@nicolas-fricke nicolas-fricke added this to In progress in Development Apr 7, 2021
@nicolas-fricke
Copy link
Member

After more investigation, we found a way of maintaining support to order by any arbitrary column while also being able to set appropriate indices to the database (works for MySQL as well as Postgres).

If our query is:

RailsCursorPagination::Paginator
  .new(relation, order_by: :author, first: 2, after: "WyJKYW5lIiw0XQ==")
  .fetch

and our cursor encodes something like ['Jane', 4], the generated SQL query so far was

SELECT *
FROM "posts"
WHERE CONCAT(author, '-', "posts"."id") > 'Jane-4'
ORDER BY CONCAT(author, '-', "posts"."id") ASC
LIMIT 2

which was performing poorly since contrary to this gem's documentation neither MySQL nor Postgres support indices on functions like CONCAT.

But we can rewrite this query to something like this:

SELECT *
FROM "posts"
WHERE (author > 'Jane') OR (author = 'Jane' AND "posts"."id" > 4)
ORDER BY author ASC, "posts"."id" ASC
LIMIT 2

If we then add a compound index to (author, id), the database can use this to resolve both the WHERE clauses as well as the ORDER BY condition. I created an index on MySQL like this:

CREATE INDEX index_posts_on_author_and_id ON posts (author, id);

And then an EXPLAIN on the SELECT query returned this (on a database with 10000 records):

+----+-------------+-------+------------+-------+--------------------------------------+------------------------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys                        | key                          | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+-------+------------+-------+--------------------------------------+------------------------------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | posts | NULL       | range | PRIMARY,index_posts_on_author_and_id | index_posts_on_author_and_id | 776     | NULL | 5016 |   100.00 | Using index condition |
+----+-------------+-------+------------+-------+--------------------------------------+------------------------------+---------+------+------+----------+-----------------------+

So it managed to use the index for a range query.

This still doesn't solve the requested support for arbitrary SQL queries, but I think it would fix the biggest issue of the bad performance. And with this change, the ordering also respects the column type and does not treat everything as string. It would allow users to just use the gem, order by any column in their table and then add indices if performance becomes an issue.

In the case that more complex ordering logic is required (e.g. a FIELD query in MySQL), this would have to be solved by a separate field in the database.

Do you think this would be a viable solution @domangi ?

Development automation moved this from In progress to Done Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

2 participants