Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching to queries #19121

Open
wants to merge 22 commits into
base: trunk
Choose a base branch
from
Open

Add caching to queries #19121

wants to merge 22 commits into from

Conversation

aristath
Copy link
Contributor

@aristath aristath commented Nov 3, 2022

Context

Adds caches to queries - See #17005 for details on the issue.

Summary

This PR adds some caching methods to the ORM class, to handle storing, getting, and flushing caches.
These caching methods are then used to set/get/flush caches when appropriate.

After a discussion with @leonidasmi we figured it would be better if the caches have a relatively short lifespan of 10 minutes. This time is enough to avoid any and all performance issues, without any SEO drawbacks.

This PR can be summarized in the following changelog entry:

  • Implements caches for indexable queries.

Relevant technical choices:

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Block/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

QA can test this PR by following these steps:

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change.

Quality assurance

  • I have tested this code to the best of my abilities
  • I have added unit tests to verify the code works as intended
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.

Fixes #19555

Copy link
Member

@diedexx diedexx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far!

lib/orm.php Outdated Show resolved Hide resolved
lib/orm.php Outdated Show resolved Hide resolved
lib/orm.php Outdated Show resolved Hide resolved
lib/orm.php Outdated Show resolved Hide resolved
lib/orm.php Show resolved Hide resolved
lib/orm.php Outdated Show resolved Hide resolved
lib/orm.php Outdated Show resolved Hide resolved
@aristath aristath requested a review from diedexx November 7, 2022 07:58
@aristath aristath marked this pull request as ready for review November 7, 2022 08:12
Copy link
Member

@diedexx diedexx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! One minor comment remains.

lib/orm.php Outdated Show resolved Hide resolved
Comment on lines +2717 to +2720
// Bail early if the site doesnt have persistent caching.
if ( ! self::has_persistent_cache() ) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that's unnecessary? This method is not supposed to be run anyway if there's no persistent cache and even if something goes wrong in the future and it does run, it's not much of a performance hit.

Also if we don't bail early, we can make the @return type into a string.

lib/orm.php Outdated Show resolved Hide resolved
*
* @return bool
*/
private static function has_persistent_cache() {
Copy link
Contributor

@leonidasmi leonidasmi Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would like us to explore the possibility of depending only on the wp_using_ext_object_cache() method for assessing whether there is persistent cache and not on the existence of files. A site might as well have a WP_CONTENT_DIR . '/object-cache.php' file, without it meaning that persistent object cache is active there.

It would be awesome if we had something like this for the has_persistent_cache() method (this also adds filters to hard enable/disable our caching):

	private static function has_persistent_cache() {
		if ( self::$has_persistent_cache === false || self::$has_persistent_cache === true ) {
			return self::$has_persistent_cache;
		}

		 /**
		 * Filter: 'wpseo_force_enable_indexable_cache' - Allows force enabling indexable caching.
		 *
		 * @api bool Whether indexable cache is forced to be enabled.
		 */
		if ( \apply_filters( 'wpseo_force_enable_indexable_cache', false ) ) {
			self::$has_persistent_cache = true;
			return true;
		}

		// Check if the site is using an external object cache.
		if ( \function_exists( 'wp_using_ext_object_cache' ) ) {
			/**
			 * Filter: 'wpseo_force_disable_indexable_cache' - Allows force disabling indexable caching.
			 *
			 * @api bool Whether indexable cache is forced to be disabled.
			 */
			if ( \apply_filters( 'wpseo_force_disable_indexable_cache', false ) ) {
				self::$has_persistent_cache = false;
				return false;
			}

			self::$has_persistent_cache = true;
			return true;
		}

		// No persistent caching was detected.
		self::$has_persistent_cache = false;
		return false;
	}

but I'm worried about the

		// This _should_ be covered by the `wp_using_ext_object_cache()` check,
		// but sometimes isn't depending on the timing of the cache.

remark you've added @aristath 🤔 Can we talk about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like wp_using_ext_object_cache should be enough here. The rest seems like overkill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for your input! We're probably gonna rely solely on that then.

(We do want to keep the filters proposed above, for allowing users to hard enable/disable the enhancement, for taking care of edge cases)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you care that the object cache in persistent? WordPress fallbacks to an in memory cache if persistent cache is not present. But even this may mean saving some database queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the plugin already employs a custom cache implementation (via Meta_Tags_Context_Memoizer) that covers a lot of the benefits of a memory cache, within single page loads.

There's the possibility of enabling this for every setup regardless of object cache being persistent or not (thus providing performance benefits for all Yoast websites), but this should be staggered among multiple releases imo, for risk management reasons amongst others.

Comment on lines +2620 to +2623
if ( \file_exists( WP_CONTENT_DIR . '/object-cache.php' ) ) {
self::$has_persistent_cache = true;
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WordPress performance plugin installs a object-cache.php for testing but object caching is not enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, good point!

*
* Individual cache-groups are purged when a record is updated or deleted.
*/
\wp_cache_set( $key, $value, self::get_cache_group( $group_id ), ( 10 * MINUTE_IN_SECONDS ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, by do correct cache invalidation. there are lots of actions in core to let you do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already invalidating cache in this PR, but we want to take the better-safe-than-sorry route here and roll out the enhancement with a short TTL as a first iteration.

Once that's out and battle tested in the wild, we can then consider to increase the TTL in later iterations, for greater performance impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can (and should, probably) make the TTL filterable though, to allow customization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are doing cache invalidation, you should not need to only cache for 10 minutes. Just let the invalidation handle itself. A 10 minute cache is not specially useful. For most high traffic sites, they would be behind a CDN and traffic might not hit the origin for a url every 10 minutes. Object caches like this are useful for sites with lots of traffic and content. So a site with 100k posts, might only get 1-5 hits a day on an article from 5 years ago. But if google indexes the site, that number of db queries can take a site down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I'm confused, can you please elaborate? Because I'm probably missing context.

I thought what prompted all this is the need for putting queries behind cache to avoid repeated db load within a short period of time. Copying from your original post:

but for sites under heavily load or under a DDOS attack, 3 queries a page should take the database server and website down.

For that reason, we came up with that staggered strategy I mention above. To take care of such cases (and then expand our feature to a longer caching period) in a safe and contained way.

We have to keep in mind that what we're trying to cache here has serious SEO implications and any inconsistency brought by cache has the potential to hurt websites in that regard.

So a site with 100k posts, might only get 1-5 hits a day on an article from 5 years ago.
If so, are we caching here just to make those 1-5 queries a day into 0?

If that's the case, we should also re-consider our whole current implementation. Because they way we do it now, cache invalidation happens with every insert/update/delete to the table in play with each query, which has the potential to happen rather frequently (with every new post, with every updated post, with every post visit (if SEO optimization hasn't been performed in that site), with every user addition, with every term addition, etc. etc.)

@aristath Do you have any thoughts on all that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key should be generated from the object and id. Cache invalidation should done on that cache key when only that object is changed. Not every update to the database.

Comment on lines +2679 to +2693
public static function flush_group_caches( $group_id = '' ) {
// Bail early if the site doesnt have persistent caching.
if ( ! self::has_persistent_cache() ) {
return;
}
// Bail early if the cache doesn't support group-flushing.
if ( \function_exists( 'wp_cache_supports' ) && ! \wp_cache_supports( 'flush_group' ) ) {
return;
}
// Bail early if the `wp_cache_flush_group` function doesn't exist.
if ( ! \function_exists( 'wp_cache_flush_group' ) ) {
return;
}
\wp_cache_flush_group( self::get_cache_group( $group_id ) );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wp_cache_flush_group is a very new feature of object caching and is not very well supported. their needs to be a fallback here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to disable the feature completely if on a setup with no support of wp_cache_flush_group

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why validate everythings through?

Copy link
Contributor

@spacedmonkey spacedmonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR really need unit tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache indexables when they are retrieved
4 participants