Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last Mod. column in sitemaps contains wrong data #15801

Open
poluhovich opened this issue Aug 5, 2020 · 5 comments
Open

Last Mod. column in sitemaps contains wrong data #15801

poluhovich opened this issue Aug 5, 2020 · 5 comments

Comments

@poluhovich
Copy link

poluhovich commented Aug 5, 2020

Please give us a description of what happened.

Source of the issue is in the file: inc/sitemaps/class-post-type-sitemap-provider.php:

$sql = "
	SELECT post_modified_gmt
	    FROM ( SELECT @rownum:=0 ) init 
	    JOIN {$wpdb->posts} USE INDEX( type_status_date )
	    WHERE post_status IN ('" . implode( "','", $post_statuses ) . "')
	      AND post_type = %s
	      AND ( @rownum:=@rownum+1 ) %% %d = 0
	    ORDER BY post_modified_gmt ASC
";

Since that query contains WHERE clause for filtering by post_type and post_status, the numeration of rows isn't what is expected.

Example of the source query:

SELECT post_modified_gmt FROM ( SELECT @rownum:=0 ) init JOIN wp_posts USE INDEX( type_status_date )
WHERE post_status IN ('publish') AND post_type = 'post'
AND ( @rownum:=@rownum+1 ) % 1000 = 0 ORDER BY post_modified_gmt ASC;

In my case that query returns next data (just for example, I will use these date below for searching):

post_modified_gmt
2015-08-04 14:20:20
2017-03-14 13:36:57
2017-04-10 13:52:36
2017-07-07 10:50:59
2017-07-10 14:08:37
2017-09-06 10:31:39
2017-12-29 14:09:19
2018-01-08 10:47:58
2018-03-22 16:35:27
2020-05-29 11:13:40
2020-07-08 12:16:20

But if we compare with data from the next query, which outputs a number of rows, and search for these dates, we will see that numbers of rows aren't what we expected to see. We expected to see numbers of a row as multiple of 1000 (due to ( @rownum:=@rownum+1 ) % 1000 = 0 ) but these numbers are not.

Example of SQL query with a number of rows output:

SELECT post_modified_gmt, @rownum := @rownum + 1 as row FROM ( SELECT @rownum:=0 ) init JOIN wp_posts USE INDEX( type_status_date )
WHERE post_status IN ('publish') AND post_type = 'post'
ORDER BY post_modified_gmt ASC;

Example of results:

post_modified_gmt	row
...
2015-08-04 14:20:20	198
...
2017-03-14 13:36:57	3210
...
2017-04-10 13:52:36	3530
...

and so on

Solution

The solution is to wrap the whole SQL query into another SQL query and filter results by row number:

SELECT post_modified_gmt FROM (
    SELECT post_modified_gmt, (@rownum := @rownum + 1) as row FROM ( SELECT @rownum:=0 ) init
    JOIN wp_posts USE INDEX( type_status_date )
    WHERE post_status IN ('publish') AND post_type = 'post'
    ORDER BY post_modified_gmt ASC
) t
WHERE row % 1000 = 0;

and output of row column prooves that data is correct now:

row	post_modified_gmt
1000	2016-01-12 14:00:02
2000	2016-11-18 10:44:40
3000	2017-02-21 15:50:37
4000	2017-06-08 08:30:16
5000	2017-10-23 10:06:55
6000	2017-12-07 13:58:39
7000	2018-01-11 12:50:57
8000	2018-03-09 15:46:52
9000	2018-09-12 07:31:43
10000	2019-06-25 14:06:21
11000	2020-06-16 08:53:55

So, the original code in the mentioned file needs to be next:

$sql = "
	SELECT post_modified_gmt FROM (
		SELECT post_modified_gmt, (@rownum := @rownum + 1) as row
		FROM ( SELECT @rownum:=0 ) init 
		    JOIN {$wpdb->posts} USE INDEX( type_status_date )
		    WHERE post_status IN ('" . implode( "','", $post_statuses ) . "')
		      AND post_type = %s
		    ORDER BY post_modified_gmt ASC
	) t WHERE row %% %d = 0
";
@Djennez
Copy link
Member

Djennez commented Aug 6, 2020

Hi @poluhovich and thank you for your detailed report. However, it's kind of unclear what problem this is trying to fix, could you provide a problem description for this solution?

@poluhovich
Copy link
Author

@Djennez the problem is that "Last Modified" column in the sitemaps contains wrong data, I don't know if it has any impact on SEO:
image

@Djennez
Copy link
Member

Djennez commented Aug 6, 2020

Do you have any information on how to recreate wrong data in that column? Because my data seems correct at the moment.

@poluhovich
Copy link
Author

poluhovich commented Aug 6, 2020

@Djennez I guess it might be related to the fact that I have a few hundreds of excluded post IDs, I've described related issue here: #15807

@DinaSkills
Copy link

Hey, i notice same in my sitemap but notice that if I update category or tag it does not chage last mod. Aslo no new categories at all eventhough I cached everything done everthing on their page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants