-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: MDEV-28724: Add stats sampling based on % of total pages #2173
base: 11.0
Are you sure you want to change the base?
WIP: MDEV-28724: Add stats sampling based on % of total pages #2173
Conversation
I'm looking into all these and will have something shortly. |
Thank you for the contribution. I only had a quick look at the proposed interface. I think that it would be better to use the MariaDB Could we have a single parameter, say, I would avoid putting any ticket numbers in test names, and instead try to use subsystem or feature names in tests. For example, the file name |
Thanks a lot, @dr-m !
Really good point and I agree. I'm thinking of adding the parameter to
Yes! That was my thought initially. I then changed my mind because I wanted the setting to represent one thing and one thing only. However, after typing 2 parameters repeatedly during my testing, I think a single parameter is a wonderful idea!
Agreed. I'm working on revamping this based on your input. I will also target 10.10. |
Sounds good. I would use this existing parameter for reference: static MYSQL_THDVAR_UINT(default_encryption_key_id, PLUGIN_VAR_RQCMDARG,
"Default encryption key id used for table encryption.",
NULL, innodb_default_encryption_key_id_update,
FIL_DEFAULT_ENCRYPTION_KEY, 1, UINT_MAX32, 0);
…
HA_TOPTION_SYSVAR("ENCRYPTION_KEY_ID", encryption_key_id, default_encryption_key_id), Unlike this, no static MYSQL_THDVAR_UINT(stats_percentage, PLUGIN_VAR_RQCMDARG,
"Insert description here",
nullptr, nullptr, 0, 0, 100, 0);
…
HA_TOPTION_SYSVAR("STATS_PERCENTAGE", stats_percentage, stats_percentage), If we want to avoid a duplicated meaning of the value 0 (does it make sense to use percentage-based sampling with that value?), we might repurpose a previously impossible bit pattern to enable percentage-based sampling: diff --git a/include/my_base.h b/include/my_base.h
index f96e569cae4..f45a4b92d78 100644
--- a/include/my_base.h
+++ b/include/my_base.h
@@ -341,7 +341,7 @@ enum ha_base_keytype {
storage engine and used by the optimizer for query optimization will be
stored on disk and will not change after a server restart.
*/
-#define HA_OPTION_STATS_PERSISTENT 4096U
+#define HA_OPTION_STATS_PERSISTENT (1U << 12)
/*
STATS_PERSISTENT=0 has been specified in CREATE/ALTER TABLE. Statistics
for the table will be wiped away on server shutdown and new ones recalculated
@@ -350,7 +350,11 @@ enum ha_base_keytype {
explicitly set at table level and the corresponding table will use whatever
is the global server default.
*/
-#define HA_OPTION_NO_STATS_PERSISTENT 8192U
+#define HA_OPTION_NO_STATS_PERSISTENT (2U << 12)
+/*
+ STATS_PERSISTENT=2 has been specified...
+*/
+#define HA_OPTION_STATS_PERCENTAGE (3U << 12)
/* .frm has extra create options in linked-list format */
#define HA_OPTION_TEXT_CREATE_OPTIONS_legacy (1U << 14) /* 5.2 to 5.5, unused since 10.0 */
diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
index 172ea82b1a6..ba3966bb54d 100644
--- a/sql/sql_yacc.yy
+++ b/sql/sql_yacc.yy
@@ -5621,6 +5621,9 @@ create_table_option:
case 1:
Lex->create_info.table_options|= HA_OPTION_STATS_PERSISTENT;
break;
+ case 2:
+ Lex->create_info.table_options|= HA_OPTION_STATS_PERCENTAGE;
+ break;
default:
thd->parse_error();
MYSQL_YYABORT; I will be on vacation between July 4 and July 24. |
I think one setting I'm working on this now. Hopefully it will be in a good state before your vacation. |
d61fd4a
to
e3aeb84
Compare
@dr-m There are still work to be done, but I wanted to push in some changes for your review before your vacation, if you have time. For example, the Thanks! |
Weird that github says |
I made upstream contributions previously as an independent developer, as a result I signed the CLA. Now I'm contributing as an AWS employee and using the BSD license, can I unsign the CLA I previously signed? I suppose I could create a different Github account, but don't want to do that if not absolutely necessary. |
All of the build failed on the same test: mysql-test/main/subselect_sj2_stats_percentage.test. It is identical to I need some help to get the reject file so I can do a comparison. Suggestions/comments welcome. By the way, the Thanks! |
@LinuxJedi Can you give some advice here what you want Haidong to take into account additionally while polishing the submission? |
In general this looks good to me, only minor thing I saw was a code comment closed without a space before the |
storage/innobase/dict/dict0stats.cc
Outdated
#define N_SAMPLE_PAGES(index) \ | ||
static_cast<ib_uint64_t>( \ | ||
(index)->table->stats_percentage != 0 \ | ||
? (index)->stat_index_size * \ | ||
(index)->table->stats_percentage / 100 + 1 \ | ||
: (index)->table->stats_sample_pages != 0 \ | ||
? (index)->table->stats_sample_pages \ | ||
: srv_stats_persistent_sample_pages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be defined as an inline
function with some if
and else
instead of multiple ternary operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. It was a define macro, but with a little bit of added logic, it's getting unwieldy. I'll change in a new commit that'll be pushed in today.
/* With this option the user defines the key identifier used for the encryption */ | ||
HA_TOPTION_SYSVAR("ENCRYPTION_KEY_ID", encryption_key_id, default_encryption_key_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "This option identifies the key for encrypting the data file"? This would be best changed in a comment on its own, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This change is unrelated. I'll revert in a new push.
storage/innobase/include/dict0mem.h
Outdated
/** The percentage of pages to sample for this table during persistent | ||
stats estimation. If this is 0, then the value of the stats_sample_pages | ||
will be used instead. */ | ||
ulint stats_percentage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need 32 or 64 bits for this?
The sizeof(dict_index_t)
is already large enough. Could we combine all fields into a single word:
unsigned stat_persistent:2;
unsigned stats_auto_recalc:2;
unsigned stats_percentage:7;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where will ha_table_option_struct::stats_percentage
be assigned to dict_table_t::stats_percentage
? I did not find any code to do that. For the pre-existing statistics attributes that is being done in innobase_copy_frm_flags_from_table_share()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding combining all fields into a single word, I tried it, but received compiler warning because of type inconsistency with definitions inside dict0stats.inl
. I ignored the warning, but that produced failures for test case innodb_stats_create_on_corrupted
. In the end, I opted to use ib_uint32_t
, which made all tests pass.
Where will ha_table_option_struct::stats_percentage be assigned to dict_table_t::stats_percentage?
Addressed in a new push.
5adacc6
to
7636f8d
Compare
I did rebase against 10.11, and also addressed Marko's comments. The commit message is rewritten in the latest commit. Please review it. Thanks! |
There seems to be an extra cherry-picked commit before the main commit. Furthermore, I believe that meanwhile the target for new features has moved to the 11.0 release or its not-yet-existing successor branch. I’d still like to see the use of bit fields in |
When running `ANALYZE TABLE` against `innodb` tables, index stats are calculated by sampling `STATS_SAMPLE_PAGES` number of pages. This new feature allows a user to define a percentage of total indexe pages to sample, between 0 and 100. Here is how 3 competing database products implement this feature: - PostgreSQL As far as I can tell, PostgreSQL doesn't have a similar feature. I checked both the `CREATE STATISTICS`[^1] and `ANALYZE`[^2] documentation - SQL Server SQL Server has the setting of `PERSIST_SAMPLE_PERCENT`[^3]. This name maybe the most relevant when considering the naming of our setting variables; - Oracle Like SQL Server, Oracle has something similar[^4]. Here is the proposed solution implemented in this PR: - 1 new `innodb` setting `STATS_PERCENTAGE` for `CREATE` or `ALTER TABLE`. Its value ranges between 0 and 100, inclusive; - To minimize memory footprint, the new `stats_percentage` uses bit field. The `stat_persistent` and `stats_auto_recalc` are also refactored to use bit fields. - During index stats sampling (ANALYZE TABLE, for example) 1. Check the table's `STATS_PERCENTAGE` setting. If `> 0`, `N_SAMPLE_PAGES` will be calculated using its `STATS_PERCENTAGE`. Otherwise: 2. Check the table's `STATS_SAMPLE_PAGES` value. If greater than 0, then set `N_SAMPLE_PAGES` to it. Otherwise: 3. Set `N_SAMPLE_PAGES` to `INNODB_STATS_PERSISTENT_SAMPLE_PAGES` New feature usage examples 1. `CREATE TABLE` using `STATS_PERCENTAGE` ``` MariaDB [test]> CREATE TABLE t(a INT UNSIGNED PRIMARY KEY) ENGINE=InnODB STATS_PERSISTENT=1 STATS_PERCENTAGE=20; Query OK, 0 rows affected (0.007 sec) MariaDB [test]> SHOW CREATE TABLE t; +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t | CREATE TABLE `t` ( `a` int(10) unsigned NOT NULL, PRIMARY KEY (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci STATS_PERSISTENT=1 `STATS_PERCENTAGE`=20 | [^5] +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) ``` 2. `ALTER TABLE` using `STATS_PERCENTAGE` ``` MariaDB [test]> ALTER TABLE t STATS_PERCENTAGE=35; Query OK, 0 rows affected (0.007 sec) Records: 0 Duplicates: 0 Warnings: 0 MariaDB [test]> SHOW CREATE TABLE t; +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | t | CREATE TABLE `t` ( `a` int(10) unsigned NOT NULL, PRIMARY KEY (`a`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci STATS_PERSISTENT=1 `STATS_PERCENTAGE`=35 | [^5] +-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.000 sec) ``` All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc. [^1]: https://www.postgresql.org/docs/current/sql-createstatistics.html [^2]: https://www.postgresql.org/docs/current/sql-analyze.html [^3]: https://docs.microsoft.com/en-us/sql/t-sql/statements/update-statistics-transact-sql?view=sql-server-ver16 [^4]: https://docs.oracle.com/en/database/oracle/oracle-database/21/tgsql/gathering-optimizer-statistics.html [^5]: I'm aware of the issue with unwanted tilde and possibly pipe characters. I wonder if it is related to MDEV-29001 and should be addressed there.
7636f8d
to
e44b580
Compare
I have implemented the suggested change:
Stats is such an important part of the query processing engine so please review carefully. I had to rewrite a handful of tests, and re-record some result files. Additionally, This PR is now rebased on @dr-m Your input welcome and much appreciated. Thanks! |
Description
Currently when running
ANALYZE TABLE
againstinnodb
tables, indexstats are calculated by sampling
STATS_SAMPLE_PAGES
number of pages.This new feature allows a user to define a percentage of total index
pages to sample, between 0 and 100.
Here is how 3 competing database products implement this feature:
As far as I can tell, PostgreSQL doesn't have a similar feature. I
checked both the CREATE STATISTICS and ANALYZE documentation
SQL Server has the setting of PERSIST_SAMPLE_PERCENT This name maybe the
most relevant when considering the naming of our setting variables;
Like SQL Server, Oracle has something similar.
Here is the proposed solution implemented in this PR:
innodb
settings--
innodb_stats_persistent_percent_sampling
This is a boolean setting,
true
means the sampling is based on percentage.Default value should be
false
, which is consistent with the current stateof not having this feature and won't break things;
--
innodb_stats_persistent_sample_percentage
This is an int, between 0 and 100.
stas_percent_sampling
setting. Iftrue
,N_SAMPLE_PAGES
will be calculated using itsstats_sample_percentage
. Otherwise:stats_sample_pages
value. If greater than 0, thenset
N_SAMPLE_PAGES
to it. Otherwise:innodb_stats_persistent_percent_sampling
. Iftrue
,N_SAMPLE_PAGES
will be calculated usinginnodb_stats_persistent_sample_percentage
. Otherwise:N_SAMPLE_PAGES
toinnodb_stats_persistent_sample_pages
New feature usage examples
How can this PR be tested?
New test cases added and verified. Suggestions on how to best organize them welcome!
Basing the PR against the correct MariaDB version
Backward compatibility
N/A
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.