-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MDEV-29209] Introduce the connection delay feature #2393
base: 11.0
Are you sure you want to change the base?
[MDEV-29209] Introduce the connection delay feature #2393
Conversation
b901a50
to
6355d2d
Compare
a0c9c84
to
2f0e9a9
Compare
@vuvova Would you like to have a look at code and design? |
link pr #2331 |
@vuvova Any comments would be great. |
@dingweiqings (just an FYI whilst it is in review) some of our builders are failing with this:
|
@LinuxJedi Fix it already . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.tags
is checked in by mistake
more comments inline.
mysql-test/main/mysqld--help.result
Outdated
@@ -299,6 +299,8 @@ The following specify which files/extra groups are read (specified before remain | |||
--extra-port=# Extra port number to use for tcp connections in a | |||
one-thread-per-connection manner. 0 means don't use | |||
another port | |||
--failed-connections-threshold=# | |||
The failed login threshold connection control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this help text doesn't really help. I don't understand what "failed login threshold connection control" mean.
better something like "The number of failed authentication attempts before the delay is applied.
But even better, I'd say, just hard-code some reasonable value, like 3. And remove the variable.
#define FAILED_ATTEMPTS_BEFORE_DELAY 3
mysql-test/main/mysqld--help.result
Outdated
@@ -592,6 +594,8 @@ The following specify which files/extra groups are read (specified before remain | |||
If there is more than this number of interrupted | |||
connections from a host this host will be blocked from | |||
further connections | |||
--max-connection-delay=# | |||
The maximum number of connection response delay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better: "The maximum connection response delay in seconds"
mysql-test/main/mysqld--help.result
Outdated
@@ -1655,6 +1662,7 @@ max-binlog-cache-size 18446744073709547520 | |||
max-binlog-size 1073741824 | |||
max-binlog-stmt-cache-size 18446744073709547520 | |||
max-connect-errors 100 | |||
max-connection-delay 18446744073709551615 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rather long, may be use some realistic value by default? Like, a minute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a delay larger than a connection timeout practically means that the account is blocked — connecting is no longer possible. So, this creates an interesting dependency with --max-connect-errors
value.
First the user can do --failed-connections-threshold
connections without a delay. Then user gets increasingly longer delays and at --max-password-errors
he can no longer connect. No longer connect means the delay is equal to the timeout.
Considering this, you need neither --max-connection-delay
nor --min-connection-delay
nor hard-coded step of 1000ms. The delay is calculated as
timeout * (failure_count - failed_connections_threshold)/(max_password_errors - failed_connections_threshold)
and in this logic you cannot use hard-coded FAILED_ATTEMPTS_BEFORE_DELAY=3
, as you need at least some way to enable/disable this feature, so there must be at least one variable to control it.
Don't forget to make sure you don't lock out SUPER
users. They will always have to have delay < timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for reviewing code . Yes you are right , when failure_count = max_password_errors , the max delay is timeout*1=timeout. But here timeout value maybe set to
min(net_wait_timeout,net_interactive_timeout) - 10
not connect timeout from client , minus 10 to keep the max delay is less than timeout , which server send access denied response then server close this connection .
VARIABLE_TYPE INT UNSIGNED | ||
VARIABLE_COMMENT The maximum number of connection response delay | ||
NUMERIC_MIN_VALUE 0 | ||
NUMERIC_MAX_VALUE 4294967295 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't match what mysqld--help has. How comes?
sql/mysqld.cc
Outdated
@@ -3962,7 +3965,7 @@ static int init_common_variables() | |||
exit(1); | |||
} | |||
#endif | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, try to avoid invisible whitespaces
sql/sql_acl.cc
Outdated
check_connection_delay_for_user(thd, record); | ||
delete_user_login_failed_record(thd, record); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user_resource
aren't authentication in my opinion, so this code should be before user_resource
checks. May be next to handle_password_errors
that has a clean comment Login succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it'd be better to have this code inside handle_password_errors()
function. You will have it all in one place, not split over two if()
as now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here maybe is a limitation , it is depending on return result for auth plugin that calling handle_password_errors. For example, auth_test_plugin just return CR_ERROR , and will not call handle_password_errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it'd be more better to put two if()
into one function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this feature is a protection against brute force. So only delaying on CR_AUTH_USER_CREDENTIALS looks appropriate. It's a plugin bug to return incorrect status, you cannot do much if the plugin has a bug.
sql/sql_connect.cc
Outdated
PSI_mutex_key key_connection_delay_mutex= PSI_NOT_INSTRUMENTED; | ||
PSI_cond_key key_connection_delay_wait= PSI_NOT_INSTRUMENTED; | ||
PSI_stage_info stage_waiting_in_user_login_failed_delay= { | ||
0, "Waiting in connection_control stage", 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no connection_control plugin, so better rename everything that carries this name to match the new design.
sql/sql_connect.cc
Outdated
|
||
/* | ||
Get structure for logging connection data for the current user | ||
*/ | ||
|
||
#ifndef NO_EMBEDDED_ACCESS_CHECKS | ||
static HASH hash_user_connections; | ||
HASH user_login_failed_hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps you could use a C++ wrapper of HASH to get type checks and other C++ goodies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, do you need a hash at all? failed password attempts are counted in ACL_USER
already, you can simply use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, use password errors in ACL_USER is very simple, and don't to introduce additional HASH and less lock . But here maybe one problem , ACL_USER is count failed password attempts by acl user not connect user/host . In this case , acl user is app@% , if someone login as app user with many wrong times , it will delay application server in production connect. I don't know this is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is. I'd say it's important to behave consistently in all cases of "wrong password" reactions. If the whole ACL_USER is locked, then the timeout should also apply to the ACL_USER.
Supposedly, it was done to prevent distributed password brute-forcing from many different IPs.
sql/sql_connect.cc
Outdated
my_printf_error(ME_NOTE, "%s have login failed %u times", | ||
MYF(ME_NOTE | ME_ERROR_LOG), record->key, | ||
record->failed_count); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you need to wrap my_printf_error
in #ifndef DBUG_OFF
it means you should simply use DBUG_PRINT()
sql/sql_connect.cc
Outdated
int delete_user_login_failed_record(THD *thd, | ||
st_user_login_failed_record *record) | ||
{ | ||
my_hash_delete(&user_login_failed_hash, (uchar *) record); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure that FLUSH PRIVILEGES
empties the hash
9079add
to
6c7911f
Compare
plugin/auth_examples/test_plugin.c
Outdated
/** | ||
dialog test plugin mimicking the ordinary auth mechanism. Used to test the clear text plugin API. | ||
The only difference between auth_cleartext_plugin and auth_cleartext_plugin2 is that the second plugin return CR_AUTH_USER_CREDENTIALS when password isn't matched. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to do that. Simply change auth_cleartext_plugin()
.
sql/sql_connect.cc
Outdated
/* Time unit for delay step is seconds , convert to milliseconds*/ | ||
uint64 timeout= (p->net_wait_timeout < p->net_interactive_timeout | ||
? p->net_wait_timeout | ||
: p->net_interactive_timeout) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use
uint64 timeout= thd->net->read_timeout;
but please test that it's the correct timeout to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that read timeout value is 30 seconds , which means max delay is 30 seconds . Think from another point , client send login packet by network socket and server create a thread serving for connection. So in this thread context only when preceding instructions are executed poll function will be called
to check if timeout. Simplely first delay response and then check read timeout. As a result , delay time and timeout is independent. Maybe I dont't get full picture of vio framework. Would you give me some explanation? @vuvova
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes, you're right. The connection will not be automatically aborted because of the timeout. And indeed delay time and timeout are thus independent.
Still, connect_timeout
(which is also vio->read_timeout
and vio->write_timeout
until the user is fully authenticated) is supposed to limit the duration of the login phase, so it kind of makes sense to artificially limit delay
by vio->read_timeout
. As a bonus it reduces the number of degrees of freedom, independent variables to configure, and makes two separate features parts of one larger feature
sql/sql_connect.cc
Outdated
DBUG_PRINT("info",("The step for connection delay is %d milliseconds",step)); | ||
/* min delay time is 1 seconds and max delay time is timeout */ | ||
uint64 delay= | ||
(step > 1000 ? step : 1000) * (count - p->failed_attempts_before_delay); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you cap the value with 1 second?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have calculated this case : if max_password_errors is UNIT_MAX and timeout is 24h , delay time is nearly =24*3600*1000/40_0000_0000=24*36/40000=216/10000=0.0216
ms . The result is too small , which influence nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I personally wouldn't do it, but up to you. Just, please, explain this logic in a comment.
bae59a5
to
fda926c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! That was quite good, I didn't have any comments about the actual implementation logic.
just some minor style comments, like, fix the spacing, rephrase the comment, redundant definition. in other words, making it ready to be pushed :)
sql/sql_connect.cc
Outdated
/* | ||
* Based on that invalid password errors will cause user can't connect server | ||
ever and server close connection if it got message from client after | ||
wait_timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you've found out, this is not correct, the delay can be longer than wait_timeout and the server will not disconnect. We artificially limit the delay by wait_timeout. So, better reflect that in the comment, for example
/*
connect_timeout is designed to limit the connect time, every single network read and
write is limited by it. Thus we'll say here that the maximal delay is limited by the
connect_timeout too, and if it takes longer - the connection times out and the user
cannot connect at all. This means that the delay must reach the connect_timeout
when the count reaches max_password_errors
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very sorry for late response. Recently , I have some personal business.
Hmm, what do you think about this?
/*
Connect_timeout is designed to limit the connect time, every single network read and
write is limited by it. Thus we'll say here that the maximal delay is limited by the
connect_timeout too. As a bonus it reduces the number of degrees of freedom, independent variables to
configure
*/
Here , my main point is the delay time can be longer than connect timeout , it could be better from reducing config variables perspective.
sql/sql_connect.cc
Outdated
read_timeout / (max_password_errors - password_errors_before_delay); | ||
/* If max_password_errors is large, step is very small which | ||
* influence nothing, so 1 second is minim delay step */ | ||
step= step > 1000 ? step : 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking more about it, I believe this is not a good idea. Yes, the delay will be growing slowly if max_password_errors is big. But with 1s step, the delay will very quickly reach connect_timeout (for example, if connect_timeout is 10, then the delay will be 10s after 10 wrong passwords) and then it will not grow at all. Constant linear growth seems to be more adequate here
--source ../inc/check_connection_delay.inc | ||
|
||
let $SERVER_RESPONSE_TIME= $DELAY_MIN_RESPONSE_TIME; | ||
--source ../inc/check_response_delay_time.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not very convincing. If you look for the wait to be not found, better make it
let $SERVER_RESPONSE_TIME= .*;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have redesigned test cases. check_delay.inc is used to check expect delay, check_no_delay.inc is used to check expect not delay. In check_no_delay.inc file , it use
let $SERVER_RESPONSE_TIME= \d+;
I was reviewing your test cases, and when looking at It was implemented in MDEV-3909 remote user enumeration. See also fa7d0c4 and this comment
But this delay feature does. There is no delay for non-existent users, so by trying the same username few times and observing the delay one can determine whether such a user exists or not. At the moment I don't yet know how to solve this |
We could add a hash map for count non-existent users, so we could call handle_password_errors for all users. This design expand max_password_errors and delay response feature for all users . |
I'm afraid, yes, it has to be a hash. I would've been great and very simply to use But if there will be a hash, there must be some mechanism to limit its growth to prevent it from eating up all the memory. One way of doing it would be to keep only the latest N (configurable) usernames. If the N+1 username connects it simply gets the max possible delay without being added to the hash. Another approach would be to remove the oldest username from the hash, but it's not worth the extra complexity. |
I find that we can simply make non-existent users get max possible delay . For example ,
|
No, of course, we cannot just use max possible delay for non-existent users. Because then it will be a property different between existing users and non-existent. Like, if you connect, and get a "wrong password" error after a delay which is less than max possible delay then the user must definitely exist. And this is exactly what we need to prevent, a way to tell whether a user exists. |
|
@@ -14393,6 +14398,41 @@ static bool check_password_lifetime(THD *thd, const ACL_USER &acl_user) | |||
return false; | |||
} | |||
|
|||
static void handle_nonexistent_user_login_failed(THD *thd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vuvova What do you think about this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vuvova following up on this, it looks like it needs a reply from you.
Description
It should be possible to configure:
1.1 Number of consecutive unsuccessful attempts before which delay is introduced
2.2 Maximum amount of delay introduced
Feature Requirements
The same as mysql worklog FR. Look at https://dev.mysql.com/worklog/task/?id=8885
High Level Design
Low Level Design
net_read_timeout/(max_password_errors - password_errors_before_delay)*(failed_count -password_errors_before_delay)
.Note and Limitation
Connection_control_delay_generated and
information_schema.CONNECTION_CONTROL_FAILED_LOGIN_ATTEMPTS
is not included in this PR , which upstream mysql implemented.net_read_timeout/(max_password_errors - password_errors_before_delay)*(failed_count -password_errors_before_delay)
is delay time calculated formula. As a result, max delay time is equal to net_read_timeout.How can this PR be tested?
./mtr --suite=connection_delay
Basing the PR against the correct MariaDB version
Backward compatibility