Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question - help needed] My Master - Master setup stop working due to an error I can not solve #561

Open
BobWs opened this issue Feb 21, 2024 · 2 comments
Labels
question Usability question, not directly related to an error with the image

Comments

@BobWs
Copy link

BobWs commented Feb 21, 2024

Hi,

My Mariadb master-master replication setup stop working due to an error and I don't know how to solve this problem.
If anyone could help this noob;-) to solve this...

Here is the error log of the replications.
db-source log:

Slave_IO_State 	Waiting for master to send event
Master_Host 	192.168.178.100
Master_User 	replicator
Master_Port 	3307
Connect_Retry 	10
Master_Log_File 	source10-bin.001177
Read_Master_Log_Pos 	15520
Relay_Log_File 	replica20-relay-bin.000002
Relay_Log_Pos 	690
Relay_Master_Log_File 	source10-bin.001173
Slave_IO_Running 	Yes
Slave_SQL_Running 	No
Replicate_Do_DB 	
Replicate_Ignore_DB 	
Replicate_Do_Table 	
Replicate_Ignore_Table 	
Replicate_Wild_Do_Table 	
Replicate_Wild_Ignore_Table 	
Last_Errno 	1396
Last_Error 	Error 'Operation DROP USER failed for 'authelia'@'%'' on query. Default database: 'mysql'. Query: 'DROP USER 'authelia'@'%''
Skip_Counter 	0
Exec_Master_Log_Pos 	4289
Relay_Log_Space 	226680
Until_Condition 	None
Until_Log_File 	
Until_Log_Pos 	0
Master_SSL_Allowed 	No
Master_SSL_CA_File 	
Master_SSL_CA_Path 	
Master_SSL_Cert 	
Master_SSL_Cipher 	
Master_SSL_Key 	
Seconds_Behind_Master 	

db-replica log:

Slave_IO_State 	Waiting for master to send event
Master_Host 	192.168.178.110
Master_User 	replicator
Master_Port 	3307
Connect_Retry 	10
Master_Log_File 	replica20-bin.000575
Read_Master_Log_Pos 	190559
Relay_Log_File 	source10-relay-bin.000002
Relay_Log_Pos 	692
Relay_Master_Log_File 	replica20-bin.000547
Slave_IO_Running 	Yes
Slave_SQL_Running 	No
Replicate_Do_DB 	
Replicate_Ignore_DB 	
Replicate_Do_Table 	
Replicate_Ignore_Table 	
Replicate_Wild_Do_Table 	
Replicate_Wild_Ignore_Table 	
Last_Errno 	1452
Last_Error 	Error 'Cannot add or update a child row: a foreign key constraint fails (`hass`.`events`, CONSTRAINT `events_ibfk_1` FOREIGN KEY (`data_id`) REFERENCES `event_data` (`data_id`))' on query. Default database: 'hass'. Query: 'INSERT INTO events (event_type, event_data, origin, origin_idx, time_fired, time_fired_ts, context_id, context_user_id, context_parent_id, data_id, context_id_bin, context_user_id_bin, context_parent_id_bin, event_type_id) VALUES (NULL, NULL, NULL, 1, NULL, 1707944134.049929e0, NULL, NULL, NULL, 4684, '���gE��4�p��}���', 'P�b�k�O���5 ���', NULL, 6) RETURNING events.event_id'
Skip_Counter 	0
Exec_Master_Log_Pos 	87503
Relay_Log_Space 	68621741
Until_Condition 	None
Until_Log_File 	
Until_Log_Pos 	0
Master_SSL_Allowed 	No
Master_SSL_CA_File 	
Master_SSL_CA_Path 	
Master_SSL_Cert 	
Master_SSL_Cipher 	
Master_SSL_Key 	
Seconds_Behind_Master 	

In the db-source the error is about USER authelia wich I manually deleted via phpMyAdmin but the apparently something went wrong.

The error in db-replica is related to Home-Assistant (hass) and I don't have any clue what that's about.

TIA

@grooverdan
Copy link
Member

So db-source - given its receiving replication drop of user authenlia, then phpMyAdmin dropped the user on db-replica (its master?).

To get past this on db-source (missing the drop of a missing user isn't that important):

set global sql_slave_skip_counter=1; start slave;

db-replica:

foreign key constraints. bit hard to say without table structure of hass.events and what events_ibfk_1 is.

The usual scenario is that in a master-master scenario, only ever one node is written to at a time. Replication must be caught up and no more writes on a master before writing to the other master. Make sure this is true somehow, or revert to a master and replica.

pt-table-sync be used to get the two instances in sync again (once replication is going again and you know which copy is current).

@grooverdan grooverdan added the question Usability question, not directly related to an error with the image label Feb 22, 2024
@BobWs
Copy link
Author

BobWs commented Feb 22, 2024

@grooverdan Thank you for replying!

The first problem I manage to solve by skipping the error message and reseting the master (replica) using phpMyAdmin.

This is the log status of db-replica (Master):
(Which looks good I think no more errors in phpMyAdmin)

Slave_IO_State 	Waiting for master to send event
Master_Host 	192.168.178.100
Master_User 	replicator
Master_Port 	3307
Connect_Retry 	10
Master_Log_File 	source10-bin.001201
Read_Master_Log_Pos 	6236
Relay_Log_File 	replica20-relay-bin.000019
Relay_Log_Pos 	5641
Relay_Master_Log_File 	source10-bin.001201
Slave_IO_Running 	Yes
Slave_SQL_Running 	Yes
Replicate_Do_DB 	
Replicate_Ignore_DB 	
Replicate_Do_Table 	
Replicate_Ignore_Table 	
Replicate_Wild_Do_Table 	
Replicate_Wild_Ignore_Table 	
Last_Errno 	0
Last_Error 	
Skip_Counter 	0
Exec_Master_Log_Pos 	6236
Relay_Log_Space 	5500
Until_Condition 	None
Until_Log_File 	
Until_Log_Pos 	0
Master_SSL_Allowed 	No
Master_SSL_CA_File 	
Master_SSL_CA_Path 	
Master_SSL_Cert 	
Master_SSL_Cipher 	
Master_SSL_Key 	
Seconds_Behind_Master 	0 

But the second problem with the hass database has not been solved yet.

Here are the steps I have taken so far:

  • tried truncating the database on db-source (no success)
  • tried deleting all the tables on db-source (no success)
  • Then I created a new database on db-replica for the hass database called db_hass and copied all the data to that new database. After making sure that everything was working with Home-Assistant with the new database db_hass, I then deleted the "old hass database" on db-source where the problem occurred with phpMyAdmin, but that too did not go well. On db-replica the synchronization went fine and "hass" was removed there without problems.

So now I'm getting this error on db-source:

Slave_IO_State 	Waiting for master to send event
Master_Host 	192.168.178.110
Master_User 	replicator
Master_Port 	3307
Connect_Retry 	10
Master_Log_File 	replica20-bin.000587
Read_Master_Log_Pos 	872364
Relay_Log_File 	source10-relay-bin.000002
Relay_Log_Pos 	3301
Relay_Master_Log_File 	replica20-bin.000547
Slave_IO_Running 	Yes
Slave_SQL_Running 	No
Replicate_Do_DB 	
Replicate_Ignore_DB 	
Replicate_Do_Table 	
Replicate_Ignore_Table 	
Replicate_Wild_Do_Table 	
Replicate_Wild_Ignore_Table 	
Last_Errno 	1146
Last_Error 	Error 'Table 'hass.states' doesn't exist' on query. Default database: 'hass'. Query: 'INSERT INTO states (entity_id, state, attributes, event_id, last_changed, last_changed_ts, last_updated, last_updated_ts, old_state_id, attributes_id, context_id, context_user_id, context_parent_id, origin_idx, context_id_bin, context_user_id_bin, context_parent_id_bin, metadata_id) VALUES (NULL, '6886.437', NULL, NULL, NULL, NULL, NULL, 1707944904.574685e0, 11598857, 26, NULL, NULL, NULL, 0, '���s�~�(c+� ���K', NULL, NULL, 6), (NULL, '0.0403200000000288', NULL, NULL, NULL, NULL, NULL, 1707944904.576475e0, 11598858, 39431, NULL, NULL, NULL, 0, '���s���F`��/�(ۙ', NULL, NULL, 16) RETURNING states.state_id, states.state_id AS state_id__1'
Skip_Counter 	0
Exec_Master_Log_Pos 	171062
Relay_Log_Space 	118689185
Until_Condition 	None
Until_Log_File 	
Until_Log_Pos 	0
Master_SSL_Allowed 	No
Master_SSL_CA_File 	
Master_SSL_CA_Path 	
Master_SSL_Cert 	
Master_SSL_Cipher 	
Master_SSL_Key 	
Seconds_Behind_Master 	

I have also tried to skip the error and reset the replica but nothing is working, the error persists and I don't know how to solve this.
TIA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Usability question, not directly related to an error with the image
Development

No branches or pull requests

2 participants