-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add manage_service settings to get puppet out of the way. #349
Add manage_service settings to get puppet out of the way. #349
Conversation
This still has a problem or two to be worked out, specifically errors against rabbitmq |
64a9e48
to
8b0892b
Compare
1242336
to
622d350
Compare
efc5352
to
db61df8
Compare
bind_host => $bind_host, | ||
} | ||
contain cinder::api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contain, along with contain cinder::scheduler, caused a dependency cycle error like
Error: Could not apply complete catalog: Found 1 dependency cycle:
(Cinder_config[DEFAULT/glance_api_version] => Service[cinder-api] => Class[Cinder::Api] => Class[Quickstack::Cinder] => Class[Quickstack::Cinder_volume] => Cinder_config[DEFAULT/glance_api_version])
when the paramter $backend_rbd in quickstack::pacemaker::cinder was set to true
Removing the 2 contain's caused the dep cycle error to go away. However, I had to add 2 extra dependencies to make sure that the Service's were started before we executed the one-time stop and disable. See
https://github.com/cwolferh/astapor/compare/jguiditta:add_manage_service_ha...cwolferh:service_unmanage_tinkering?expand=1
db61df8
to
c0e6da6
Compare
Results of successful run:
Note that services that should be A/P like neutron (anything but server) and heat engine, are shown as active on only one node, and inactive on the others. |
Looks good, services have been disabled: http:https://ur1.ca/i4jhy |
@@ -71,6 +78,9 @@ | |||
try_sleep => 10, | |||
command => "/tmp/ha-all-in-one-util.bash all_members_include rabbitmq", | |||
} -> | |||
quickstack::pacemaker::manual_service { "rabbitmq-server": | |||
stop => $_enabled, | |||
} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pacemaker did not bring up rabbitmq across all nodes after a couple of fresh attempts. E.g.:
# pcs status
Clone Set: rabbitmq-server-clone [rabbitmq-server]
Started: [ c1a1.example.com c1a2.example.com ]
Stopped: [ c1a3.example.com ]
From the puppet output on the control node:
Debug: Execone-time-rabbitmq-server-disable: Executing '/sbin/chkconfig rabbitmq-server off'
Debug: Executing '/sbin/chkconfig rabbitmq-server off'
Notice: /Stage[main]/Quickstack::Pacemaker::Rabbitmq/Quickstack::Pacemaker::Manual_service[rabbitmq-server]
/Exec[one-time-rabbitmq-server-disable]/returns: executed successfully
Debug: /Stage[main]/Quickstack::Pacemaker::Rabbitmq/Quickstack::Pacemaker::Manual_service[rabbitmq-server]
/Exec[one-time-rabbitmq-server-disable]: The container Quickstack::Pacemaker::Manual_service[rabbitmq-server] will propagate my refresh event
Debug: Quickstack::Pacemaker::Manual_service[rabbitmq-server]: The container
Class[Quickstack::Pacemaker::Rabbitmq] will propagate my refresh event
Debug: /usr/sbin/pcs resource show rabbitmq-server > /dev/null 2>&1
Debug: /usr/sbin/pcs resource create rabbitmq-server systemd:rabbitmq-server op monitor interval=30s start-delay=35s interval=30s --clone
Error: Unable to create resource/fence device
Call cib_create failed (-206): Application of an update diff failed
If I comment out the above 3 lines quickstack::pacemaker::manual_service { "rabbitmq-server":..., rabbitmq comes up fine for me.
Clone Set: rabbitmq-server-clone [rabbitmq-server]
Started: [ c1a1.example.com c1a2.example.com c1a3.example.com ]
The cause isn't clear to me, but I've got the logs handy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had success throwing in a "sleep 60" after manual_service { "rabbitmq-server": . Not that I'm suggesting that as the final solution, but perhaps we need to let systemd to finish reloading (if applicable) across nodes before a pacemaker resource is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that is odd, since all we are doing is a chkconfig off, not even stopping the service. I wonder if it would be better to perhaps move the disable to happen at the very end of the first run?
c0e6da6
to
8277e20
Compare
b81527d
to
748a61b
Compare
https://bugzilla.redhat.com/show_bug.cgi?id=1123303 This does not include rabbit, due to unique issues with that service. This patch should make starting services more consistent for the included services as pacemaker will have full control of starting and stopping the services without the chance of puppet having already done so, which potentially could cause confusion for pacemaker.
748a61b
to
42c1fc6
Compare
Looks good. Services disabled after 2nd puppet run. |
Add manage_service settings to get puppet out of the way.
This should make starting services more consistent, as pacemaker will have full control of starting and stopping the services without the chance of puppet having already done so, which potentially
could cause confusion for pacemaker.