Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icinga2 randomly does not reload objects after adding new objects via api #6957

Closed
raffis opened this issue Feb 19, 2019 · 9 comments
Closed
Labels
area/api REST API area/db-ido Database output

Comments

@raffis
Copy link

raffis commented Feb 19, 2019

Creating new objects does not reflect on the web interface (And i'm pretty sure they do not get checked as well).

The objects are visible after manually restarting icinga.

The problem look quite random (See steps to reproduce):

  • Sometimes I see all 10 objects
  • Sometimes I see none
  • Sometimes I see some of the 10 objects

Again only restarting icinga solves this problem.

Expected Behaviour

New objects (10 test servicegroups (Can be any object types)) are visible in icingaweb.

Current Behaviour

Servicegroups are not visible in the servicegroup list in the icinga web ui.
As far as I can see the web ui fetches its information not from the api but from the mysql db directly. GET https://localhost:5665/v1/objects/servicegroups lists all those objects also does icinga2 object list.

As soon as I restart icinga the objects are visible in the web ui.

Possible Solution

Not sure how this can happen but it looks like a major problem.

Steps to Reproduce (for bugs)

Create file /tmp/test:

curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test10' -d '{ "attrs": { "display_name":"test10", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test11' -d '{ "attrs": { "display_name":"test11", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test12' -d '{ "attrs": { "display_name":"test12", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test13' -d '{ "attrs": { "display_name":"test13", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test14' -d '{ "attrs": { "display_name":"test14", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test15' -d '{ "attrs": { "display_name":"test15", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test16' -d '{ "attrs": { "display_name":"test16", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test17' -d '{ "attrs": { "display_name":"test17", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test18' -d '{ "attrs": { "display_name":"test18", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test19' -d '{ "attrs": { "display_name":"test19", "groups": [] }}'
curl -k -s -u root:root -H 'Accept: application/json' -H 'Content-Type: application/json' -X PUT 'https://localhost:5665/v1/objects/servicegroups/test20' -d '{ "attrs": { "display_name":"test20", "groups": [] }}'

cat /tmp/test | while read l; do sh -c "$l"; done

Objects are not visible in the web ui.
(You may need to do this a couple of time since this is not always the case but mostly)

Chances are higher to get some objects if waiting a short time between requests:
cat /tmp/test | while read l; do sh -c "$l"; sleep 2; done

Context

I have this issue in kube-icinga https://github.com/gyselroth/kube-icinga.
This async app does create many api calls within a short time and even async.

Your Environment

  • Version used (icinga2 --version): r2.10.2-1
  • Operating System and version: docker image jordan/icinga2 but same problem on classic installation
  • Enabled features (icinga2 feature list):
Disabled features: elasticsearch gelf influxdb opentsdb perfdata statusdata syslog
Enabled features: api checker command compatlog debuglog graphite ido-mysql livestatus mainlog notification
  • Icinga Web 2 version and modules (System - About): 2.6.2
  • Config validation (icinga2 daemon -C):
[2019-02-19 11:15:34 +0000] information/cli: Icinga application loader (version: r2.10.2-1)
[2019-02-19 11:15:34 +0000] information/cli: Loading configuration file(s).
[2019-02-19 11:15:34 +0000] information/ConfigItem: Committing config item(s).
[2019-02-19 11:15:34 +0000] information/ApiListener: My API identity: icinga2
[2019-02-19 11:15:34 +0000] warning/ApplyRule: Apply rule 'ssh' (in /etc/icinga2/conf.d/services.conf: 47:1-47:19) for type 'Service' does not match anywhere!
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 11 Services.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 LivestatusListener.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 2 Hosts.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 2 FileLoggers.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 12 Notifications.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 Downtime.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 GraphiteWriter.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 3 Zones.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 Endpoint.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 2 ApiUsers.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 CompatLogger.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 User.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 215 CheckCommands.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 14 ServiceGroups.
[2019-02-19 11:15:34 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2019-02-19 11:15:34 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-02-19 11:15:34 +0000] information/cli: Finished validating the configuration file(s).
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
@dnsmichi
Copy link
Contributor

Please see #6012.

@dnsmichi dnsmichi added area/db-ido Database output area/api REST API labels Feb 19, 2019
@raffis
Copy link
Author

raffis commented Feb 19, 2019

Please see #6012.

Nice. fast response! Looks like #5205/#6927 (or the mentioned parent task).

So basically I need to trigger lots of restarts since kube-icinga may add lots of objects (also removing them first since there is no way to trigger apply rules for changed objects...)

Only workaround is to trigger a restart?

@dnsmichi
Copy link
Contributor

Up until the underlaying problem inside the IDO feature is fixed, a restart is the only workaround, yes.

@raffis
Copy link
Author

raffis commented Feb 19, 2019

Up until the underlaying problem inside the IDO feature is fixed, a restart is the only workaround, yes.

Whats the difference between a service reload via init and a POST /v1/actions/restart-process ?

After sending a POST /v1/actions/restart-process my object list in icingaweb is empty and adding the same object again ends in error 500:

"Cannot create object 'test10'. Configuration file '/var/lib/icinga2/api/packages/_api//conf.d/servicegroups/test10.conf' already exists."

(Which is a different error compared to just do restart via systemd)

Sending a POST /v1/actions/restart-process would be the only workaround for my app. Otherwise this is gonna be impossible with the actual version of the icinga api.

[2019-02-19 15:59:11 +0000] information/HttpServerConnection: Request: POST /v1/actions/restart-process (from [172.19.0.1]:50396), user: icinga2-director)
[2019-02-19 15:59:11 +0000] information/HttpServerConnection: HTTP client disconnected (from [172.19.0.1]:50396)
[2019-02-19 15:59:12 +0000] information/Application: Got reload command: Starting new instance.
[2019-02-19 15:59:12 +0000] information/Application: Reload requested, letting new process take over.
[2019-02-19 15:59:12 +0000] information/ApiListener: 'api' stopped.
[2019-02-19 15:59:12 +0000] information/CheckerComponent: 'checker' stopped.
[2019-02-19 15:59:12 +0000] information/CompatLogger: 'compatlog' stopped.
[2019-02-19 15:59:12 +0000] information/ExternalCommandListener: 'command' stopped.
[2019-02-19 15:59:13 +0000] information/FileLogger: 'main-log' started.
[2019-02-19 15:59:13 +0000] information/ApiListener: 'api' started.
[2019-02-19 15:59:13 +0000] information/ApiListener: Copying 2 zone configuration files for zone 'director-global' to '/var/lib/icinga2/api/zones/director-global'.
[2019-02-19 15:59:13 +0000] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/director-global' (0 Bytes). Received timestamp '2019-02-19 15:59:13 +0000' (1550591953.329891), Current timestamp '2019-02-19 15:52:47 +0000' (1550591567.355288).
[2019-02-19 15:59:13 +0000] information/ApiListener: Copying 1 zone configuration files for zone 'master' to '/var/lib/icinga2/api/zones/master'.
[2019-02-19 15:59:13 +0000] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones/master' (0 Bytes). Received timestamp '2019-02-19 15:59:13 +0000' (1550591953.330238), Current timestamp '2019-02-19 15:52:47 +0000' (1550591567.355010).
[2019-02-19 15:59:13 +0000] information/ApiListener: Started new listener on '[0.0.0.0]:5665'
[2019-02-19 15:59:13 +0000] information/ExternalCommandListener: 'command' started.
[2019-02-19 15:59:13 +0000] information/GraphiteWriter: 'graphite' started.
[2019-02-19 15:59:13 +0000] information/LivestatusListener: 'livestatus' started.
[2019-02-19 15:59:13 +0000] information/LivestatusListener: Created UNIX socket in '/run/icinga2/cmd/livestatus'.
[2019-02-19 15:59:13 +0000] information/CheckerComponent: 'checker' started.
[2019-02-19 15:59:13 +0000] information/NotificationComponent: 'notification' started.
[2019-02-19 15:59:13 +0000] information/DbConnection: 'ido-mysql' started.
[2019-02-19 15:59:13 +0000] information/CompatLogger: 'compatlog' started.

@dnsmichi
Copy link
Contributor

"Cannot create object 'test10'. Configuration file '/var/lib/icinga2/api/packages/_api//conf.d/servicegroups/test10.conf' already exists."

It misses the stage name after _api/, so highly likely the API package got broken somehow in the process of restarting.

@raffis
Copy link
Author

raffis commented Feb 20, 2019

"Cannot create object 'test10'. Configuration file '/var/lib/icinga2/api/packages/_api//conf.d/servicegroups/test10.conf' already exists."

It misses the stage name after _api/, so highly likely the API package got broken somehow in the process of restarting.

Argh my fault, I have removed content in /var/lib/icinga2/api/packages/_api manually during debuging and just noticed that files like active-stage.conf, active.conf were missing after restart. But if I create new objects via the api those get created in conf.d folder directly in _api, /var/lib/icinga2/api/packages/_api/conf.d/xxxx. And after restart the service the added services are gone again (But files still there).

Maybe a check for that would be helpful (or a log entry somewhere that the stage folder is gone or not active.) Probably the api should respond with a 500 error and not accepting new objects in the first place.

@dnsmichi
Copy link
Contributor

I've created #6959 as follow-up. I just don't have the time to code any further here, maybe you'd like to catch up on this.

@raffis
Copy link
Author

raffis commented Feb 20, 2019

I've created #6959 as follow-up. I just don't have the time to code any further here, maybe you'd like to catch up on this.

👍, yes as soon as I have some spare time.

@dnsmichi
Copy link
Contributor

Will be superseded with IcingaDB, the old tracking for the IDO is #6012.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API area/db-ido Database output
Projects
None yet
Development

No branches or pull requests

2 participants