Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(journaldreceiver): fail if unsufficient permissions for journalctl command #23276

Merged

Conversation

sumo-drosiek
Copy link
Member

@sumo-drosiek sumo-drosiek commented Jun 12, 2023

Description:

Wait for journalctl to finish and log error in case there is any issue

Link to tracking Issue: #20906

Testing:

2023-06-13T11:07:31.283+0200    info    service/telemetry.go:104        Setting up own telemetry...
2023-06-13T11:07:31.283+0200    info    service/telemetry.go:127        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
2023-06-13T11:07:31.284+0200    info    [email protected]/exporter.go:275        Development component. May change in the future.        {"kind": "exporter", "data_type": "logs", "name": "logging"}
2023-06-13T11:07:31.285+0200    info    service/service.go:131  Starting otelcontribcol...      {"Version": "0.79.0-dev", "NumCPU": 16}
2023-06-13T11:07:31.285+0200    info    extensions/extensions.go:30     Starting extensions...
2023-06-13T11:07:31.285+0200    info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "journald", "data_type": "logs"}
2023-06-13T11:07:31.285+0200    info    service/service.go:148  Everything is ready. Begin running and processing data.
2023-06-13T11:07:31.288+0200    error   journald/journald.go:240        journalctl command failed {"kind": "receiver", "name": "journald", "data_type": "logs", "operator_id": "journald_input", "operator_type": "journald_input", "error": "exit status 1", "output": "No journal files were opened due to insufficient permissions.\n"}
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/journald.(*Input).Start.func1
        github.com/open-telemetry/opentelemetry-collector-contrib/pkg/[email protected]/operator/input/journald/journald.go:240


Documentation: N/A

@sumo-drosiek
Copy link
Member Author

sumo-drosiek commented Jun 13, 2023

@djaglowski Could you take a look at the code?

I have two issues:

  • not sure how to write unit tests to this feature

  • should we force to exit collector if journalctl command fails?

    I would say yes, but cannot find interface to do it from within goroutine, and cannot really do it outside of gorouine as Wait is blocking operation

    There are three cases where journalctl may fail:

    • fail on start
    • killed by external process
    • killed during operator stop

@djaglowski
Copy link
Member

@sumo-drosiek, for unit testing, maybe you can mock the cmd interface?

I don't think we need to force the collector to exit. If a failure happens in Start, returning an error would be enough to stop the collector. Otherwise, I think just trying to fail gracefully and return an error should be fine.

@sumo-drosiek
Copy link
Member Author

sumo-drosiek commented Jun 14, 2023

@sumo-drosiek, for unit testing, maybe you can mock the cmd interface?

The feature is logging that the command failed and why. So unit test should check logs (at least for current code)

I don't think we need to force the collector to exit. If a failure happens in Start, returning an error would be enough to stop the collector. Otherwise, I think just trying to fail gracefully and return an error should be fine.

In the case I'm working on, journalctl starts correctly, but it fails few moments later and therefore it is not catched by Start(). So, I run Wait() in goroutine as it is blocking operation and in case the command fails I print out the reason and stops the operator, but Otelcol is still working, so customer will not know that something is wrong until they check logs

@djaglowski
Copy link
Member

In the case I'm working on, journalctl starts correctly, but it fails few moments later and therefore it is not catched by Start(). So, I run Wait() in goroutine as it is blocking operation and in case the command fails I print out the reason and stops the operator, but Otelcol is still working, so customer will not know that something is wrong until they check logs

If the problem isn't detectable during Start (in a reasonable amount of time), then I think these guidelines about error handling are relevant:

What you're describing sounds about right to me. In the future, I think we can improve upon the way this error state is exposed, either by shipping logs via the otel-go library or reporting per-component status.

@sumo-drosiek
Copy link
Member Author

sumo-drosiek commented Jun 19, 2023

(in a reasonable amount of time)

how do you define it?

I believe it is, it took 3 ms in the example

@djaglowski
Copy link
Member

@sumo-drosiek, I'm not sure if we have a precise definition, but the idea is that Start should fail if there is an obvious problem that will not typically resolve without further input from the user, but that the collector should not take a long time to start up just because one component wants to take a lot of time to assess something.

In my opinion, if this is an error that will occur every time journalctl is run, we could run a test during Start and return an error if necessary. 3ms or anything on that order seems reasonable to me.

@sumo-drosiek sumo-drosiek marked this pull request as ready for review June 26, 2023 13:59
@sumo-drosiek sumo-drosiek requested a review from a team as a code owner June 26, 2023 13:59
@sumo-drosiek sumo-drosiek changed the title feat: log error if journald fails feat(journaldreceiver): fail if not sufficient permissions for journalctl command Jun 27, 2023
@sumo-drosiek sumo-drosiek changed the title feat(journaldreceiver): fail if not sufficient permissions for journalctl command feat(journaldreceiver): fail if unsufficient permissions for journalctl command Jun 27, 2023
@sumo-drosiek
Copy link
Member Author

@kovrus Can we proceed with this PR?

@djaglowski djaglowski merged commit 37a8518 into open-telemetry:main Jul 5, 2023
87 checks passed
@github-actions github-actions bot added this to the next release milestone Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants