Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Port cannot be reused even when the process that opened it has been terminated. #337

Open
Routhinator opened this issue Oct 12, 2022 · 5 comments

Comments

@Routhinator
Copy link

Routhinator commented Oct 12, 2022

I've been fighting a problem with my graphs from this module because of strange decimal values for number of inserts/deletes/updates of models. After digging into it more, it was observed that the counters for these metrics will randomly 0, and then come back to their original values. There are no restarts of the application. Simply refreshing the /metrics/ endpoint, if 2 users were created since the last restart of the app, the count will be 2 - then suddenly 0 for up to a minute, then back to 2 again.

This results in highly unreliable metrics. I'm uncertain where the problem could lie.

In my models, I am using the ExportModelOperationsMixin in all my models as per the docs:

Example:

class Member(ExportModelOperationsMixin('member'), AbstractBaseUser, PermissionsMixin,
             RulesModelMixin, TimestampedModel, metaclass=RulesModelBase):
    """
    Main Member model
    """

And I am using the django_prometheus.db.backends.postgresql engine.

Versions:

django_prometheus: 2.2
Django: 4.0.8
Python: 3.9

=========

Update

The remaining problem with this implementation is outlined in #337 (comment) - Once a port has been opened, it cannot be reused until the host is rebooted, even after the container running it has been killed and reaped.

@Routhinator
Copy link
Author

So, this seems to be related to #325 - and is resolved by setting the PROMETHEUS_MULTIPROC_DIR environment variable.

A couple of points here:

  • The current doc page that mentions this var is not linked from the readme, it really should be highlighted more. I found this document by stumbling across Incorrect number of CRUD operations in multiworker context #325 and then searching this repo.
  • An environment variable seems a strange way to make this exclusively configured. It should really be looking for this in settings.py, and that way if the app developer wants this to be configured from env vars or a file, it's up to them.

@Routhinator
Copy link
Author

I spoke too soon. PROMETHEUS_MULTIPROC_DIR does not seem to solve the issue.

@Routhinator
Copy link
Author

Ruling out other things, I am using Gunicorn, not uWSGI, so the lazy-apps behaviour setting for uWSGI is default behaviour for Gunicorn.

@Routhinator
Copy link
Author

Routhinator commented Oct 15, 2022

Ok, I managed to sort this out by switching to Gunicorn/Gevent instead of Gunicorn/Gthread and dropping to one worker per container, as well as defining the PROMETHEUS_MULTIPROC_DIR - Metrics are at least stable now.

I also need to leverage the PROMETHEUS_METRICS_EXPORT_PORT_RANGE = range(8001, 8002) setting, as I need the metrics export on a dedicated thread in order to be reliable. They work without it however if the threads are all tied up it stops answering, as alluded to in the docs.

I am having one problem with this though. Using docker, once a port has been opened it does not seem to be able to be reused until the host is rebooted. After a container has used it; stopping, restarting, deleting that container does not free the port. The port is not in use, but rather this seems to be related to the same file descriptor being reused over and over as the python HTTP client that prometheus_client uses is not cleaning up the FD or marking it reusable as mentioned in comments on prometheus/client_python#155

Unfortunately this is making it nigh impossible to nail down this Django module and ensure production readiness.

@Routhinator Routhinator changed the title BUG: Exporter randomly zeroes counters for django_db_* metrics for models BUG: Port cannot be reused even when the process that opened it has been terminated. Oct 15, 2022
@Routhinator
Copy link
Author

This behaviour seems to be related to https://peps.python.org/pep-0446/#non-inheritable-file-descriptors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant