-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker engine swarm api service discovery #1766
Comments
This will need to be bypassed for Prometheus service discovery. We may want to wait for a release or two for this to stabilise before adding it, and ensuring there's sufficient interest to justify the maintenance effort of another SD. |
This feature would be amazing. It would allow us to simplify some dependencies now we need to manage to maintain a dynamic Prometheus environment |
You could use dns_sd_configs.
Its a workaround but functional. |
Any progress on this? I'm really looking forward to use this feature. |
@michaelharrer Unfortunately, there is no way to determine the node_exporter is running on which node. Only node_exporter itself knows it, but there's no option to provide it into its metrics (prometheus/node_exporter#319) |
I just hacked in a proof-of-concept that syncs tasks from Swarm manager to Prometheus: https://github.com/function61/prometheus-docker-swarm Current limitation is that Prometheus has to be running on a Swarm manager node. |
@genki, @joonas-fi: I've updated the description of the image I created for getting the metrics: https://github.com/bvis/docker-prometheus-swarm. It's not perfect but it is very useful and the best I've seen until now.
@joonas-fi I'll try your solution when I get some time, probably it's a better alternative. And you don't need to have it running in a swarm manager node if you expose the metrics to the cluster thanks a a proxy. A similar approach to:
Or:
That gets the docker swarm events and expose them in the On other hand I've tried it but I didn't get it work as it tries to obtain the data from the ingress network instead of the specific network where both services are attached, I think you should allow to define it as well, do you want me to open an issue in your project? |
@bvis I have implemented your second suggestion genki@2f49d37 This inject meta labels "__domain", "__service", "__task" and "__host" after the query execution time using docker API. |
@genki Do you have a prometheus image ready for use? It works! At least it's a first approach to a system that provides the host! Nice work! What I've seen is that these values do not appear in the "Console" column, that's why I didn't saw them. In case you fix it it would be nice to have a public image with your changes. Could this be acceptable as PR in this project? |
@bvis Thank you for reporting :) |
@bvis: oh man, thanks for tip regarding creating a service that exposes the manager Docker socket (via constraint) over TCP, I didn't think of that as a way to loosen the requirement of running it on a Swarm node. :) I will make the Docker URL given to Docker client configurable, as you pointed out! I'm not sure what you mean by "as it tries to obtain the data from the ingress network". To my understanding the ingress network is only for published ports and the routing mesh? So if you publish the socat port, it will be public and therefore will be visible both from the ingress network AND the container's IP itself. Publishing seems unnecessary as the port shouldn't be public anyway (security issue) and you can reach the socat service just by its name without the port being public (provided that socat service and monitoring are on the same network), if I understand correctly. :) I haven't given much thought/researched into services running on different networks (business services and monitoring on different networks). Currently my assumption is that everything's running on the same network. I'll document that caveat. It might be easy to implement, I just don't know it yet. Just to be super clear to everyone, mine and @bvis 's projects achieve different things:
|
@genki The problem I see with your solution is that it does not allow to filter any query based on these values, then I cannot use it in my dashboard to get values from one or some hosts. @joonas-fi You are right when you say that it's unnecessary to publish the ports of the exporters in the routing mesh, I have used it just for debugging purposes. At the moment I removed the "--publish" option in cadvisor and node-exporter your system started to scrape the values correctly. But for use it under different environments and conditions I suggest to you to implement the networking selection feature. Another suggestion is that it would be better to split your "docker-prometheus-bridge" binary in another image to allow process isolation, with both services running in the same container some problems could come. Or try to add it in the prometheus itself. On other hand my dashboard shows the container metrics cadvisor provides. And it's easy to extend it. It would be good to allow me to create issues in your project to do a better follow-up. And a 3rd option: create a prometheus fork adding your both features: @joonas-fi and @genki. It could be very useful until the Prometheus project adds support to Docker swarm service discovery, or maybe they could accept your changes, it's one of the best things of the open source model. ;) |
@bvis Injected labels are only be usable for such as legend labels because it is not real labels in the scope of query. Prometheus is using the labels as identifier of targets, so if insert something in it causes duplication of targets while recreation of containers. My motivation was just using injected labels for legend label in Grafana such like "{{__host}}". |
Based on the Swarm discovery from @ContainerSolutions, I've coded that PoC that is working ok in our stagging env: Takes some of the great ideas from the original solution, but tries to fit best in a deployment where prometheus is executed in a swarm worker (dedicated) without mounting shared volumes between workers/masters (that is fairly complex in some cloud providers) and provides more swarm metadata. It also removes the "Autoconnection" to swarm networks feature, leaving this responsibility to the swarm operator that interconnects services. (although this feature can be easily brought back) The original motivation was using the hostname from the worker as the That client/server duality needed could be simplified dropping completly the client if prometheus implements a generic Please, let me know what do you think about this approach |
After several months of waiting, I have implemented a simple Swarm discovery in my fork rep, maybe you guys also need it: Or download image directly
ConfigurationFor prometheus: - job_name: swarm
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
swarm_sd_configs:
- api_server: http:https://docker-proxy:2375
# api_version: 1.32
# group: xxx
# network: xxx
# refresh_interval: 10s
# timeout: 15s
relabel_configs:
# Add a service label
- source_labels: [__meta_swarm_service]
target_label: service
# Add a node ip label
- source_labels: [__meta_swarm_node_ip]
target_label: node_ip
# Add a node name label
- source_labels: [__meta_swarm_node_name]
target_label: node_name For Swarm service, You can add several labels to control scraping:
|
We really need this as well, could we get an indication from prometheus, if they want to include the functionality provided by @cuigh ? |
I'm closing the issue as unfortunately, we are currently not accepting new integrations. ContainerSolutions/prometheus-swarm-discovery is listed in the Prometheus documentation to integrate Docker Swarm with the file service discovery. We can only provide the stability and performance we want to provide if we can properly maintain the codebase. This includes, amongst others, test integrations in an automated and scalable fashion. For this reason, we are suggesting people integrate with the help of our generic interfaces. We have an integrations page on which integrations using our generic interfaces are listed. Even if existing integrations can not be tested in an automated fashion, we will not remove them for reasons of compatibility. This also means that any additions we take on, or any changes to existing integrations we make or accept, will mean maintaining and testing these until at least the next major version, realistically even beyond that. Feel free to question this answer on our developer mailing list, but be aware it's unlikely that you will get a different answer. |
Be aware that ContainerSolutions/prometheus-swarm-discovery is not yet ready for production usage - due to file descriptors leaks ContainerSolutions/prometheus-swarm-discovery#9. |
I updated my old proof of concept to use a better strategy: https://github.com/function61/promswarmconnect Previously it used the file service discovery type to dynamically write the file to disk based on the info in Swarm. Its drawback was that we had to make changes to the Prometheus container, overriding the entrypoint and launching the file synchronizer binary AND Prometheus. This is not robust, because we would have had to write logic to deal with either of the binaries crashing. My new approach emulates the API of the existing Triton service discovery, so we can run the released Prometheus container from Docker Hub 100 % unchanged. All you have to do is write configuration for the Triton SD in the Prometheus config file. |
Docker Swarm Mode is popular enough that we can make an exception to the SD moratorium. We also previously discussed adding support for it according to @brian-brazil. |
I see no reason to make any exceptions, we continue to have issues maintaining what we already have. We also previously decided not to support it, and it sounds like what exists now is not what existed then. |
@pdambrauskas unfortunately there is no practical option in Go otherwise I guess that pluggable SD would have been done a long time ago... |
Have you guys seen cuigh's post above? His implementation (https://github.com/cuigh/prometheus) is complete, fully integrated and confirmed working. Considering that he already did all the heavy lifting, why not simply integrate his implementation? By the looks of things he even seems more than happy to maintain it.. . |
I think it would be great. @cuigh Would you be willing to open a PR to add it? |
Hey, just test the @cuigh fork and it fill my needs, but i prefed to stay in the main prom repo As i understand, the good solution is to create a new custom sd mechanism like the example in here : https://github.com/prometheus/prometheus/tree/master/documentation/examples/custom-sd With the code of @cuigh fork in Anybody have started working on this ? |
I would propose re-submitting #3687 as a new PR. We can take an official vote on the Prometheus developers list to decide if it's good enough to merge, rather than having one person on prometheus-team object. |
I still don't think it's a good idea to implement swarm_sd based on file_sd, unless HTTP is supported in file_sd. |
Yeah agree
See https://prometheus.io/blog/2018/07/05/implementing-custom-sd/ |
No, we don't want to remove SD from the core. We do want to make it easier to add new methods outside the core. |
OK So who can resubmit PR for a vote ? @cuigh I would like to improve a bit some code in your fork, can you enable issues on your fork so we can echange on that ? |
We discussed this at our monthly meeting today, the moratorium remains. Currently we're awaiting integration testing for a good swathe of our existing SDs, which any new SD would be expected to follow in the steps of. |
@brian-brazil could you then please add http support to the file SD (so the SD JSON can be fetched over HTTP), so we'd at least get a clean point of integration for adding SD agents running outside of Prometheus' container? See use case of https://github.com/function61/promswarmconnect - this would be much cleaner if it could produce JSON compatible with the file SD agent! |
We have a moratorium on new SDs, and we already have a clean generic interface for integrations. |
That interface is just passing complexity management to the users. With that interface I need to have a the SD binary (let's say promswarmconnect) running either:
I ask again, is all this complexity justified just because you don't want to add remote JSON support to the file SD? I can totally understand not wanting to add 4 138 different SD plugins you have to maintain, for trendiest service platform of the week, but we're asking for an olive branch here because what you're suggesting is far from elegant and especially not of a microservice philosophy which Prometheus in other regards so elegantly fits in. TL;DR: generic HTTP based SD integration is the only elegant way we'll be able to build SD integrations outside of Prometheus' tree. |
The sidecar model is pretty standard, and not something you can really avoid if you're using Prometheus. We assume a POSIX system, and that includes processes being able to share filesystems, send each other signals etc.
I've done it in the past, the bash scripting is a little finicky, but it's quite doable. Especially if you can use a non-ancient version of bash.
I disagree here, and there's many out there that build fine on what we have. Writing code and deploying it are separate concerns, and I don't think we should be adding features just because one particular deployment system happens to lack a basic feature. |
PR was already merged and I enabled issues setting also, thanks. |
Docker Swarm rocks 🤘 |
Looks like this issue has gone stale...I'm looking for mechanisms to implement metrics discovery for Swarm-hosted containers and came across this thread. Any further progress/thoughts on whether this will be supported in the master branch? Thanks. |
@kz1000fan I think it's no way here |
I am aware that there have been several discussions around this subject - has a decision since been made on whether to natively support swarm service discovery? |
@darkl0rd Yes, we're willing to accept new discovery. The new rules are
I'm happy to be the sponsor for the docker swarm discovery, but someone needs to write the code. :) |
There are several variants out there, however I've yet to see one which is of a standard where it could be inside Prometheus - for example no hardcoding of business logic. |
I am working on this in #7420 |
In docker 1.12, the docker engine will ship with swarm mode built in. This means that it is now possible to stand up a swarm cluster using a bunch of nodes with just docker installed. In addition, swarm mode will come with dns and health checks built-in, negating the need to run consul or some other service discovery mechanism. More info here: https://docs.docker.com/engine/swarm/
It would be nice if prometheus can directly use the new services API to discover services running in a swarm cluster: https://docs.docker.com/engine/reference/api/docker_remote_api_v1.24/#3-8-services
Perhaps the config option could be called
docker_swarm_sd
.The text was updated successfully, but these errors were encountered: