Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support question: metrics with Swarm #879

Closed
poerwiyanto opened this issue Sep 19, 2018 · 11 comments · Fixed by #880
Closed

Support question: metrics with Swarm #879

poerwiyanto opened this issue Sep 19, 2018 · 11 comments · Fixed by #880

Comments

@poerwiyanto
Copy link

Expected Behaviour

Gateway service count returns no. of replicas and auto scaling works.

Current Behaviour

Gateway service count returns "no data points" error in Prometheus and auto scaling did not work (checked from docker service ps command) even APIHighInvocationRate is firing.

Possible Solution

Steps to Reproduce (for bugs)

  1. Deploy stack in Docker Swarm with 2 nodes.
  2. Deploy First Python Function tutorial.
  3. curl the function using while true to trigger APIHighInvocationRate.

Context

I am trying to learn OpenFaaS and need the auto scaling.

Your Environment

  • Docker version docker version (e.g. Docker 17.0.05 ):
    Docker 18.06.1-ce, Docker Machine 0.14.0

  • Are you using Docker Swarm or Kubernetes (FaaS-netes)?
    Docker Swarm

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Ubuntu 18.04

  • Link to your project or a code example to reproduce issue:

  • Please also follow the troubleshooting guide and paste in any other diagnostic information you have:

@poerwiyanto poerwiyanto changed the title Gateway service count returns 0 and auto scaling is not working Gateway service count returns "no data points" and auto scaling is not working Sep 19, 2018
@alexellis
Copy link
Member

Hi thanks for raising an issue please could you work through the troubleshooting guide and report back to us? We are unaware of any issues with metrics or autoscaling at this time.

@poerwiyanto
Copy link
Author

Hi, Alex. Thanks for your fast response. I will try the troubleshooting and get back to you.

@alexellis alexellis changed the title Gateway service count returns "no data points" and auto scaling is not working Support question: metrics with Swarm Sep 19, 2018
@alexellis
Copy link
Member

I'd also advise taking a look at this - https://github.com/openfaas/workshop/blob/master/lab9.md

@poerwiyanto
Copy link
Author

poerwiyanto commented Sep 19, 2018

Actually, I followed the workshop and got stuck in lab9 because prometheus returns "no data points" for gateway_service_count and the no. of replicas for my function was stuck at 1 even after APIHighInvocationRate was triggered. I'm still figuring it out. Thanks a lot.

@alexellis
Copy link
Member

So you're using a docker-machine VM on top of Ubuntu using the KVM driver right?

Is this all with what's in master?

What function did you deploy? Maybe try the store too.

@poerwiyanto
Copy link
Author

screenshot from 2018-09-19 10-40-53
screenshot from 2018-09-19 10-42-25
screenshot from 2018-09-19 10-43-15
Hi, Alex. Sorry for bothering you. I captured my metrics (I used stefanprodan's faas-grafana). When I used Prometheus, there was no gateway_service_count in metrics options.

Yes, I used 2 docker-machine VMs on top of Ubuntu using VirtualBox driver, but I tried in DigitalOcean (https://github.com/openfaas/faas/blob/master/guide/deployment_digitalocean.md) and faced the same problem. Is there anything I am doing wrong?

@LucasRoesler
Copy link
Member

I deployed my own 2 node swarm and found this in my gateway logs

func_gateway.1.jaut08aptbtz@myvm2    | 2018/09/19 18:24:21 Alert received.
func_gateway.1.jaut08aptbtz@myvm2    | 2018/09/19 18:24:21 {"receiver":"scale-up","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"APIHighInvocationRate","function_name":"nodeinfo","monitor":"faas-monitor","service":"gateway","severity":"major","value":"7.4"},"annotations":{"description":"High invocation total on ","summary":"High invocation total on "},"startsAt":"2018-09-19T18:24:16.175098924Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"https://5356532ccb6a:9090/graph?g0.expr=sum+by%28function_name%29+%28rate%28gateway_function_invocation_total%7Bcode%3D%22200%22%7D%5B10s%5D%29%29+%3E+5\u0026g0.tab=1"}],"groupLabels":{"alertname":"APIHighInvocationRate","service":"gateway"},"commonLabels":{"alertname":"APIHighInvocationRate","function_name":"nodeinfo","monitor":"faas-monitor","service":"gateway","severity":"major","value":"7.4"},"commonAnnotations":{"description":"High invocation total on ","summary":"High invocation total on "},"externalURL":"https://4ba1a6cc8430:9093","version":"4","groupKey":"{}:{alertname=\"APIHighInvocationRate\", service=\"gateway\"}"}
func_gateway.1.jaut08aptbtz@myvm2    |
func_gateway.1.jaut08aptbtz@myvm2    | 2018/09/19 18:24:21 [Scale] function=nodeinfo 0 => 4.
func_gateway.1.jaut08aptbtz@myvm2    | 2018/09/19 18:24:21 error scaling HTTP code 401, https://faas-swarm:8080/system/scale-function/nodeinfo
func_gateway.1.jaut08aptbtz@myvm2    | 2018/09/19 18:24:21 [error scaling HTTP code 401, https://faas-swarm:8080/system/scale-function/nodeinfo]
func_gateway.1.jaut08aptbtz@myvm2    | 2018/09/19 18:24:21 Alert received.

Note the 401 errors during the scaling request. A scaling request is being made

I also noticed that when we try to scrape the service count here

get, _ := http.NewRequest(http.MethodGet, endpointURL.String()+"system/functions", nil)
services := []requests.Function{}
res, err := proxyClient.Do(get)
if err != nil {
log.Println(err)
continue
}

it does not seem to set any authentication headers, so that would explain why gateway_service_count is missing

@LucasRoesler
Copy link
Member

@poerwiyanto can you try disabling auth on the faas-swarm component. You can do this by modifying the docker-compose.yaml file so that basic_auth: "false" in the faas-swarm.environment

alexellis added a commit that referenced this issue Sep 19, 2018
- this is a blocking issue for auth with Docker Swarm
fixes #879

Signed-off-by: Alex Ellis (VMware) <[email protected]>
alexellis added a commit that referenced this issue Sep 19, 2018
- this is a blocking issue for auth with Docker Swarm
fixes #879

Signed-off-by: Alex Ellis (VMware) <[email protected]>
@alexellis
Copy link
Member

The part of the issue template asks users to provide logs as part of the troubleshooting steps.

Once @LucasRoesler was able to do this I could track down the error to needing additional authentication for the faas-swarm component. I've mentioned the workarounds on the linked PR above.

@poerwiyanto please see the instructions for updating your local environment. This will get you past the issue and thanks for reporting it.

@poerwiyanto
Copy link
Author

Okay, thanks for your help, Alex and Lucas.

ewilde pushed a commit to ewilde/faas that referenced this issue Sep 21, 2018
- this is a blocking issue for auth with Docker Swarm
fixes openfaas#879

Signed-off-by: Alex Ellis (VMware) <[email protected]>
@alexellis
Copy link
Member

Derek lock: old thread

@derek derek bot locked and limited conversation to collaborators Jun 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants