Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BusinessController creation order issue #296

Closed
brzyangg opened this issue Oct 12, 2021 · 15 comments
Closed

BusinessController creation order issue #296

brzyangg opened this issue Oct 12, 2021 · 15 comments
Milestone

Comments

@brzyangg
Copy link
Contributor

I uses EtcdServiceRegistry in my HttpPipeline.
When the easegress is started, the BussinessController will be created in turn.
If the HttpPipeline is created before the EtcdServiceRegistry, an error will be reported.
The details are as follows:

2021-10-12T16:12:25.143+08:00   ERROR   proxy/server.go:139  get service palfish-service-registry/base/gateway-controller failed: palfish-service-registry not found

Errors are often reported during startup. Although there is no impact on the program, there is an ERROR level log, I always feel that there is something wrong.

@xxx7xxxx
Copy link
Contributor

Please give us complete config and log messages.

@brzyangg
Copy link
Contributor Author

brzyangg commented Oct 12, 2021

MyHTTPPipelineConfig
name: gateway-controller-pipeline
kind: HTTPPipeline
flow:
  - filter: request-adaptor
  - filter: proxy
filters:
  - name: request-adaptor
    kind: RequestAdaptor
    path:
      trimPrefix: "/gateway-controller"
  - name: proxy
    kind: Proxy
    mainPool:
      serviceName: base/gateway-controller
      serviceRegistry: etcd-service-registry
MyServiceRegistry
kind: EtcdServiceRegistry
name: etcd-service-registry
endpoints: ['127.0.0.1:2379']
prefix: "/services/"
cacheTimeout: 5s
MyLog
021-10-12T16:12:24.62+08:00    INFO    server/main.go:57       Easegress release: UNKNOWN, repo: UNKNOWN, commit: UNKNOWN
2021-10-12T16:12:24.622+08:00   INFO    cluster/config.go:125   etcd config: init-cluster:member-001=https://127.0.0.1:12380 cluster-state:existing force-new-cluster:false
2021-10-12T16:12:25.1+08:00     INFO    cluster/cluster.go:675  server is ready
2021-10-12T16:12:25.1+08:00     INFO    cluster/cluster.go:389  client connect with endpoints: [https://127.0.0.1:12380]
2021-10-12T16:12:25.101+08:00   INFO    cluster/cluster.go:402  client is ready
2021-10-12T16:12:25.122+08:00   INFO    cluster/cluster.go:499  lease is ready(use existed one: 75467c73874cb005)
2021-10-12T16:12:25.122+08:00   INFO    cluster/cluster.go:213  cluster is ready
2021-10-12T16:12:25.124+08:00   INFO    supervisor/supervisor.go:137    create TrafficController
2021-10-12T16:12:25.124+08:00   INFO    supervisor/supervisor.go:137    create RawConfigTrafficController
2021-10-12T16:12:25.124+08:00   INFO    supervisor/supervisor.go:137    create ServiceRegistry
2021-10-12T16:12:25.125+08:00   INFO    supervisor/supervisor.go:137    create StatusSyncController
2021-10-12T16:12:25.125+08:00   INFO    cluster/cluster.go:575  session is ready
2021-10-12T16:12:25.125+08:00   INFO    api/api.go:72   register api group admin
2021-10-12T16:12:25.125+08:00   INFO    api/server.go:78        api server running in 127.0.0.1:2382
2021-10-12T16:12:25.142+08:00   INFO    trafficcontroller/trafficcontroller.go:182      create namespace default
2021-10-12T16:12:25.142+08:00   INFO    trafficcontroller/trafficcontroller.go:190      create http server default/server-palfish-2
2021-10-12T16:12:25.142+08:00   INFO    supervisor/supervisor.go:182    create globalFilters
2021-10-12T16:12:25.143+08:00   WARN    serviceregistry/watcher.go:168  watch service etcd-service-registry/base/gateway-controller: list service instances failed: etcd-service-registry not found
2021-10-12T16:12:25.143+08:00   ERROR   proxy/server.go:139  get service etcd-service-registry/base/gateway-controller failed: etcd-service-registry not found
2021-10-12T16:12:25.143+08:00   INFO    trafficcontroller/trafficcontroller.go:424      create http pipeline default/gateway-controller-pipeline
2021-10-12T16:12:25.144+08:00   INFO    trafficcontroller/trafficcontroller.go:190      create http server default/server-palfish
2021-10-12T16:12:25.144+08:00   INFO    supervisor/supervisor.go:182    create etcd-service-registry

@xxx7xxxx
Copy link
Contributor

Thx. And won't this log appear again after supervisor/supervisor.go:182 create etcd-service-registry, will it?

2021-10-12T16:12:25.143+08:00   ERROR   proxy/server.go:139  get service etcd-service-registry/base/gateway-controller failed: etcd-service-registry not found

@brzyangg
Copy link
Contributor Author

Thx. And won't this log appear again after supervisor/supervisor.go:182 create etcd-service-registry, will it?

2021-10-12T16:12:25.143+08:00   ERROR   proxy/server.go:139  get service etcd-service-registry/base/gateway-controller failed: etcd-service-registry not found

@xxx7xxxx yes,If create etcd-service-registry is in front, no error log
I think the main problem is the order of creation here. https://github.com/megaease/easegress/blob/main/pkg/supervisor/supervisor.go#L176

@xxx7xxxx
Copy link
Contributor

Yes, the unordered feature of the map will cause it randomly. We will figure it out.

@brzyangg
Copy link
Contributor Author

@xxx7xxxx thank you

@localvar localvar added this to the v1.4.0 milestone Oct 13, 2021
@nevill
Copy link
Contributor

nevill commented Oct 13, 2021

Hi @brzyangg I'd like to help on this issue. But I cannot really reproduce the bug after several tries.

Can you paste your full config from the output of egctl object list here ?

@brzyangg
Copy link
Contributor Author

@nevill Hi, This is my object list and log.
I tried many times this time.

MyObjectList
- cacheTimeout: 5s
  endpoints:
  - 127.0.0.1:2379
  kind: EtcdServiceRegistry
  name: etcd-service-registry
  prefix: /services/
- filters:
  - kind: RequestAdaptor
    name: request-adaptor
    path:
      trimPrefix: /gateway-controller
  - kind: Proxy
    mainPool:
      loadBalance:
        policy: ipalfishConcurrent
      serviceName: base/gateway-controller
      serviceRegistry: etcd-service-registry
    name: proxy
  flow:
  - filter: request-adaptor
    jumpIf: {}
  - filter: proxy
    jumpIf: {}
  kind: HTTPPipeline
  name: gateway-controller-pipeline
- cacheSize: 0
  certBase64: ""
  certs: {}
  globalFilter: globalFilters
  http3: false
  https: false
  keepAlive: true
  keepAliveTimeout: 60s
  keyBase64: ""
  keys: {}
  kind: HTTPServer
  maxConnections: 10240
  name: server-palfish
  port: 10081
  rules:
  - host: ""
    hostRegexp: test-(.*).palfish.com
    paths:
    - backend: gateway-controller-pipeline
      headers: []
      pathRegexp: /gateway-controller(.*)
      rewriteTarget: ""
  tracing: null
  xForwardedFor: false
MyLog
cluster-join-urls [https://127.0.0.1:12380] changed to empty because it tries to join itself
2021-10-13T19:36:18.843+08:00   INFO    server/main.go:57       Easegress release: UNKNOWN, repo: UNKNOWN, commit: UNKNOWN
2021-10-13T19:36:18.844+08:00   INFO    cluster/config.go:125   etcd config: init-cluster:member-001=https://127.0.0.1:12380 cluster-state:existing force-new-cluster:false
2021-10-13T19:36:19.394+08:00   INFO    cluster/cluster.go:675  server is ready
2021-10-13T19:36:19.394+08:00   INFO    cluster/cluster.go:389  client connect with endpoints: [https://127.0.0.1:12380]
2021-10-13T19:36:19.395+08:00   INFO    cluster/cluster.go:402  client is ready
2021-10-13T19:36:19.504+08:00   INFO    cluster/cluster.go:535  lease is ready (grant new one: 75467c79705b3005)
2021-10-13T19:36:19.504+08:00   INFO    cluster/cluster.go:213  cluster is ready
2021-10-13T19:36:19.506+08:00   INFO    supervisor/supervisor.go:137    create TrafficController
2021-10-13T19:36:19.507+08:00   INFO    supervisor/supervisor.go:137    create RawConfigTrafficController
2021-10-13T19:36:19.533+08:00   INFO    trafficcontroller/trafficcontroller.go:182      create namespace default
2021-10-13T19:36:19.533+08:00   INFO    trafficcontroller/trafficcontroller.go:190      create http server default/server-palfish
2021-10-13T19:36:19.533+08:00   INFO    supervisor/supervisor.go:137    create ServiceRegistry
2021-10-13T19:36:19.533+08:00   INFO    supervisor/supervisor.go:137    create StatusSyncController
2021-10-13T19:36:19.533+08:00   INFO    cluster/cluster.go:575  session is ready
2021-10-13T19:36:19.533+08:00   INFO    api/api.go:72   register api group admin
2021-10-13T19:36:19.533+08:00   INFO    api/server.go:78        api server running in 127.0.0.1:2382
2021-10-13T19:36:19.535+08:00   ERROR   proxy/server.go:139  get service etcd-service-registry/base/gateway-controller failed: etcd-service-registry not found
2021-10-13T19:36:19.535+08:00   WARN    serviceregistry/watcher.go:168  watch service etcd-service-registry/base/gateway-controller: list service instances failed: etcd-service-registry not found
2021-10-13T19:36:19.535+08:00   INFO    trafficcontroller/trafficcontroller.go:424      create http pipeline default/gateway-controller-pipeline
2021-10-13T19:36:19.535+08:00   INFO    supervisor/supervisor.go:182    create etcd-service-registry
2021-10-13T19:36:19.538+08:00   INFO    proxy/server.go:171  use dynamic service: etcd-service-registry/base/gateway-controller
2021-10-13T19:36:24.531+08:00   INFO    cluster/member.go:154   store clusterMembers: member-001(a30c34bf7ec77546)=https://127.0.0.1:12380
2021-10-13T19:36:24.532+08:00   INFO    cluster/member.go:155   store knownMembers  : member-001(a30c34bf7ec77546)=https://127.0.0.1:12380

@nevill
Copy link
Contributor

nevill commented Oct 13, 2021

@brzyangg I'm not sure if it's relevant, but your config looks really weird.

  1. What's the globalFilter in HTTPServer ?
  2. What's the loadbalance policy ipalfishConcurrent in Proxy ?
  3. Do you really have a etcd running at 127.0.0.1, which is listening on the same port 2379 as easegress-server ?

@brzyangg
Copy link
Contributor Author

@nevill Sorry i forgot to explain. We developed easegress.

  1. globalFilter is a BusinessController developed by me. #253
  2. ipalfishConcurrent is a strategy we have customized.
  3. I have a running etcd. My easegress-server is not listening on port 2379

I can delete these weird configurations and run it again

@brzyangg
Copy link
Contributor Author

@nevill Hi, I have completely removed those weird configurations and the error still appears.

MyLog
cluster-join-urls [https://127.0.0.1:12380] changed to empty because it tries to join itself
2021-10-13T21:10:37.91+08:00    INFO    server/main.go:57       Easegress release: UNKNOWN, repo: UNKNOWN, commit: UNKNOWN
2021-10-13T21:10:37.91+08:00    INFO    cluster/config.go:125   etcd config: init-cluster:member-001=https://127.0.0.1:12380 cluster-state:existing force-new-cluster:false
2021-10-13T21:10:38.881+08:00   INFO    cluster/cluster.go:675  server is ready
2021-10-13T21:10:38.881+08:00   INFO    cluster/cluster.go:389  client connect with endpoints: [https://127.0.0.1:12380]
2021-10-13T21:10:38.882+08:00   INFO    cluster/cluster.go:402  client is ready
2021-10-13T21:10:38.929+08:00   INFO    cluster/cluster.go:535  lease is ready (grant new one: 75467c79c6b4fd05)
2021-10-13T21:10:38.929+08:00   INFO    cluster/cluster.go:213  cluster is ready
2021-10-13T21:10:38.949+08:00   INFO    supervisor/supervisor.go:137    create TrafficController
2021-10-13T21:10:38.949+08:00   INFO    supervisor/supervisor.go:137    create RawConfigTrafficController
2021-10-13T21:10:38.95+08:00    INFO    supervisor/supervisor.go:137    create ServiceRegistry
2021-10-13T21:10:38.951+08:00   INFO    supervisor/supervisor.go:137    create StatusSyncController
2021-10-13T21:10:38.951+08:00   INFO    cluster/cluster.go:575  session is ready
2021-10-13T21:10:38.951+08:00   INFO    api/api.go:72   register api group admin
2021-10-13T21:10:38.951+08:00   INFO    api/server.go:78        api server running in 127.0.0.1:2382
2021-10-13T21:10:38.968+08:00   INFO    trafficcontroller/trafficcontroller.go:416      create namespace default
2021-10-13T21:10:38.969+08:00   ERROR   proxy/server.go:139     get service etcd-service-registry/base/gateway-controller failed: etcd-service-registry not found
2021-10-13T21:10:38.969+08:00   WARN    serviceregistry/watcher.go:168  watch service etcd-service-registry/base/gateway-controller: list service instances failed: etcd-service-registry not found
2021-10-13T21:10:38.969+08:00   INFO    trafficcontroller/trafficcontroller.go:424      create http pipeline default/gateway-controller-pipeline
2021-10-13T21:10:38.969+08:00   INFO    trafficcontroller/trafficcontroller.go:190      create http server default/server-palfish
2021-10-13T21:10:38.969+08:00   INFO    supervisor/supervisor.go:182    create etcd-service-registry
2021-10-13T21:10:38.971+08:00   INFO    proxy/server.go:171     use dynamic service: etcd-service-registry/base/gateway-controller
2021-10-13T21:10:43.956+08:00   INFO    cluster/member.go:154   store clusterMembers: member-001(a30c34bf7ec77546)=https://127.0.0.1:12380
2021-10-13T21:10:43.956+08:00   INFO    cluster/member.go:155   store knownMembers  : member-001(a30c34bf7ec77546)=https://127.0.0.1:12380
MyObjectList
- cacheTimeout: 5s
  endpoints:
  - 127.0.0.1:2379
  kind: EtcdServiceRegistry
  name: etcd-service-registry
  prefix: /services/
- filters:
  - kind: RequestAdaptor
    name: request-adaptor
    path:
      trimPrefix: /gateway-controller
  - kind: Proxy
    mainPool:
      loadBalance:
        policy: roundRobin
      serviceName: base/gateway-controller
      serviceRegistry: etcd-service-registry
    name: proxy
  flow:
  - filter: request-adaptor
    jumpIf: {}
  - filter: proxy
    jumpIf: {}
  kind: HTTPPipeline
  name: gateway-controller-pipeline
- cacheSize: 0
  certBase64: ""
  certs: {}
  http3: false
  https: false
  keepAlive: true
  keepAliveTimeout: 60s
  keyBase64: ""
  keys: {}
  kind: HTTPServer
  maxConnections: 10240
  name: server-palfish
  port: 10081
  rules:
  - host: ""
    hostRegexp: sea-(.*).test.com
    paths:
    - backend: gateway-controller-pipeline
      headers: []
      pathRegexp: /gateway-controller(.*)
      rewriteTarget: ""
  tracing: null
  xForwardedFor: false

@nevill
Copy link
Contributor

nevill commented Oct 13, 2021

I still cannot reproduce the bug, but I will dig into the source code later.

Since you're developing a new feature, can you switch to themain branch without any changes and make build_server there, to make sure the bug is still there?

@nevill
Copy link
Contributor

nevill commented Oct 15, 2021

Finally, I can reproduce the bug. But I doubt that if this is a bug.

I think it should be designed as a declarative system, like Kubernetes. Which means part of the system can be lost at any time. But the system will wait and try to recover itself later based on the Spec definition.

Let's go back to easegress and think about replacing a registry as an example. Basically, we need to add a new one and remove the old one in the process. ObjectEntityWatcher will make all the things in the end, even this etcd-service-registry not found error still can happen during the process.

@brzyangg
Copy link
Contributor Author

Yes, this may not be a bug, but there is an ERROR log during the startup process, which will always make people feel something abnormal.

So I want to discuss whether the BusinessController can execute in a certain order at startup.

@localvar
Copy link
Collaborator

hi @nevill and @brzyangg , I agree with you that Easegress should not rely on a certain creation order of BusinessControllers at startup.

So I think we replace the error with a warning and make the message more clear is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants