-
Notifications
You must be signed in to change notification settings - Fork 147
ENV 'FLT_METRICS_LIST' and timestamps looks incorrectly parsed #160
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. /close |
@sesheta: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hello!
I have conducted research how pad is handling configuration. So, I deployed pad as Container (podman) and also in OpenShift with different methods of filling environment variables and would like to share my two big concerns.
Concern 1
Affected both deploy models andConcern 2
only in OpenShift. I think this is a bug, but if it is meant to be normally, please clarify. Hope it helps to improve this tool.Here is ConfigMap test samples which I used for pad Deployment
Or with two metrics with regex
And only with such values it's working OK, but it will be broken without any warning if
1. Quotes in timestamp variables
FLT_ROLLING_TRAINING_WINDOW_SIZE
We can fill in'7d'
or'7'
or just7
- In this ENV it must be always with suffix'd'
but it's a little confusing because in other variable we are using digit without suffix (e.g.FLT_RETRAINING_INTERVAL_MINUTES: '30'
).it leads to incorrect rolling training window size (4096 days instead of 7 for example) and it shown in pad logs but no any error and web server simple does not start (503). Running
curl localhost:8080
inside container returnconnection refused
. I also tested it without host network via standart bridge mode with port mapping and got the same results.So, I think it should be processed in more consistent way and maybe apply some validation of variables in python script.
2. Incorrect Parsing of value from FLT_METRICS_LIST
If we're using yaml-folded style with quotes like this, it leads pad to just start web server and no make any calculations therefore we can not see any
*_prophet
metrics from webServer endpoint.logs
---> Running application from Python script (app.py) ... 2021-09-03 11:10:34,461:INFO:configuration: Metric data rolling training window size: 6 days, 23:59:59.895134 2021-09-03 11:10:34,461:INFO:configuration: Model retraining interval: 30 minutes 2021-09-03 11:10:34,548:ERROR:prophet.plot: Importing plotly failed. Interactive plots will not work. 2021-09-03 11:10:34,565:INFO:__main__: Training models using ProcessPool of size:1 2021-09-03 11:10:34,583:INFO:__main__: Initializing Tornado Web App 2021-09-03 11:10:34,594:INFO:__main__: Will retrain model every 30 minutes
And it's always in such state without eny errors/additions.
WebServer Output
But Pad is working OK only when we're not using yaml-folded style with quotes
logs
I think it needs to handle value from this metrics and cut-off quotes if pad cannot process with it or change logic of start web server or forming metrics list
See output from pad container how it saves values from variale:
(Working OK)
(Not prophet calculations from tornado, only python initial data but no errors in log)
Here is my testing Deployment in OpenShift
The text was updated successfully, but these errors were encountered: