[BEAM-4775] Converting MonitoringInfos to MetricResults in PortableRunner #9843

kamilwu · 2019-10-21T11:08:20Z

This PR enables users to retrieve metrics from portable runners via query() call.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang	SDK	Apex	Dataflow	Gearpump	Samza	Spark
Go		---	---	---	---
Java
Python		---		---	---
XLang	---	---	---	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

kamilwu · 2019-10-21T13:56:28Z

Run Python PreCommit

kamilwu · 2019-10-21T15:14:41Z

R: @robertwb @ajamato Could you take a look?
cc: @lgajowy

pabloem · 2019-10-23T18:45:47Z

We'll have to rely on r:@Ardagan as Alex is not working on this anymore.

ajamato · 2019-10-23T21:00:44Z

sdks/python/apache_beam/runners/portability/local_job_service.py

+ monitoring_infos._is_user_distribution_monitoring_info(x)
+ ]
+
+ return beam_job_api_pb2.GetJobMetricsResponse(


FYI, here is some background. I was not aware that MetricResult existed in proto form now. Creating a proto was suggested but not pursued at the time.
https://s.apache.org/get-metrics-api

When I last worked here MetricResult did not have a proto format. The plan was to use MonitoringInfos for a language agnostic format, and then MetricResult would be a language specific format (python, java, go, etc.) Each runner should provide a way to return MonitoringInfos, and each language would have a library to convert MonitoringInfos to MetricResult protos

It seems like you might be using a different approach, to use MetricResult protos as the language agnostic solution

Its hard for me to review, I don't think I am up to date on whatever the current plan/usage of these protos are.

ajamato · 2019-10-23T21:05:48Z

sdks/python/apache_beam/runners/portability/portable_metrics.py

+
+ Args:
+ monitoring_info_list: An iterable of MonitoringInfo objects.
+ user_metrics_only: If true, includes user metrics only.


Please add a Returns section to pydoc comment

ajamato · 2019-10-23T21:06:24Z

sdks/python/apache_beam/runners/portability/portable_metrics.py

+
+ @staticmethod
+ def _create_metric_key(monitoring_info):
+ step_name = ParseMonitoringInfoMixin._get_step_name(monitoring_info)


_get_step_name seems like it should live on monitoring_infos.py

ajamato · 2019-10-23T21:08:15Z

sdks/python/apache_beam/runners/portability/portable_metrics.py

+
+ @staticmethod
+ def _get_step_name(monitoring_info):
+ keys_to_check = [monitoring_infos.PTRANSFORM_LABEL,


This doesn't seem correct. Why would you consider the step name to be under labels other than PTRANSFORM_LABEL

see montioring_info specs defined here
https://github.com/apache/beam/blob/d4afbabf38a3ab557625c9c091ed5f06ca6731ce/model/pipeline/src/main/proto/metrics.proto

PTRANSFORM_LABEL is the only one used for this purpose

I'm running a simple test (sdks/python/apache_beam/testing/load_tests/pardo_test.py) on FlinkRunner and I see a lot of metrics with PCOLLECTION label. They have different PCOLLECTION label value ('u'ref_PCollection_PCollection_16', 'u'ref_PCollection_PCollection_10' and so on), but their urn is the same: beam:metric:element_count:v1. Shouldn't we include all of these metrics in final result? If I didn't consider the step name to be under labels other than PTRANSFORM_LABEL, their MetricKey would be like ('', MetricName('beam', 'metric:element_count:v1')). It that case, some of the metrics would be lost, because their MetricKey would be the same.

No, the intention is that URN + Labels defines the metric instance.

Think of URN as the class of metric, and URN+Labels defines the object instance, as an analogy.

I don't quite remember exactly what MetricKey contained, but generally our collection objects for metrics need to account for URN+Labels to correctly identify the metric instance.

MetricResult was originally designed just for user metrics, which did not have labels. Just a name and namespace. The labels concept was introduced to MonitoringInfos later. Then the name and namespace were reworked to be labels.
See the user metric MonitoringInfoSpec, defining what should be populated on an MI

beam/model/pipeline/src/main/proto/metrics.proto

Line 67 in d4afbab

required_labels: ["PTRANSFORM", "NAMESPACE", "NAME"],

In Dataflow runner we have to translate SDK PTransform and PCollection names back to user defined names. I'm still checking the code to see if that is required here as well.

Some additional background:
ElementCount metric is defined only by PCollection name. If you need to get a step name from it, you need to store it in mapping from PCollection to StepNames. PCollection name is incorrect to be considered a step name.

As Alex mentioned each metric is defined by URN + List of Labels. Each URN can have different set of labels that uniquely identify the metric. We need to treat each URN differently atm. This comes from the point that for example same PCollection can technically be relevant to different step names and choosing specific step name might not be trivial.

I see your point now.
I think it might be too difficult (if possible at all) to implement such a mapping right here. Since this is a portable runner, such a logic would have to be aware of all possible runner's implementation details.

I'm going to stay with checking PTRANSFORM_LABEL only for now.

If you want to aggregate away the other labels, you could use this approach if you sum everything with the same ptransform_label

ajamato · 2019-10-23T21:09:36Z

sdks/python/apache_beam/runners/portability/portable_runner.py

@@ -361,16 +363,31 @@ def add_runner_options(parser):
 state_stream, cleanup_callbacks)


-class PortableMetrics(metrics.metric.MetricResults):
+class PortableMetrics(metric.MetricResults,
+ portable_metrics.ParseMonitoringInfoMixin):


Please remove the use of double inheritance. You can use the methods from portable_metrics.ParseMonitoringInfoMixin by making that file define the methods in the module instead of a class, since it only defined static methods

ajamato · 2019-10-23T21:10:48Z

sdks/python/apache_beam/runners/portability/portable_metrics.py

+from apache_beam.metrics.metric import MetricName
+
+
+class ParseMonitoringInfoMixin(object):


Please remove this class and instead define the methods on a module, without a class. All the methods here are static, so there is no need for a class/object instance,

ajamato · 2019-10-23T21:11:30Z

sdks/python/apache_beam/testing/load_tests/load_test.py

- bq_table=self.metrics_namespace,
- bq_dataset=self.metrics_dataset,
- )
+ self.metrics_monitor = MetricsReader(


I am not familiar with MetricsReader

We use MetricsReader to push metrics obtained from PipelineResult to BigQuery in load tests. This diff is just a fix to prevent system metrics from being published, as we don't need them in BigQuery

ajamato · 2019-10-24T16:35:50Z

Happy to keep reviewing but please have Ardagan review as well.

kamilwu · 2019-10-25T14:30:18Z

Thanks for your review so far, I've put some fixes

kamilwu · 2019-10-28T10:14:15Z

Run Python PreCommit

Ardagan · 2019-10-28T16:05:03Z

sdks/python/apache_beam/runners/portability/portable_metrics.py

+def _create_metric_key(monitoring_info):
+ step_name = monitoring_infos.get_step_name(monitoring_info)
+ if not step_name:
+ raise ValueError("Step name is empty")


Dump monitoring info to error message. Otherwise this error is indescriptive. Ideally change text to what you expect here, for example: "Monitoring info should contain XXX field. Monitoring info: {}" or in this case it can be: "Failed to deduce step_name from MonitoringInfo: {}"

"Failed to deduce step_name..." looks good.

sdks/python/apache_beam/runners/portability/portable_metrics.py

+ try:
+ key = _create_metric_key(mi)
+ except ValueError:
+ continue


ajamato

Please have @Ardagan approve as well. I think you've addressed my concerns

kamilwu · 2019-10-29T08:55:30Z

Thanks @ajamato!
@Ardagan I've added dumping monitoring infos. I've also squashed all commits.

FnApiRunner and PortableRunner shares the same set of unit tests. Tests that check metrics were, however, disabled in PortableRunner tests suites. This commit removes this limitation.

Using committed property could have been a source of an error if runner doesn't support committed metrics.

Ardagan · 2019-10-29T16:44:40Z

run python precommit

Ardagan

Need to get tests green before merging. LGTM overall.

Ardagan · 2019-10-30T19:20:13Z

There are no more comments, I'll merge PR.

kamilwu · 2019-10-31T08:42:25Z

Thanks!

kamilwu changed the title ~~Converting MonitoringInfos to MetricResults in PortableRunner~~ [BEAM-4775] Converting MonitoringInfos to MetricResults in PortableRunner Oct 21, 2019

kamilwu force-pushed the portable-runners-metrics branch from 5440307 to 67e73d3 Compare October 21, 2019 11:53

pabloem requested a review from Ardagan October 23, 2019 18:45

ajamato reviewed Oct 23, 2019

View reviewed changes

Ardagan reviewed Oct 28, 2019

View reviewed changes

ajamato approved these changes Oct 28, 2019

View reviewed changes

kamilwu force-pushed the portable-runners-metrics branch from 092e32f to cbeda3e Compare October 29, 2019 08:50

kamilwu added 4 commits October 29, 2019 10:20

[BEAM-4775] Extract code responsible for parsing MonitoringInfo

054fe9c

[BEAM-4775] Expose metrics gathered by FnApiRunner in LocalJobServicer

34407f6

FnApiRunner and PortableRunner shares the same set of unit tests. Tests that check metrics were, however, disabled in PortableRunner tests suites. This commit removes this limitation.

[BEAM-4775] Publish only user metrics in load tests

f1944c0

[BEAM-4775] Use result property of MetricResult in load tests

f53e47a

Using committed property could have been a source of an error if runner doesn't support committed metrics.

kamilwu force-pushed the portable-runners-metrics branch from cbeda3e to f53e47a Compare October 29, 2019 09:21

Ardagan approved these changes Oct 29, 2019

View reviewed changes

Ardagan merged commit 7a02887 into apache:master Oct 30, 2019

kamilwu deleted the portable-runners-metrics branch October 31, 2019 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-4775] Converting MonitoringInfos to MetricResults in PortableRunner #9843

[BEAM-4775] Converting MonitoringInfos to MetricResults in PortableRunner #9843

kamilwu commented Oct 21, 2019 •

edited

Loading

kamilwu commented Oct 21, 2019

kamilwu commented Oct 21, 2019

pabloem commented Oct 23, 2019 •

edited

Loading

ajamato Oct 23, 2019 •

edited

Loading

ajamato Oct 23, 2019

kamilwu Oct 24, 2019

ajamato Oct 23, 2019

ajamato Oct 23, 2019

kamilwu Oct 24, 2019 •

edited

Loading

ajamato Oct 24, 2019

Ardagan Oct 24, 2019 •

edited

Loading

kamilwu Oct 25, 2019

ajamato Oct 28, 2019

ajamato Oct 23, 2019

ajamato Oct 23, 2019

ajamato Oct 23, 2019

kamilwu Oct 25, 2019

ajamato commented Oct 24, 2019

kamilwu commented Oct 25, 2019

kamilwu commented Oct 28, 2019

Ardagan Oct 28, 2019

kamilwu Oct 29, 2019

This comment was marked as resolved.

ajamato left a comment

kamilwu commented Oct 29, 2019

Ardagan commented Oct 29, 2019

Ardagan left a comment •

edited

Loading

Ardagan commented Oct 30, 2019

kamilwu commented Oct 31, 2019

		from apache_beam.metrics.metric import MetricName


		class ParseMonitoringInfoMixin(object):

[BEAM-4775] Converting MonitoringInfos to MetricResults in PortableRunner #9843

[BEAM-4775] Converting MonitoringInfos to MetricResults in PortableRunner #9843

Conversation

kamilwu commented Oct 21, 2019 • edited Loading

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

kamilwu commented Oct 21, 2019

kamilwu commented Oct 21, 2019

pabloem commented Oct 23, 2019 • edited Loading

ajamato Oct 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kamilwu Oct 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ardagan Oct 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ajamato commented Oct 24, 2019

kamilwu commented Oct 25, 2019

kamilwu commented Oct 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

ajamato left a comment

Choose a reason for hiding this comment

kamilwu commented Oct 29, 2019

Ardagan commented Oct 29, 2019

Ardagan left a comment • edited Loading

Choose a reason for hiding this comment

Ardagan commented Oct 30, 2019

kamilwu commented Oct 31, 2019

kamilwu commented Oct 21, 2019 •

edited

Loading

pabloem commented Oct 23, 2019 •

edited

Loading

ajamato Oct 23, 2019 •

edited

Loading

kamilwu Oct 24, 2019 •

edited

Loading

Ardagan Oct 24, 2019 •

edited

Loading

Ardagan left a comment •

edited

Loading