Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse - Materialized view contains duplicates on the grouping key #26301

Closed
StarpTech opened this issue Aug 30, 2023 · 9 comments
Closed
Labels
exporter/clickhouse question Further information is requested

Comments

@StarpTech
Copy link
Contributor

StarpTech commented Aug 30, 2023

Component(s)

exporter/clickhouse

What happened?

Description

Hello, we experience duplicates in the otel_traces_trace_id_ts_mv when ingesting spans from multiple services. This looks like a race condition or limitation of how the schema was designed. We work around this by querying the otel_traces_trace_id_ts table directly.

image

Reading the blog post https://clickhouse.com/blog/storing-traces-and-spans-open-telemetry-in-clickhouse from clickhouse the value of the materialized view is very limited. I recommend deleting it to avoid this. I also saw that the README examples don't use the materialized view at all.

Steps to Reproduce

Expected Result

No duplicates are in the otel_traces_trace_id_ts_mv table because the idea is to use this table as a root spans overview.

Actual Result

Duplicates. See image above.

Collector version

0.81.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") PopOs 22.04 LTS
Compiler(if manually compiled): (e.g., "go 14.2") 1.2.0

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@StarpTech StarpTech added bug Something isn't working needs triage New item requiring triage labels Aug 30, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@hanjm
Copy link
Member

hanjm commented Aug 30, 2023

The duplicates data is normal beacuse it is inserted when a batch of spans write to clickhouse. A traceID may exist in more than more batch. so we query with min(start) and max(end)+1 as the time index of traceID.

The mv is high effectly when spans count grow high especially more than billon level. #13442 (comment)

If you query traceID is very fast without this time index mv in small dataset, you can just delete it.

@StarpTech
Copy link
Contributor Author

@hanjm thanks for clarifying. That means In order to get correct results, I'd GROUP again on the otel_traces_trace_id_ts_mv table?

SELECT
	TraceId,
	min(Timestamp) as Start,
	max(Timestamp) as End
FROM
otel_traces_trace_id_ts_mv
WHERE TraceId !=''
GROUP BY TraceId;

@hanjm
Copy link
Member

hanjm commented Aug 30, 2023

@StarpTech Yes. as the readme.md example

```clickhouse
WITH
'391dae938234560b16bb63f51501cb6f' as trace_id,
(SELECT min(Start) FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as start,
(SELECT max(End) + 1 FROM otel_traces_trace_id_ts WHERE TraceId = trace_id) as end
SELECT Timestamp as log_time,
TraceId,

@StarpTech
Copy link
Contributor Author

@hanjm I mean when making use of the mv.

@Frapschen Frapschen added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Aug 31, 2023
@hanjm
Copy link
Member

hanjm commented Aug 31, 2023

@StarpTech No, this mv is just a insert trigger, you can select from the distination table otel_traces_trace_id_ts but can not query otel_traces_trace_id_ts_mv directly. https://clickhouse.com/blog/using-materialized-views-in-clickhouse

@StarpTech
Copy link
Contributor Author

@hanjm Thank you, that makes sense.

@StarpTech
Copy link
Contributor Author

StarpTech commented Aug 31, 2023

I think this is not 100% correct. I know where my confusion came from. In Clickhouse, you can also use mv differently. Look at https://altinity.com/blog/clickhouse-materialized-views-illuminated-part-1 and query the mv directly because it creates a hidden table.

@hanjm
Copy link
Member

hanjm commented Sep 1, 2023

The diffence is the to table when create a mv, you can see the best practices from altinity. https://kb.altinity.com/altinity-kb-schema-design/materialized-views/#best-practices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/clickhouse question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants