Follower server redownloads segments upon server restart due to CRC change #11004

itschrispeck · 2023-06-29T20:54:26Z

When restarting servers in our clusters running on 0.12+, we observe numerous Helix pending messages due to Failed to Load LLC Segment. This is caused by CRC mismatch between ZK and the starting server.

Initial investigation showed that the 'leader' server commits and updates ZK, while the 'follower' server catches up and builds the segment locally with a different CRC. When a server is restarted all the segments that it 'followed' fail to load and are redownloaded from our deep store. We've confirmed that startOffset/endOffset match and the difference between two segments lies in the columns.psf file.

Logs for a segment:

2023-06-29T13:45:25-07:00 [host] Adding segment: <segment_name> to table: <table_name>
2023-06-29T13:45:25-07:00 [host] Segment: <segment_name> of table: <table_name> has crc change from: 3828021013 to: 1125725625
2023-06-29T13:45:25-07:00 [host] Failed to load LLC segment: <segment_name>, downloading a new copy

This behavior isn't seen on our clusters running on a 0.11 base. Is it possible some non-deterministic was introduced in the segment build process?

The text was updated successfully, but these errors were encountered:

chenboat · 2023-06-29T22:16:38Z

@Jackie-Jiang @mayankshriv

Jackie-Jiang · 2023-06-30T07:03:39Z

I think this issue also exists in 0.11. Here are 2 related fixes, which will be picked up in the next release: #10231 #10468

itschrispeck · 2023-07-17T17:24:39Z

You're right, I think it's coincidence that the clusters we saw issue with are running a newer version of Pinot. We've jumped ahead to a commit that includes both fixes and we're still seeing the behavior.

It looks like the lucene index may not be created deterministically also:

Binary files replica_1/json_data.lucene.index/_0.cfe and replica_2/json_data.lucene.index/_0.cfe differ
72 | Binary files replica_1/json_data.lucene.index/_0.cfs and replica_2/json_data.lucene.index/_0.cfs differ
73 | Binary files replica_1/json_data.lucene.index/_0.si and replica_2/json_data.lucene.index/_0.si differ
74 | Binary files replica_1/json_data.lucene.index/segments_1 and replica_2/json_data.lucene.index/segments_1 differ
75 | Binary files replica_1/message_dictionaryVars.lucene.index/_0.cfe and replica_2/message_dictionaryVars.lucene.index/_0.cfe differ
76 | Binary files replica_1/message_dictionaryVars.lucene.index/_0.cfs and replica_2/message_dictionaryVars.lucene.index/_0.cfs differ
77 | Binary files replica_1/message_dictionaryVars.lucene.index/_0.si and replica_2/message_dictionaryVars.lucene.index/_0.si differ
78 | Binary files replica_1/message_dictionaryVars.lucene.index/segments_1 and replica_2/message_dictionaryVars.lucene.index/segments_1 differ
79 | Binary files replica_1/message_logtype.lucene.index/_0.cfe and replica_2/message_logtype.lucene.index/_0.cfe differ
80 | Binary files replica_1/message_logtype.lucene.index/_0.cfs and replica_2/message_logtype.lucene.index/_0.cfs differ
81 | Binary files replica_1/message_logtype.lucene.index/_0.si and replica_2/message_logtype.lucene.index/_0.si differ
82 | Binary files replica_1/message_logtype.lucene.index/segments_1 and replica_2/message_logtype.lucene.index/segments_1 differ
83

And I think this issue would also present if we use certain transform functions (e.g. storing ingestion time via now()).

What necessitates computing the CRC for all segments each restart? Assuming no data corruption happened, it seems that any data difference was already being 'served' as valid data. How do others feel about adding a config to skip the CRC check?

Jackie-Jiang · 2023-07-20T21:57:45Z

Good point on the indeterministic index and functions. We should consider always downloading from deep-store for non-committing servers, or fetch CRC from ZK and set it into the local segment (this is kind of hacky)

tibrewalpratik17 mentioned this issue Jun 27, 2024

Upsert Compaction: CRC mismatch robustness #13493

Open

3 tasks

tibrewalpratik17 mentioned this issue Oct 8, 2024

add logs to debug why crc values are different upon same input data #14188

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follower server redownloads segments upon server restart due to CRC change #11004

Follower server redownloads segments upon server restart due to CRC change #11004

itschrispeck commented Jun 29, 2023

chenboat commented Jun 29, 2023

Jackie-Jiang commented Jun 30, 2023

itschrispeck commented Jul 17, 2023 •

edited

Loading

Jackie-Jiang commented Jul 20, 2023

Follower server redownloads segments upon server restart due to CRC change #11004

Follower server redownloads segments upon server restart due to CRC change #11004

Comments

itschrispeck commented Jun 29, 2023

chenboat commented Jun 29, 2023

Jackie-Jiang commented Jun 30, 2023

itschrispeck commented Jul 17, 2023 • edited Loading

Jackie-Jiang commented Jul 20, 2023

itschrispeck commented Jul 17, 2023 •

edited

Loading