You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are experiencing some strange issue when we bulk insert data into a table in a parent-child relation (one-to-one). Lets assume in the parent table document there are 20'000 documents, already indexed in elasticsearch via pgsync. In the empty table "content" we insert 20'000 rows (committing between each insert), but only 20 rows ("content"-field) get synced to elasticsearch. When we set REDIS_READ_CHUNK_SIZE: "1", it works correctly (all 20'000 content-inserts are reflected in elasticsearch), but it takes about 10 minutes. If i set REDIS_READ_CHUNK_SIZE: "10", i get around 2000 correct syncs to elasticsearch. So it seems that somehow it drops most of the redis read chunk on each sync. We tried every combination of possible solutions via env-vars (elastic bulk streaming, different chunk sizes on elastic or redis and so on). As i can see it pgsync is only processing one record from each redis read chunk.
Heres the log with the default REDIS_READ_CHUNK_SIZE (you can clearly see how it only reflects a few update to elastic each chunk. In the end, i have only 20 documents with content correctly synced):
Heres the same thing with REDIS_READ_CHUNK_SIZE set to 1 (you can see how everything gets synced correctly eventually. In the end, i have every of the 20'000 documents content synced):
Edit: I tested this behaviour all the way down to two consecutive inserts (instead of 20'000) at a time, to the content table. Only one of them gets processed. So it seems all except one record of the redis read chunk gets discarded in case of an update / insert in a child relationship. I also tested inserts / updates to the parent "document" table, in this case, everything works as expected.
Here is a debug log where only one of the two inserts gets synced:
From the logs included, it seems things worked as expected up till DEBUG:pgsync.sync: tg_op: INSERT table: public.content
My guess (until I see your schema) is that your relationship with manual foreign_key config might not be setup correctly.
I tried with a completely fresh database, no manual foreign key setup (as its alrady setup by the DB schema and pgsync seems to detect it correctly)
Sadly, i have exactly the same behavior.
Whats also to mention: It only behaves wrong on INSERTS. UPDATES / DELETES get handled correctly
PGSync version: 3.2.0
Postgres version: 16.4.0
Elasticsearch/OpenSearch version: 7.16.3
Redis version: 7.4.0
Python version: 3.9
Problem Description:
Maybe related to #481, #552, #569
First, the schema:
We are experiencing some strange issue when we bulk insert data into a table in a parent-child relation (one-to-one). Lets assume in the parent table document there are 20'000 documents, already indexed in elasticsearch via pgsync. In the empty table "content" we insert 20'000 rows (committing between each insert), but only 20 rows ("content"-field) get synced to elasticsearch. When we set REDIS_READ_CHUNK_SIZE: "1", it works correctly (all 20'000 content-inserts are reflected in elasticsearch), but it takes about 10 minutes. If i set REDIS_READ_CHUNK_SIZE: "10", i get around 2000 correct syncs to elasticsearch. So it seems that somehow it drops most of the redis read chunk on each sync. We tried every combination of possible solutions via env-vars (elastic bulk streaming, different chunk sizes on elastic or redis and so on). As i can see it pgsync is only processing one record from each redis read chunk.
Heres the log with the default REDIS_READ_CHUNK_SIZE (you can clearly see how it only reflects a few update to elastic each chunk. In the end, i have only 20 documents with content correctly synced):
Heres the same thing with REDIS_READ_CHUNK_SIZE set to 1 (you can see how everything gets synced correctly eventually. In the end, i have every of the 20'000 documents content synced):
Edit: I tested this behaviour all the way down to two consecutive inserts (instead of 20'000) at a time, to the content table. Only one of them gets processed. So it seems all except one record of the redis read chunk gets discarded in case of an update / insert in a child relationship. I also tested inserts / updates to the parent "document" table, in this case, everything works as expected.
Here is a debug log where only one of the two inserts gets synced:
Error Message (if any):
The text was updated successfully, but these errors were encountered: