-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible to combine "on conflict do nothing" and "stream_to"? #881
Comments
I doubt ON CONFLICT would work with streams, but I can't think of a better way either so it's worth a try. |
Oh my goodness, thank you so much for responding! I'm in deep water with a job that currently will take over 200 hours to finish each time it runs if I don't optimize the DB side! I'm processing 90+ million chess games and the moves of each of those games. Needless to say, that's a LOT of inserts. How would I supply that? This is how I do it elsewhere: const pqxx::table_path games_table_path = {"games"};
auto games_table = pqxx::stream_to::table(tx, games_table_path, {"column1", "column2", ... }
for (auto const &game : games) {
auto row = std::make_tuple(...)
games_table << row;
} Also, currently my non stream version of the real work of the process (saving the individual chess boards per move) prepares a statement then with a pqxx::work and runs exec_prepared separately for each row I need to insert. Is there a way to send them all at once? |
I feel like there's probably a better way than what I'm doing. How I'm preparing the query void prepare_insert_boards(pqxx::connection &conn) {
conn.prepare(
"insert_boards",
"insert into boards (hash, white_bishops, white_rooks, white_queens, "
"white_knights, white_king, white_pawn, black_bishops, black_rooks, "
"black_queens, black_knights, black_king, black_pawn, white_to_move, "
"castle_rights, enpassant_sq) values ($1, $2, $3, $4, $5, $6, $7, $8, "
"$9, $10, $11, $12, $13, $14, $15, $16) on conflict (hash) do nothing");
} How I'm inserting all the boards in a batch of games void insert_boards(pqxx::work &tx, const std::vector<Game> &games) {
std::map<std::string, bool> seen;
// Add missing boards
for (const auto &game : games) {
for (const auto &continuation : game.continuations) {
const auto &board = continuation.board;
if (seen[board.hash]) {
continue;
}
tx.exec_prepared("insert_boards", board.hash,
to_bits(board.white_bishops), to_bits(board.white_rooks),
to_bits(board.white_queens),
to_bits(board.white_knights), to_bits(board.white_king),
to_bits(board.white_pawn), to_bits(board.black_bishops),
to_bits(board.black_rooks), to_bits(board.black_queens),
to_bits(board.black_knights), to_bits(board.black_king),
to_bits(board.black_pawn), board.white_to_move,
board.castle_rights, board.enpassant_sq);
seen[board.hash] = true;
}
}
} And here's the to_bits code. (The columns used for it are bit(64) in postgresql std::string to_bits(std::uint64_t value) {
std::bitset<64> bits(value);
return bits.to_string();
} |
A prepared statement is probably not saving you much time, if anything. Instead, you could try composing a big INSERT statement ad hoc, which inserts multiple rows at once. |
There are also some tricks that only work under very specific circumstances, such as temporarily disabling constraints or temporarily making the table unlogged. But there are risks to those tricks so I'd start away from them as long as possible. |
Is there a way, in a where clause, to express "in ($1, $2, $3 ... $n)" without building that part of the query manually? For example, in some frameworks you can say "in (?)" and pass an array for that value. Also, I found my above code failed with an error regarding copy missing column data (for some reason it truncated after the first column I wrote, despite it working fine for a different query) until I rewrote it to use write_values instead of <<. |
Have you tried just putting the values in a
The |
(Some of the same questions came up in #879.) |
I'm going to chime in about the code as well. One simple optimization that would already help a bit would be changing the If it's possible I'd also check if it's possible to make Maybe it's also possible to eliminate duplications before inserting so that you don't have to do that as well? As for converting the the representation of whatever these In the end, what I'm proposing may not yield significant results, so I'd benchmark anyway if you care that much about speed, but I wanted to leave an insight, since it's an important matter. |
I guess we're kind of stuck on this one. I don't think there's a way to "upsert" using |
The core motivation was to be able to use the bitwise operators in queries as matching on bit boards with masks and such is standard. Was hoping to so that db side |
Sorry, late-occurring thought @chao-mu: have you considered streaming the data into a temporary table, and then doing the merging from the temporary table into the destination table? |
Hello,
I am inserting a massive (hundreds of millions) of rows. Ideally, I want to use the streaming api, however there will likely be conflicts from past iterations. I had an implementation where I would process the workload in chunks and check ahead of time whether something existed before inserting, but that is rife with issues around concurrency and seems far too complex.
Is there a way to get stream_to::table to ignore duplicates?
Thank you,
Me.
The text was updated successfully, but these errors were encountered: