Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator performance #770

Open
alexolog opened this issue Dec 16, 2023 · 14 comments
Open

Iterator performance #770

alexolog opened this issue Dec 16, 2023 · 14 comments

Comments

@alexolog
Copy link

In C++ iterators are presumed to be lightweight objects and are therefore commonly passed by value.

However, the internal implementation of pqxx iterators makes them not so lightweight:

pqxx::const_row_iterator inherits from pqxx::field.
pqxx::field holds a pqxx::result by value
pqxx::result holds 2 std::shared_ptrs.

The problem is that an std::shared_ptr has to increment/decrement its counter atomically, which has a performance hit.

A quick test showed that passing an object that included two std::shared_ptrs by value was about 15 times slower than passing it by reference in a debug build, and 50 times slower in a release build.

Can something be done to make copying iterators faster?

@tt4g
Copy link
Contributor

tt4g commented Dec 16, 2023

I have not compared performance, but perhaps the stream_query_input_iterator (https://github.com/jtv/libpqxx/blob/7.8.1/include/pqxx/internal/stream_query_impl.hxx) used by the Stream API is more lightweight.

See the test for how to use the Stream API, such as tx.stream("SELECT ...").
https://github.com/jtv/libpqxx/blob/7.8.1/test/unit/test_stream_query.cxx

@jtv
Copy link
Owner

jtv commented Dec 16, 2023

Food for thought.

Originally the result & row iterators did not have the shared_ptr. That was a long time ago.

IIRC I added it so that a function like exec1() could return a row (and other similar use-cases).

Perhaps we could break the inheritance relationships between row, field, and these iterators. That way row and field can hold shared_ptr but the iterators would work with regular pointers.

It could be a bit of an undertaking though. And technically it would be a breaking change. We're in the run-up to 8.0, so it's the perfect time for that really.

@alexolog
Copy link
Author

I don't think you need to make it "safer" than normal STL iterators. An iterator into std::vector is basically a pointer under the hood, and will become dangling if the underlying container goes away. That seems like a reasonable approach to take.

@jtv
Copy link
Owner

jtv commented Dec 17, 2023

Indeed. The catch though is that row and field do need the shared_ptrs. And the iterator classes are derived from those classes.

That wasn't a great design choice, but it avoided an overwhelming amount of extra work and duplicated code. I'll have to see whether I can break those inheritance relationships. It looks like a serious amount of work, since the C++ iterator API is pretty broad. It may also break some client code out there.

@jtv
Copy link
Owner

jtv commented Dec 20, 2023

I've been looking into this, and I'm digging up other complications:

  • A result iterator is also a container. This creates conflicts in how it should define some of its nested types!
  • In row and field I still need a full result (which contains a shared_ptr) so that a query function can return a row or field without it becoming an instant dangling reference.
  • Someone may want conversions from a result iterator to a row, and from a row iterator to a field.

So I'm still struggling with this. We may end up with a different division of labour between classes.

@jtv
Copy link
Owner

jtv commented Dec 23, 2023

Yet another complication is that it's not immediately clear when to return a lightweight iterator and when to return a full row or field. It quickly becomes non-obvious, which means it's error-prone, so for now I'm giving up on the idea of removing the shared_ptr from the iterator classes.

I'm not actually sure that's a big problem for iterators, because you don't actually need to copy them very often, so long as you iterate sensibly and pass references where appropriate. Really it's random access with operator[] that's costly. It gets better in C++23 with multi-dimensional operator[], but it's still pretty costly. We can optimise the loop in pqxx::result::for_each(), but if that's what you want you're probably better off streaming the query.

@jtv
Copy link
Owner

jtv commented Dec 29, 2023

I'm forming some answers to the questions. For example, I'm now leaning towards making all iteration and indexing return iterators, not pqxx::row or pqxx::field. If you wanted to keep the result set in memory, you'd have to construct a row or field yourself. Everything would then be fast by default, with the option to get the heavier, "safer" objects.

Unfortunately all the work I did on this is on the laptop that broke today, so I'm kind of stuck for now. :-( One fun little side quest was to replace most code in the reverse_iterator types for result and row with std::reverse_iterator.

@alexolog
Copy link
Author

Will it break backward compatibility?

@jtv
Copy link
Owner

jtv commented Dec 29, 2023

Well yes, of course - the change you asked for breaks compatibility.

@jtv
Copy link
Owner

jtv commented Jan 19, 2024

I've got my laptop back, but unfortunately there was no chance to make a backup. There was a lot of incomplete work on these iterators on there. :-(

Copy link

There has been no activity on this ticket. Consider closing it.

Copy link

There has been no activity on this ticket. Consider closing it.

@jtv
Copy link
Owner

jtv commented Jun 16, 2024

I think this might be a good thing to put on the list for libpqxx 8.0, and combine it with at least the preparation for #846.

@jtv
Copy link
Owner

jtv commented Aug 15, 2024

It's been ages. But I've had some more time to think about it, and I think in libpqxx 8.0 we can go for a much stronger separation between row/field on the one hand, and result::iterator/row::iterator on the other.

The row and field classes will still have a shared_ptr to the result, and so keep it alive in memory, but the iterators will not.

When I experimented with this, I found that it didn't affect a lot of test code. Which I hope is a good sign. And... C++ compilers have advanced "analyzers" now which help find bugs that this might cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants