Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ transition, guidelines and todo #3103

Open
pmatilai opened this issue May 17, 2024 · 0 comments
Open

C++ transition, guidelines and todo #3103

pmatilai opened this issue May 17, 2024 · 0 comments
Labels

Comments

@pmatilai
Copy link
Member

pmatilai commented May 17, 2024

For the lack of better place and to allow commenting/questions, putting this here for now. Think of this as a forever work-in-progress doc on the subject.

Rpm being an old-school C project started in the nineties, transitioning it to anything resembling a modern C++ codebase will be a long road. We're doing this in multiple stages over years as time permits, with the initial priority being on enabling use of C++ STL types and algorithms internally throughout the code: stages 1 and 2. We are relative C++ newbies and will be missing tricks and best practises, so questions and suggestions are quite welcome, but, please understand the constraints we're living with:

  • these guidelines are specifically for controlled, gradual transition towards C++, for this particular project. It's not how a new C++ project should be written.
  • C++17 is the C++ standard versions we'll be using initially. We will upgrade the standard as tooling matures, but fussing and buzzing how this or that is done in C++23 is not helpful just now. I expect us to move to C++20 in a couple of years. There are several things to look forward in that.
  • rpm needs to maintain a stable C API and ABI, this places many constraints on what can be done
  • rpm is maintained at essentially release quality at all times, all change has to be gradual under-the-hood uninterrupted service

Stage 1, native allocation

Convert heap-allocated data structures to use new/delete or STL containers instead of malloc/calloc/realloc/free. This is a practical requirement to allow those structs to contain non-trivial C++ types like strings and containers.

Practicalities:

  • things that API user needs to free() cannot be converted, new/delete must always pair up. This may not always not always be obvious at the outset: if we have fooNew() and fooFree() API, its okay to convert, but eg headerExport() returns a blob that must be free()'d, and sometimes the implementation leaks into the API in surprising ways.
  • memset() cannot be used for initialization in C++, always use new {} to ensure similar effect (mytype_s foo {} for local structs)
  • realloc() always calls for an STL container (or string, we realloc those a lot too)
  • DON'T use raw C++ arrays (ie new[]/delete[]), they're just painful

STL notes:

  • watch out when storing data that requires manual freeing in containers (see stage 2)
  • vector is your Swiss army knife for transition work, in particular it exposes its raw data via non-const .data() which can be passed around to unsuspecting C code without modifications to keep changes local and minimal (as long as it's not free()'d)
  • rpmhash is a flexible beast that supports multiple values per key and also set semantics, replacing rpmhash calls for one of unordered_multimap, unordered_map (depending on use), or unordered_set.
  • when passing containers as arguments, prefer references over raw pointers (for safety, but also eg vector[index] becomes annoying otherwise)

Stage 2, native objects

This is basically the flip-side of the stage 1 coin, the objective being to make rpm structs RAII compatible native objects and safe for use in STL containers. Meaning construcors/destructors and copy-control (see rule-of-five)

Practicalities:

  • default-initialize the struct members in the struct declaration itself, this ensures you never end up with uninitialized members no matter what
  • if unsure, mark it as deleted. The compiler will tell you if it needs it, but it will not tell you the default-generated method is unsafe for a given datatype

Stage ??, fancy pants

Look at things like smart pointers. Rpm is full of pointers that we'd prefer automatically freed, and many of those are manually reference counted too. Many of them will also get passed to C which limits our options.

Source (re)naming

We'll rename all the C++ sources to an appropriate extension somewhere around rpm 4.20 beta/rc release, at any rate during 2024. This is lessons learned from similar transition in gcc who only renamed things 10+ years after the fact. Having C++ in files called .c is painful for a whole number or reasons, including

  • build systems will fight you back
  • syntax highlighting will fight you back
  • its just confusing

C++ APIs

Sooner than later we'll need to start adding C++ native APIs to move things forward. Initially these will be internal only to allow us to learn the ropes before heading to open seas.

Some immediate needs that have risen already and have a wide impact internally:

  • macro expansion that accepts and returns STL strings
  • argvSplit() and argvJoin() counterparts, should be templates to work with any container

We need to figure out how and where to put our C++ includes. I guess the public-oriented includes should go to a separate directory (think, include/rpmxx) that we can make public when we're ready to do so. As a first aid we can use the various _internal.h headers.

Any new APIs should be written with RAII in mind. Macros make for a nice example: the macro context is mutex-protected, and each rpmExpand() call is forced to take and release a lock, and anything at all can happen between two expands. The sane way to do this is to acquire a local context object which is allows you to do your business and at the end of scope the variable goes out of scope and so does the lock.

C++ hygiene

  • change C standard library includes to C++ counterparts, eg #include <stdlib.h> to `#include
  • introduce rpm:: namespace to our own stuff
  • figure out what to do with the struct foo_s mess in a way that's compatible with the C API, having rpmThisStruct_s::rpmThisStruct_s() and all gets really ugly and old real fast

Useful references

General stuff

Keep it simple, Sam. We're using C++ mainly for STL convenience, not because OMG OOP! Polymorphism is good when it's called for, but this is not anybody's Design Patterns master thesis. There are at most a handful of places in rpm that call for it. Don't use a C++ feature just because it exists, some of them are just terrible. As the core guidelines shows, C++ is so big it's easier to tell what to do than what not to. I expect us to develop some more detailed guidelines over time.

auto is a sanity-saver with verbose types coming from STL, but doesn't make sense with something like int - don't overuse it (reminder to self too).

@pmatilai pmatilai added the INFO label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Todo
Development

No branches or pull requests

1 participant