You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For the lack of better place and to allow commenting/questions, putting this here for now. Think of this as a forever work-in-progress doc on the subject.
Rpm being an old-school C project started in the nineties, transitioning it to anything resembling a modern C++ codebase will be a long road. We're doing this in multiple stages over years as time permits, with the initial priority being on enabling use of C++ STL types and algorithms internally throughout the code: stages 1 and 2. We are relative C++ newbies and will be missing tricks and best practises, so questions and suggestions are quite welcome, but, please understand the constraints we're living with:
these guidelines are specifically for controlled, gradual transition towards C++, for this particular project. It's not how a new C++ project should be written.
C++17 is the C++ standard versions we'll be using initially. We will upgrade the standard as tooling matures, but fussing and buzzing how this or that is done in C++23 is not helpful just now. I expect us to move to C++20 in a couple of years. There are several things to look forward in that.
rpm needs to maintain a stable C API and ABI, this places many constraints on what can be done
rpm is maintained at essentially release quality at all times, all change has to be gradual under-the-hood uninterrupted service
Stage 1, native allocation
Convert heap-allocated data structures to use new/delete or STL containers instead of malloc/calloc/realloc/free. This is a practical requirement to allow those structs to contain non-trivial C++ types like strings and containers.
Practicalities:
things that API user needs to free() cannot be converted, new/delete must always pair up. This may not always not always be obvious at the outset: if we have fooNew() and fooFree() API, its okay to convert, but eg headerExport() returns a blob that must be free()'d, and sometimes the implementation leaks into the API in surprising ways.
memset() cannot be used for initialization in C++, always use new {} to ensure similar effect (mytype_s foo {} for local structs)
realloc() always calls for an STL container (or string, we realloc those a lot too)
DON'T use raw C++ arrays (ie new[]/delete[]), they're just painful
STL notes:
watch out when storing data that requires manual freeing in containers (see stage 2)
vector is your Swiss army knife for transition work, in particular it exposes its raw data via non-const .data() which can be passed around to unsuspecting C code without modifications to keep changes local and minimal (as long as it's not free()'d)
rpmhash is a flexible beast that supports multiple values per key and also set semantics, replacing rpmhash calls for one of unordered_multimap, unordered_map (depending on use), or unordered_set.
when passing containers as arguments, prefer references over raw pointers (for safety, but also eg vector[index] becomes annoying otherwise)
Stage 2, native objects
This is basically the flip-side of the stage 1 coin, the objective being to make rpm structs RAII compatible native objects and safe for use in STL containers. Meaning construcors/destructors and copy-control (see rule-of-five)
Practicalities:
default-initialize the struct members in the struct declaration itself, this ensures you never end up with uninitialized members no matter what
if unsure, mark it as deleted. The compiler will tell you if it needs it, but it will not tell you the default-generated method is unsafe for a given datatype
Stage ??, fancy pants
Look at things like smart pointers. Rpm is full of pointers that we'd prefer automatically freed, and many of those are manually reference counted too. Many of them will also get passed to C which limits our options.
Source (re)naming
We'll rename all the C++ sources to an appropriate extension somewhere around rpm 4.20 beta/rc release, at any rate during 2024. This is lessons learned from similar transition in gcc who only renamed things 10+ years after the fact. Having C++ in files called .c is painful for a whole number or reasons, including
build systems will fight you back
syntax highlighting will fight you back
its just confusing
C++ APIs
Sooner than later we'll need to start adding C++ native APIs to move things forward. Initially these will be internal only to allow us to learn the ropes before heading to open seas.
Some immediate needs that have risen already and have a wide impact internally:
macro expansion that accepts and returns STL strings
argvSplit() and argvJoin() counterparts, should be templates to work with any container
We need to figure out how and where to put our C++ includes. I guess the public-oriented includes should go to a separate directory (think, include/rpmxx) that we can make public when we're ready to do so. As a first aid we can use the various _internal.h headers.
Any new APIs should be written with RAII in mind. Macros make for a nice example: the macro context is mutex-protected, and each rpmExpand() call is forced to take and release a lock, and anything at all can happen between two expands. The sane way to do this is to acquire a local context object which is allows you to do your business and at the end of scope the variable goes out of scope and so does the lock.
C++ hygiene
change C standard library includes to C++ counterparts, eg #include <stdlib.h> to `#include
introduce rpm:: namespace to our own stuff
figure out what to do with the struct foo_s mess in a way that's compatible with the C API, having rpmThisStruct_s::rpmThisStruct_s() and all gets really ugly and old real fast
Keep it simple, Sam. We're using C++ mainly for STL convenience, not because OMG OOP! Polymorphism is good when it's called for, but this is not anybody's Design Patterns master thesis. There are at most a handful of places in rpm that call for it. Don't use a C++ feature just because it exists, some of them are just terrible. As the core guidelines shows, C++ is so big it's easier to tell what to do than what not to. I expect us to develop some more detailed guidelines over time.
auto is a sanity-saver with verbose types coming from STL, but doesn't make sense with something like int - don't overuse it (reminder to self too).
The text was updated successfully, but these errors were encountered:
For the lack of better place and to allow commenting/questions, putting this here for now. Think of this as a forever work-in-progress doc on the subject.
Rpm being an old-school C project started in the nineties, transitioning it to anything resembling a modern C++ codebase will be a long road. We're doing this in multiple stages over years as time permits, with the initial priority being on enabling use of C++ STL types and algorithms internally throughout the code: stages 1 and 2. We are relative C++ newbies and will be missing tricks and best practises, so questions and suggestions are quite welcome, but, please understand the constraints we're living with:
Stage 1, native allocation
Convert heap-allocated data structures to use new/delete or STL containers instead of malloc/calloc/realloc/free. This is a practical requirement to allow those structs to contain non-trivial C++ types like strings and containers.
Practicalities:
new {}
to ensure similar effect (mytype_s foo {}
for local structs)STL notes:
Stage 2, native objects
This is basically the flip-side of the stage 1 coin, the objective being to make rpm structs RAII compatible native objects and safe for use in STL containers. Meaning construcors/destructors and copy-control (see rule-of-five)
Practicalities:
Stage ??, fancy pants
Look at things like smart pointers. Rpm is full of pointers that we'd prefer automatically freed, and many of those are manually reference counted too. Many of them will also get passed to C which limits our options.
Source (re)naming
We'll rename all the C++ sources to an appropriate extension somewhere around rpm 4.20 beta/rc release, at any rate during 2024. This is lessons learned from similar transition in gcc who only renamed things 10+ years after the fact. Having C++ in files called .c is painful for a whole number or reasons, including
C++ APIs
Sooner than later we'll need to start adding C++ native APIs to move things forward. Initially these will be internal only to allow us to learn the ropes before heading to open seas.
Some immediate needs that have risen already and have a wide impact internally:
We need to figure out how and where to put our C++ includes. I guess the public-oriented includes should go to a separate directory (think, include/rpmxx) that we can make public when we're ready to do so. As a first aid we can use the various _internal.h headers.
Any new APIs should be written with RAII in mind. Macros make for a nice example: the macro context is mutex-protected, and each rpmExpand() call is forced to take and release a lock, and anything at all can happen between two expands. The sane way to do this is to acquire a local context object which is allows you to do your business and at the end of scope the variable goes out of scope and so does the lock.
C++ hygiene
#include <stdlib.h>
to `#includeUseful references
General stuff
Keep it simple, Sam. We're using C++ mainly for STL convenience, not because OMG OOP! Polymorphism is good when it's called for, but this is not anybody's Design Patterns master thesis. There are at most a handful of places in rpm that call for it. Don't use a C++ feature just because it exists, some of them are just terrible. As the core guidelines shows, C++ is so big it's easier to tell what to do than what not to. I expect us to develop some more detailed guidelines over time.
auto
is a sanity-saver with verbose types coming from STL, but doesn't make sense with something likeint
- don't overuse it (reminder to self too).The text was updated successfully, but these errors were encountered: