Waiter! There's a VLA in my C!
Real world ayekat has chosen to pass on some of his knowledge to the microtechnics and electricity people on his daily living site. Mostly this consists of annoying some poor students with elaborated in-depth facts about the beauty of Unix and the C programming language.
While it might sound like a rather doable job, one never ceases to learn new things. Or as for this matter: one never ceases to encounter the ugly sides in the things one so loves and cherishes.
In my case it was a simple sequence of code that I thought would never compile — and the moment it did, I feared I would not be on good terms with it:
int n;
scanf("%d", &n);
int array[n];
-Wall
!
-Wextra
! -pedantic
! man gcc
! -W
HowCanThisNotThrowAWarning
Variable Length Arrays
What is happening? Obviously the array is allocated at runtime, yet it doesn't
seem to require a manual free
afterwards. And indeed, a quick test program
confirmed my assumption: it is allocated on the stack.
That doesn't seem to be that much of black magick. After all, alloca
—
while non-standard — is also used to allocate memory on the stack, and
apparently it is just C99 notation for
int n, *array;
scanf("%d", &n);
array = alloca(n * sizeof(int));
So why do I react so allergic to those VLAs?
First, I do program in C99, but I tend to put all variable declarations at the top of the functions. That may be a relic from the C89 times, but for me, this also makes it easier to comprehend what is happening on the stack; I don't like random variable declarations popping up in the middle of the code (and to all people arguing about keeping variable declarations as local as possible: they are local to the function — if you argue with "but for longer functions", modularise your bloody code).
VLAs mess around with my mental picture of what's happening, because a variable
declaration depends on executed code (the alloca
way makes it clearer, while
keeping the same functionality).
As for a technical reason: again, it is allocated on the stack, and the stack has a limited size, and all you can do is sit there and watch and pray… and subsequently enter Damnation:
RETURN VALUE
The alloca() function returns a pointer to the beginning of the allocated space. If the allocation causes stack overflow, program behavior is undefined.
Undefined.
Try entering 192837465647382910
to the above program and watch the World burn.
This is no longer some non-standard, unsafe technique — it is now some
standard, unsafe technique. And to make matters worse, the abovementioned
students use a two-dimensional VLA to store image data on the stack.
What could possibly go wrong…
The Rabbit Hole
As stated above, the students use two-dimensional arrays. So while messing around with those VLAs, I stumbled over another peculiarity in the C language.
void do_something(int width, int height, int **array);
// ...
int array[w][h];
do_something(w, h, array);
Now let's see what interesting things gcc
has to say about this:
warning: passing argument 3 of ‘do_something’ from incompatible pointer type [-Wincompatible-pointer-types]
do_something(w, h, array);
^
note: expected ‘int **’ but argument is of type ‘int (*)[(sizetype)(a)]’
void do_something(int width, int height, int **array);
^
So apparently a 2D array is not the same thing as **
, fine — but what
in the name of the Seven Hells is an int (*)[(sizetype)(a)]
?
Multi-Dimensional Arrays
This might seem pretty logic, but I've nevertheless managed to write programs in
C for about 8 years without paying proper attention to the subtle differences
between pointers and arrays; I knew that if they were statically allocated, the
program would handle them differently for sizeof
, yet I failed to grasp the
exact reason for that — and subsequently also what so radically
separates multi-dimensional arrays from pointers.
Yet, if we look at how they are stored in memory, it makes perfect sense: a 2D array is not simply a 1D array of 1D arrays, but rather like one big 1D array that has been cut into n pieces. We can't just put classical "double-pointers" there (think about pointer arithmetics violently blowing up and cities burning and people dying and the air stinking and everything being really, really bad).
So how can we reference a 2D array?
int a[3][4];
int **p;
p = a; // OK
// ... or no, wait - no, actually: BOOM!
Not like this.
Let us have another look at that bizarre int (*)[(sizetype)(a)]
. It resembles
a function pointer, yet that would make no sense. And it isn't, anyway.
It is simply a notation. The (*)
indicates that it is a pointer to something,
and the brackets to the right indicate that the "something" is an array of size
a
. Consequently, a 2D array is like a 1D array of some "units", where each
unit has a size of a
. And sizetype
is simply gcc
's way of telling us that
it expects a type that can be used to indicate a size — might be an int
,
a size_t
, or something alike.
So we would need to write this instead:
int a[3][4];
int (*p)[4];
p = a; // yay!
a[3][4]
is a array of three 4-element units. (*p)[4]
is a pointer to the
first element in an array of 4-element units (not an array of arrays) and the
notation suddenly starts to make "sense", or whatever you would call that for
some weird C notation.
Equally, a function would need to be declared like this:
void do_something(int width, int height, int (*array)[height]);
// or with some "sugar":
void do_something(int width, int height, int array[][height]);
// or with even MORE "sugar":
void do_something(int width, int height, int array[width][height]);
Science! Source: Tech Talk About C99 — What Are Variable Length Arrays?
read more
2019
2015
- Waiter! There's a VLA in my C!
- Anarcho DHCP
- QEMU/networkd/nftables
- QEMU/networkd
- Dynamic Waltz