[EN] Writing a Dynamic Array in C
Edit on October 28th 2023: This article was written on November 28th 2020, almost three years ago. Since then, I have noticed issues with the current implementation of my dynamic C array, as noted by some readers in the comments below. I will probably rewrite a new dynamic array in C some time in the future addressing these issues.
Although C is a very, very popular language, it is also known to be quite tiny: memory is handled manually, and much of what is available in its standard library is a given in all other languages. But C being a low level language also means it lacks a lot of other stuff other popular languages have; for instance, dynamic arrays are present in the library of most popular languages, be it JavaScript, C++, Rust and so on, but C’s simplicity forbids them from being there. If you want it in C, you have to implement it –which is exactly what I did!
Introduction
When I wrote this library, I was mostly inspired by C++’s std::vector
and Rust’s std::vec::Vec
, but my library lacks some features both
have: it’s still a simple one. Here is the list of what it is able to
do:
- Create a dynamic array, with or without an initial capacity specified by the user
- Store a function pointer to the destructor of the elements that will be stored in the vector for when they are destroyed
- Append new elements at the end of the array
- Get elements by position, safely or not, or get the first and last elements in the array
- Get the length of the vector as well as its capacity
- Shrink the size of the allocated array to the size of the vector
- Remove an element at a specific index, or the last element
- Completely destroy the vector and its elements
Elements that will be stored in the vector will need to be dynamically allocated in memory since the vector will not store the elements themselves, but rather pointers to them. This way, we avoid copying data when inserting it to the vector, and handling these elements is also a tad easier. And since we do not know what we will be storing, we will be storing void pointers. The user will be able to cast them to their desired type later on.
Before defining the vector, there are a few things I want to define. First, there is an attribute I will often use with my functions:
#indef NONNULL
# define NONNULL __attribute__((nonnull))
#endif
This will forbid passing to functions marked with this attribute NULL
pointers, because we will use a lot of them.
We will also need to include some headers:
assert.h
- so we can make sure memory is allocated and reallocated correctly
string.h
- for some memory operations such as
memcpy
#include <assert.h>
#include <string.h>
We also need to define a type that will be used as the destructor type. The functions we want to accept as destructors are functions that accept a void pointer to an element and return nothing, hence this definition:
typedef void (*Destructor)(void *element);
Now, onto the structure itself.
The Data Structure of the Vector
With our vector, we will need to keep track a couple of things:
- the size of the vector
- the capacity of the vector
- the destructor
- the array itself
With this, we can describe our structure for the vector:
struct Vector_s {
size_t capacity;
size_t length;
void ** elements;
Destructor destructor;
};
typedef struct Vector_s Vector;
We have now four elements:
elements
- an array of void pointers pointing themselves either to
elements stored in the vector or to nothing (initialized to
NULL
) (note this forbids storingNULL
elements in the vector), length
- the number of elements currently stored in the vector,
capacity
- the size of the allocated memory pointed to by
elements
divided by the size of a void pointer. This gives us the amount of elements that can be stored in the vector without any reallocation at most, destructor
- pointer to the function used to free elements stored in the vector
Now, onto the functions associated with this data structure. They are
all prefixed with vec_
in order to avoid any collisions with other
libraries and functions.
Building Vectors
The first function for building vectors is vec_new()
. Here is its
definition:
Vector *vec_new(Destructor const destructor);
It is quite straightforward: when creating a new, standard vector,
simply pass as its arguments a pointer to the destructor of this
vector, either a NULL
pointer for trivial data types, or a pointer to
an existing function you declared somewhere. Once you do that, you get
yourself a pointer to the newly created vector with which you can now
store elements. Let’s see how it works under the hood:
Vector *vec_new(Destructor const destructor)
{
Vector *self;
self = (Vector *)malloc(sizeof(Vector));
assert(self);
*self = (Vector){.length = 0,
.capacity = VEC_INITIAL_CAPACITY,
.elements = (void *)malloc(sizeof(void *) * VEC_INITIAL_CAPACITY),
.destroy = destructor};
assert(self->elements);
return self;
}
A new pointer is created, which will be the pointer returned to the user. To this pointer, we allocate enough memory to hold a vector. Once that is done, we initialize this new memory buffer with an actual vector, with its members initialized as described above. An assertion is done in order to ensure both the vector but also its storage are correctly allocated.
The second function, vec_with_capacity
, is quite similar though not
the same as vec_new
: it allows for an initialization of
vec_with_capacity
with a user-defined amount of capacity in the
storage of the vector. That is, if vec_with_capacity(14)
is called,
the library will return a pointer to a vector which can contain and
has the size of precisely fourteen elements. That way, if the user
knows they’ll need a certain amount of elements to be stored in a
vector, they’ll be able to reserve that exactly and limit the amount
of reallocations when adding new elements. Its definition is the
following:
Vector *vec_with_capacity(Destructor const destructor, size_t const capacity);
Under the hood, it calls vec_new
, then it will reallocate the memory
already allocated for the member elemements
.
Vector *vec_with_capacity(Destructor const t_destructor,
size_t const t_capacity)
{
Vector *self = vec_new(t_destructor);
free(self->elements);
(*self).elements = (void *)malloc(sizeof(void *) * t_capacity);
assert(self->elements);
(*self).capacity = t_capacity;
return self;
}
Adding Data
The main feature of vectors is to hold data, so let’s make them able to take new data from the user. But first, let me explain a bit how this dynamic array which I call vector works in C.
As you saw earlier, a vector is initialized with a fixed amount of
memory allocated to the vector, so people can store their data in
these arrays. Now, imagine you have an array of four elements, and you
wish to add one more, what to do? You can reallocate your array with
realloc
with one more slot for your element, so now you have an array
for five elements with your four original elements and a free slot for
your fifth. Cool, now you can add new elements as you need them!
Except that if you want to add some tens of thousands of new elements,
you would end up calling some tens of thousands times realloc
, and
that is slow. Seriously, try it, you’ll understand what I mean. And
all these calls to realloc
are an opportunity for it to fail. Let’s
limit calls to this function, OK ? If we end up short on slots in our
current array, let’s actually double the amount of slots in it. So, if
we have a four-slots array, let’s make it an eight-slots array, and
then a sixteen-slots array. And in a couple more calls to realloc
,
we’ll quickly reach our tens of thousands slots array, way faster than
by incrementing its capacity one by one.
“But, we’ll end up with a lot of unused memory if we need just one more element than 216 elements! We don’t need a 217 elements array for 216+1 elements!”
You’re completely right, but that’s a tradeoff. Would you rather have a slow but memory-efficient program, or a fast but memory-hungry software? Plus, as you’ll see later, there is a function to shrink the size of the allocated array down to the actual amount of elements you stored in it, making it possible to temporarily have a 217 elements array, and immediately after shrink it down to 216+1, once you know you won’t be adding any other elements.
With this out of the way, let’s see how to add new elements to our vector. First, let’s declare a static function that reallocates the memory of a vector. Here is its declaration:
static void vec_realloc(Vector *const self) NONNULL;
Its implementation is rather simple: double its capacity, and reallocate its array twice its previous size. Of course, there is an assertion on whether the arrays have been correctly reallocated to ensure memory safety.
void vec_realloc(Vector *const self)
{
self->capacity *= 2;
self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
assert(self->elements);
return;
}
Now, we can proceed to element insertion. Here is the definition of
vec_push
, which adds a new element at the end of the vector:
void *vec_push(Vector *const self, void *const element) NONNULL;
As you can see, it takes as its arguments a pointer to the vector (the
same returned by its constructor) as well as a pointer to the element
to be added to the vector. This is an important point: the vector does
not store elements themselves, only their pointer. If the function
detects there is not enough space for a new element, a call will be
made to vec_realloc
described above. Once the function is done, it
will return a pointer to the newly inserted element.
void *vec_push(Vector *const self, void *const t_element)
{
if (vec_length(self) >= vec_capacity(self)) {
vec_realloc(self);
}
self->elements[(*self).length++] = t_element;
return vec_last(self);
}
And this is it! There may be a function added later that will allow
the insertion of a new value in any valid position between the first
and last position of an array (not counting the unused slots of said
array), and if I implement this it will imply a reimplementation of
vec_push
so that vec_push
relies on this potential new vec_insert
.
Retrieving Data
Two functions are available when retrieving data: vec_safe_at
which
safely retrieves the element at a certain index, and vec_at
, which is
a bit more performant but without the safety of the former. Let’s see
the definition of both:
void *vec_safe_at(Vector const *const self, size_t const index) NONNULL;
void *vec_at(Vector const *const self, size_t const index) NONNULL;
Both have the same arguments: the former is a pointer to the vector we
want to manipulate, and the latter is the index at which we want to
retrieve our data. To see the difference in how both work, let’s first
see the definition of vec_at
:
void *vec_at(Vector const *const self, size_t const index)
{
return self->elements[index];
}
vec_at
is really straightforward and is just syntax sugar around the
vector’s elements
member and will behave exactly like the square
brackets in standard C. However, vec_safe_at
performs some additional
checks as you can see below:
void *vec_safe_at(Vector const *const self, size_t const t_index)
{
return (t_index >= vec_length(self)) ? NULL : vec_at(self, t_index);
}
If the requested index is larger than the furthest index possible, a
NULL
pointer will be returned, otherwise the pointer to the requested
element is. With this function, it is possible to check whether an
element has been returned or not while avoiding a possible segfault or
something similar. It could be used in a loop for instance in order to
check we only have valid elements.
It is also possible to retrieve directly the last element with
vec_last
. Here is its definition:
void *vec_last(Vector const *const self) NONNULL;
Just as the previous functions, its declaration is really straightforward:
void *vec_last(Vector const *const self)
{
return vec_at(self, vec_length(self) - 1);
}
For the sake of the Object-Oriented Programming paradigm, two functions were also declared in order to retrieve some data that could otherwise be easily accessible:
size_t vec_length(Vector const *const self) NONNULL;
size_t vec_capacity(Vector const *const self) NONNULL;
Their implementation is extremely trivial and doesn’t really need any explanation.
size_t vec_length(Vector const *const self)
{
return self->length;
}
size_t vec_capacity(Vector const *const self)
{
return self->capacity;
}
Deleting Data
While this chapter is about destroying data, this first function will
not exactly destroy data, or at least not data we care about:
vec_shrink_to_fit
will reallocate the memory in our vector to make it
so that the member elements
is exactly large enough to store all of
our data with no more space than that. Here is its definition:
void vec_shrink_to_fit(Vector *const self) NONNULL;
There’s nothing too exciting about its implementation: a simple
reallocation exactly the size of the number of elements currently
stored times the size of a void pointer, and we verify with an assert
if it has been correctly reallocated. Nothing is returned.
void vec_shrink_to_fit(Vector *const self)
{
if (self->length <= 0) {
return;
}
self->capacity = self->length;
self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
assert(self->elements);
return;
}
Notice that a check is done to see if the vector exists, because
otherwise calling shrink_to_fit
on an empty vector would result in an
error while asserting the reallocation.
Next, we have two functions: vec_pop_at
and vec_pop
. The latter relies
on the former, which can delete an element at any valid position.
Beware: these functions return nothing and simply deletes the element.
Here is their definition:
void vec_pop_at(Vector *const self, size_t const index) NONNULL;
void vec_pop(Vector *const self) NONNULL;
In order to insure memory safety, a static function is declared in
src/vector.c
which will delete an element if a destructor has been
provided to the vector when it has been built. Its definition is the
following:
static void vec_maybe_delete_element(Vector const *self,
size_t const t_index) NONNULL;
Its implementation is quite simple: if a destructor exists, then the
element at the requested index will be destroyed through this
destructor. Otherwise, nothing is done with the destructor, hence the
name of the function vec_maybe_delete_element
. However, it should be
noted that the element will be freed from memory, so if the user needs
it before popping it, they need to retrieve it with something like
vec_at
and store it elsewhere.
void vec_maybe_delete_element(Vector const *self, size_t const t_index)
{
void *element = vec_at(self, t_index);
if (self->destroy) {
self->destroy(element);
}
free(element);
}
Now that we have this function sorted out, we can implement our pops.
Here is the implementation of vec_pop_at
:
void vec_pop_at(Vector *const t_self, size_t const t_index)
{
if (vec_safe_at(t_self, t_index) == NULL) {
return;
}
vec_maybe_delete_element(t_self, t_index);
if (t_index + 1 < vec_length(t_self)) {
memcpy(vec_at(t_self, t_index), vec_at(t_self, t_index + 1),
sizeof(void *) * (t_self->length - (t_index + 1)));
}
--(*t_self).length;
}
A check is performed at the beginning of the function: that the
element we want to pop actually exists. If it does not, the function
does nothing, otherwise the function deletes the element if needed.
The call to vec_maybe_delete_element
will free the requested element.
Then, a check is performed to see if the requested element was at the
end of the array or not. If it was not, then the elements located
after the destroyed element are shifted one element closer to the
beginning of the array; otherwise, if the requested element was at the
end of the array, nothing is done particularly. Lastly, the count of
elements stored in the vector is decreased by one.
vec_pop
uses the above function in order to provide a simpler call if
we want to delete the last element of the array. We can see how it
relies on vec_pop_at
in its implementation:
void vec_pop(Vector *const self)
{
vec_pop_at(self, vec_length(self));
}
Finally, vec_delete
allows for the complete destruction and
deallocation of a vector, including all of its elements. Here is its
definition:
void vec_delete(Vector *const self) NONNULL;
In its implementation, we can see three distinct steps:
- The deletion of all its elements if a destructor exists
- The deletion of the array of the vector
- The deletion of the vector itself.
void vec_delete(Vector *const self)
{
if (self->destroy) {
for (size_t i = 0; i < vec_length(self); ++i) {
self->destroy(self->elements[i]);
}
}
free(self->elements);
free(self);
}
The Final Source Code
Finally, we can see the whole source code. Here is the header for the
library: vector.h
#ifndef VECTOR_H_
#define VECTOR_H_
#indef NONNULL
# define NONNULL __attribute__((nonnull))
#endif
struct Vector_s {
size_t capacity;
size_t length;
void ** elements;
Destructor destructor;
};
typedef struct Vector_s Vector;
Vector *vec_new(Destructor const destructor);
Vector *vec_with_capacity(Destructor const destructor, size_t const capacity);
void *vec_push(Vector *const self, void *const element) NONNULL;
void *vec_safe_at(Vector const *const self, size_t const index) NONNULL;
void *vec_at(Vector const *const self, size_t const index) NONNULL;
void *vec_last(Vector const *const self) NONNULL;
size_t vec_length(Vector const *const self) NONNULL;
size_t vec_capacity(Vector const *const self) NONNULL;
void vec_shrink_to_fit(Vector *const self) NONNULL;
void vec_pop_at(Vector *const self, size_t const index) NONNULL;
void vec_pop(Vector *const self) NONNULL;
void vec_delete(Vector *const self) NONNULL;
#endif /* VECTOR_H_ */
And here is the implementation file: vector.c
#include "vector.h"
#include <assert.h>
#include <string.h>
static void vec_realloc(Vector *const self) NONNULL;
static void vec_maybe_delete_element(Vector const *self,
size_t const t_index) NONNULL;
Vector *vec_new(Destructor const destructor)
{
Vector *self;
self = (Vector *)malloc(sizeof(Vector));
assert(self);
*self = (Vector){.length = 0,
.capacity = VEC_INITIAL_CAPACITY,
.elements = (void *)malloc(sizeof(void *) * VEC_INITIAL_CAPACITY),
.destroy = destructor};
assert(self->elements);
return self;
}
Vector *vec_with_capacity(Destructor const t_destructor,
size_t const t_capacity)
{
Vector *self = vec_new(t_destructor);
free(self->elements);
(*self).elements = (void *)malloc(sizeof(void *) * t_capacity);
assert(self->elements);
(*self).capacity = t_capacity;
return self;
}
void vec_realloc(Vector *const self)
{
self->capacity *= 2;
self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
assert(self->elements);
return;
}
void *vec_push(Vector *const self, void *const t_element)
{
if (vec_length(self) >= vec_capacity(self)) {
vec_realloc(self);
}
self->elements[(*self).length++] = t_element;
return vec_last(self);
}
void *vec_at(Vector const *const self, size_t const index)
{
return self->elements[index];
}
void *vec_safe_at(Vector const *const self, size_t const t_index)
{
return (t_index >= vec_length(self)) ? NULL : vec_at(self, t_index);
}
void *vec_last(Vector const *const self)
{
return vec_at(self, vec_length(self) - 1);
}
size_t vec_length(Vector const *const self)
{
return self->length;
}
size_t vec_capacity(Vector const *const self)
{
return self->capacity;
}
void vec_shrink_to_fit(Vector *const self)
{
if (self->length <= 0) {
return;
}
self->capacity = self->length;
self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
assert(self->elements);
return;
}
void vec_pop(Vector *const self)
{
vec_pop_at(self, vec_length(self));
}
void vec_maybe_delete_element(Vector const *self, size_t const t_index)
{
void *element = vec_at(self, t_index);
if (self->destroy) {
self->destroy(element);
}
free(element);
}
void vec_pop_at(Vector *const t_self, size_t const t_index)
{
if (vec_safe_at(t_self, t_index) == NULL) {
return;
}
vec_maybe_delete_element(t_self, t_index);
if (t_index + 1 < vec_length(t_self)) {
memcpy(vec_at(t_self, t_index), vec_at(t_self, t_index + 1),
sizeof(void *) * (t_self->length - (t_index + 1)));
}
--(*t_self).length;
}
void vec_pop(Vector *const self)
{
vec_pop_at(self, vec_length(self));
}
void vec_delete(Vector *const self)
{
if (self->destroy) {
for (size_t i = 0; i < vec_length(self); ++i) {
self->destroy(self->elements[i]);
}
}
free(self->elements);
free(self);
}
And with that, we should be good! I used this library in a SOM (Kohonen, 1982) implementation and ran it through valgrind, and there were no memory leaks. If you find one though, don’t hesitate telling me in the comments, through social media such as Twitter, or by email.
Happy programming!