[EN] Writing a Dynamic Array in C // Phundrak’s rambling

Edit on October 28th 2023: This article was written on November 28th 2020, almost three years ago. Since then, I have noticed issues with the current implementation of my dynamic C array, as noted by some readers in the comments below. I will probably rewrite a new dynamic array in C some time in the future addressing these issues.

Although C is a very, very popular language, it is also known to be quite tiny: memory is handled manually, and much of what is available in its standard library is a given in all other languages. But C being a low level language also means it lacks a lot of other stuff other popular languages have; for instance, dynamic arrays are present in the library of most popular languages, be it JavaScript, C++, Rust and so on, but C’s simplicity forbids them from being there. If you want it in C, you have to implement it –which is exactly what I did!

Introduction
The Data Structure of the Vector
Building Vectors
Adding Data
Retrieving Data
Deleting Data
The Final Source Code

Introduction

When I wrote this library, I was mostly inspired by C++’s std::vector and Rust’s std::vec::Vec, but my library lacks some features both have: it’s still a simple one. Here is the list of what it is able to do:

Create a dynamic array, with or without an initial capacity specified by the user
Store a function pointer to the destructor of the elements that will be stored in the vector for when they are destroyed
Append new elements at the end of the array
Get elements by position, safely or not, or get the first and last elements in the array
Get the length of the vector as well as its capacity
Shrink the size of the allocated array to the size of the vector
Remove an element at a specific index, or the last element
Completely destroy the vector and its elements

Elements that will be stored in the vector will need to be dynamically allocated in memory since the vector will not store the elements themselves, but rather pointers to them. This way, we avoid copying data when inserting it to the vector, and handling these elements is also a tad easier. And since we do not know what we will be storing, we will be storing void pointers. The user will be able to cast them to their desired type later on.

Before defining the vector, there are a few things I want to define. First, there is an attribute I will often use with my functions:

#indef NONNULL
# define NONNULL __attribute__((nonnull))
#endif

This will forbid passing to functions marked with this attribute NULL pointers, because we will use a lot of them.

We will also need to include some headers:

assert.h: so we can make sure memory is allocated and reallocated correctly
string.h: for some memory operations such as memcpy

#include <assert.h>
#include <string.h>

We also need to define a type that will be used as the destructor type. The functions we want to accept as destructors are functions that accept a void pointer to an element and return nothing, hence this definition:

typedef void (*Destructor)(void *element);

Now, onto the structure itself.

The Data Structure of the Vector

With our vector, we will need to keep track a couple of things:

the size of the vector
the capacity of the vector
the destructor
the array itself

With this, we can describe our structure for the vector:

struct Vector_s {
  size_t     capacity;
  size_t     length;
  void **    elements;
  Destructor destructor;
};
typedef struct Vector_s Vector;

We have now four elements:

elements: an array of void pointers pointing themselves either to elements stored in the vector or to nothing (initialized to NULL) (note this forbids storing NULL elements in the vector),
length: the number of elements currently stored in the vector,
capacity: the size of the allocated memory pointed to by elements divided by the size of a void pointer. This gives us the amount of elements that can be stored in the vector without any reallocation at most,
destructor: pointer to the function used to free elements stored in the vector

Now, onto the functions associated with this data structure. They are all prefixed with vec_ in order to avoid any collisions with other libraries and functions.

Building Vectors

The first function for building vectors is vec_new(). Here is its definition:

Vector *vec_new(Destructor const destructor);

It is quite straightforward: when creating a new, standard vector, simply pass as its arguments a pointer to the destructor of this vector, either a NULL pointer for trivial data types, or a pointer to an existing function you declared somewhere. Once you do that, you get yourself a pointer to the newly created vector with which you can now store elements. Let’s see how it works under the hood:

Vector *vec_new(Destructor const destructor)
{
  Vector *self;
  self = (Vector *)malloc(sizeof(Vector));
  assert(self);
  *self = (Vector){.length   = 0,
                   .capacity = VEC_INITIAL_CAPACITY,
                   .elements = (void *)malloc(sizeof(void *) * VEC_INITIAL_CAPACITY),
                   .destroy  = destructor};
  assert(self->elements);
  return self;
}

A new pointer is created, which will be the pointer returned to the user. To this pointer, we allocate enough memory to hold a vector. Once that is done, we initialize this new memory buffer with an actual vector, with its members initialized as described above. An assertion is done in order to ensure both the vector but also its storage are correctly allocated.

The second function, vec_with_capacity, is quite similar though not the same as vec_new: it allows for an initialization of vec_with_capacity with a user-defined amount of capacity in the storage of the vector. That is, if vec_with_capacity(14) is called, the library will return a pointer to a vector which can contain and has the size of precisely fourteen elements. That way, if the user knows they’ll need a certain amount of elements to be stored in a vector, they’ll be able to reserve that exactly and limit the amount of reallocations when adding new elements. Its definition is the following:

Vector *vec_with_capacity(Destructor const destructor, size_t const capacity);

Under the hood, it calls vec_new, then it will reallocate the memory already allocated for the member elemements.

Vector *vec_with_capacity(Destructor const t_destructor,
                          size_t const     t_capacity)
{
  Vector *self = vec_new(t_destructor);
  free(self->elements);
  (*self).elements = (void *)malloc(sizeof(void *) * t_capacity);
  assert(self->elements);
  (*self).capacity = t_capacity;
  return self;
}

Adding Data

The main feature of vectors is to hold data, so let’s make them able to take new data from the user. But first, let me explain a bit how this dynamic array which I call vector works in C.

As you saw earlier, a vector is initialized with a fixed amount of memory allocated to the vector, so people can store their data in these arrays. Now, imagine you have an array of four elements, and you wish to add one more, what to do? You can reallocate your array with realloc with one more slot for your element, so now you have an array for five elements with your four original elements and a free slot for your fifth. Cool, now you can add new elements as you need them!

Except that if you want to add some tens of thousands of new elements, you would end up calling some tens of thousands times realloc, and that is slow. Seriously, try it, you’ll understand what I mean. And all these calls to realloc are an opportunity for it to fail. Let’s limit calls to this function, OK ? If we end up short on slots in our current array, let’s actually double the amount of slots in it. So, if we have a four-slots array, let’s make it an eight-slots array, and then a sixteen-slots array. And in a couple more calls to realloc, we’ll quickly reach our tens of thousands slots array, way faster than by incrementing its capacity one by one.

“But, we’ll end up with a lot of unused memory if we need just one more element than 2¹⁶ elements! We don’t need a 2¹⁷ elements array for 2¹⁶+1 elements!”

You’re completely right, but that’s a tradeoff. Would you rather have a slow but memory-efficient program, or a fast but memory-hungry software? Plus, as you’ll see later, there is a function to shrink the size of the allocated array down to the actual amount of elements you stored in it, making it possible to temporarily have a 2¹⁷ elements array, and immediately after shrink it down to 2¹⁶+1, once you know you won’t be adding any other elements.

With this out of the way, let’s see how to add new elements to our vector. First, let’s declare a static function that reallocates the memory of a vector. Here is its declaration:

static void vec_realloc(Vector *const self) NONNULL;

Its implementation is rather simple: double its capacity, and reallocate its array twice its previous size. Of course, there is an assertion on whether the arrays have been correctly reallocated to ensure memory safety.

void vec_realloc(Vector *const self)
{
  self->capacity *= 2;
  self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
  assert(self->elements);
  return;
}

Now, we can proceed to element insertion. Here is the definition of vec_push, which adds a new element at the end of the vector:

void   *vec_push(Vector *const self, void *const element) NONNULL;

As you can see, it takes as its arguments a pointer to the vector (the same returned by its constructor) as well as a pointer to the element to be added to the vector. This is an important point: the vector does not store elements themselves, only their pointer. If the function detects there is not enough space for a new element, a call will be made to vec_realloc described above. Once the function is done, it will return a pointer to the newly inserted element.

void *vec_push(Vector *const self, void *const t_element)
{
  if (vec_length(self) >= vec_capacity(self)) {
    vec_realloc(self);
  }
  self->elements[(*self).length++] = t_element;
  return vec_last(self);
}

And this is it! There may be a function added later that will allow the insertion of a new value in any valid position between the first and last position of an array (not counting the unused slots of said array), and if I implement this it will imply a reimplementation of vec_push so that vec_push relies on this potential new vec_insert.

Retrieving Data

Two functions are available when retrieving data: vec_safe_at which safely retrieves the element at a certain index, and vec_at, which is a bit more performant but without the safety of the former. Let’s see the definition of both:

void   *vec_safe_at(Vector const *const self, size_t const index) NONNULL;
void   *vec_at(Vector const *const self, size_t const index) NONNULL;

Both have the same arguments: the former is a pointer to the vector we want to manipulate, and the latter is the index at which we want to retrieve our data. To see the difference in how both work, let’s first see the definition of vec_at:

void *vec_at(Vector const *const self, size_t const index)
{
  return self->elements[index];
}

vec_at is really straightforward and is just syntax sugar around the vector’s elements member and will behave exactly like the square brackets in standard C. However, vec_safe_at performs some additional checks as you can see below:

void *vec_safe_at(Vector const *const self, size_t const t_index)
{
  return (t_index >= vec_length(self)) ? NULL : vec_at(self, t_index);
}

If the requested index is larger than the furthest index possible, a NULL pointer will be returned, otherwise the pointer to the requested element is. With this function, it is possible to check whether an element has been returned or not while avoiding a possible segfault or something similar. It could be used in a loop for instance in order to check we only have valid elements.

It is also possible to retrieve directly the last element with vec_last. Here is its definition:

void   *vec_last(Vector const *const self) NONNULL;

Just as the previous functions, its declaration is really straightforward:

void *vec_last(Vector const *const self)
{
  return vec_at(self, vec_length(self) - 1);
}

For the sake of the Object-Oriented Programming paradigm, two functions were also declared in order to retrieve some data that could otherwise be easily accessible:

size_t  vec_length(Vector const *const self) NONNULL;
size_t  vec_capacity(Vector const *const self) NONNULL;

Their implementation is extremely trivial and doesn’t really need any explanation.

size_t vec_length(Vector const *const self)
{
  return self->length;
}

size_t vec_capacity(Vector const *const self)
{
  return self->capacity;
}

Deleting Data

While this chapter is about destroying data, this first function will not exactly destroy data, or at least not data we care about: vec_shrink_to_fit will reallocate the memory in our vector to make it so that the member elements is exactly large enough to store all of our data with no more space than that. Here is its definition:

void    vec_shrink_to_fit(Vector *const self) NONNULL;

There’s nothing too exciting about its implementation: a simple reallocation exactly the size of the number of elements currently stored times the size of a void pointer, and we verify with an assert if it has been correctly reallocated. Nothing is returned.

void vec_shrink_to_fit(Vector *const self)
{
  if (self->length <= 0) {
    return;
  }
  self->capacity = self->length;
  self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
  assert(self->elements);
  return;
}

Notice that a check is done to see if the vector exists, because otherwise calling shrink_to_fit on an empty vector would result in an error while asserting the reallocation.

Next, we have two functions: vec_pop_at and vec_pop. The latter relies on the former, which can delete an element at any valid position. Beware: these functions return nothing and simply deletes the element. Here is their definition:

void    vec_pop_at(Vector *const self, size_t const index) NONNULL;
void    vec_pop(Vector *const self) NONNULL;

In order to insure memory safety, a static function is declared in src/vector.c which will delete an element if a destructor has been provided to the vector when it has been built. Its definition is the following:

static void vec_maybe_delete_element(Vector const *self,
                                     size_t const  t_index) NONNULL;

Its implementation is quite simple: if a destructor exists, then the element at the requested index will be destroyed through this destructor. Otherwise, nothing is done with the destructor, hence the name of the function vec_maybe_delete_element. However, it should be noted that the element will be freed from memory, so if the user needs it before popping it, they need to retrieve it with something like vec_at and store it elsewhere.

void vec_maybe_delete_element(Vector const *self, size_t const t_index)
{
  void *element = vec_at(self, t_index);
  if (self->destroy) {
    self->destroy(element);
  }
  free(element);
}

Now that we have this function sorted out, we can implement our pops. Here is the implementation of vec_pop_at:

void vec_pop_at(Vector *const t_self, size_t const t_index)
{
  if (vec_safe_at(t_self, t_index) == NULL) {
    return;
  }
  vec_maybe_delete_element(t_self, t_index);
  if (t_index + 1 < vec_length(t_self)) {
    memcpy(vec_at(t_self, t_index), vec_at(t_self, t_index + 1),
           sizeof(void *) * (t_self->length - (t_index + 1)));
  }
  --(*t_self).length;
}

A check is performed at the beginning of the function: that the element we want to pop actually exists. If it does not, the function does nothing, otherwise the function deletes the element if needed. The call to vec_maybe_delete_element will free the requested element. Then, a check is performed to see if the requested element was at the end of the array or not. If it was not, then the elements located after the destroyed element are shifted one element closer to the beginning of the array; otherwise, if the requested element was at the end of the array, nothing is done particularly. Lastly, the count of elements stored in the vector is decreased by one.

vec_pop uses the above function in order to provide a simpler call if we want to delete the last element of the array. We can see how it relies on vec_pop_at in its implementation:

void vec_pop(Vector *const self)
{
  vec_pop_at(self, vec_length(self));
}

Finally, vec_delete allows for the complete destruction and deallocation of a vector, including all of its elements. Here is its definition:

void    vec_delete(Vector *const self) NONNULL;

In its implementation, we can see three distinct steps:

The deletion of all its elements if a destructor exists
The deletion of the array of the vector
The deletion of the vector itself.

void vec_delete(Vector *const self)
{
  if (self->destroy) {
    for (size_t i = 0; i < vec_length(self); ++i) {
      self->destroy(self->elements[i]);
    }
  }
  free(self->elements);
  free(self);
}

The Final Source Code

Finally, we can see the whole source code. Here is the header for the library: vector.h

#ifndef VECTOR_H_
#define VECTOR_H_

#indef NONNULL
# define NONNULL __attribute__((nonnull))
#endif

struct Vector_s {
  size_t     capacity;
  size_t     length;
  void **    elements;
  Destructor destructor;
};
typedef struct Vector_s Vector;

Vector *vec_new(Destructor const destructor);
Vector *vec_with_capacity(Destructor const destructor, size_t const capacity);
void   *vec_push(Vector *const self, void *const element) NONNULL;
void   *vec_safe_at(Vector const *const self, size_t const index) NONNULL;
void   *vec_at(Vector const *const self, size_t const index) NONNULL;
void   *vec_last(Vector const *const self) NONNULL;
size_t  vec_length(Vector const *const self) NONNULL;
size_t  vec_capacity(Vector const *const self) NONNULL;
void    vec_shrink_to_fit(Vector *const self) NONNULL;
void    vec_pop_at(Vector *const self, size_t const index) NONNULL;
void    vec_pop(Vector *const self) NONNULL;
void    vec_delete(Vector *const self) NONNULL;

#endif /* VECTOR_H_ */

And here is the implementation file: vector.c

#include "vector.h"

#include <assert.h>
#include <string.h>

static void vec_realloc(Vector *const self) NONNULL;
static void vec_maybe_delete_element(Vector const *self,
                                     size_t const  t_index) NONNULL;

Vector *vec_new(Destructor const destructor)
{
  Vector *self;
  self = (Vector *)malloc(sizeof(Vector));
  assert(self);
  *self = (Vector){.length   = 0,
                   .capacity = VEC_INITIAL_CAPACITY,
                   .elements = (void *)malloc(sizeof(void *) * VEC_INITIAL_CAPACITY),
                   .destroy  = destructor};
  assert(self->elements);
  return self;
}

Vector *vec_with_capacity(Destructor const t_destructor,
                          size_t const     t_capacity)
{
  Vector *self = vec_new(t_destructor);
  free(self->elements);
  (*self).elements = (void *)malloc(sizeof(void *) * t_capacity);
  assert(self->elements);
  (*self).capacity = t_capacity;
  return self;
}

void vec_realloc(Vector *const self)
{
  self->capacity *= 2;
  self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
  assert(self->elements);
  return;
}

void *vec_push(Vector *const self, void *const t_element)
{
  if (vec_length(self) >= vec_capacity(self)) {
    vec_realloc(self);
  }
  self->elements[(*self).length++] = t_element;
  return vec_last(self);
}

void *vec_at(Vector const *const self, size_t const index)
{
  return self->elements[index];
}

void *vec_safe_at(Vector const *const self, size_t const t_index)
{
  return (t_index >= vec_length(self)) ? NULL : vec_at(self, t_index);
}

void *vec_last(Vector const *const self)
{
  return vec_at(self, vec_length(self) - 1);
}

size_t vec_length(Vector const *const self)
{
  return self->length;
}

size_t vec_capacity(Vector const *const self)
{
  return self->capacity;
}

void vec_shrink_to_fit(Vector *const self)
{
  if (self->length <= 0) {
    return;
  }
  self->capacity = self->length;
  self->elements = realloc(self->elements, sizeof(void *) * vec_capacity(self));
  assert(self->elements);
  return;
}

void vec_pop(Vector *const self)
{
  vec_pop_at(self, vec_length(self));
}

void vec_maybe_delete_element(Vector const *self, size_t const t_index)
{
  void *element = vec_at(self, t_index);
  if (self->destroy) {
    self->destroy(element);
  }
  free(element);
}

void vec_pop_at(Vector *const t_self, size_t const t_index)
{
  if (vec_safe_at(t_self, t_index) == NULL) {
    return;
  }
  vec_maybe_delete_element(t_self, t_index);
  if (t_index + 1 < vec_length(t_self)) {
    memcpy(vec_at(t_self, t_index), vec_at(t_self, t_index + 1),
           sizeof(void *) * (t_self->length - (t_index + 1)));
  }
  --(*t_self).length;
}

void vec_pop(Vector *const self)
{
  vec_pop_at(self, vec_length(self));
}

void vec_delete(Vector *const self)
{
  if (self->destroy) {
    for (size_t i = 0; i < vec_length(self); ++i) {
      self->destroy(self->elements[i]);
    }
  }
  free(self->elements);
  free(self);
}

And with that, we should be good! I used this library in a SOM (Kohonen, 1982) implementation and ran it through valgrind, and there were no memory leaks. If you find one though, don’t hesitate telling me in the comments, through social media such as Twitter, or by email.

Happy programming!