A different memory pool in Rust

Written on 2024-06-16

When discussing memory pools with a colleague of mine who writes Rust full time some time back, I was surprised that he had an approach to them that was fairly different from mine. His natural approach was that the user would create objects and move them into the pool, rather than allocating objects from the pool itself. This design was intriguing to me as it is not suitable for most of the use cases for which I would typically reach for a memory pool, which I will discuss later. This started to make a lot more sense to me after I looked for Rust memory pool crates which used my preferred design to show him what I had in mind: they essentially didn’t exist[1]! Since he had never had a need for memory pools himself, he was conditioned by the APIs he had seen when looking into those out of curiosity. After that, I started to wonder why Rust memory pools all look like that: is my preferred design too unergonomic in Rust, or maybe it involves more unsafe than Rust programmers are comfortable with? To find out, I set out to write one for myself and see what would happen.

Memory pool designs

First off, I should clarify that what I mean by memory pool is a data structure that allows for low-latency and low-jitter object allocation and deallocation. It is not a general purpose memory allocator, and it allows for independent per-object lifetimes, as long as those are shorter than the pool’s.

Here is what I want out of a memory pool:

Cheap and consistent allocation and deallocation: they should cost a few arithmetic operations and touch two cache lines at most.
Object reuse:
- Objects should (only) be constructed once at initialization time. In particular, the user shoud not move anything into the pool (moves are often cheap but not free).
- No constructor or destructors should be called at runtime.
Since we reuse objects but do not construct/destruct them a runtime, we need a way to reset them. This should be configurable by the user so that nested allocations can also be reused.

Implementing my memory pool

The plan

The properties I wanted from my design were, in order:

Fulfill the requirements I outlined earlier.
Use strict constant time for all operations.
Have good cache locality.
Limit usage of unsafe.
Remain as simple as possible given the previous constraints.

I decided to go with 2 arrays of fixed size: one holding the objects, the other used as a stack, holding the indices of the free objects. To allocate an object, one pops an index and returns a reference to the associated object. To free an object, one pushes the index back onto the stack and resets the object.

Naïve implementation

Here is a naïve implementation, sans the reset:

// We require `T` to be `Default` for convenience, but we could also ask the
// user to provide initial values instead.
struct Pool<T: Default> {
    // `Vec` is more powerful than what we need and slightly wasteful, but works
    // for our purpose.
    data: Vec<T>,
    free: Vec<usize>,
}

impl<T: Default> Pool<T> {
    pub fn new() -> Self {
        Self {
            data: (0..1024).map(|_| T::default()).collect(),
            free: (0..1024).collect(),
        }
    }

    pub fn allocate<'a>(&mut self) -> Option<&'a mut T> {
        self.free
            .pop()
            .map(|i| unsafe { self.data.as_mut_ptr().add(i).as_mut().unwrap() })
    }

    pub unsafe fn deallocate<'a>(&mut self, reference: &T) {
        let ptr = std::ptr::from_ref(reference);
        let element_index = unsafe { ptr.offset_from(self.data.as_ptr()) };
        self.free.push(element_index as usize);
    }
}

Note that deallocate is unsafe because it accepts any reference, even if it was not allocated via the pool. We could make it safe by adding runtime checks to check the reference’s provenance.

OwnedRef

The code above is nice and short (if incomplete), except that it is also… unsound. Unsurprisingly, the issue has to do with our usage of unsafe. Perhaps more suprisingly is the fact that the bug does not really live within the unsafe block in allocate, but rather in the safe deallocate function. What happens is that we hand out manually crafter references in allocate, but we do not take them back in deallocate, we merely use them. Rust ownership system rightfully lets many functions take the same reference, because references are not linear types. Note that this would not be different if deallocate took a mutable/exclusive reference: multiple shared borrows are fine as long as they don’t overlap.

More concretely, what is going on is that after we deallocate an object, the caller still has a reference to it, which it may use. This fine until the object gets reused and handed out again as another mutable reference, which violates Rust’s aliasing rules and introduces undefined behavior:

fn main() {
    let mut pool = Pool::<usize>::new();

    let x = pool.allocate().unwrap();
    *x = 1234;
    unsafe { pool.deallocate(x) };

    let y = pool.allocate().unwrap();
    // If the values are identical, we can reasonably assume that both `x` and `y` point to the
    // same object.
    assert_eq!(*y, 1234);

    // Use after free!
    *x = 2000;
    // This reference was exclusive!
    assert_eq!(*y, 2000);
}

So what do we do? We need to somehow add produce/consume semantics to the references that we hand out to make sure that the caller cannot use an object after freeing it. Since this is just what ownership semantics for structs are in Rust, we can simply use that and wrap our reference in a struct: the compiler will do the rest for us:

pub struct OwnedRef<'a, T> {
    reference: &'a mut T,
}

We just have to wrap the references we hand out…

    pub fn allocate<'a>(&mut self) -> Option<OwnedRef<'a, T>> {
        self.free.pop().map(|i| OwnedRef {
            reference: unsafe { self.data.as_mut_ptr().add(i).as_mut().unwrap() },
        })
    }

… and unwrap them when getting them back:

    pub unsafe fn deallocate<'a>(&mut self, owned_ref: OwnedRef<T>) {
        let reference = owned_ref.reference;
        let ptr = std::ptr::from_ref(reference);
        let element_index = unsafe { ptr.offset_from(self.data.as_ptr()) };
        self.free.push(element_index as usize);
    }

Here is what the caller will look like:

fn main() {
    let mut pool = Pool::<usize>::new();

    let x = pool.allocate().unwrap();
    // We can access the underlying reference through the `OwnedRef`.
    *x.reference = 1234;
    unsafe { pool.deallocate(x) };

    // Does not compile anymore!
    *x.reference = 2000;
}

Note that taking references to that OwnedRef, whether shared or exclusive, adds overhead when accessing the underlying object, since there are two pointers to dereference to get to it. This is not a concern to me:

Those costs are minimal, especially if both the OwnedRef and the underlying object are in cache.
My experience suggests that pooled objects are typically accessed exclusively and linearly, so this extra referencing would not happen. Of course, your mileage may vary here.

If the extra reference is required but unacceptable for your use case, I’m not sure how to best deal handle it. If your requirements are simple and static enough, using the typestate to convert an OwnedRef into multiple SharedRefs by consuming it might work for you, but I have not explored that and I can imagine this not being very simple or ergonomic. At this point, it may be fair to introduce more unsafe, though that would not be ideal.

Reset

Now that we have a sound memory pool, let’s make it support the feature which arguably started all this: deep reuse of objects. To do so, we can simply require the type that we pool to implement a trait that let’s us reset them, and call that when deallocating an object:

trait Reset {
    fn reset(&mut self);
}

After adding the Reset trait bound to the Pool struct, we call reset in deallocate:

    pub unsafe fn deallocate<'a>(&mut self, owned_ref: OwnedRef<T>) {
        let reference = owned_ref.reference;
        reference.reset();
        let ptr = std::ptr::from_ref(reference);
        let element_index = unsafe { ptr.offset_from(self.data.as_ptr()) };
        self.free.push(element_index as usize);
    }

We favor running this logic during deallocation since it is typically less latency-sensitive than allocation. We make the assumption that calling reset is cheap, but we otherwise could also buffer it and defer it for later, adding a new API that can be called during idle times.

Since we own this trait, we can implement it for any type we want, including standard ones:

impl Reset for String {
    fn reset(&mut self) {
        self.clear();
    }
}

fn main() {
    let mut pool = Pool::<String>::new();

    let s = pool.allocate().unwrap();
    *s.reference = "Hello, World!".to_owned();
    unsafe { pool.deallocate(s) };

    let s = pool.allocate().unwrap();
    assert_eq!(s.reference.len(), 0);
    assert_eq!(s.reference.capacity(), 13);
}

Depending on your needs, an implementation of reset might be a no-op, a trivial reset (e.g. setting integers to 0), a clearing of a container like String or Vec, or something fancier. In particular, dynamically nested structures that require expensive construction beyond the top level (e.g. Vec<String>) are typically not trivial to handle.

Final code listing

Here is what the toy pool looks like after all the modifications that this article went over:

trait Reset {
    fn reset(&mut self);
}

// We require `T` to be `Default` for convenience, but we could also ask the
// user to provide initial values instead.
struct Pool<T: Default + Reset> {
    // `Vec` is more powerful than what we need and slightly wasteful, but works
    // for our purpose.
    data: Vec<T>,
    free: Vec<usize>,
}

pub struct OwnedRef<'a, T> {
    reference: &'a mut T,
}

impl<T: Default + Reset> Pool<T> {
    pub fn new() -> Self {
        Self {
            data: (0..1024).map(|_| T::default()).collect(),
            free: (0..1024).collect(),
        }
    }

    pub fn allocate<'a>(&mut self) -> Option<OwnedRef<'a, T>> {
        self.free.pop().map(|i| OwnedRef {
            reference: unsafe { self.data.as_mut_ptr().add(i).as_mut().unwrap() },
        })
    }

    pub unsafe fn deallocate<'a>(&mut self, owned_ref: OwnedRef<T>) {
        let reference = owned_ref.reference;
        reference.reset();
        let ptr = std::ptr::from_ref(reference);
        let element_index = unsafe { ptr.offset_from(self.data.as_ptr()) };
        self.free.push(element_index as usize);
    }
}

impl Reset for String {
    fn reset(&mut self) {
        self.clear();
    }
}

Note that this is simple enough that it is in my opinion not worth having a dedicated crate for it. It is generic, so we don’t need to copy paste it for each type that we want to pool, but also simple enough that we can adapt it to our needs by making a new type, rather than adding complexity to a single implementation.

Going further

The pool implemented for this article is a toy. Here are various improvements one could make to it, some of which I consider almost necessary:

The deallocate function should have checks to ensure that the reference that is passed to it is valid, so that we can remove the unsafe from its function signature. This can be done at runtime by checking that the reference points to an object within the pool, or possibly at compile time by tagging the references with a tag unique to the pool it comes from. This would make call sites more cumbersome, but ensure that a reference can only given back to the pool that created it. I have not yet tried that approach myself.
OwnedRef should implement Deref and DerefMut: this makes it behave almost like a regular reference for the caller, which greatly improves ergonomics.
The size of the pool should be configurable, either at compile or run time. It is also possible for the pool to have a dynamic size, but this complexifies the design significantly.
Conversions between OwnedRef and usize can be added to pass references around as handles/user data. This is useful when interacting with APIs such as io_uring, but note that the conversion from usize to OwnedRef is unsafe.
It is possible to automate deallocations by adding a reference to the pool in `OwnedRef` (at which point it would deserve a new name), and implement `Drop` on it, which would call `deallocate`. Since this adds the overhead of storing an additional reference on each object, an alternative would be to check for missing deallocations in the `Pool`'s destructor instead. Either way no memory will leak beyond the pool's lifetime, and the destructors for all objects will be run.
It is possible to remove the use of Vec in favor of something more minimal (a Box<[T; SIZE]> for the objects and a fixed-sized stack for the indices), but it will likely show up anyway during initialization as boxed objects are initialized on the stack and then moved to the heap in Rust, which can cause stack overflows. Vec::into_boxed_slice is often used as a workaround.
There are several possible micro-optimizations such as improving memory access patterns depending on the specific use case. I haven’t experimented with that myself.

[1] I have since had a closer look, and some actually come closer than I initially thought, but none are exactly what I have in mind, and I want to write this article anyway, so here we go anyway. Jump back