Working with infinite memory

Written on 2022-11-05

Infinite memory is how I like to call a set of tricks based on huge virtual memory allocations for single data structures.

This can have several benefits:

eliminate many allocations at runtime after initialization
eliminate the need for relocation due to growth
memory locality

at the expense of:

granular memory consumption
a higher risk of virtual memory exhaustion
a loss in flexibility in programming techniques

This article presents patterns and data structures for working with infinite memory.

Virtual array

Virtual arrays are a typical way of working with large arrays so that they have a fixed location in memory. The idea is to allocate a huge array upfront, but have the virtual memory system only map to physical memory the pages that are actually touched. A decent explanation is given here [1]. Many codebases have an implementation of that data structure, so you might have encountered it before, perhaps under a different name.

SoA

Just like regular arrays easily extend to SoA, infinite memory scales well from virtual arrays to virtual SoA. However, the memory consumption goes up very quickly, as each member gets its own big allocation.

Polymorphic SoA

This technique is similar to the data layout currently used in the self-hosted Zig compiler, which Andrew Kelley presented at Handmade Seattle 2021.

It simply is an SoA where one of the arrays contains tags and the others contain untyped (unless some types are always in common) data. Each element is identified by its index and (some of) the data arrays are interpreted based on the type, stored in the tags array.

As an example, consider 4 data types: A and B are each 32 bits wide while C and D are each 64 bits wide. Our SoA can then look something like:

enum Tag { A, B, C, D };

struct SoA {
    Tag tags[SIZE];
    u32 data1[SIZE];
    u32 data2[SIZE];
};

This layout obviously assumes that neither B or D are single 64 bit values, as it would be inefficient and cumbersome to constantly split them when writing and recompose them when reading. Documenting the Tag enum can greatly help knowing how much data it uses and how to interpret the data arrays.

The ergonomics are not too bad, but one has to be careful, especially when refactoring, due to the lack of type-safety. They can however be improved by returning a proxy object once the type is known, although this isn't perfect:

it is somewhat verbose and requires boilerplate to write the getters, which cannot be automatically generated without reflection
it can require quite a bit of optimization from the compiler to have no runtime overhead, especially for mutable proxies, as it must eliminate both the proxy and the references it contains
it is somewhat error-prone still, since the right getter must be called for a given tag

A big advantage of this approach is that there is only a single SoA for all types, so a single index fully identifies an object, it does not need to be accompanied by a tag, unlike C++'s std::variant or Rust's enums. In other words, the type information is stored in the referee rather than the referrer. This may or may not be desirable depending on the access pattern, but it enables better memory locality if/when there is no need to access the tag, since the tag array does not have to be touched. This is particularly useful in the context of the next trick.

Partitioned polymorphic SoA

Because the backing arrays can be huge, it is possible to partition them ahead of time and still have plenty of room in each partition to store all the data handled by the program. The partitioning criterion can be anything, such as data size, domain properties or type.

Partitioning by size

Partitioning by size can maximize cache efficiency. In the previous example, any A or B would not make use of the data2 array, so because all objects are stored in the same pool, half of the cache lines fetched from data2 would be useless (and therefore wasted) on average, assuming an equal representation of all types. By partitioning by data type size, As and Bs would then be stored at the lower SIZE / 2 indices, while Cs and Ds would be stored in the upper indices. With such a scheme, any data fetched from data2 would be useful, as part of either a C or a D. All data fetched from data1 would also be useful, as all types utilize data1.

Note that it is easy to hit diminishing returns: each new partition adds cache pressure, since another segment of each of the arrays that it uses (tag and some data) has to be loaded in cache. Having a few partitions is fine though, as modern caches can fit quite a few lines.

Partitioning by domain properties

Partitioning by domain properties enables ECS-like processing. For example in a compiler, type declarations (structs, enums, unions, etc.) could be stored in the same partition, so that operations over type declarations could be performed efficiently, without the need to iterate over the whole AST/IR.

This approach can be useful, but it typically does not scale as well as a dedicated ECS framework (or a custom-tailored alternative), since it favors a certain set of queries over the data: the ones that determine the partitioning criterion.

Partitioning by type

Each type gets its own partition. This is an extreme strategy that can quickly hit the diminishing returns outlined previously. On the other hand, it enables removing the tag array altogether, as the type can be computed from the index itself.

Limitations

Surprisingly, infinite memory is actually rather finite. Physical memory is not the issue: an "infinite memory program" will allocate vast amounts of memory, but it should not page in any more memory from RAM than a traditional program [2]. Rather, while 64 bit systems can theoretically address 2⁶⁴ addresses, virtual address spaces are typically much smaller than that. On Linux, only 2⁴⁸ bytes are addressable, which corresponds to 256 TiB of addressable memory. While this remains a terrific amount of memory, allocating 4 billion elements per struct member grows quickly with both the amount of types and the size of their components, which also consumes a terrific amount of memory.

The memory layout also has an impact on the way that objects are manipulated. For example, modern approaches, like primarily storing objects as individual elements in a hash map, are not directly applicable.

Lastly, those techniques are only applicable on modern, mostly non-embedded systems, since they require a virtual memory system.

[1] I originally had a different/better source in mind, but could not find it anymore. You might stumble upon it if you research that topic though! Jump back

[2] It might in fact use less memory due to lower fragmentation and padding, although every end of segment might page in almost a page of overhead. Jump back