Infinite memory is how I like to call a set of tricks based on huge virtual memory allocations for single data structures.
This can have several benefits:
This article presents patterns and data structures for working with infinite memory.
Virtual arrays are a typical way of working with large arrays so that they have a fixed location in memory. The idea is to allocate a huge array upfront, but have the virtual memory system only map to physical memory the pages that are actually touched. A decent explanation is given here [1]. Many codebases have an implementation of that data structure, so you might have encountered it before, perhaps under a different name.
Just like regular arrays easily extend to SoA, infinite memory scales well from virtual arrays to virtual SoA. However, the memory consumption goes up very quickly, as each member gets its own big allocation.
This technique is similar to the data layout currently used in the self-hosted Zig compiler, which Andrew Kelley presented at Handmade Seattle 2021.
It simply is an SoA where one of the arrays contains tags and the others contain untyped (unless some types are always in common) data. Each element is identified by its index and (some of) the data arrays are interpreted based on the type, stored in the tags array.
As an example, consider 4 data types: A
and B
are each
32 bits wide while C
and D
are each 64 bits wide. Our
SoA can then look something like:
enum Tag { A, B, C, D }; struct SoA { Tag tags[SIZE]; u32 data1[SIZE]; u32 data2[SIZE]; };This layout obviously assumes that neither
B
or D
are
single 64 bit values, as it would be inefficient and cumbersome to constantly
split them when writing and recompose them when reading. Documenting the
Tag
enum can greatly help knowing how much data it uses and how to
interpret the data arrays.
The ergonomics are not too bad, but one has to be careful, especially when refactoring, due to the lack of type-safety. They can however be improved by returning a proxy object once the type is known, although this isn't perfect:
A big advantage of this approach is that there is only a single SoA for all
types, so a single index fully identifies an object, it does not need to be
accompanied by a tag, unlike C++'s std::variant
or Rust's enums. In
other words, the type information is stored in the referee rather than the
referrer. This may or may not be desirable depending on the access pattern, but
it enables better memory locality if/when there is no need to access the tag,
since the tag array does not have to be touched. This is particularly useful in
the context of the next trick.
Because the backing arrays can be huge, it is possible to partition them ahead of time and still have plenty of room in each partition to store all the data handled by the program. The partitioning criterion can be anything, such as data size, domain properties or type.
Partitioning by size can maximize cache efficiency. In the previous example, any
A
or B
would not make use of the data2
array, so because all objects are stored in the same pool, half of the cache
lines fetched from data2
would be useless (and therefore wasted) on
average, assuming an equal representation of all types. By partitioning by data
type size, A
s and B
s would then be stored at the lower
SIZE / 2
indices, while C
s and D
s would
be stored in the upper indices. With such a scheme, any data fetched from
data2
would be useful, as part of either a C
or a
D
. All data fetched from data1
would also be useful,
as all types utilize data1
.
Note that it is easy to hit diminishing returns: each new partition adds cache pressure, since another segment of each of the arrays that it uses (tag and some data) has to be loaded in cache. Having a few partitions is fine though, as modern caches can fit quite a few lines.
Partitioning by domain properties enables ECS-like processing. For example in a compiler, type declarations (structs, enums, unions, etc.) could be stored in the same partition, so that operations over type declarations could be performed efficiently, without the need to iterate over the whole AST/IR.
This approach can be useful, but it typically does not scale as well as a dedicated ECS framework (or a custom-tailored alternative), since it favors a certain set of queries over the data: the ones that determine the partitioning criterion.
Each type gets its own partition. This is an extreme strategy that can quickly hit the diminishing returns outlined previously. On the other hand, it enables removing the tag array altogether, as the type can be computed from the index itself.
Surprisingly, infinite memory is actually rather finite. Physical memory is not the issue: an "infinite memory program" will allocate vast amounts of memory, but it should not page in any more memory from RAM than a traditional program [2]. Rather, while 64 bit systems can theoretically address 264 addresses, virtual address spaces are typically much smaller than that. On Linux, only 248 bytes are addressable, which corresponds to 256 TiB of addressable memory. While this remains a terrific amount of memory, allocating 4 billion elements per struct member grows quickly with both the amount of types and the size of their components, which also consumes a terrific amount of memory.
The memory layout also has an impact on the way that objects are manipulated. For example, modern approaches, like primarily storing objects as individual elements in a hash map, are not directly applicable.
Lastly, those techniques are only applicable on modern, mostly non-embedded systems, since they require a virtual memory system.
[1] I originally had a different/better source in mind, but could not find it anymore. You might stumble upon it if you research that topic though! Jump back
[2] It might in fact use less memory due to lower fragmentation and padding, although every end of segment might page in almost a page of overhead. Jump back