John Carmack on inlined code and functional programming

Written on 2022-03-26

The following are lightly edited emails and articles from John Carmack, kept here for archival purposes but also because I've noticed that the opposite (harmful) advice has been very popular for years ("Write many more functions! Keep them as short as possible!"). Other people are pushing against it though, like Chandler Carruth in his There are no zero-cost abstractions.

Email on inlining code (2007-03-13)

From: John Carmack <johnc@idsoftware.com>
Date: Tue, Mar 13, 2007 at 4:17 PM
Subject: inlining code

This is going to be an unusual email – I want to talk about coding style. I'm not going to issue any mandates, but I want everyone to seriously consider some of these issues. I'm not talking about petty things like spaces around operators, bracing styles, or pointers by type or variable (although we probably should settle that one), but about larger organization of code. There aren't any silver bullets, but months can be shaved off of a development project with a few percent improvement in productivity.

This email was getting too long, so I am going to follow up with some other thoughts later. I have a lot of general things to discuss, but there is a specific tactical direction that I want to advocate.

A year ago I was involved in a discussion about writing extremely reliable software for aerospace applications, as I have been several times over the years. Typically I am there to rail against the people that talk about using threads and an RTOS for such things, when a simple polled loop that looks like a primitive video game is much more clear and effective. However, this particular discussion brought up a new wrinkle for me:

Indeed, if memory serves (it's been a while since I read about this)…

The fly-by-wire flight software for the Saab Gripen (a lightweight fighter) went a step further. It disallowed both subroutine calls and backward branches, except for the one at the bottom of the main loop. Control flow went forward only. Sometimes one piece of code had to leave a note for a later piece telling it what to do, but this worked out well for testing: all data was allocated statically, and monitoring those variables gave a clear picture of most everything the software was doing. The software did only the bare essentials, and of course, they were serious about thorough ground testing.

No bug has ever been found in the "released for flight" versions of that code.

Henry Spencer

Now, a great deal of stuff that goes on in the aerospace industry should not be emulated by anyone, and is often self destructive. Most of you have probably read various popular articles about the development process that produces the space shuttle software, and while some people might think that the world would be better if all software developers were that "careful", the truth is that we would be decades behind where we are now, with no PC's and no public internet if everything was developed at that snail's pace. This particular anecdote seemed to have some practical value, so I decided to give it a try. The flight control code for the Armadillo rockets is only a few thousand lines of code, so I took the main tic function and started inlining all the subroutines. While I can't say that I found a hidden bug that could have caused a crash (literally…), I did find several variables that were set multiple times, a couple control flow things that looked a bit dodgy, and the final code got smaller and cleaner.

After living with the code in that style for quite some time, I haven't found any drawbacks to it, and I have started applying the general approach to my code at Id. In lots of places there is a choice between organizing code a few ways:

------- style A:

void MinorFunction1( void ) {
}

void MinorFunction2( void ) {
}

void MinorFunction3( void ) {
}

void MajorFunction( void ) {
        MinorFunction1();
        MinorFunction2();
        MinorFunction3();
}

--------- style B:

void MajorFunction( void ) {
        MinorFunction1();
        MinorFunction2();
        MinorFunction3();
}

void MinorFunction1( void ) {
}

void MinorFunction2( void ) {
}

void MinorFunction3( void ) {
}


---------- style C:

void MajorFunction( void ) {
        // MinorFunction1

        // MinorFunction2

        // MinorFunction3 
}

I have historically used "style A" to allow for not prototyping in all cases, although some people prefer "style B". The difference between the two isn’t of any consequence. Michael Abrash used to write code in "style C", and I remember actually taking his code and converting it to "style A" in the interest of perceived readability improvements. At this point, I think there are some definite advantages to "style C", but they are development process oriented, rather than discrete, quantifiable things, and they run counter to a fair amount of accepted conventional wisdom, so I am going to try and make a clear case for it. There isn’t any dogma here, but considering exactly where it is and isn’t appropriate is worthwhile.

In no way, shape, or form am I making a case that avoiding function calls alone directly helps performance.

An exercise that I try to do every once in a while is to "step a frame" in the game, starting at some major point like common->Frame(), game->Frame(), or renderer->EndFrame(), and step into every function to try and walk the complete code coverage. This usually gets rather depressing long before you get to the end of the frame. Awareness of all the code that is actually executing is important, and it is too easy to have very large blocks of code that you just always skip over while debugging, even though they have performance and stability implications.

C++ is no friend of this goal, with operator overloading, implicit constructors, and so on. A lot of things done in the name of flexibility are somewhat misguided, and at the root of a lot of development problems. Games, with the continuously recurring real time tic structure, also have some goals and constraints that encourage a programming style somewhat different than, say, a content creation application or transaction processor.

If something is going to be done once per frame, there is some value to having it happen in the outermost part of the frame loop, rather than buried deep inside some chain of functions that may wind up getting skipped for some reason. For example, our usercmd_t generation code is buried off of asyncServer, and it really should be in the main common loop. Related to this is a topic of hardware versus software design – it is often better to go ahead and do an operation, then choose to inhibit or ignore some or all of the results, than try to conditionally perform the operation. The usercmd_t generation relates to this (and other messes related to the interaction with game over bindsets), where the usercmd_t are only generated when they "need to be".

The way we have traditionally measured performance and optimized our games encouraged a lot of conditional operations – recognizing that a particular operation doesn’t need to be done in some subset of the operating states, and skipping it. This gives better demo timing numbers, but a huge amount of bugs are generated because skipping the expensive operation also usually skips some other state updating that turns out to be needed elsewhere.

We definitely still have tasks that are performance intensive enough to need optimization, but the style gets applied as a matter of course in many cases where a performance benefit is negligible, but we still eat the bugs. Now that we are firmly decided on a 60hz game, worst case performance is more important than average case performance, so highly variable performance should be looked down on even more.

It is very easy for frames of operational latency to creep in when operations are done deeply nested in various subsystems, and things evolve over time. This can lay hidden as a barely perceptible drop in input quality, or it can be blatantly obvious as a model trailing an attachment point during movement. If everything is just run out in a 2000 line function, it is obvious which part happens first, and you can be quite sure that the later section will get executed before the frame is rendered.

Besides awareness of the actual code being executed, inlining functions also has the benefit of not making it possible to call the function from other places. That sounds ridiculous, but there is a point to it. As a codebase grows over years of use, there will be lots of opportunities to take a shortcut and just call a function that does only the work you think needs to be done. There might be a FullUpdate() function that calls PartialUpdateA(), and PartialUpdateB(), but in some particular case you may realize (or think) that you only need to do PartialUpdateB(), and you are being efficient by avoiding the other work. Lots and lots of bugs stem from this. Most bugs are a result of the execution state not being exactly what you think it is.

Strictly functional functions that only read their input arguments and just return a value without examining or modifying any permanent state are safe from these types of errors, and the nice ability to formally speak about them makes them a good ivory tower topic, but very little of our real code falls into this category. I don’t think that purely functional programming writ large is a pragmatic development plan, because it makes for very obscure code and spectacular inefficiencies, but if a function only references a piece or two of global state, it is probably wise to consider passing it in as a variable. It would be kind of nice if C had a "functional" keyword to enforce no global references.

Const parameters and const functions are helpful in avoiding side effect related bugs, but the functions are still susceptible to changes in the global execution environment. Trying to make more parameters and functions const is a good exercise, and often ends in casting it away in frustration at some point. That frustration is usually due to finding all sorts of places that state could be modified that weren’t immediately obvious – places for bugs to breed.

C++ object methods could be thought of as almost-functional, with an implicit overwriting assignment on return, but with larger objects that contain lots of variables you don’t have much awareness of what is modified by the method, and again, there is no guarantee that the function isn’t running off and doing something horribly global, like parsing a decl.

The function that is least likely to cause a problem is one that doesn’t exist, which is the benefit of inlining it. If a function is only called in a single place, the decision is fairly simple.

In almost all cases, code duplication is a greater evil than whatever second order problems arise from functions being called in different circumstances, so I would rarely advocate duplicating code to avoid a function, but in a lot of cases you can still avoid the function by flagging an operation to be performed at the properly controlled time. For instance, having one check in the player think code for health <= 0 && !killed is almost certain to spawn less bugs than having KillPlayer() called in 20 different places.

On the subject of code duplication, I tracked all the bugs (problems that surfaced after I thought everything was working correctly) I fixed for a while, and I was quite surprised at how often copy-paste-modify operations resulted in subtle bugs that weren’t immediately obvious. For small vector operations on three or four things, I would often just past and modify a few characters like this:

v[0] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+0));
v[1] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+1));
v[2] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+2));
v[3] = HF_MANTISSA(*(halfFloat_t *)((byte *)data + i*bytePitch+j*8+3));

I now strongly encourage explicit loops for everything, and hope the compiler unrolls it properly. A significant number of my bugs related to things like this, and I am now rethinking even two dimensional cases where I commonly use discrete _X, _Y or _WIDTH, _HEIGHT variables. I find that much easier to read than a two element array, but it is difficult to argue with my data on how often it made me screw up.

Some practical Matters

Using large comment blocks inside the major function to delimit the minor functions is a good idea for quick scanning, and often enclosing it in a bare braced section to scope the local variables and allow editor collapsing of the section is useful. I know there are some rules of thumb about not making functions larger than a page or two, but I specifically disagree with that now – if a lot of operations are supposed to happen in a sequential fashion, their code should follow sequentially.

Enclosing pages of code inside a conditional or loop statement does have readability and awareness drawbacks, so leaving that code in a separate function may still be justified, but in some cases it is still possible to move the code to another place where its execution wouldn’t be conditional, or just execute it at all times and inhibit the results in some way with a very small conditional block. An execute-and-inhibit style usually takes more absolute time, but it reduces the variability in frame times, and eliminates a class of bugs.

Inlining code quickly runs into conflict with modularity and OOP protections, and good judgment must be applied. The whole point of modularity is to hide details, while I am advocating increased awareness of details. Practical factors like increased multiple checkouts of source files and including more local data in the master precompiled header forcing more full rebuilds do need to be weighed. Currently I am leaning towards using heavyweight objects as the reasonable break point for combining code, and trying to reduce the use of medium sized helper objects, while making any very lightweight objects as purely functional as possible if they have to exist at all.

To sum up:

If a function is only called from a single place, consider inlining it.
If a function is called from multiple places, see if it is possible to arrange for the work to be done in a single place, perhaps with flags, and inline that.
If there are multiple versions of a function, consider making a single function with more, possibly defaulted, parameters.
If the work is close to purely functional, with few references to global state, try to make it completely functional.
Try to use const on both parameters and functions when the function really must be used in multiple places.
Minimize control flow complexity and "area under ifs", favoring consistent execution paths and times over "optimally" avoiding unnecessary work.

Discussion?

John Carmack

Article on functional programming in C++ (2012-04-30)

Probably everyone reading this has heard "functional programming" put forth as something that is supposed to bring benefits to software development, or even heard it touted as a silver bullet. However, a trip to Wikipedia for some more information can be initially off-putting, with early references to lambda calculus and formal systems. It isn't immediately clear what that has to do with writing better software.

My pragmatic summary: A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in. In a multithreaded environment, the lack of understanding and the resulting problems are greatly amplified, almost to the point of panic if you are paying attention.

Programming in a functional style makes the state presented to your code explicit, which makes it much easier to reason about, and, in a completely pure system, makes thread race conditions impossible. I do believe that there is real value in pursuing functional programming, but it would be irresponsible to exhort everyone to abandon their C++ compilers and start coding in Lisp, Haskell, or, to be blunt, any other fringe language. To the eternal chagrin of language designers, there are plenty of externalities that can overwhelm the benefits of a language, and game development has more than most fields. We have cross-platform issues, proprietary tool chains, certification gates, licensed technologies, and stringent performance requirements on top of the issues with legacy codebases and workforce availability that everyone faces. If you are in circumstances where you can undertake significant development work in a non-mainstream language, I'll cheer you on, but be prepared to take some hits in the name of progress. For everyone else: No matter what language you work in, programming in a functional style provides benefits. You should do it whenever it is convenient, and you should think hard about the decision when it isn't convenient. You can learn about lambdas, monads, currying, composing lazily evaluated functions on infinite sets, and all the other aspects of explicitly functionally oriented languages later if you choose. C++ doesn't encourage functional programming, but it doesn't prevent you from doing it, and you retain the power to drop down and apply SIMD intrinsics to hand laid out data backed by memory mapped files, or whatever other nitty-gritty goodness you find the need for.

Pure Functions

A pure function only looks at the parameters passed in to it, and all it does is return one or more computed values based on the parameters. It has no logical side effects. This is an abstraction of course; every function has side effects at the CPU level, and most at the heap level, but the abstraction is still valuable. It doesn't look at or update global state. it doesn't maintain internal state. It doesn't perform any IO. it doesn't mutate any of the input parameters. Ideally, it isn't passed any extraneous data – getting an allMyGlobals pointer passed in defeats much of the purpose. Pure functions have a lot of nice properties.

Thread safety

A pure function with value parameters is completely thread safe. With reference or pointer parameters, even if they are const, you do need to be aware of the danger that another thread doing non-pure operations might mutate or free the data, but it is still one of the most powerful tools for writing safe multithreaded code. You can trivially switch them out for parallel implementations, or run multiple implementations to compare the results. This makes it much safer to experiment and evolve.

Reusability

It is much easier to transplant a pure function to a new environment. You still need to deal with type definitions and any called pure functions, but there is no snowball effect. How many times have you known there was some code that does what you need in another system, but extricating it from all of its environmental assumptions was more work than just writing it over?

Testability

A pure function has referential transparency, which means that it will always give the same result for a set of parameters no matter when it is called, which makes it much easier to exercise than something interwoven with other systems. I have never been very responsible about writing test code; a lot of code interacts with enough systems that it can require elaborate harnesses to exercise, and I could often convince myself (probably incorrectly) that it wasn't worth the effort. Pure functions are trivial to test; the tests look like something right out of a textbook, where you build some inputs and look at the output. Whenever I come across a finicky looking bit of code now, I split it out into a separate pure function and write tests for it. Frighteningly, I often find something wrong in these cases, which means I'm probably not casting a wide enough net.

Understandability and maintainability

The bounding of both input and output makes pure functions easier to re-learn when needed, and there are less places for undocumented requirements regarding external state to hide. Formal systems and automated reasoning about software will be increasingly important in the future. Static code analysis is important today, and transforming your code into a more functional style aids analysis tools, or at least lets the faster local tools cover the same ground as the slower and more expensive global tools. We are a "Get 'er done" sort of industry, and I do not see formal proofs of whole program "correctness" becoming a relevant goal, but being able to prove that certain classes of flaws are not present in certain parts of a codebase will still be very valuable. We could use some more science and math in our process. Someone taking an introductory programming class might be scratching their head and thinking "aren't all programs supposed to be written like this?" The reality is that far more programs are Big Balls of Mud than not. Traditional imperative programming languages give you escape hatches, and they get used all the time. If you are just writing throwaway code, do whatever is most convenient, which often involves global state. If you are writing code that may still be in use a year later, balance the convenience factor against the difficulties you will inevitably suffer later. Most developers are not very good at predicting the future time integrated suffering their changes will result in.

Purity In Practice

Not everything can be pure; unless the program is only operating on its own source code, at some point you need to interact with the outside world. It can be fun in a puzzly sort of way to try to push purity to great lengths, but the pragmatic break point acknowledges that side effects are necessary at some point, and manages them effectively. It doesn't even have to be all-or-nothing in a particular function. There is a continuum of value in how pure a function is, and the value step from almost-pure to completely-pure is smaller than that from spaghetti-state to mostly-pure. Moving a function towards purity improves the code, even if it doesn't reach full purity. A function that bumps a global counter or checks a global debug flag is not pure, but if that is its only detraction, it is still going to reap most of the benefits. Avoiding the worst in a broader context is generally more important than achieving perfection in limited cases. If you consider the most toxic functions or systems you have had to deal with, the ones that you know have to be handled with tongs and a face shield, it is an almost sure bet that they have a complex web of state and assumptions that their behavior relies on, and it isn't confined to their parameters. Imposing some discipline in these areas, or at least fighting to prevent more code from turning into similar messes, is going to have more impact than tightening up some low-level math functions.

The process of refactoring towards purity generally involves disentangling computation from the environment it operates in, which almost invariably means more parameter passing. This seems a bit curious – greater verbosity in programming languages is broadly reviled, and functional programming is often associated with code size reduction. The factors that allow programs in functional languages to sometimes be more concise than imperative implementations are pretty much orthogonal to the use of pure functions — garbage collection, powerful built-in types, pattern matching, list comprehensions, function composition, various bits of syntactic sugar, etc. For the most part, these size reducers don't have much to do with being functional, and can also be found in some imperative languages. You should be getting irritated if you have to pass a dozen parameters into a function; you may be able to refactor the code in a manner that reduces the parameter complexity.

The lack of any language support in C++ for maintaining purity is not ideal. If someone modifies a widely used foundation function to be non-pure in some evil way, everything that uses the function also loses its purity. This sounds disastrous from a formal systems point of view, but again, it isn't an all-or-nothing proposition where you fall from grace with the first sin. Large scale software development is unfortunately statistical. It seems like there is a sound case for a pure keyword in future C/C++ standards. There are close parallels with const – an optional qualifier that allows compile time checking of programmer intention and will never hurt, and could often help, code generation. The D programming language does offer a pure keyword. Note their distinction between weak and strong purity – you need to also have const input references and pointers to be strongly pure. In some ways, a language keyword is over-restrictive — a function can still be pure even if it calls impure functions, as long as the side effects don't escape the outer function. Entire programs can be considered pure functional units if they only deal with command line parameters instead of random file system state.

Object Oriented Programming

OO makes code understandable by encapsulating moving parts. - Michael Feathers @mfeathers

FP makes code understandable by minimizing moving parts. The "moving parts" are mutating states. Telling an object to change itself is lesson one in a basic object-oriented programming book, and it is deeply ingrained in most programmers, but it is anti-functional behavior. Clearly there is some value in the basic OOP idea of grouping functions with the data structures they operate on, but if you want to reap the benefits of functional programming in parts of your code, you have to back away from some object-oriented behaviors in those areas. Class methods that can't be const are not pure by definition, because they mutate some or all of the potentially large set of state in the object. They are not thread safe, and the ability to incrementally poke and prod objects into unexpected states is indeed a significant source of bugs. const object methods can still be technically pure if you don't count the implicit const this pointer against them, but many object are large enough to constitute a sort of global state all their own, blunting some of the clarity benefits of pure functions. Constructors can be pure functions, and generally should strive to be – they take arguments and return an object. At the tactical programming level, you can often work with objects in a more functional manner, but it may require changing the interfaces a bit. At id we went over a decade with an idVec3 class that had a self-mutating void Normalize() method, but no corresponding idVec3 Normalized() const method. Many string methods were similarly defined as working on themselves, rather than returning a new copy with the operation performed on it – ToLowerCase(), StripFileExtension(), etc.

Performance Implications

In almost all cases, directly mutating blocks of memory is the speed-of-light optimal case, and avoiding this is spending some performance. Most of the time this is of only theoretical interest; we trade performance for productivity all the time. Programming with pure functions will involve more copying of data, and in some cases this clearly makes it the incorrect implementation strategy due to performance considerations. As an extreme example, you can write a pure DrawTriangle() function that takes a framebuffer as a parameter and returns a completely new framebuffer with the triangle drawn into it as a result. Don't do that. Returning everything by value is the natural functional programming style, but relying on compilers to always perform return value optimization can be hazardous to performance, so passing reference parameter for output of complex data structures is often justifiable, but it has the unfortunate effect of preventing you from declaring the returned value as const to enforce single assignment. There will be a strong urge in many cases to just update a value in a complex structure passed in rather than making a copy of it and returning the modified version, but doing so throws away the thread safety guarantee and should not be done lightly. List generation is often a case where it is justified. The pure functional way to append something to a list is to return a completely new copy of the list with the new element at the end, leaving the original list unchanged. Actual functional languages are implemented in ways that make this not as disastrous as it sounds, but if you do this with typical C++ containers you will die. A significant mitigating factor is that performance today means parallel programming, which usually requires more copying and combining than in a single threaded environment even in the optimal performance case, so the penalty is smaller, while the complexity reduction and correctness benefits are correspondingly larger. When you start thinking about running, say, all the characters in a game world in parallel, it starts sinking in that the object-oriented approach of updating objects has some deep difficulties in parallel environments. Maybe if all of the object just referenced a read only version of the world state, and we copied over the updated version at the end of the frame… Hey, wait a minute…

Action Items

Survey some non-trivial functions in your codebase and track down every bit of external state they can reach, and all possible modifications they can make. This makes great documentation to stick in a comment block, even if you don't do anything with it. If the function can trigger, say, a screen update through your render system, you can just throw your hands up in the air and declare the set of all effects beyond human understanding. The next task you undertake, try from the beginning to think about it in terms of the real computation that is going on. Gather up your input, pass it to a pure function, then take the results and do something with it. As you are debugging code, make yourself more aware of the part mutating state and hidden parameters play in obscuring what is going on. Modify some of your utility object code to return new copies instead of self-mutating, and try throwing const in front of practically every non-iterator variable you use.

Additional references:

Email comment about the previous article (2014-09-2014)

In the years since I wrote this, I have gotten much more bullish about pure functional programming, even in C/C++ where reasonable:

The real enemy addressed by inlining is unexpected dependency and mutation of state, which functional programming solves more directly and completely. However, if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing. When it gets to be too much to take, figure out how to factor blocks out into pure functions (and don't let them slide back into impurity!).

As an additional confirmation of the point of the article, a couple years later when I was working on the Doom 3 BFG edition release, the exactly predicted off-by-one-frame-of-latency input sampling happened, and very nearly shipped. That was a cold-sweat moment for me: after all of my harping about latency and responsiveness, I almost shipped a title with a completely unnecessary frame of latency.

To make things more complicated, the do always, then inhibit or ignore strategy, while a very good idea for high reliability systems, is less appropriate in power and thermal constrained environments like mobile.

John Carmack