Just how slow are slow languages?

Written on 2022-07-10

I've recently been working a little bit on a compiler, which has made me ask myself the following question: just how slow is it compared to mainstream languages? Not only does my compiler perform no optimizations yet (no inlining, loop unrolling, constant folding, strength reduction, dead code elimination, nothing!), it generates very naïve assembly. Because it has no register allocation strategy, it stores everything on the stack and only uses rax (and rbx when using binary instructions). This leads to comically bad assembly, such as:

movq $1, %rax
movq %rax, %rbx

Anyways, I was curious to know just how slow the code my compiler outputs is compared to mainstream languages. Obviously, it would be slower than the equivalent C code or even Python, but just how much? I wanted to get a feel for this, but also for how close I could realistically get with basic optimizations.

It's common knowledge that you can't compete with LLVM/GCC in terms of code generation (at least not as a one-man effort), but also that all mainstream languages benefit from tons of performance work being done by lots of experts. Therefore, my intuition was that if I could get within an order of magnitude of their speed after implementing optimizations, I should be quite satisfied with it. I estimate that, on the scale of a whole program, not generating ridiculously bad assembly plus optimizations should be able to get me an order of magnitude in terms of performance (maybe too optimistic, but it's really bad right now). Being an order of magnitude away from mainstream languages wouldn't be too bad, I don't intend to use my language for performance critical work and can drop down to assembly if needed.

So I've decided to go ahead and compare the same program, written in my language, in Python and in C. I've gone for a naïve Fibonacci function (the recursive one, which doesn't use dynamic programming) and asked it for the 40th number. For reference, here's the (x64) assembly that my compiler generated for it:

fibo:
    push %rbp
    movq %rsp, %rbp
    movq 16(%rbp), %rax
    push %rax
    movq $0, %rax
    pop %rbx
    cmpq %rax, %rbx
    movq $0, %rax
    sete %al
    push %rax
    movq 16(%rbp), %rax
    push %rax
    movq $1, %rax
    pop %rbx
    cmpq %rax, %rbx
    movq $0, %rax
    sete %al
    pop %rbx
    or %rbx, %rax
    cmpq $0, %rax
    je else_0
    movq 16(%rbp), %rax
    jmp end_0
else_0:
    movq 16(%rbp), %rax
    push %rax
    movq $1, %rax
    movq %rax, %rbx
    pop %rax
    subq %rbx, %rax
    push %rax
    call fibo
    addq $8, %rsp
    push %rax
    movq 16(%rbp), %rax
    push %rax
    movq $2, %rax
    movq %rax, %rbx
    pop %rax
    subq %rbx, %rax
    push %rax
    call fibo
    addq $8, %rsp
    pop %rbx
    addq %rbx, %rax
end_0:
    movq %rbp, %rsp
    pop %rbp
    ret

Without further ado, here are the results:

~ > compiler > out.s && gcc out.s && time ./a.out

real	0m0.925s
user	0m0.925s
sys	0m0.000s
~ > gcc -O0 f.c && time ./a.out

real	0m0.575s
user	0m0.575s
sys	0m0.000s
~ > gcc -O1 f.c && time ./a.out

real	0m0.394s
user	0m0.394s
sys	0m0.000s
~ > gcc -O2 f.c && time ./a.out

real	0m0.149s
user	0m0.149s
sys	0m0.000s
~ > gcc -O3 f.c && time ./a.out

real	0m0.137s
user	0m0.137s
sys	0m0.000s
~ > time python3 f.py

real	0m19.200s
user	0m19.191s
sys	0m0.008s

My binary is within half the performance of GCC's -O0 (which is very slow), within an order of magnitude of -O3 and... almost 20 times faster than Python..? This shocked me quite a bit when I first ran the benchmarks, since I expected to score lower than even Python.

My conclusion is twofold:

It is actually achievable for a single person or a small team to produce a decently fast programming language in a limited time frame. It may not be cutting-edge in terms of features, nor be suitable for scientific computing, high-frequency trading or writing AAA game engines, but it would be enough to write an operating system, a networking stack, a desktop environment, etc.
Some mainstream languages are really slow! Of course, most people know that "interpreted/scripting languages are slow", but comparing mature languages together doesn't help much to forge an intuition. Sure, GCC and Clang both produce executables that blow Python out of the water (and so do the JVM, .Net, Rust, Swift and friends compilers/runtimes), but those are the pinnacle of compiler engineering, humans have never produced anything more advanced in that domain and they have required incredible amounts of work to get there. Slow languages being possible/easy to compete with in terms of performance puts language design/implementation in perspective to me, but also makes me reflect about just how little the performance cost of using those languages is justified

Hall of complaints

This is a cherry-picked benchmark!

No, I've just written the first simple function that takes time to execute that came to my mind.

This is not a realistic benchmark and it disproportionally disadvantages Python!

Sure, but my assembly is also really terrible. My point is that I still expected to be slower than Python. I didn't know by how much, and with hindsight, it makes sense that I would be a faster, but I still would not have expected a 20x speedup.

Even if your assembly is naïve, it's low-level and all the data will likely fit in cache!

Well first, the second point also applies to Python. For the first point, I know that CPython doesn't JIT, but even then, I cannot imagine what causes the overhead to be that big. Either way, I'm now convinced that it is relatively easy to outperform the decades of performance work that were done on CPython.

I will be happy to perform some more realistic benchmarks later, but my compiler is not ready for that yet :^)