About & Projects

About me

Hi! I'm David Miličević, a software engineer specializing in all things low-level.
Used to work at a small hardware startup (not really, but it operated like one), now working at an SaaS company.

My main area of interest is compilers, specifically tracing just-in-time compilers. I've dabbled in everything from hand writing assembly and writing AVX vectorized code to video streaming and game development.
Anything to do with C, C++, Rust and assembly is game.

About the blog

The name Bitsdraumar is a reference to Baldrs draumar (Baldur's dreams), an old norse poem appearing in the Poetic Edda.
Just like the poem is about the dreams of the god Baldur, so is the blog name about the dreams of one single bit to become part of something more. A nibble, a byte or even something greater.

Most of the content of the blog will be about low-level programming, so expect a lot of C, assembly and compilers.

Projects (WIP - links coming later)

Zonn - An optimizing tracing JIT compiler

This is an optimizing tracing JIT compiler for a simple statically-typed language. The language is relatively minimal, just enough to write a few benchmarks.

The JIT uses the NLF (Natural Loop First) region selection heuristic, bytecode-integrated profile counters for precise and low-overhead profiling and store sinking via dense per-guard snapshots. It performs both OSR (on-stack replacement) and deoptimization.

Now onto the details:

The parser directly emits a flat register-based bytecode which is then interpreted by an indirect-threaded VM.

The VM has 2 dispatch tables; one for profiling, one for recording. Profiling is run until a loop hit counter (contained inside the instruction itself) reaches a predefined threshold (indicating the loop is "hot"). The tables are then switched and recording begins. Recording emits executed bytecode instructions and records guards and branch biases for branching instructions. There are a few conditions which trigger a bailout (and eventually blacklisting): jumping outside of the loop, jumping to a different loop header, stumbling upon an inner loop or reaching loop end but with an unexpected branch bias.

The IR is a region-based pseudo sea-of-nodes SSA representation.
Most of the usual optimizations are here: algebraic reassociation, strength reduction, constant folding, constant propagation, copy propagation, common subexpression elimination, loop inversion, dead code elimination, store sinking, function inlining, on-stack replacement, loop-invariant code motion and linear scan register allocation. (Some of them don't apply in all cases, due to implementation time constraints)

Once all of the above is done, the IR is passed through a simple template assembler that uses DynASM for codegen (with some minor peephole optimizations). At the end, the machine code is written to virtual memory and RX protected, and the original loop header instruction is switched to the JIT version.

Now every time the interpreter reaches the new jitted loop header it switches to the compiled trace. Neat.

Parnasse - Dynamic RISC-V assembler

An attempt at writing a dynamic RISC-V assembler that's currently on pause, waiting for a serious ground-up rewrite.