June 28, 2011

Generating Shaders on the Fly: Stitching

Some time ago, actually in 2009, we somehow managed to win the 4k intro competition at the annual Assembly Summer event. What we released was Muon Baryon. Perhaps, for those who are interested, a Making-of would follow later on; but for now I would like to concentrate not on the intro itself but a technology that arose from the final version of the intro.

You can download the sources for this method at the bottom of the post.

New challenges

When Jake asked me to code a final version of the intro, I also wanted to make sure it runs at an acceptable speed as well. For this reason I tried out a few tricks and some run-of-the-mill techniques. They did help a little, but they failed to become the silver bullet. As a result, I decided to conduct some experiments to figure out what makes it this slow. Turned out, GPUs back then didn’t like if-then-else statements (d’oh it’s a stream processor). On top of that, the vertex shader was entirely made out of such conditionals and it was responsible of controlling the position and the orientation of the camera.

So, I decided to port it to the CPU which is on really great terms with branching. However, the situation was quite dreadful; there was a lot to do. What made it even worse was the fact that this shader was simply setting up some sort of a state-machine using a bunch of varying variables, in turn read by the fragment shader to render the final scene. Now, let’s pause here for a second. What you just read means that for every possible combination of these varying variables that are passed to the fragment shader, the fragment shader has to change its behaviour radically.

For an unrestricted software this (= packing a lot of functionality in a single unit) would be an enormous design mistake to do. But in a maximum of four kilobytes it becomes very much huggable and loveable. So, the actual task was to generate all permutations of the fragment shader for each one of these if-then-else blocks that make up the intro’s timeline. Surely, this sounds like an impossible task! And, indeed, I thought I was biting too big of a piece to swallow; however after a few minutes of hacking I had enough confidence to go on with the rest of the implementation. What I came up with is possibly a new idea. A shader stitcher.

What is a shader stitcher?

A shader stitcher is a simple logic that combines together fragments of shader sources in function of an array of integers used to address these shader fragments. It might sound quite complicated, but it is actually very simple: let’s say you have an array of strings, which are the shader fragments you’ll be combining to generate the desired final shader; and you have an array of integers, which are indices of shader fragments within the aforementioned array of strings. You then run a simple for loop and somehow combine these fragments to obtain a single piece of meaningful shader source code that will provide the desired functionality when interpreted linearly.

The description above is capable of theoretically explaining the work-flow of the logic behind the shader stitcher. However, for the sake of clarity, below is a step-by-step walkthrough in pseudocode.

stitches[] = {
"void main() { vec4 color = vec4(",
"); fragData[0] = color; }"

The stitches array contains the fragments of the shader sources that we can stitch together. In this case the presence of gl_FragData signifies that these stitches are meant for constructing a fragment shader source code in GLSL.

indices[][] = {
{ 0, 1, 3 },
{ 0, 2, 3 }

And the indices array contains arrays of integers that are the indices of the shader stitches to be stitched together. This is a little deviation from the theoretical explanation above. Instead of having a single array of integers, for the purpose of showing varying results of the shader stitcher an array of arrays is used. Each one of the subarrays of this array contains the indices of the source snippets from the stitches array; hence each subarray describes a single shader source code.

foreach (group in indices) {
sourceCode = ""
foreach (index in group) {
sourceCode += stitches[index]

The rest is the actual combinatory logic that constructs a single, linear shader source code for each one of the subarrays of the indices array. Afterwards, the logic compiles this shader source code to obtain the final, compiled shader object.

If you paid attention you can see that in the example above we are actually joining strings together by manipulating the sourceCode string through dynamic memory allocations and copying. This is not a wise thing to do, especially if you are targeting size-savvy applications of the idea. In Muon Baryon I used the source-string parameter type of glShaderSource (which is GLchar **) to my advantage. Instead of actually combining the strings I just created a table of string pointers and fed that to the glShaderSource function. This way I needn’t to copy the strings nor perform any memory allocations. If you are able to do something similar, I strongly recommend this.


By using a stitcher I was able to generate unique GPU programs, each of which would correspond to a unique slice of the timeline. Also I was able to reuse many parts of the shader without the necessity of retyping them with slightly differing parts. So, I could actually have a single function that would return the distance to two completely different objects when called within different timeline slices. It might seem like a little optimisation that one can do without, but you’re actually hitting three birds with the same shot.

First of all, your vertex shader controlled camera still functions correctly because there’s a unique shader running just for that specific time interval. Secondly, you’re not having any branching within your shader to determine which function to call for different objects at different times. And lastly, you are not keeping track of another function: Remember; two functions can’t have the same name, unless you use a stitcher! By not having a separate function, you’re actually saving quite a bit of space because the compressor won’t need to take that extra function into account.

So, there you go. I had been planning to write about this topic for quite some time. This was more about the theory behind the implementation; but if you would like to see a quick realisation of the idea take a look at Muon Baryon’s sources. Additionally, for a way better documented and a little human-friendlier implementation of a GLSL shader stitcher, you can always use this little library I put together from a size-coding framework I coded, called Stun.

About the author, Göksel Göktas

Göksel is a dream-chaser from the homeland of perkele and sauna: Finland. He is a coder, creating experimental, audio-visual, sometimes interactive and mostly real-time, virtual macro- and microcosms of all sorts. He frequently contributes to a well-fed stream of demoscene productions by Youth Uprising. On the professional stage he works for many different companies including one major phone manufacturer.

Leave a Comment