It was a quiet Sunday evening...
At my university's rocketry club we're using a particular piece of simulation software. Let's call it ContosoSim1. You enter the various parameters of the rocket into it (mass, shape, engine power, etc.), and the software simulates the rocket's flight. Pretty neat.
At one point we decided to install the simulator on the workstation we have at the lab, so that everybody will be able to use it2. Except when we did... the program failed to launch. As in, you double-click the shortcut and nothing happens.
At first, we thought it was a bug somewhere in the software (which it was, sorta), so we contacted customer service. After some back-and-forth the service rep asked us to send the application's crash log from the Reliability Monitor.
The log clearly showed that the application crashed with an access violation. At this point my curiosity was piqued. Either our lab workstation was configured not to display the usual "send error report" dialog box, or the application suppressed it itself. In any case, this had the makings of an interesting bug.
It was time to fire up WinDbg.
An initial state of failure
Running the simulator under a debugger yields the following call stack:
So the offending code is not in the simulator itself, but rather in some external
vtkRenderingOpenGL2-7.1.dll). Throwing it into Ghidra reveals the following:
The crash occurs at the highlighted line, upon calling
which Ghidra shows to be exported from
The unusual thing here is that
__glewBlendFuncSeparate is not exported as a function,
but rather as a function pointer. Since this pointer is
NULL for some reason,
the whole thing crashes.
Digging further, we find that the troublesome pointer is initialized in
vtkglew-7.1.dll), via a call to
the decompilation output for
_glewInit_GL_VERSION_1_4 is not exactly readable,
so it's time to look for another way.
The power of open source
A quick search reveals that all those VTK libraries are actually part of
The Visualization Toolkit, which is open source under the BSD license!
Great, that simplifies things. After cloning the repo and checking out the
v7.1.0 tag (which should correspond to the version ContosoSim is using) we can search
for the crashing code.
From the call stack above we know that the function called just before
SetupPixelFormatPaletteAndContext. In the source it can be
Here's the relevant part:
And this is
Looking in Ghidra confirms that there is a tail-call optimization in
which is why the call stack doesn't show this function, only
Here's the relevant part from its code:
From our reversing we already know that
glEnable are plain
function exports (from
opengl32.dll). On the other hand,
is a macro that expands to the
__glewBlendFuncSeparate pointer we have seen
__declspec(dllimport)). It's clear now that this code assumes
__glewBlendFuncSeparate to be properly initialized, since there's
NULL check here.
Right, so we have thus confirmed our reversing findings. Time to finish this.
What happens inside
OpenGLInitContext? The (almost) first thing this function
does is call
vtkglew-7.1.dll, which calls
What this does is query the OpenGL version, parse the returned string to determine
the major and minor version, then call
_glewInit_GL_VERSION_1_4 if the version is
at least 1.4. And this last function is what initializes
as we have seen in Ghidra (and confirmed by the source).
No matter how we look at it, this is clearly a bug in VTK.
glBlendFuncSeparate is available, but that depends on the OpenGL version.
Indeed, even if the version is at least 1.4,
wglGetProcAddress can still technically
NULL, but in that case that would be a bug in the OpenGL implementation.
Setting a breakpoint on
glGetString, we can see that on our lab workstation (where
ContosoSim crashes) it returns a version number of
1.1.0, which explains why
Case closed. Send an email to the vendor telling them to upgrade VTK3 and wait for a fix. After the fix we may just get a message telling us our OpenGL version is too old, but at least that's progress.
Except... The simulator does work on other machines. So what gives?
When a DLL is more than the sum of its exports
Maybe on the machines where the simulator doesn't crash it simply goes through a
different control path, bypassing the
NULL-dereference? A likely hypothesis,
however upon closer inspection it can be quickly tossed out: on my machine
__glewBlendFuncSeparate is not
NULL, and is indeed called from the same
Okay, so maybe we just have different versions of OpenGL? Nope, again. Both systems
have the same
Alright, this is not funny anymore.
Taking another look at
__glewBlendFuncSeparate, we see that it's not
and it's also not inside
opengl32.dll. In fact, it points to
which is the "OpenGL(R) Driver for Intel(R) Graphics Accelerator".
glGetString should still return
1.1.0, right? Right?! It's the same DLL!
I don't know why it surprised me, but sure enough, the version string returned was
4.4.0 - Build 18.104.22.16824.
Somehow, this Intel DLL manages to override a legitimate Windows one. My immediate
thought was that Intel hooked it somehow4, but the truth is more prosaic.
Setting a breakpoint on the load of this DLL (
sxe ld ig8icd32.dll),
we can see that there is a function in
— which is responsible for loading it.
opengl32.dll loads a GPU vendor's OpenGL implementation and delegates to it.
If the GPU vendor implements OpenGL from version 1.4 and upwards, VTK will
work as expected. Otherwise, it'll crash.
But why doesn't this delegation happen on our lab workstation? Well... Because it doesn't have a GPU. At all.
Who to blame and what to do?
And so, a combination of an old PC with a buggy version of VTK means we can't use ContosoSim. Sure, we can wait for the simulator vendor to update VTK on their end, but we need the software now! And, as stated previously, maybe we do actually need a GPU with modern OpenGL support. Unfortunately, upgrading the workstation is not exactly an option.
Perhaps there is a way...
During my wanderings through the interwebs, I noticed that Qt has the ability to emulate OpenGL in software. Although the simulator does use Qt, setting the environment variables mentioned here does not help with the crash. But it does suggest that if we could find a software implementation of OpenGL, we might be able to fool it...
As luck would have it, there is such a thing: Mesa3D. And, what's even better, there is a Windows build. Just dropping it in ContosoSim's installation directory makes it launch. Granted, it's bound to be slower than GPU-assisted OpenGL, and perhaps it's even going to crash because of implementation issues.
But for now — it works.
Until that point it was installed only on the personal computers of some of the aerodynamics team members. ↩︎
AFAICT the issue has been fixed somewhere around commit
6498240fd590654cc9f7dd9aedc17c0dbc867c2b, but I kinda lost myself in the commit history so this might not be an accurate estimate. In any event, as of this writing the latest version of VTK is 9.0.1, so it's a safe bet they've fixed it. ↩︎
Hey, why should AV vendors have all the fun? 😈 ↩︎