In my continuing efforts to confuse the linear nature of this blog, today I’m documenting the process of fixing a bug that I’ll be showing off in Wednesday’s upcoming video, recorded last Friday. So it goes.
The bug looked like this.
I had seen this bug a handful of times over the last few months, dating back to October when I initially implemented shallow water. I first noticed it when bombs began falling through the world, and I was later able to reproduce it with the player character. But it popped up so rarely (largely due to the absence of water in most of my test maps) that I kept forgetting or choosing to ignore it.
I did finally find a consistent repro, as seen in the GIF above, and that has allowed me to fix the bug. I’ll get to the fix in a bit, but first let’s take a look at what’s happening here.
I fabricated this repro case based on an observation that this tended to occur when the player hit the corner of a solid tile adjacent to water while falling. This didn’t immediately offer any indication of what might be causing the bug, but it did help me find a more consistent repro. As it turned out, the exact situation necessary to reproduce this issue was slightly framerate dependent and involved the player character moving far enough in a single tick to collide against both the surface of the water and the adjacent wall, as shown below.
Once I’d had this realization, I found it even easier to reproduce the bug, and I could begin tracking down exactly why the player fell out of the world when this occurred. To fully explain why this was, I need to start by explaining how some parts of my collision system work.
As I discussed in one of my earliest posts on this blog (and these later ones) the collidable surfaces of the environment are dynamically generated (and then cached and retrieved for better performance) in response to sweep attempts made through the world. I chose to implement this by representing the world’s collision as a single primitive that may be represented by arbitrary geometry depending on the nature of the thing colliding against it. This is unusual in comparison to most other things in the game, which are typically represented by a single box primitive that may be decomposed into its four sides for sweep tests.
When a sweep test collides against something solid (a “blocker” in my engine’s vocabulary), the sweep path is interrupted and possibly redirected based on how the sweeper wants to react. A typical reaction is to slide along the blocking surface until another collision occurs.
When a sweep test collides against something non-solid, things work a little differently. We cache off the result of the collision such that we can react to it, but we do not interrupt the sweep and instead continue testing against everything else in our path.
The world’s dynamic collision primitive is once again highly unusual in that it may consist of both solid (blocking) and non-solid surfaces. Floors and walls that the player can stand on or collide against are solid. Water is non-solid. This dual nature turned out to be a fundamental part of the problem.
If, in a single tick, the player character’s sweep was to pass through both the surface of the water and the adjacent wall, here’s how it would go: First, we would touch the surface of the water. We would react to this appropriately, altering our physical constants to slow our movement. The surface of the water would be flagged as non-solid, so we would continue our traversal. In fact (and this may be an error in and of itself, and certainly something to investigate in any case), we would effectively repeat the last step of the sweep, beginning to end, rather than continuing from the last known point of intersection. We’d test against every primitive in our range, including the world itself, but here’s where the bug was: We’d see that we had a previous collision result against the world, and that that result had been non-solid, and we’d assume therefore that the entire primitive — that is to say, the entire world — were also non-blocking and that it would be redundant to test it further. We’d opt out of any more tests against it (tests which would otherwise have collided against the adjacent wall), and the sweep would continue unhindered. The player character would pass through the wall unobstructed, and we’d never get a collision result saying we had exited the body of water, so the physical constants would remain altered.
If that seems like a complicated mess to digest in written form, I can assure you the code itself does little to elucidate the matter. My collision code has never been pretty, and though I’ve made some attempts to reduce its size by breaking common patterns out into their own functions, game by game it’s grown into a monster stretching across multiple files and projects. My concern as I moved from understanding the bug to fixing it was that I might inadvertantly break other cases without realizing.
In order to minimize the risk of introducing side effects, I tried to alter the program flow only in this one particular case. I added hooks to the core collision system to allow this case to be trapped and handled by other code elsewhere. When checking whether a primitive is redundant by virtue of having previously generated a non-solid collision result, I now allow the primitive itself the chance to forgo this early out if it believes itself to be important. In this way, the world’s dynamic collision primitive can be tested again even when a previous collision occurred against a water surface in the same tick. The other side of this coin, however, is that the primitive must then ensure that it actually does not produce redundant results, as the sweep test could enter an infinite loop if it did. Any time a collision result is generated, it’s compared against previous results if and only if the previous result were one that could have prompted this retest. If a match is found, the redundant result is discarded and the sweep continues as normal.
All said, it took a few hours to find and fix this bug, which isn’t too bad considering its nature. Along the way, I also discovered another bug that I initially thought to be related (and probably was in some fashion, although its fix was simpler). This turned out to be a silly mistake in offsetting collision surfaces relative to tile boundaries, such that the player was reacting to exiting a body of water that they had not previously entered when standing directly over it and jumping.
Fixed that one too. Game quality plus plus.