Sunday, 4 December 2016

A Case Of Non-Randomness - Part 2

The Problem

I was just starting to set up the procedures needed for rows of buildings when I hit a problem.
Bad randomness showing up in row of buildings
These are supposed to be random heights!  I've had a nagging feeling that my random number generation operators weren't working properly and had noticed a few artefacts before, but dismissed them as me being overly sensitive to patterns.  This horrible patterning however tipped me over the edge and I had to investigate.


A great way to sanity check random numbers is to visualise a sequence as a grid of intensity values (see Runes blog post again).  So I put together a couple of nested procedures to build a 2D grid of tiles, each coloured black-to-white according to the random function under scrutiny.
Random number sequence visualisation using geometry
This instantly revealed unpleasant patterns in the number sequences.  Trying different seeds showed that most of them seemed to have regular repeating pattern or clearly be restricted to lower or upper value ranges.
Bad patterning in the random number generation


My first avenue of investigation was to reproduce the algorithm explicitly in the test procedure using the same sequence of operations as the random generation library code.
Explicitly implemented random number generator
I hoped that this would allow me to play with the algorithm to find the bug.  However, this version produced very nice random values, with no discernible patterns.  Carefully picking through the implementation I couldn't see any functional difference.
Next I added numerical display of the seed and value for each tile and zooming in on the start of the first row I noticed that they did agree for a few values, before diverging.  This seemed to coincide with the seed (handled as a signed integer) going negative.  I wasn't sure how this would affect the algorithm though so I switched to debug build and stepped through the for this case.
After some debugging runs and breaking open the algorithms a bit I discovered what was going on.


My entropy extraction macro (see Hash part of Random Value Generation Process in the previous article) uses right shift operations >> to re-align the overlapping bits, but I had forgotten that this will perform a sign extended shift if the operand is a signed value.  That is, if the most significant bit is 1, indicating a negative number, then 1's are shifted in instead of 0's.  This situation caused the extraction algorithm to XOR chunks of 1's into the seed values.  XOR has the interesting (and normally useful) property that successive applications of 1's causes toggling.  This fact combined with shifts used set up a repeating oscillation in some chunks of bits.  It was completely believable that this could produce the heavily striped and range limited results I was seeing.


The fix here was relatively straight forward as the behaviour of >> on unsigned integers always shifts in zeros into the upper bits, as was actually required.  Interestingly, the reason we didn't see this problem on the explicit implementation was that I had made sure the shift operation always used the unsigned shift.
Casting the operands to unsigned integer prior to each shift fixed this and the two implementations now agreed, AND also produced the randomness they were designed to generate.
Much better random sequences

Slight Red Herring

Going back to the building height problem I discovered that this hadn't fixed it, the result was exactly the same!  So, I had managed to find and fix a real problem, just not the one that was causing my issue.  Oops!

The Real Culprit

Checking the building height calculation I realised that I was relying on the variant parameterisation of the Random Twist operator to generate random values from the building index along the row.  The reason it was designed to work like this is that I needed a way to generate unique seeds for each building from the single seed provided to the row as a whole.  The only distinguishing information the buildings have at this point where they are in the row, i.e. their index.  The twist operation seemed to be perfect here as the variant input was intended for exactly this sort of operation.
Using twist variant input to derive per plot variation in a row of houses
Unfortunately this didn't seem to produce very good results for a sequence of successive variant values.  Adding a visualisation of the twist operation with the tile index driving the variation input again showed up this lack of randomness.
Patterning from the Random Twist operator
In hindsight this is one of the issues Rune talks about in his article; close or sequential seeds don't necessarily produce sequences that are very random relative to each other.  He suggests a decent hash of the input to better seed the sequences.

A Better Twist

This time the patterning wasn't as strong, but it was certainly there.  Looking into the Twist implementation I realised the mixing method of the variant value was far too simplistic and I was relying on small changes in variant input to produce significant changes to the output seed and hence number sequence.  This was obviously not the case and I needed a better mixer.
// jump to another point in the
// random stream (parameterised)
int Random::Twist( int input, int variant ) 
  return Random::Algorithm::Next_LCG_ANSIC( input^variant ); 
After some research, I realised that the property I was after is referred to as 'avalanche', i.e. where small input changes cause large output changes, and was a common desirable feature of hashing algorithms.  Again, rather than investigate and implement my own solution I chose a publicly available integer hashing function that claimed very good avalanche characteristics.
Replacing the mixer generation with this in the Twist operator proved very successful and finally achieved a good random variants again.
Much improved twist variant output

Fixed Buildings

Going back to my row of buildings again, with the above changes produced much, much more satisfactory results.
That's much better
I expect to see improvements throughout my projects as a result of these fixes, some will be more subtle than others.  I'm glad I took the time to investigate this as it's resolved a niggle that I couldn't quite put my finger on.


Procedural generation is typically required to be completely deterministic so that, for example, every time you re-visit a game the world is familiar to you, or that every player of a game sees the same world.  This is an important feature of a game such as No Man's Sky, a vast, hugely varied procedural universe that is the same for every player.
Fixing the bugs in my random number generation operators means I have changed the behaviour of something fundamental to all the generation processes across my projects.  This in turn means that all the content they generate will now be different, the content they generated before has been lost.  Forever.
The deterministic nature of the processes involved in procedural generation, combined with their complexity means they are hugely susceptible to the butterfly effect, that small changes to any part (all parts are effectively inputs) can have huge effects on the output.
I have often wondered how many times Hello Games has made changes to No Man's Sky that have completely invalidated the current universe it is set in.  How many versions of universe have they gone through to get to the one we play?  I dread to think how much of a problem this is for ongoing work where they can't change or even fix bugs in so much of the game for fear of breaking the universe.

“There is a theory which states that if ever anyone discovers exactly what the universe is for and why it is here, it will instantly disappear and be replaced by something even more bizarre and inexplicable."
"There is another theory which states that this has already happened.”
- Douglas Adams, The Restaurant at the End of the Universe

No comments:

Post a Comment