Procedural Building Generation in Unreal Engine 3

bhickey · on Dec 13, 2010

The results look amazing! I've always wished SimCity had done this.

The semantics for describing buildings seem a bit clunky. CityEngine uses (or used) shape grammars implemented as stochastic context-free grammars for building generation.

Suppose we want to design some buildings: Building ::= [1.0 First Floor + Rest of Floors] First Floor ::= [0.4 Door | 0.6 Door + Windows] Rest of Floors ::= [0.6 Roof | 0.4 Windows + Rest of Floors]

With three rules we can churn out buildings, where 60% are 1 story, 24% are 2 story, 14.4% are 3 story, and so on.

Here's a video of Pompeii built from a shape grammar. [https://www.youtube.com/watch?v=iDsSrMkW1uc]

smcl · on Dec 13, 2010

This is a really interesting insight into an aspect of a game I'd normally never give much though to. Very good article indeed.

However I noticed something in this that I HATE seeing in real life, variables (ok "properties", here) prepended with a character or characters indicating type (seems only booleans here, bBlah and such). I honestly can't say why it bothers me, but it makes code really tough to read. It really shouldn't, I mean it technically should be useful as it gives a bit more info to the person reading the code, but I find it infuriating. I wonder if anyone else has this possibly irrational hatred.

wmf · on Dec 13, 2010

That's https://en.wikipedia.org/wiki/Hungarian_notation — it seems to be popular among Windows programmers.

wmf · on Dec 13, 2010

If you like that, check out Subversion. Not only does it procedurally generate cities, but they look totally sci-fi. https://forums.introversion.co.uk/introversion/viewtopic.php?...

johnwatson11218 · on Dec 13, 2010

What do people think about the results of procedural generation in terms of information theory? If I get a CD that has code on in that lets me generate level upon level how much information is that CD carrying?

For doing data compression has anyone ever heard about using these techniques to generate other things besides 3D game levels?

jerf · on Dec 13, 2010

"If I get a CD that has code on in that lets me generate level upon level how much information is that CD carrying?"

The information content of something is rigidly bounded by the size of the code and the size of the input. Compress the CD's image with an "optimal" compressor (don't forget the decompression code in the total), add a couple of bytes for your random seed, done.

It doesn't matter how gloriously complicated the end result may be. The fact that you can precisely reconstruct the end result with the contents of the original CD, plus your randomization seed, plus your psuedorandom algorithm, means that that is all there is.

You can add more information by pulling from some random source that is irreducible, such as a quantum random number stream. Then, to reconstruct the result, you'd need to be fed the previously-random numbers as well.

It isn't actually possible for computation to add information. It is always bounded by the computation specification plus the input data. It can chew it and mangle it and present it and yea verily project entire universes from it, but the very definition of a computation's information content is basically the minimal seed necessary to recreate it.

See https://lesswrong.com/lw/vh/complexity_and_intelligence/ , which, incidentally, directly takes on your question about fractals in a subsequent post.

wmf · on Dec 13, 2010

It sounds like you're describing fractal compression. It's true that fractals can generate apparently complex output from small input, but if you're trying to compress a specific piece of data (like, say, a map of New York City) they don't seem to be much help.

slackito · on Dec 13, 2010

> For doing data compression has anyone ever heard about using these techniques to generate other things besides 3D game levels?

These techniques are routinely used in the demoscene for making 4Kb/64Kb intros. In this kind of intros almost everything is generated procedurally, not only meshes, but also textures and sound.

If you are interested you can watch one of these examples:

- Elevated (4K intro) Binary: https://www.scene.org/file.php?file=%2Fparties%2F2009%2Fbreak... Youtube video: https://www.youtube.com/watch?v=_YWMGuh15nE

- .kkrieger (96K game) Binary: https://kk.kema.at/files/kkrieger-beta.zip Gameplay video: https://www.youtube.com/watch?v=z5U4hEzZzaE

jacques_chester · on Dec 13, 2010

I'm not an information theory expert, but my hunch is that this has "poor" compressibility under the purely random scenario. Here's why.

The point of information is that you want to convey a particular piece of information. A particular sequence of bits, say. Compressibility is about how many 'physical' bits are needed to convey the original information.

When cities are random, your odds of getting a particular city you might have wanted are very low indeed. This means that the channel is noisy and you will need very many repeats to get the original information.

If on the other hand the randomness is from a PRNG, you could ship the engine and the seed value and go from there. The compression ratio would depend largely on how large map files are vs the engine + seed.

In one sense this is a classic time and space tradeoff. In the case of randomly generated maps it's about externalising the cost of map making onto the user.

partition · on Dec 13, 2010

You touch on a very interesting and general concept, and that is the correspondence between lossless compression algorithms and probability distributions, and the subsequent correspondence between probability distributions and procedural generation algorithms.

Consider the first correspondence. Suppose you knew a certain class of strings arose from a probability distribution. Then you can talk about the strings that are most likely to occur, or, more useful in general, about the individual characters that are most likely to occur given a history of previous characters. Using this, you can make a compression algorithm that will map (symbol, probability) or (symbol + history, conditional probability) starting from the highest probability, to the numbers 0, 1, 2 ...

This is arithmetic coding.

https://en.wikipedia.org/wiki/Arithmetic_coding

The result is optimal in terms of expected compressed string length assuming the strings you use this on really do come from that distribution.

Now consider the other way around. Suppose you have a lossless compression algorithm. Find the (uncompressed symbol, compressed symbol) pair (possibly with context) that achieves the highest compression. Assign that a high probability. Then find the next pair. Assign that a slightly lower probability (This can be done in a more principled manner than I'm alluding here). Then you have yourself a probability distribution.

More on data compression theory:

https://en.wikipedia.org/wiki/Data_compression

But what happens once you have a probability distribution over some data type, is that you may cast the problem of automatically generating instance of that data type as sampling from the distribution. Many procedural generation algorithms that give nondeterministic results (and the useful ones do, otherwise the work the modeler has to do is fundamentally the same) can be re-cast as sampling from a probability distribution; look at what the algorithm is generating and learn the distribution.

Note that this is uncomputable in general for the same reason Kolmogorov complexity is. This is known as Solomonoff induction:

https://singinst.org/blog/2007/06/25/solomonoff-induction/

So to answer your original questions:

1. The information (as in information entropy) in the algorithm in the CD is the entropy of the true probability distribution from which your levels originate.

2. And yes, in principle you can use the correspondence between data compression and procedural generation to generate instances of any arbitrary data type, not just 3D game levels. It may be hard to design a probability distribution that will create well-formatted instances though :)

johnwatson11218 · on Dec 13, 2010

Thank you for this thoughtful and detailed reply. I have read about Arithmetic Coding and Data Compression. I haven't read about Solomonoff-Induction but I will definitely read up on it.

I think another way of thinking about my original comment would be to imagine that you had an image of the Mandelbrot Set up on your monitor (assume you have drilled down a few times). If you knew the parameters that had generated that view you could write a very concise description suitable for transmission over a network. If you have forgotten the params and had to make due with a screen print saved to jpeg it would appear to contain more 'information'. Not only would the algorithmic version be shorter (so long as you hadn't drilled down to the point at which the numbers representing the view port had millions of degrees of precision) but you would be able to keep zooming in to view arbitrary detail. That is something you wouldn't be able to do with the jpeg. I guess what I'm driving at is that I don't understand how Information Theory is able to account for things like fractals without saying that either the Mandelbrot Set is random or that it contains infinite information. Am I looking at this problem correctly?

jacques_chester · on Dec 13, 2010

> I guess what I'm driving at is that I don't understand how Information Theory is able to account for things like fractals without saying that either the Mandelbrot Set is random or that it contains infinite information. Am I looking at this problem correctly?

This is (I think) the "coastline paradox": coast lines have different lengths depending on scale. Length grows with resolution, so in a sense the length of a coastline can be infinite.

I think the escape hatch is that the information conveyed is not the points on the screen, but the rule on how to generate those points. Going back to my other post this means that the city builder conveys no information about a particular city map.

partition · on Dec 13, 2010

> I think the escape hatch is that the information conveyed is not the points on the screen, but the rule on how to generate those points. Going back to my other post this means that the city builder conveys no information about a particular city map.

Yep. The amount of information conveyed can be considered 'equivalent' to the length of the shortest algorithm used to get from one string to the other.

The field that is concerned about thinking about this problem in a disciplined manner is algorithmic information theory:

https://en.wikipedia.org/wiki/Algorithmic_information_theory

In particular, they show that the quantity is incomputable (the Komolgorov complexity).

spyder · on Dec 13, 2010

Here is an another cool city generator: https://www.youtube.com/watch?v=VnWUp9JnLyc&feature=relat... This is for the Subversion game that is in development since long ago: https://www.introversion.co.uk/subversion/

razh · on Dec 13, 2010

Direct link to the GDC 2010 slides (.pptx):

https://unrealtechnology.com/Downloads/Slides/GDC2010_Golding...

d5tryr · on Dec 13, 2010

Interesting read, I'd love to get the same insight in to the Assassin's Creed team's methods...

iwwr · on Dec 13, 2010

Would we be seeing live procedural generation of this sort in games in the near future?

teamonkey · on Dec 13, 2010

I don't think that this is 'live' per se, at least it's not real-time. It's a tool to help level designers design city buildings quickly, as opposed to something like Subversion's procedural city.

Either the building mesh is generated procedurally when the level is loaded or it is generated and 'baked' in to the static level mesh at compile time (it's not entirely clear from the doc which method it uses).

bonzoesc · on Dec 13, 2010

If Epic's building it, you can expect it. They're one of the biggest middleware providers for games.

IIRC, many games use a procedural foliage generator for trees and shrubberies.

bhickey · on Dec 13, 2010

You do recall correctly. Speed tree brings us shrubbery to plenty of games. [https://www.speedtree.com/]