THE INNER PRODUCT // NOEL LLOPIS
DATA-ORIENTED DESIGN
OR WHY YOU MIGHT BE SHOOTING YOURSELF IN THE FOOT WITH OBJECT-ORIENTED PROGRAMMING
PICTURE THIS: TO WARD THE END OF THE DEVELOPMENT C YCLE, YOUR GAME
crawls, but you don’t see any obvious hotspots in the profiler. The culprit?
Random memory access patterns and constant cache misses. In an attempt
to improve performance, you try to parallelize parts of the code, but it takes
heroic efforts, and, in the end, you barely get much of a speed-up due to all
the synchronization you had to add. To top it off, the code is so complex that
fixing bugs creates more problems, and the thought of adding new features
is discarded right away. Sound familiar?
That scenario pretty accurately describes almost every game I’ve been
involved with for the last 10 years. The reasons aren’t the programming
languages we’re using, nor the development tools, nor even a lack of
discipline. In my experience, it’s object-oriented programming (OOP) and the
culture that surrounds it that is in large part to blame for those problems.
OOP could be hindering your project rather than helping it!
IT’S ALL ABOUT DATA
» OOP is so ingrained in the current game development culture that it’s
hard to think beyond objects when thinking about a game. After all, we’ve
been creating classes representing vehicles, players, and state machines
for many years. What are the alternatives? Procedural programming?
Functional languages? Exotic programming languages?
Data-oriented design is a different way to approach program design
that addresses all these problems. Procedural programming focuses on
procedure calls as its main element, and OOP deals primarily with objects.
Notice that the main focus of both approaches is code: plain procedures
(or functions) in one case, and grouped code associated with some
internal state in the other. Data-oriented design shifts the perspective of
programming from objects to the data itself: The type of the data, how it is
laid out in memory, and how it will be read and processed in the game.
Programming, by definition, is about transforming data: It’s the act of
creating a sequence of machine instructions describing how to process the
input data and create some specific output data. A game is nothing more than
a program that works at interactive rates, so wouldn’t it make sense for us to
concentrate primarily on that data instead of on the code that manipulates it?
I’d like to clear up potential confusion and stress that data-oriented
design does not imply that something is data-driven. A data-driven game
is usually a game that exposes a large amount of functionality outside
of code and lets the data determine the behavior of the game. That is an
orthogonal concept to data-oriented design, and can be used with any
type of programming approach.
IDEAL DATA
» If we look at a program from the data point of view, what does the ideal
data look like? It depends on the data and how it’s used. In general, the
ideal data is in a format that we can use with the least amount of effort. In
the best case, the format will be the same we expect as an output, so the
processing is limited to just copying that data. Very often, our ideal data
layout will be large blocks of contiguous, homogeneous data that we can
process sequentially. In any case, the goal is to minimize the amount of
transformations, and whenever possible, you should bake your data into
this ideal format offline, during your asset-building process.
Because data-oriented design puts data first and foremost, we can
architect our whole program around the ideal data format. We won’t always
be able to make it exactly ideal (the same way that code is hardly ever by-the-book OOP), but it’s the primary goal to keep in mind. Once we achieve
that, most of the problems I mentioned at the beginning of the column tend
to melt away (more about that in the next section).
When we think about objects, we immediately think of trees—
inheritance trees, containment trees, or message-passing trees, and
our data is naturally arranged that way. As a result, when we perform an
operation on an object, it will usually result in that object in turn accessing
other objects further down in the tree. Iterating over a set of objects
performing the same operation generates cascading, totally different
operations at each object (see Figure 1a).
To achieve the best possible data layout, it’s helpful to break down each
object into the different components, and group components of the same
type together in memory, regardless of what object they came from. This
organization results in large blocks of homogeneous data, which allow us to
process the data sequentially (see Figure 1b).
A key reason why data-oriented design is so powerful is because
it works very well on large groups of objects. OOP, by definition, works
on a single object. Step back for a minute and think of the last game
you worked on: How many places in the code did you have only one of
something? One enemy? One vehicle? One pathfinding node? One bullet?
One particle? Never! Where there’s one, there are many. OOP ignores that
and deals with each object in isolation. Instead, we can make things easy
WWW.GDMAG.COM 43