click bottom right corner below to fullscreen

A Comprehensive Survey

AI in Games

From Behavior Trees to Evolved Creatures: Three Paradigms of Game Intelligence

01

Three Paradigms

Traditional

GameAI

Authored rules. Reactive behavior. Cannot plan. Knowledge in code.

Machine Learning

ML4Games

LLM-powered agents. Can plan. Individual learning. Knowledge in memory.

Artificial Life

AL4Games

Evolved behavior. Population learning. Knowledge in gene pool.

02

Key Distinctions

Planning vs Reacting

GameAI reacts to current state. ML4Games decomposes goals into subgoals, reasons about sequences, adjusts when circumstances change.

Learning Timescale

ML4Games: individual accumulates experience within lifetime. AL4Games: population accumulates fitness across generations. Individuals disposable.

03

Part I

Traditional Game AI

Authoring behavior explicitly. Designer anticipates situations, writes responses, agent executes deterministically. Given same inputs, same outputs.

Strength: debuggable, designable

Limitation: brittle, bounded, cannot plan

04

Decision Trees

The bedrock: a flowchart where each node tests an attribute, each branch represents an outcome, each leaf triggers an action.

if (player_visible) {
  if (health > 50%) → Attack
  else → Flee
} else → Patrol

Strength: Absolute predictability. Crucial for debugging.

Weakness: Exponential growth. "Spaghetti code" as complexity rises. No state memory without external variables.

05

Behavior Trees

Industry standard since Halo 2 (early 2000s). Now built into Unreal Engine and Unity.

Composites

Selector: Try children until one succeeds. Sequence: All must succeed.

Decorators

Conditionals that gate branches. Can abort running branches when conditions change.

Blackboard

Working memory: TargetActor, LastKnownLocation, AmmoCount. Separates logic from data.

06

Behavior Trees

Case Study: Halo 2 Combat AI

Halo 2's Covenant Elites demonstrated BT reactivity:

  • Engage in combat normally
  • Shield drops below threshold → decorator triggers "Flee to Cover" branch
  • Attack sequence aborts mid-execution
  • Shields recharge → re-engage

The behavior felt dynamic but was entirely reactive—the Elite wasn't planning to retreat and re-engage. It was responding to a threshold being crossed.

07

State Trees (UE5)

Bridge between Finite State Machines and Behavior Trees. Hierarchical state machines with selector logic.

Key Difference

BTs constantly re-evaluate from root. State Trees rest in a state—near-zero CPU cost while in-state. Only transition when conditions met.

Best For

Long-duration behaviors: "Sleep for 8 hours," "Harvest Field." Quest logic, NPC schedules, high-level life simulation.

08

State Trees (UE5)

Part II

Utility AI

Traditional architectures are binary—True or False. Biological decision-making operates on gradients.

An agent isn't "Hungry (True/False)"—it's "73% Hungry".

09

Utility AI

The Philosophy

Instead of tree structures, maintain a flat list of actions (Eat, Sleep, Attack, Patrol, Flee).

Every tick: score each action (0.0 to 1.0) based on context, select the highest.

Designers define what the agent values, not what the agent does. Behavior emerges from competing gradients.

Still reactive (no planning), but more organic than binary trees.

10

The Philosophy

Response Curves

Raw inputs pass through curves that encode personality:

Linear

"Desire to flee" scales directly with "Number of Enemies."

Sigmoid

Stays low until threshold, spikes, plateaus. Ignore low health until below 30%, then "Heal" skyrockets.

Exponential

Increasingly urgent as input grows. Perfect for starvation mechanics.

Scores combine via multiplication: Score = HealthFactor × DistanceFactor × AmmoFactor. Zero vetoes the action.

11

Response Curves

Influence Maps

Extend Utility AI into spatial reasoning.

A grid overlay where each cell stores abstract values: danger level, resource density, territorial control, recent player activity.

  • Threat map: "Danger" propagates outward from enemies, decaying with distance
  • Resource map: Where supplies cluster
  • Exploration map: Areas not visited recently

Agent follows gradients—naturally avoids high-threat areas, gravitates toward resources—without complex pathfinding.

12

Influence Maps

Case Study: The Sims (Maxis)

The Sims (the 2000 game by Maxis) popularized Utility AI. Sims have needs (Hunger, Bladder, Social, Fun) that decay over time. Every object advertises which needs it satisfies.

Note: The Sims (game) is unrelated to Karl Sims (ALife researcher featured later in Part IV).

Emergent "morning routine":

  • Wake up with high Bladder, high Hunger, low Energy
  • Toilet scores highest (urgent Bladder)
  • Then kitchen (Hunger)
  • Then socializing if another Sim is around

No "morning routine" script exists. The routine emerges from need dynamics.

13

The Sims

The Fundamental Limitation

Traditional & Utility AI cannot plan.

They react to the current frame. They do not:

  • Simulate future states ("If I shoot, will he take cover?")
  • Reason backwards ("To get the key, I first need the rope")
  • Construct multi-step strategies

They are Execution Engines, not Reasoning Engines. To get planning, we need Machine Learning.

14

Part III

ML4Games

Machine Learning for Games. Unlike traditional AI where behavior is authored, these agents learn, reason, and generate.

PLANNING SPECTRUM LEARNS IN LIFETIME KNOWLEDGE IN MEMORY
15

The Planning Capability

ML4Games agents can plan to varying degrees:

  • Implicit Planning: RL agents "plan" by learning value functions that predict long-term reward.
  • Explicit Planning: LLM agents decompose goals into subgoals and reason about sequences directly.

This spectrum allows for behaviors ranging from "instinctive" mastery (RL) to "reasoned" strategy (LLMs)—far beyond simple reactive pattern-matching.

16

Reinforcement Learning

The "brute force" planner. Agents learn policies by maximizing reward functions through trial and error (millions of episodes).

Pros

Superhuman micro-skills (Dota 2, StarCraft). Emergent strategies like tool use or exploiting physics glitches.

Cons

Sample inefficient (needs lifetimes of practice). Brittle (catastrophic forgetting if rules change). Black box.

State of the Art: Softmax is reviving RL at megascale, integrating theory of mind to achieve "organic alignment" rather than rigid control. RL may yet have the last laugh.

17

Reinforcement Learning

Dialogue Agents: Convai

Bridges game engines with LLMs. NPCs that converse dynamically, not static dialogue trees.

Perception

Speech recognition + visual context. Knows who's speaking and what's nearby.

Cognition

RAG queries knowledge base, injects facts + persona into prompt.

Action

LLM generates text → speech → lip-sync. "Open the gate!" triggers Unreal events.

The LLM becomes a natural language controller—players interact via voice, not buttons.

18

Convai

Behavioral Agents: Voyager

Voyager (NVIDIA/Caltech, 2023) is the first LLM-powered embodied lifelong learning agent.

It solves the "Exploration Problem" in open worlds without Reinforcement Learning.

  • RL Struggle: In Minecraft, finding a Diamond requires thousands of steps. RL fails because the reward signal is too sparse.
  • Voyager Solution: Uses GPT-4 not just to "talk," but as a coding engine that interacts with the game via API (Mineflayer).
19

The Voyager Loop

1. Curriculum

GPT-4 analyzes inventory and state to propose the next logical task. "I have cobblestone, I should craft a stone pickaxe." Maximizes exploration.

2. Self-Correction

Writes JavaScript code. If execution fails, the error message & environment feedback (e.g., "inventory full") is fed back to GPT-4. It reasons and rewrites the code to fix the bug.

3. Skill Library

Successful programs are stored in a vector DB. Complex tasks query this library to retrieve and compose executable code (e.g., "Combat" reuses "Equip Weapon").

20

Voyager: White-Box Mastery

Stats: 3.3x more unique items, 15.3x faster tech tree traversal than prior SOTA.

Zero-Shot Transfer: Because skills are stored as generic code (not neural weights), a new Voyager instance in a new world starts with a library of mastered skills (e.g., "craft_sword"), solving complex tasks immediately.

Interpretability: We can read the code to see how it thinks. RL is a black box; Voyager is a white box.

21

Voyager

Visual-Language-Action: SIMA

DeepMind's SIMA (2024) is a Universal Interface for game AI.

The Breakthrough: It decouples the agent from the Game Engine API. It plays 9 different games using only:

  • Input: Screen Pixels + Natural Language Instructions.
  • Output: Keyboard & Mouse events.

This makes every game a potential training ground, enabling a "Foundation Model" for motor control.

22

SIMA 2: The Generalist Agent

Nov 2025. SIMA 2 embeds Gemini to create a true Generalist Agent.

Previous agents were savants: AlphaStar could play StarCraft but not Go. SIMA 2 can play any game it can see.

Visual Grounding

It links language ("chair") to pixels. It understands complex, open-ended instructions: "Find a spot with a good view and build a campfire."

Zero-Shot Transfer

Because it understands concepts (gravity, inventory, enemies) rather than just patterns, it can play a brand new game on Day 1 without training.

23

SIMA 2

The Ultimate Test: Genie 3

DeepMind paired SIMA 2 with Genie 3 (a model that generates playable 3D worlds from prompts).

This creates a closed loop of AI-generated reality:

  • World Generation: Genie 3 dreams up a new environment (e.g., "A platformer with ice physics") from a text prompt.
  • Agent Adaptation: SIMA 2 plays it. It uses Gemini to "see" and "reason" about the new physics zero-shot.
  • Self-Improvement: The agent improves its policy purely by playing in these dreamed worlds, without human data.
24

Generative Agents: Smallville

Stanford 2023: populated a small town with 25 LLM-powered agents, each with unique personalities, memories, and relationships.

The architecture enables emergent social behavior without explicit programming.

25

Smallville

Smallville: Architecture

1. Memory Stream

A comprehensive list of every observation and thought. Scored by Importance (LLM-rated poignancy).

2. Retrieval

Context is pulled via Score = Recency + Importance + Relevance. This prevents the "goldfish effect" by surfacing only pertinent memories.

3. Hierarchical Reflection

Synthesizes memories into higher-level thoughts. ObservationsInsightsPersona. Creates depth of character.

4. Planning

Recursively decomposes high-level goals ("Throw party") into hourly schedules and minute-by-minute actions.

26

Smallville: Emergent Behavior

The core emergent phenomenon was Information Diffusion.

  • Origin: One agent (Isabella) was told "I'm throwing a Valentine's Party."
  • Diffusion: She mentioned it to friends -> they checked their calendars -> they invited others.
  • Result: 12 agents showed up at the right place/time. 5 formed rejections based on relationships.

The "party" wasn't scripted; it was the result of information propagating through the social graph via the memory/retrieval loop.

27

Part IV

AL4Games

Artificial Life for Games. Behavior emerges through selection pressure across generations.

POPULATION LEARNING KNOWLEDGE IN GENES INDIVIDUALS DISPOSABLE
28

The Evolutionary Paradigm

Genetic Algorithms optimize through death.

A single Bibite doesn't get smarter over its lifetime—born with neural network fixed. But the population gets smarter as successful variants reproduce and unsuccessful ones die.

Advantages

No catastrophic forgetting. Parallel exploration. No reward function needed—survival IS the reward.

Trade-offs

Slow adaptation (generational timescales). Can't react to novel situations within lifetime.

29

Karl Sims (1994): Body + Brain

"Evolving Virtual Creatures" demonstrated the first co-evolution of morphology and neural control. Both body shape and brain architecture emerge together.

The breakthrough: Distributed Control.

Unlike traditional approaches where one central brain controls the whole body, Sims gave each body part its own mini-brain. When evolution adds a new limb, that limb arrives with its own sensors, actuators, and local neural circuit already wired.

This enabled modular complexity—a leg gene with a recursion counter produces a centipede. No central planner needed.

30

Karl Sims (1994)

Karl Sims: Emergent Strategies

Locomotion: In water—streamlined shapes, undulation, sculling. On land—legs, hopping, rolling. One creature evolved a horse-like gallop with coordinated four-limb gait.

Competition: Two creatures compete for a cube (fitness = proximity):

  • Rushing: Get there first
  • Blocking: Wide body between opponent and cube
  • Covering: Sprawl over cube
  • Pinning: Immobilize opponent
  • Keepaway: Grab cube and move it

None programmed—emergent solutions to fitness functions in physics.

31

Neuroevolution (NEAT)

NeuroEvolution of Augmenting Topologies. Traditional Deep Learning trains weights on a fixed architecture. NEAT does something radical: it evolves the architecture itself.

The key insight: Start simple and let complexity emerge only when needed.

  • Starts Minimal: Direct input-to-output connections (no hidden layers).
  • Grows Complexity: Mutations add neurons and connections only when simpler structures fail.
  • Protects Innovation: New structures get time to optimize before competing with established solutions.

Brain complexity scales naturally with task difficulty—no designer tuning required.

Limitation: Sample inefficient (needs thousands of generations). Evolution is generate-and-test—no gradient signal means it's searching blind.

32

NEAT

The Bibites: Digital Metabolism

State-of-the-art behavioral ALife. The critical innovation isn't just the brain (NEAT), but the Metabolism.

Bibites don't just "eat to score points." They eat to maintain energy balance. Every action (moving, sensing, thinking) costs energy.

Energy In

Consumption of plants or other Bibites. Photosynthesis (if evolved).

Energy Out

Basal metabolic rate + Motion cost + Brain cost. Larger brains cost more energy!

33

Bibites: Emergent Behaviors

Emergent Herding

Initially random pheromone emission. Mutation links "high energy" → "emit pheromone." Successful foragers signal. Others evolve to follow gradients. Groups become efficient.

Designer provided signaling capacity; evolution invented the protocol.

Predator-Prey Arms Race

Mutation connects "see Bibite" → "attack." Predation emerges. Prey evolve faster acceleration, evasion. With Sin nodes, prey evolve juking patterns that break pursuit.

34

The Bibites

Species ALRE: Macro-Evolution

Unlike Bibites (micro-behavior focus), Species: ALRE simulates population dynamics and speciation.

Real speciation: The engine tracks genetic distance between sub-populations. When clusters diverge beyond threshold—via geographic isolation or divergent selection—they're classified as distinct species. Real-time cladograms visualize the tree of life branching.

Transparent Design

Uses Utility AI, not neural networks. Genome defines stats (Metabolism, Aggression, Speed); behavior logic is fixed code. What evolves is propensity—peaceful lineages evolve low Aggression, predators evolve high.

The Rover

Player avatar for artificial selection. Feed creatures, cull others, relocate populations, terraform terrain. You become selection pressure—discovering how hard it is to engineer evolution toward specific outcomes.

Trade-off: Species is legible (you know exactly why creatures behave as they do) but can't surprise you. Bibites is opaque but invents behaviors the designer never anticipated.

35

Species ALRE

Special Mention: Rain World

Not "Evolved" (it uses Behavior Trees/Utility), but represents the pinnacle of Ecosystem Simulation.

The Illusion of Life

Creatures have needs (eat, sleep, avoid rain) and persistent relationships. They hunt each other, not just the player.

Procedural Animation

Creatures physically navigate geometry using inverse kinematics, making them feel like heavy, living animals, not floating hitboxes.

36

Rain World

Part V

Frontiers

The loop closes: Self-Modifying Games.

SELF-PATCHING NEW GENRES
37

The Self-Patching Game

What happens when agents can rewrite their own source code?

The Loop

Player finds an exploit (e.g., standing on a rock). The AI analyzes the telemetry, diagnoses the failure, and patches the bug in real-time—either by rewriting navigation logic or altering the geometry.

The Meta

The game doesn't just get harder; it evolves to counter the player base. A living, breathing codebase that immunizes itself against cheese tactics.

Proof of Concept: Sakana AI's Darwin Gödel Machine improved its own coding performance (SWE-bench) from 20% → 50%. It didn't just tune weights; it rewrote its own prompt engineering and search strategy, validated every change empirically, and committed the improvements.

38

The Self-Patching Game

Agents, Populations, and Worlds

Evolved Architectures

Don't hand-design Behavior Trees—evolve them. Populations of decision structures adapt to level layouts.

The Living Lore

Myths become mechanics. A rumor started by an NPC becomes a quest, then a faction, then a war.

Nemesis System 2.0

Enemies that evolve tactics to exploit your specific playstyle. You snipe? They evolve carapace plating.

The A/B Testing God

Game runs 1,000 nightly sims to tune drop rates and difficulty curves automatically for max engagement.

Co-Evolving Civs

Brains, bodies, and culture evolving together. Mini-civilizations that rise, war, and fall based on geography.

Agent Economies

Inflation isn't a variable. It emerges from 10,000 agents with biological needs trading goods for calories.

The Bicameral Agent

System 1: Fast Utility AI handles combat (200ms). System 2: LLM handles strategy/dialogue (2s).

Permadeath Content

Evolution requires death. If a game zone is ignored by players, it should rot and be recycled into something new.

39

Failure Modes and Guardrails

Reward Hacking

Agents exploit scoring/bugs. Guardrails: adversarial test suites, holdout maps, objective unit tests, human veto.

Monoculture / Collapse

Diversity vanishes → brittle meta. Guardrails: speciation, novelty search, immigration, entropy constraints.

Grief / Policy Violations

Toxic tactics or content. Guardrails: content filters, policy checks, red‑team arenas, “no‑bleed” sandboxes.

Live‑Ops Regression

Fun metrics drop post‑update. Guardrails: canaries, kill‑switch, rollback, telemetry dashboards.

Compute / Cost

Evolution too expensive. Guardrails: surrogate models, distillation, small‑batch EAs, parallel sims.

40

The Trajectory

From Explicit Instruction to Implicit Intent.

From Machine-like to Lifelike.

  • Traditional GameAI: Tell the machine exactly how. Cannot plan.
  • Utility AI: Tell the machine what you value. Behavior emerges from curves.
  • ML4Games: Tell the machine who it is. It figures out what to say and do. Can plan.
  • AL4Games: Tell the machine the rules of existence. Behavior, morphology, strategy emerge.
  • The Frontier: Combine them all. Self-modifying architectures that improve their own improvement processes.

The role of the developer shifts from architect to gardener—planting seeds of intelligence and watching worlds grow that are wilder than anything hand-built could be.

41