click bottom right corner below to fullscreen
A Comprehensive Survey
AI in Games
From Behavior Trees to Evolved Creatures: Three Paradigms of Game Intelligence
01Three Paradigms
GameAI
Authored rules. Reactive behavior. Cannot plan. Knowledge in code.
ML4Games
LLM-powered agents. Can plan. Individual learning. Knowledge in memory.
AL4Games
Evolved behavior. Population learning. Knowledge in gene pool.
Key Distinctions
Planning vs Reacting
GameAI reacts to current state. ML4Games decomposes goals into subgoals, reasons about sequences, adjusts when circumstances change.
Learning Timescale
ML4Games: individual accumulates experience within lifetime. AL4Games: population accumulates fitness across generations. Individuals disposable.
Part I
Traditional Game AI
Authoring behavior explicitly. Designer anticipates situations, writes responses, agent executes deterministically. Given same inputs, same outputs.
Strength: debuggable, designable
Limitation: brittle, bounded, cannot plan
04Decision Trees
The bedrock: a flowchart where each node tests an attribute, each branch represents an outcome, each leaf triggers an action.
if (health > 50%) → Attack
else → Flee
} else → Patrol
Strength: Absolute predictability. Crucial for debugging.
Weakness: Exponential growth. "Spaghetti code" as complexity rises. No state memory without external variables.
05Behavior Trees
Industry standard since Halo 2 (early 2000s). Now built into Unreal Engine and Unity.
Composites
Selector: Try children until one succeeds. Sequence: All must succeed.
Decorators
Conditionals that gate branches. Can abort running branches when conditions change.
Blackboard
Working memory: TargetActor, LastKnownLocation, AmmoCount. Separates logic from data.
Behavior Trees
Case Study: Halo 2 Combat AI
Halo 2's Covenant Elites demonstrated BT reactivity:
- Engage in combat normally
- Shield drops below threshold → decorator triggers "Flee to Cover" branch
- Attack sequence aborts mid-execution
- Shields recharge → re-engage
The behavior felt dynamic but was entirely reactive—the Elite wasn't planning to retreat and re-engage. It was responding to a threshold being crossed.
07State Trees (UE5)
Bridge between Finite State Machines and Behavior Trees. Hierarchical state machines with selector logic.
Key Difference
BTs constantly re-evaluate from root. State Trees rest in a state—near-zero CPU cost while in-state. Only transition when conditions met.
Best For
Long-duration behaviors: "Sleep for 8 hours," "Harvest Field." Quest logic, NPC schedules, high-level life simulation.
State Trees (UE5)
Part II
Utility AI
Traditional architectures are binary—True or False. Biological decision-making operates on gradients.
An agent isn't "Hungry (True/False)"—it's "73% Hungry".
09Utility AI
The Philosophy
Instead of tree structures, maintain a flat list of actions (Eat, Sleep, Attack, Patrol, Flee).
Every tick: score each action (0.0 to 1.0) based on context, select the highest.
Designers define what the agent values, not what the agent does. Behavior emerges from competing gradients.
Still reactive (no planning), but more organic than binary trees.
10The Philosophy
Response Curves
Raw inputs pass through curves that encode personality:
Linear
"Desire to flee" scales directly with "Number of Enemies."
Sigmoid
Stays low until threshold, spikes, plateaus. Ignore low health until below 30%, then "Heal" skyrockets.
Exponential
Increasingly urgent as input grows. Perfect for starvation mechanics.
Scores combine via multiplication: Score = HealthFactor × DistanceFactor × AmmoFactor. Zero vetoes the action.
11Response Curves
Influence Maps
Extend Utility AI into spatial reasoning.
A grid overlay where each cell stores abstract values: danger level, resource density, territorial control, recent player activity.
- Threat map: "Danger" propagates outward from enemies, decaying with distance
- Resource map: Where supplies cluster
- Exploration map: Areas not visited recently
Agent follows gradients—naturally avoids high-threat areas, gravitates toward resources—without complex pathfinding.
12Influence Maps
Case Study: The Sims (Maxis)
The Sims (the 2000 game by Maxis) popularized Utility AI. Sims have needs (Hunger, Bladder, Social, Fun) that decay over time. Every object advertises which needs it satisfies.
Note: The Sims (game) is unrelated to Karl Sims (ALife researcher featured later in Part IV).
Emergent "morning routine":
- Wake up with high Bladder, high Hunger, low Energy
- Toilet scores highest (urgent Bladder)
- Then kitchen (Hunger)
- Then socializing if another Sim is around
No "morning routine" script exists. The routine emerges from need dynamics.
13The Sims
The Fundamental Limitation
Traditional & Utility AI cannot plan.
They react to the current frame. They do not:
- Simulate future states ("If I shoot, will he take cover?")
- Reason backwards ("To get the key, I first need the rope")
- Construct multi-step strategies
They are Execution Engines, not Reasoning Engines. To get planning, we need Machine Learning.
14Part III
ML4Games
Machine Learning for Games. Unlike traditional AI where behavior is authored, these agents learn, reason, and generate.
The Planning Capability
ML4Games agents can plan to varying degrees:
- Implicit Planning: RL agents "plan" by learning value functions that predict long-term reward.
- Explicit Planning: LLM agents decompose goals into subgoals and reason about sequences directly.
This spectrum allows for behaviors ranging from "instinctive" mastery (RL) to "reasoned" strategy (LLMs)—far beyond simple reactive pattern-matching.
16Reinforcement Learning
The "brute force" planner. Agents learn policies by maximizing reward functions through trial and error (millions of episodes).
Pros
Superhuman micro-skills (Dota 2, StarCraft). Emergent strategies like tool use or exploiting physics glitches.
Cons
Sample inefficient (needs lifetimes of practice). Brittle (catastrophic forgetting if rules change). Black box.
State of the Art: Softmax is reviving RL at megascale, integrating theory of mind to achieve "organic alignment" rather than rigid control. RL may yet have the last laugh.
17Reinforcement Learning
Dialogue Agents: Convai
Bridges game engines with LLMs. NPCs that converse dynamically, not static dialogue trees.
Perception
Speech recognition + visual context. Knows who's speaking and what's nearby.
Cognition
RAG queries knowledge base, injects facts + persona into prompt.
Action
LLM generates text → speech → lip-sync. "Open the gate!" triggers Unreal events.
The LLM becomes a natural language controller—players interact via voice, not buttons.
18Convai
Behavioral Agents: Voyager
Voyager (NVIDIA/Caltech, 2023) is the first LLM-powered embodied lifelong learning agent.
It solves the "Exploration Problem" in open worlds without Reinforcement Learning.
- RL Struggle: In Minecraft, finding a Diamond requires thousands of steps. RL fails because the reward signal is too sparse.
- Voyager Solution: Uses GPT-4 not just to "talk," but as a coding engine that interacts with the game via API (Mineflayer).
The Voyager Loop
1. Curriculum
GPT-4 analyzes inventory and state to propose the next logical task. "I have cobblestone, I should craft a stone pickaxe." Maximizes exploration.
2. Self-Correction
Writes JavaScript code. If execution fails, the error message & environment feedback (e.g., "inventory full") is fed back to GPT-4. It reasons and rewrites the code to fix the bug.
3. Skill Library
Successful programs are stored in a vector DB. Complex tasks query this library to retrieve and compose executable code (e.g., "Combat" reuses "Equip Weapon").
Voyager: White-Box Mastery
Stats: 3.3x more unique items, 15.3x faster tech tree traversal than prior SOTA.
Zero-Shot Transfer: Because skills are stored as generic code (not neural weights), a new Voyager instance in a new world starts with a library of mastered skills (e.g., "craft_sword"), solving complex tasks immediately.
Interpretability: We can read the code to see how it thinks. RL is a black box; Voyager is a white box.
Voyager
Visual-Language-Action: SIMA
DeepMind's SIMA (2024) is a Universal Interface for game AI.
The Breakthrough: It decouples the agent from the Game Engine API. It plays 9 different games using only:
- Input: Screen Pixels + Natural Language Instructions.
- Output: Keyboard & Mouse events.
This makes every game a potential training ground, enabling a "Foundation Model" for motor control.
22SIMA 2: The Generalist Agent
Nov 2025. SIMA 2 embeds Gemini to create a true Generalist Agent.
Previous agents were savants: AlphaStar could play StarCraft but not Go. SIMA 2 can play any game it can see.
Visual Grounding
It links language ("chair") to pixels. It understands complex, open-ended instructions: "Find a spot with a good view and build a campfire."
Zero-Shot Transfer
Because it understands concepts (gravity, inventory, enemies) rather than just patterns, it can play a brand new game on Day 1 without training.
SIMA 2
The Ultimate Test: Genie 3
DeepMind paired SIMA 2 with Genie 3 (a model that generates playable 3D worlds from prompts).
This creates a closed loop of AI-generated reality:
- World Generation: Genie 3 dreams up a new environment (e.g., "A platformer with ice physics") from a text prompt.
- Agent Adaptation: SIMA 2 plays it. It uses Gemini to "see" and "reason" about the new physics zero-shot.
- Self-Improvement: The agent improves its policy purely by playing in these dreamed worlds, without human data.
Generative Agents: Smallville
Stanford 2023: populated a small town with 25 LLM-powered agents, each with unique personalities, memories, and relationships.
The architecture enables emergent social behavior without explicit programming.
25Smallville
Smallville: Architecture
1. Memory Stream
A comprehensive list of every observation and thought. Scored by Importance (LLM-rated poignancy).
2. Retrieval
Context is pulled via Score = Recency + Importance + Relevance. This prevents the "goldfish effect" by surfacing only pertinent memories.
3. Hierarchical Reflection
Synthesizes memories into higher-level thoughts. Observations → Insights → Persona. Creates depth of character.
4. Planning
Recursively decomposes high-level goals ("Throw party") into hourly schedules and minute-by-minute actions.
Smallville: Emergent Behavior
The core emergent phenomenon was Information Diffusion.
- Origin: One agent (Isabella) was told "I'm throwing a Valentine's Party."
- Diffusion: She mentioned it to friends -> they checked their calendars -> they invited others.
- Result: 12 agents showed up at the right place/time. 5 formed rejections based on relationships.
The "party" wasn't scripted; it was the result of information propagating through the social graph via the memory/retrieval loop.
27Part IV
AL4Games
Artificial Life for Games. Behavior emerges through selection pressure across generations.
The Evolutionary Paradigm
Genetic Algorithms optimize through death.
A single Bibite doesn't get smarter over its lifetime—born with neural network fixed. But the population gets smarter as successful variants reproduce and unsuccessful ones die.
Advantages
No catastrophic forgetting. Parallel exploration. No reward function needed—survival IS the reward.
Trade-offs
Slow adaptation (generational timescales). Can't react to novel situations within lifetime.
Karl Sims (1994): Body + Brain
"Evolving Virtual Creatures" demonstrated the first co-evolution of morphology and neural control. Both body shape and brain architecture emerge together.
The breakthrough: Distributed Control.
Unlike traditional approaches where one central brain controls the whole body, Sims gave each body part its own mini-brain. When evolution adds a new limb, that limb arrives with its own sensors, actuators, and local neural circuit already wired.
This enabled modular complexity—a leg gene with a recursion counter produces a centipede. No central planner needed.
30Karl Sims (1994)
Karl Sims: Emergent Strategies
Locomotion: In water—streamlined shapes, undulation, sculling. On land—legs, hopping, rolling. One creature evolved a horse-like gallop with coordinated four-limb gait.
Competition: Two creatures compete for a cube (fitness = proximity):
- Rushing: Get there first
- Blocking: Wide body between opponent and cube
- Covering: Sprawl over cube
- Pinning: Immobilize opponent
- Keepaway: Grab cube and move it
None programmed—emergent solutions to fitness functions in physics.
31Neuroevolution (NEAT)
NeuroEvolution of Augmenting Topologies. Traditional Deep Learning trains weights on a fixed architecture. NEAT does something radical: it evolves the architecture itself.
The key insight: Start simple and let complexity emerge only when needed.
- Starts Minimal: Direct input-to-output connections (no hidden layers).
- Grows Complexity: Mutations add neurons and connections only when simpler structures fail.
- Protects Innovation: New structures get time to optimize before competing with established solutions.
Brain complexity scales naturally with task difficulty—no designer tuning required.
Limitation: Sample inefficient (needs thousands of generations). Evolution is generate-and-test—no gradient signal means it's searching blind.
32NEAT
The Bibites: Digital Metabolism
State-of-the-art behavioral ALife. The critical innovation isn't just the brain (NEAT), but the Metabolism.
Bibites don't just "eat to score points." They eat to maintain energy balance. Every action (moving, sensing, thinking) costs energy.
Energy In
Consumption of plants or other Bibites. Photosynthesis (if evolved).
Energy Out
Basal metabolic rate + Motion cost + Brain cost. Larger brains cost more energy!
Bibites: Emergent Behaviors
Emergent Herding
Initially random pheromone emission. Mutation links "high energy" → "emit pheromone." Successful foragers signal. Others evolve to follow gradients. Groups become efficient.
Designer provided signaling capacity; evolution invented the protocol.
Predator-Prey Arms Race
Mutation connects "see Bibite" → "attack." Predation emerges. Prey evolve faster acceleration, evasion. With Sin nodes, prey evolve juking patterns that break pursuit.
The Bibites
Species ALRE: Macro-Evolution
Unlike Bibites (micro-behavior focus), Species: ALRE simulates population dynamics and speciation.
Real speciation: The engine tracks genetic distance between sub-populations. When clusters diverge beyond threshold—via geographic isolation or divergent selection—they're classified as distinct species. Real-time cladograms visualize the tree of life branching.
Transparent Design
Uses Utility AI, not neural networks. Genome defines stats (Metabolism, Aggression, Speed); behavior logic is fixed code. What evolves is propensity—peaceful lineages evolve low Aggression, predators evolve high.
The Rover
Player avatar for artificial selection. Feed creatures, cull others, relocate populations, terraform terrain. You become selection pressure—discovering how hard it is to engineer evolution toward specific outcomes.
Trade-off: Species is legible (you know exactly why creatures behave as they do) but can't surprise you. Bibites is opaque but invents behaviors the designer never anticipated.
35Species ALRE
Special Mention: Rain World
Not "Evolved" (it uses Behavior Trees/Utility), but represents the pinnacle of Ecosystem Simulation.
The Illusion of Life
Creatures have needs (eat, sleep, avoid rain) and persistent relationships. They hunt each other, not just the player.
Procedural Animation
Creatures physically navigate geometry using inverse kinematics, making them feel like heavy, living animals, not floating hitboxes.
Rain World
Part V
Frontiers
The loop closes: Self-Modifying Games.
The Self-Patching Game
What happens when agents can rewrite their own source code?
The Loop
Player finds an exploit (e.g., standing on a rock). The AI analyzes the telemetry, diagnoses the failure, and patches the bug in real-time—either by rewriting navigation logic or altering the geometry.
The Meta
The game doesn't just get harder; it evolves to counter the player base. A living, breathing codebase that immunizes itself against cheese tactics.
Proof of Concept: Sakana AI's Darwin Gödel Machine improved its own coding performance (SWE-bench) from 20% → 50%. It didn't just tune weights; it rewrote its own prompt engineering and search strategy, validated every change empirically, and committed the improvements.
38The Self-Patching Game
Agents, Populations, and Worlds
Evolved Architectures
Don't hand-design Behavior Trees—evolve them. Populations of decision structures adapt to level layouts.
The Living Lore
Myths become mechanics. A rumor started by an NPC becomes a quest, then a faction, then a war.
Nemesis System 2.0
Enemies that evolve tactics to exploit your specific playstyle. You snipe? They evolve carapace plating.
The A/B Testing God
Game runs 1,000 nightly sims to tune drop rates and difficulty curves automatically for max engagement.
Co-Evolving Civs
Brains, bodies, and culture evolving together. Mini-civilizations that rise, war, and fall based on geography.
Agent Economies
Inflation isn't a variable. It emerges from 10,000 agents with biological needs trading goods for calories.
The Bicameral Agent
System 1: Fast Utility AI handles combat (200ms). System 2: LLM handles strategy/dialogue (2s).
Permadeath Content
Evolution requires death. If a game zone is ignored by players, it should rot and be recycled into something new.
Failure Modes and Guardrails
Reward Hacking
Agents exploit scoring/bugs. Guardrails: adversarial test suites, holdout maps, objective unit tests, human veto.
Monoculture / Collapse
Diversity vanishes → brittle meta. Guardrails: speciation, novelty search, immigration, entropy constraints.
Grief / Policy Violations
Toxic tactics or content. Guardrails: content filters, policy checks, red‑team arenas, “no‑bleed” sandboxes.
Live‑Ops Regression
Fun metrics drop post‑update. Guardrails: canaries, kill‑switch, rollback, telemetry dashboards.
Compute / Cost
Evolution too expensive. Guardrails: surrogate models, distillation, small‑batch EAs, parallel sims.
The Trajectory
From Explicit Instruction to Implicit Intent.
From Machine-like to Lifelike.
- Traditional GameAI: Tell the machine exactly how. Cannot plan.
- Utility AI: Tell the machine what you value. Behavior emerges from curves.
- ML4Games: Tell the machine who it is. It figures out what to say and do. Can plan.
- AL4Games: Tell the machine the rules of existence. Behavior, morphology, strategy emerge.
- The Frontier: Combine them all. Self-modifying architectures that improve their own improvement processes.
The role of the developer shifts from architect to gardener—planting seeds of intelligence and watching worlds grow that are wilder than anything hand-built could be.
41