Delta_Ghost

Title: Transforming NPC Behavior in Dynamic Game Worlds Through Multi-Modal Machine Learning Systems (text2behavior)

Research Objective:
This research aims to revolutionize non-player character (NPC) behavior in video game environments by developing a machine learning-based behavior generation system. The goal is to enable NPCs to exhibit intelligent, adaptive, and emergent behaviors in real-time. By moving beyond pre-scripted actions, this project will explore the creation of autonomous, creative agents capable of continuous learning and complex social interactions within game ecosystems.

Research Questions:

How can multi-modal machine learning (ML) systems dynamically drive NPC behaviors in video games?
What impact does real-time visual, state, and memory-based information processing have on the emergent behavior of NPCs?
How can NPCs autonomously expand their behavioral repertoire and adapt through continuous learning and social interaction (with other agents and people)?

Hypothesis:
NPCs driven by multi-modal machine learning behavior trees, capable of processing visual data, world states, and memory in real-time, can demonstrate higher levels of adaptive intelligence, emergent behaviors, and complex social interactions. This will lead to enriched player experiences and enhance replayability.

Methodology:

System Design:
The project will develop a text-to-behavior system where NPCs interpret developer prompts and autonomously execute behaviors based on their current world state. This system integrates visual and state information through a multi-modal behavior tree, allowing NPCs to recombine fundamental actions to adaptively fulfill in-game objectives. It can also be dynamically prompted during gameplay.

Machine Learning Integration:
The research focuses on building an ML-based behavior tree, enabling NPCs to:

Use vision-based inputs from in-game camera sensors (e.g., forward and bird’s-eye views) to understand the environment.
Apply state-based inputs (e.g., positions, game rules) to contextualize actions.
Utilize memory-driven reasoning to recall prior states and dynamically adjust behaviors based on short- and long-term goals.

Behavior Execution:
The system provides NPCs with a modular "action bank," allowing them to generate complex chains of behavior (e.g., architectural construction, environmental manipulation). These behaviors will emerge through continuous feedback loops between visual recognition, memory, and state inputs.

Technical Overview:

Our technique integrates OpenAI agents (assistants) and UE5 into a multi-modal ML-based behavior tree. The agent receives visual and state information from the game-agent's point of view in Unreal, then the ML-agent—equipped with memory, reasoning, and long-term goal orientation—decides on an appropriate course of action. The game-agent inside Unreal executes these actions in a continuous loop while trying to fulfill its core objective (its prompt).

This approach avoids overcomplicating the system by not micromanaging small details (e.g., finger movements) while still allowing for creative and emergent behaviors through an action bank. This leads to emergent gameplay, increased replayability, and dynamic, ever-changing scenarios.

Demonstration Examples:

DEMO: gpt4o building a structure in unreal engine

In the video below, Unreal saves images from an agent’s forward facing and bird’s eye view cameras and sends them to the OAI assistant. The OAI assistant analyzes the images and decides what action to issue based on the visual data (and also its short and long term memory and world-information—including the actions available). Then, Unreal parses the responses and fires the appropritate action. The agent has been instructed to “build an interesting structure.” It is recombining fundamental actions, in order to do so (move up, down, left, right, forward, backward, place blocks, etc.). While it’s doing that, it’s also printing to the screen (a.) what is sees out of both cameras and (b.) why it’s deciding to take one action or another, helping the developer debug the system.

fast video: agent building process

In this example, the agent is choosing whether to go up, down, left, right, strafe left, strafe right, move forwards, backwards or place a block based on visual information and world knowledge. It is aware of its body in space, and it is also aware of how its actions change the structure’s shape. It’s also aware of the structure’s position vis-a-vis other objects in the world.

Slow Video: agent diagnostics

In this example, the agent is printing to the screen what it sees from both of its cameras and why it decided to make the decision it made, often showing complex spatial reasoning skills.

Changing Behaviors with a Prompt: find interesting stuff

In this example, we change the behavior of the agent by slightly modifying the prompt. The agent is now looking for interesting things in the environment; it finds the trees interesting, so it heads towards them.

Navigating a More Complex World

In this example, the agent is navigating a more complex environment.

Research and Development Focus:

Sensor and Action Expansion:
Expand the NPC’s sensory inputs and action possibilities to increase behavioral complexity.
Continuous Learning:
Develop systems that allow NPCs to learn, adapt, and reframe their own behaviors over time.
Social Learning:
Enable agents to interact with and learn from each other within game environments, fostering cooperative behaviors and social structures.

Stretch Goal:
Investigate the feasibility of autonomous self-programming within NPCs, where agents dynamically generate new behaviors based on in-game scenarios, inspired by recent AI research (e.g., Voyager paper).

Potential Impact:

This research bridges a significant gap in the game development industry by providing developers with a tool for implementing complex, adaptive NPC behaviors. By allowing NPCs to autonomously evolve and create, this project drives advancements in emergent gameplay, creating more immersive, replayable, and dynamic game worlds. Furthermore, it explores the boundaries of machine learning in game AI, with potential applications in broader fields of interactive and autonomous systems.

Conclusion:
This research project proposes a groundbreaking system for NPC behavior generation, underpinned by machine learning and real-time environmental feedback. By enabling intelligent, evolving NPCs, this project will fundamentally change how we perceive and interact with virtual worlds, paving the way for more immersive and adaptive gaming experiences.