In the World Engines Lab, we are currently working on a video game system similar to Palm-E where a frame is sent from Unreal to a multi-modal LLM with vision for analysis. This model analyzes the frame and sends e.g. goals to the gameAI, which might e.g. implment the pathing protocol of the agent. So the VGPT takes a frame and generates perception and then instructions (e.g. explore novel areas) and writes action code that is sent back to unreal (move forward 50ft.). We are also exploring using a model locally and not through an API endpoint. This might allow us to have a more open ended agent with more open ended behavior than what you would purely find in gameAI. We’re hoping to templatize this agent so that artists can build on top of it.