OpenMPI ) - Win64
- CartPole Tutorial: Creating A Custom Reinforcement Learning Environment [Example Video]
- Automated Stock Trading Tutorial: Build a Bitcoin Bot In Unreal Engine 4
- Match To Sample: Solving A Memory Puzzle with a Non Player Character
Possible applications extend beyond game design to a variety of scientific and technical use cases. These include robotic simulation, autonomous driving, generative architecture, procedural graphics and much more. MindMaker provides a central platform from which advances in machine learning can reach many of these fields. For game developers, the use cases for self-optimizing agents include controlling NPC behavior (in a variety of settings such as multi-agent and adversarial), prototyping game design decisions, and automated testing of game builds.
Using MindMaker DRL Engine, cutting edge reinforcement algorithms(based on the Stable Baselines RL Lib) can quickly be deployed in UE4 without the user needing to implement them from scratch. Algorithms presently supported include : Actor Critic ( A2C ), Sample Efficient Actor-Critic with Experience Replay (ACER), Actor Critic using Kronecker-Factored Trust Region ( ACKTR ), Deep Q Network ( DQN ), Proximal Policy Optimization ( PPO ), Soft Actor Critic ( SAC ), Twin Delayed DDPG ( TD3 ), Trust Region Policy Optimization ( TRPO ), Deep Deterministic Policy Gradient ( DDPG ).
Summary of the MindMaker AI Process:
What follows is a summary the AI process as it takes place using the MindMaker Deep Reinforcement Learning(DRL) Engine within the video game environment. There are two primary components to MindMaker – an executable learning engine and a set of blueprint nodes which interface with the learning engine and pass information back and forth to UE using the MindMaker AI Plugin. The Plugin is necessary to communicate between the Learning Engine and UE4. The blueprints for MindMaker are located in the MindMakerStarterContent directory of the Blueprints project. There are also several example projects included with these in the /Examples directory.
The first step in creating your self-learning AI is to define the actions that will be available to it within blueprints. These can either be a discrete number of actions, or a continuous spectrum of actions. You will need to specify this in the Launch MindMaker blueprint node, as well as the low and high bounds of the observational variables that the agent has access to.
After MindMaker is launched at the time of gameplay it will begin generating random actions based upon this action space you have defined for it(in the launch MindMaker node). These actions take the form of numbers, which you must associate within different behaviors (to see how this is done, look at the DefineActionSpace function within the AI_Character_Controler_BP in the MatchToSample example provided) . These random actions are communicated to Unreal Engine via a socketIO connection and then stored in the RecieveAction Function. Once an action is received, the UE environment must be updated to reflect the action generated by the learning engine. This happens in the DisplayAgentActions function(see Examples).
Next, two important things must occur – we must check if the reward conditions were met by the action that the learning engine generated for the agent, and we must update the agents observations about its environment. This occurs in the CheckReward function and in the MakeObservations function respectively. Once this is finished, Reward and Observation are passed back to the MindMaker Learning Engine via the SocketIO connection so that it can improve its action selection process. This action selection process is optimized by whatever algorithm one has selected to train with in the Launch MindMaker node. Then MindMaker selects a new action, and the process repeats until it has discovered an ideal set of actions for the given environment.
Step By Step
1. MindMaker Learning Engine selects Action
2. Action Received by UE
3. The game environment within UE is updated in respect to the action just taken
4. Reward and observations variables are updated and passed back to MindMaker
5. MindMaker selects a new action, based upon the reward and obs it received
6. Returns to step 1, Process repeats. Eventually optimal actions are discovered
Saving and Loading Models
To save a trained model, set the “Save Model after Training” checkbox in the Launch MindMaker Function to True. You will need to ensure your number of training episodes is a non zero number. The model will save after training completes.
To load the train models, uncheck the “Save Model after Training” checkbox and instead set the “Load Pre Trained Model” checkbox in the Launch MindMaker Function to True. You will also need to set the number of training episodes to zero, since no training is to occur. Ensure the number of evaluations episodes is non-zero, since this will be how the pre-trained model demonstrates learning.
Models are saved locally in the “Appdata roaming” folder of your computer, for instance c:\Users\LeoN\Appdata\Roaming
Anatomy of MindMaker AI:
As mentioned there are two primary components to MindMaker – an executable learning engine and a set of blueprint nodes which interface with the learning engine and pass information back forth to UE via the Plugin.
This next section covers the usage of the MindMaker blueprint assets. There are two default agent classes for use with MindMaker, one is for a 3rd person NPC and the other is for Generic Actor object such as a sphere or a cube.
Assets for both uses cases are located within the MindMakerStarterContent directory, in the MindMakerAIControlerBP and the MindMakerActorBP blueprints respectively.
MindMakerAIControlerBP is used for training NPCs and the other, MindMakerActorBP is used for endowing UE objects with intelligent, self-learning capabilities. In both cases, the MindMaker backend executable is accessed via the LaunchMindMaker node and connects to blueprints via SocketIO connection(SocketIO plugin included). To use the MindMakerAIControlerBP for a given AI character, start with any third person character mesh, and then under the Pawn properties of the mesh, under the AI Controler class, specify MindMakerAIControlerBP as the controller class. This will make your mesh controllable by MindMaker.
Next we will cover the individual parameters of the LaunchMindMaker blueprint node, which is the main component of the AI studio.
RL Algorithm – This is where one can select the flavor of RL algorithm one wants to train the agent with. There are ten options in the drop down menu, with each algorithm having its own pros and cons. A detailed discussion of the available of the relevant algorithms and their use cases can be found here.
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
Num Train EP –this is an integer input representing the number of training episodes one wishes the agent to undertake. The larger the number of training episodes, the more exploration the agent does before transitioning to the strategic behavior it acquires during training. The complexity of the actions the agent is attempting to learn typically determines the number of training episodes required - more complex strategies and behaviors require more training episodes.
Num Eval EP – This is also an integer input and represents the number of evaluation episodes the agent will undergo after training. These are the episodes in which the agent demonstrates its learned behavior.
Continuous Action Space – This is a Boolean input which determines if the agent is using a continuous action space. A continuous action space is one in which there are an infinite number of actions the agent can take, for example if it is learning to steer a car, and range of angles over which the steering column can change is a decimal value between 0 and 180, than there is an infinite number of values within that range such as .12 and 145.774454. You will want to identify at the outset of using if your agent has an infinite number of actions or finite number actions they can take. The action space must either be continuous or discrete, it cannot be both.
Discrete Action Space - This is a Boolean input which determines if the agent is using a discrete action space. A discrete action space is one in which there are a finite number of actions the agent can take, such as if the AI can only move right one space or left one space. In which case it only has two actions available to it and the action space is discrete. The user determines which kind of action space the agent will be using before using MindMaker and set these values accordingly.
Action Space Shape – This defines the lower and upper boundaries of the actions available to the agent. If you are using a discrete action space, than this is simply the total number of actions available to the agent, for instance 2 or 8. If you are using a continuous action space, things are more complicated and you must define the low and high boundaries of the action space seperatly. The format for doing so is as follows: low= lowboundary, high= highboundary,shape=(1,)
In this case, lowboundary is an value such as -100.4 and highboundary is a values such as 298.46. All decimal values between these bounds will then represent actions available to the agent. If you had an array of such actions, you could change the shape portion to reflect this.
Observation Space Shape – Properly speaking this input is a python derivative of the OPEN AI custom environment class and defines the lower and upper boundaries of observations available to the agent after it takes an action. The format for doing so is as follows: low=np.array([lowboundary]), high=np.array([highboundary]),dtype=np.float32. Imagine an agent that needed to take three specific action in a row to receive a reward, then its observation space would need to include access to those three actions, which would each be represented by a unique observation. Therefore the array of observations would have to include three different values, each one with own unique boundaries. For example, such an action space might be defined as such: low=np.array([0,0,0]), high=np.array([100,100,100]),dtype=np.float32 if each of its own actions that agent needed to observe was a value between 0 and 100. A rule of thumb is that if a value is part of the reward function for the agent, ie their behavior is only rewarded if some condition being met, than the observation space must include a reference to that value. If five conditions must be met for an agent to rewarded, than each of these five conditions must be part of the agents observation space.
Load Pre Trained Model – This is a Boolean value that determines if you want to the agent to load some pre trained behavior that was previously saved. If you set this to true, you will want to specify the name of the file in the Save /Load Model name input box. All models are saved by default to the app data roaming directory of the computer for instance C:\Users\username\AppData\Roaming
Save Model After Training – This is a Boolean value that determines if you want to the agent to save the behavior it has learned after training. If you set this to true, you will want to specify the name of the file in the Save/Load Model name input box. All models are saved by default to the app data roaming directory of the computer for instance C:\Users\username\AppData\Roaming
Save/Load Model Name – This is a string representing the name of the model you wish to save or load. Files are saved to the app data roaming directory of the computer for instance C:\Users\username\AppData\Roaming
Use Custom Params – This is Boolean value that determines if you want to use the stock version of the algorithm you have selected or wish to modify its parameters. If you wish to use custom parameters these can be accessed via the custom parameters structure variables. If you click on the them, for instance A2Cparams, you will see all the values that can be set within these structures. A detailed breakdown of the parameters for each algorithm can be found here: https://stable-baselines.readthedocs.io/en/master/
Have questions, reach out to us at the email below.
Email: info@autonomousduck.com
Or check out our other offerings: