Multi-agent Path Finding
This Unity and ML agents project visualizes multi-agent pathfinding algorithms. Visualization is done in Unity, algorithms are written in Python and communication between Unity and the Python code is done via ML agents.
There are two ways to work with the project:
- using a Unity build
- using the Unity editor
More details can be found in the Quick start section.
When using the Unity editor, note that the version used in the project is 2021.3.7f1, together with ML agents release 17. When using Unity Hub the setup of the Unity project and the corresponding packages is done automatically and you do not need to install or set up anything.
For running the Python algorithms it is important to use the correct version of the Python packages. For the ML agents release used in the project the needed ml-agents packages should be:
mlagents = "==0.26.0"
mlagents-envs = "==0.26.0"
For existing Python algorithms we have added pipfiles specifying the versions of all used Python packages.
NOTE: If using python 3.6 and you encounter problems with pipenv try pipenv version <= 2022.4.8.
Extremely important: guidelines for incorporating your work in this project can be found in the Adding your own algorithms section. It is important that you do not alter the Unity code.
The repository has the following structure:
- Algorithms: a folder containing algorithms written in Python that can be used in different settings of the pathfinding problems, you can also add algoritms yourself.
- PathFinding: Unity code, this code should not be altered
- VisualizeStats: code that you can use and/or adapt for statistics and visualizations
The Unity project consists of two scenes / problem settings which will be explained below.
- Centralized
- Decentralized
Quick start
Using a Unity build
The easist way to start working with the project is downloading the correct Unity build from the AI lab website. There are different builds for centralized and decentralized.
To run one of the examples in the Algorithms
folder using the Unity build:
- Locate the
StreamingAssets
folder in the downloaded build. - The file
InputOutput.txt
insideStreamingAssets
specifies the file that will set up the environment and the file to which statistics should be written to. These paths should be relative to theStreamingAssets
folder. Do not change the name ofInputOutput.txt
or move it outside of StreamingAssets. - Open the Python project you want to run.
- Make sure you have installed a Python virtual environment using the correct version for the ml-agents packages (also specified in pipfile).
- In the Python project where you set up the Unity environment, specify the path to the build:
UnityEnvironment(file_name=<path to build>, side_channels=[envChannel])
- Run the main Python file. This will automatically open the Unity build and run the algorithm.
Using the Unity editor
Open the Unity project folder PathFinding in Unity Hub. This is the folder that contains the subfolder Assets. The setup of the Unity project and the used packages is done automatically after opening the project. If necessary Unity Hub will ask you to install the particular Unity Editor version. Do not change the Unity version nor the versions of the packages that are being used.
After opening the project folder in Unity, you will see a 3d environment containing the two different scenes which will be described below.
To run one of the examples in the Algorithms
folder using the Unity editor:
- Select the correct scene (centralized/decentralized) in the Unity editor.
- Set up the configuration files in
StreamingAssets
which can be found in theAssets
folder.- The file
InputOutput.txt
specifies the file that will set up the environment and the file to which statistics should be written to. Theses should be relative to theStreamingAssets
folder. Do not change the name ofInputOutput.txt
or move it outside of StreamingAssets.
- The file
- Open the Python project you want to run.
- Make sure you have installed a Python virtual environment using the correct version for the ml-agents packages (also specified in pipfile).
- In the Python project where you set up the Unity environment, do not specify a path for the build:
UnityEnvironment(file_name=None, side_channels=[envChannel])
- Run the main Python file.
- Press the play button in Unity.
Of course, when using the editor, you can also create a build yourself and use the build instead of the editor.
Overview of readme and next steps
More details about setting up the warehouse and running the simulation can be found in Initializing the warehouse and Running the environment.
Overview of this readme:
- Problem definition
- Initializing the warehouse
- Running the environment
- Adding your own algorithms
- Adding your own algorithms: example
Problem definition
Description
Given
- a grid based world of walls and tiles,
- n robots,
- n initial nodes (position in grid + rotation) for the robots and
- n goal nodes (position in grid + rotation) for the robots (by default in the same order as the initial nodes)
plan a path for each robot from its initial to its goal node.
Centralized vs decentralized
Since we use ML-agents to connect Unity with algorithms written in Python a distinction has to be made between centralized and decentralized.
In the centralized setting, actions for the robot are created all at once, in the decentralized setting this is done by every robot separately.
Nodes
At all time steps, robots are in nodes which is combination of a position and a rotation.
Positions in the warehouse are of the form (i,j)
with i
the row and j
the column.
The rotation is an integer in [0,360)
where 0
corresponds to "north", 90
to "east",
180
to "south" and 270
to "west" and where the layout is of the form
(0,0) (0,1) ... (0,n)
(1,0) (1,1) ... (1,n)
...
(m,0) (m,1) ... (m,n)
Hence a robot at
- location
(1,1)
and rotation0
is facing location(0,1)
- location
(1,1)
and rotation90
is facing location(1,2)
- location
(1,1)
and rotation180
is facing location(2,1)
- location
(1,1)
and rotation270
is facing location(1,0)
A robot can rotate according to a degree of rotation, e.g. if this is 45
degrees, then for each location
a robot has 8 possible rotations: 0,45,90,135,180,225,270,315
. If it is for instance at location (1,1)
and
rotation 45
it can move to
- location
(1,1)
and rotation0
and - location
(1,1)
and rotation90
The idea behind using this unit of rotation is that the time it takes to move one tile forward without rotation
(e.g. from (0,0)
, rotation 90
to (0,1)
, rotation 90
) is equal to the time it takes to rotate over this unit of
rotation.
We add the restriction that the unit of rotation must be a divisor of 90
. This way a robot can move from one tile to another if its rotation is equal to 0,90,180
or 270
.
Note: The Unity environment does no check on which movements are "allowed". For instance, if you say that a robot should move from node location (1,1)
and rotation 0
to location (2,1)
and rotation 0
the robot will make this move (as long as there are no obstacles). Even to a node which is not nearby such as location (10,10)
and rotation 0
. In one timestep the robot will move between these two nodes. This also implies that you can use the environment even if you do not want to include rotations.
Initializing the warehouse
In the Unity editor select the correct scene:
- Centralized
- Decentralized
and specify where Unity can find the necessary input to set up the warehouse and where it should save statistics. You specify these paths in the file InputOutput.txt
under /Pathfinding/Assets/StreamingAssets
.Do not alter the name or the location of InputOutput.txt
. The file must be in JSON format and must correspond to an instance
of the class InputOutput.cs
,
i.e. you must use the following naming:
-
ConfigFile
of typestring
: the name of the warehouse config file. This file must be placed under/Pathfinding/Assets/StreamingAssets
. -
StatsFile
of typestring
: the name of the file to output the statistics to. The file will be generated under/Pathfinding/Assets/StreamingAssets
.
A typical InputOutput file would look like this:
{
"ConfigFile": "/ConfigurationFiles/config.txt",
"StatsFile": "/Stats/stat.txt"
}
More information on which metrics are stored by the Unity environment can be found in section Metrics.
The file corresponding to ConfigFile
must be in JSON format and must correspond to an instance
of WarehouseParameters.cs
. Examples can be
found in ConfigurationFiles
. These are the parameters you have to
set :
-
Matrix
of typeint[][]
: contains the grid based layout of the warehouse, the values that are accepted are as follows and can also be consulted inConstants.cs
:- CodeWall:
1
- CodeTile:
0
- CodeTileArrowEast:
3
- CodeTileArrowSouth:
4
- CodeTileArrowWest:
5
- CodeTileArrowNorth:
6
We have walls, tiles and tiles with an arrow drawn on top.
- CodeWall:
-
Rotation
of typeint
: this is the unit of rotation, i.e. the rotation in degrees that corresponds in time to moving forward (or backward) from one tile to an adjacent one -
TimeRobotPerTile
of typeint
: the time in ms that corresponds to moving one tile forward (or backward) or equivalent to a minimal rotation -
NumberOfRobots
of typeint
: the number of robots in the warehouse -
InitialNodes
of typeNode[]
: an array of initial nodes, the length of this array is equal toNumberOfRobots
,InitialNodes[i]
contains the initial node of robot i -
GoalNodes
of typeNode[]
: an array of goal nodes, the length of this array is equal to NumberOfRobots, if not otherwise specifiedGoalNodes[i]
contains the goal node of robot i
Optionally you can also add the following field. If omitted, this will be set to False
-
GoalNodesOrderNotFixed
of typebool
. If you set this property to True, your algorithm has to assign goal nodes to the different robots and you must send these assignments back to Unity. Important: these assignments must be send back before a first action is sent to Unity!
Robots travel from node to node. These nodes are instances of the class
Node.cs
. Each node contains two properties
- GridPosition: an instance of
Position.cs
corresponding to its position in the grid - Degree: an
int
corresponding to its rotation
Following checks are done:
-
Matrix
should have correct dimensions, i.e. every row has the same length and its entries should be valid integers. -
Rotation
should be a divisor of 90. - The length of
InitialNodes
andGoalNodes
should be equal toNumberOfRobots
. - The nodes in
InitialNodes
andGoalNodes
should be valid nodes, i.e. the position may not correspond to a wall and the rotation should be a divisor ofRotation
.
Running the environment
Using the Unity editor
See the quick start on how to run the environment.
You change camera view using the following keys:
- W/S : closer/further
- D/A : right/left
- E/Q : down/up
You can also use the mouse to rotate.
Goal nodes will all get a different and random color and the robots will be colored in the same color as their assigned goal node.
If there is a collisions between robots (or between a robot and a wall) a message "Collission" will be shown and the simulation stops.
Metrics
In InputOutput.txt
you have to specify to which file you want
to write statistics when robots have reached
their goal nodes.
Saving statistics is done by Statistics.cs
. In every timestep, the history is updated: for every robot we save whether it was at its goal node, whether it was not at its goal or whether it was at its goal node but received the action to leave that goal node.
When all robots are at their
goal locations, a json object containing the following elements will be saved in statsPath
.
-
elapsedTimeMilliSecondsUntilFirstAction
of typefloat
: total time in milliseconds between the moment the initial information on the environment is sent to Python and the moment the first action was sent back to Unity -
makespan
of typeint
: the length of the longest single agent path, i.e. the maximum oflengths
-
sumOfCosts
of typeint
: the sum of all single agent paths, i.e. the sum oflengths
-
lengthOfPaths
of typeDictionary<int, int>
: for every robotID the number of actions received before arriving in goal node -
history
of typeDictionary<int, Dictionary<string, List<int>>>
: contains for each robot (instanceID) a dictionary with keys"all timesteps not at goal"
,"all timesteps at goal"
and"timesteps not at goal node but previous node was goal node"
Of course you're allowed to add more metrics in your own local version.
Note that one timestep corresponds to TimeRobotPerTile
.
In the Python project VisualizeStats
you can find inspiration and examples on how to visualize soms
metrics using the above mentioned json object.
Adding your own algorithms
If you want to add your algorithm to this project, it is important that you do not alter the Unity code. You can submit changes to the Unity code through a merge request but these will have to be approved and this will only happen if they do not change the environment and address bugs or improve overall flow.
You can add your own algorithms (written in Python) in directory Algorithms. Two subdirectories corresponding to the different scenes have been added:
In each directory toy examples can be found. You can add a new algorithm by creating a new folder
called /Algorithm_<description>
.
Add a readme specifying which is you main file (see below for an example) and also contains
a description of your algorithm.
Recall that it is important to use the correct version of the Python ML-agents packages.
Before integrating your changes into the main branch, check the following
- For each new algorithm create a folder named
/Algorithm_<description>
in the correct subdirectory of Algorithms. - Add a readme in the same folder
/Algorithm_<description>
that clearly documents, the problem you are solving, which is the algorithm and which is the main file. - Remember that you may not change anything in the Unity code.
- Before integrating your commits into the main branch, merge the main branch into your branch or rebase your branch onto the main branch.
- Test your algorithm using the configuration files in ConfigurationFiles.
- If nothing breaks, you can open a merge request.
Observations and actions
Observation and actions are fixed. An observation contains the current node of a robot. An action contains information about the next node. It is important to note that Unity will not check whether next nodes are reachable. No check is done in Unity, if actions are incorrect, e.g. your next action is a node that is not reachable from the observation, you will see weird things in Unity.
In the centralized versions, actions for all robots will be requested at the same time. In the decentralized versions, each robot will request individual actions and not necessarily at the same time.
An individual observation has size 4
: an ID and a node. The order is fixed:
- the instance ID (integer, uniquely generated by Unity)
- the last seen row (integer)
- the last seen column (integer)
- the last seen degrees/rotation (integer)
Your algorithm then has to specify next actions. An individual action has size 4
and specifies the next
node where the robot should move. The order is fixed:
- the instance ID (integer)
- the next row (integer)
- the next column (integer)
- the next degrees/rotation (integer)
In the centralized versions, observations sent to the Python code will have size 4*NumberOfRobots
since they are
grouped. Actions that have to be sent back must have size 4*NumberOfRobots
.
In the decentralized versions, observations sent to the Python code will have size 4
. Actions that have to be sent
back must have size 4
.
Example
This section describes in detail how to add your own examples, a high level overview is given of how visualization and communication through ML-agents is done, but remember that you should not alter the Unity code.
You can also check toy examples that have been added for each scene. In these toy examples, if goal nodes have not been assigned yet in the configuration file, they are assigned in order. Next, a shortest path is computed for every robot but no check is done on collisions.
As a running example we use Toy example Centralized. However in all toy examples the flow is the same and as follows:
The file main.py
sets up the side channel and the unity environment:
envChannel = EnvChannel.EnvChannel()
env = UnityEnvironment(file_name=None, side_channels=[envChannel])
If you are not using the Unity editor but a build, then you have to specify the path in file_name
:
envChannel = EnvChannel.EnvChannel()
env = UnityEnvironment(file_name=<path to build>, side_channels=[envChannel])
It then sets up an instance of class
EnvironmentManager
which controls
defining new actions and the communication with unity:
envManager = EnvironmentManager.EnvironmentManager(envChannel.envData, env, envChannel)
-
envChannel.envData
is an instance ofEnvironmentData
which holds all the info on the environment -
env
is the unity environment -
envChannel
is the side channel that is used for initial communication on info environment and also for communicating goal assignments
The initialization method
of EnvironmentManager
takes care of
setting all attributes in envChannel.envData
. This is done by calling self.reset()
which triggers
the on_message_received
method in
EnvChannel
, see also documentation on ML
agents.
Communication via side channels
A side channel envChannel
of
type EnvSideChannel.cs
is set up at Unity side
for sending initial information to the Python side. It is also used for sending information on assignment of
goal nodes back to Unity. In your Python folder you should have a file such as
EnvChannel.py
which processes the sent JSON object.
Make sure that the UUID corresponds to ENV_SIDE_CHANNEL_ID
in Constants.cs
.
Initially the following information is sent from Unity to Python
-
matrixLayout
: matrix -
rotationRobot
: the unit of rotation -
sizeOfAction
: size of action for one robot, here this will always be 4 -
sizeOfObservation
: size of observation for one robot, here this will always be 4 -
numberOfRobotsInEnv
: number of robots in the environment -
initial
: array of initial nodes -
goal
: array of goal nodes -
orderGoalNodesNotFixed
: optional parameter denoting that specifies whether goal nodes have to be assigned (if True: algorithm has to assign goal nodes using the side channelEnvChannel.py
) -
instanceIDS
: the instance IDS of the robots
Arrays initial
, goal
and instanceIDS
will have size numberOfRobotsInEnv
and for every index i
the
instanceID instanceIDS[i]
is the Unity instance ID for the robot that is in initial node initial[i]
.
If orderGoalNodesNotFixed
is false
(default), then its goal node is goal[i]
and in Unity the robot will colored in
the same color as that goal node.
It is entirely up to you how you process this information (which is in JSON format). You cannot change however which information is sent, if you would need extra information to be send, please create a new issue or a merge request.
Assigning goal nodes to agents
Optionally, the Python code has to assign the goal nodes to each of the robots. In this example we assign the goal nodes in the same order as the initial nodes.
In order to send the information back to Unity, we reuse the same sidechannel
EnvChannel
. Unity will then color the
robots using the same color
as the assigned goal node.
It is very important to use the correct structure as specified at Unity side in
EnvSideChannel.cs
private class GoalAssignment
{
public int RobotID = -1; // instance ID of robot
public Node GoalNode = null; // goal node that will be assigned
}
In our example envManager
takes care of this by calling send_goal_nodes_to_unity()
in its initialization method.
def send_goal_nodes_to_unity(self):
for i in range(len(self.envData.goal_nodes)):
goal_dict = {
"RobotID": self.envData.instance_ids[i],
"GoalNode": {"GridPosition":
{"Row": self.envData.goal_nodes[i]["GridPosition"]["Row"],
"Column":self.envData.goal_nodes[i]["GridPosition"]["Column"]},
"Degrees":self.envData.goal_nodes[i]["Degrees"]}
}
self.envChannel.send_string(json.dumps(goal_dict))
It uses for each goal assignment a dictionary that contains the keys and values GoalAssignment
in
EnvSideChannel.cs
expects. Note
that envChannel.send_string()
only allows arguments of the form goal_dict
: this has to be done for each robot
separately.
The queued assignments will be send through the side channel the next time envManager.step()
is called, see also
documentation on ML agents.
Computing actions: the algorithm
In our main.py
file we call
while simulation_going:
envManager.step()
which checks whether there are new requests and then computes new actions. Currently the
control variable simulation_going
is never set to False
. If there is a new action available for a robot
it will be send back,
otherwise the current observation will be send back. This can be changed the way you want, e.g. by
calling envManager.close()
which
terminates communication. In Unity
the robots will move towards the action that was received.
In our example we use precomputed paths. A dedicated data structure
Node
has been
defined to represent nodes and in the initialization method of
EnvironmentManager
a graph is created
self.graph = Graph(envData.matrix, envData.rotation, envData.can_go_backwards)
See Graph
for implementation details.
A path is then computed for each of the robots by calling
self.compute_shortest_paths()
and the paths are stored in the dictionary self.shortest_paths
whose keys are the Unity robot IDs and the values the
next nodes in the path.
Instance method step
of EnvironmentManager
then
computes new actions and sends them to Unity. This is done as follows:
Every time a request has been made,
the
information on the observation is stored in
decision_steps, terminal_steps = self.unityEnv.get_steps(self.behavior_name)
-
decision_steps.obs
contains a list of numpy arrays containing the collected observations per group of agents. -
decision_steps.agent_id
contains a list with info on all agents that have requested an action, in particular its length correspond to the number of agents that have requested an action
The number of agents that have requested an actions equals len(decision_steps.agent_id)
and
for each agent i
in range(len(decision_steps.agent_id))
its observation can be found in
observation = decision_steps.obs[0][i]
Centralized
In the centralized versions, len(decision_steps.agent_id)
will always be 1
since there is is only one agent that
collects all individual observations and thus the observation can be found in
observation = decision_steps.obs[0][0]
In our toy example, we then extract the individual observations from observation
and store them in a
dictionary observations
where the key is the instanceID and the value is a list containing the observation, i.e. the
row, column and rotation:
observations = {}
for i in range(self.envData.number_of_robots):
observation_i = observation[i * self.envData.size_of_observation:
i * self.envData.size_of_observation + self.envData.size_of_observation]
observations[observation_i[0]] = observation_i[1:]
Decentralized
In the decentralized versions, we can just work directly with decision_steps.obs[0][i]
.
Create actions to send via ML agents
In the toy example, the next action is found in the precomputed path.
We store all actions that need to be sent back to Unity in a list actions
which will contain all individual actions.
This list is not a nested list! In the centralized version its length will be
self.envData.size_of_action * self.envData.number_of_robots
. In the centralized version its length will be
self.envData.size_of_action * n
where n = len(decision_steps.agent_id)
, the number of agents that have requested a
decision.
Suppose we need to add the action for robot with ID robot_id_unity
:
- If the next action is going to node
(next_action.x, next_action.y, next_action.g)
, then:
actions.extend([robot_id_unity, next_action.x, next_action.y, next_action.g])
- If there are no more actions and
obs
contains the current observation[row, column, degrees]
of the robot, then:
actions.extend([robot_id_unity, obs[0], obs[1], obs[2]])
Send actions back to Unity
In order to send these actions back to Unity some manipulations have to be done, see also ML agents documentation.
First we convert actions
to a numpy array of the correct size: first dimension is the number of
agents that requested a decision and the second is the action size for each of the agents:
- The number of agents that have requested an action can be found in
len(decision_steps.agent_id)
. - The action size is
self.envData.size_of_action * self.envData.number_of_robots
for centralized andself.envData.size_of_action
for decentralized.
For centralized we have:
actions = np.array(actions)
actions.resize(len(decision_steps.agent_id), self.envData.size_of_action * self.envData.number_of_robots)
For decentralized we have:
actions = np.array(actions)
actions.resize(len(decision_steps.agent_id), self.envData.size_of_action)
Then we have to add actions
to the ActionTuple
object as discrete actions
action_tuple = ActionTuple()
action_tuple.add_discrete(actions)
Finally we set the actions and send them to Unity by calling the step method of self.unityEnv
.
self.unityEnv.set_actions(self.behavior_name, action_tuple)
self.unityEnv.step()
Recall that self.unityEnv.step()
will also send queued information through the
side channels.
Note on requesting actions
In the centralized approach, the script corresponding to the simulation manager extends the ML agents interface
Agent
. In the decentralized approach, the script corresponding to the robot extends Agent
.
Such an agent is an entity that observes its environment and
executes the actions that it receives from the external algorithm. In particular it collects
all observations for all the robots and request decisions for all of them at the same time. The corresponding
gameobject must have a component Behavior Parameters
added to itself. This has already been added in the
Unity editor. If you haven't altered anything in the unity editor this should work correctly.