A PD World environment was developed where agents take turns picking up and dropping off blocks. SARSA and Q-Learning algorithms, utilizing a Q-table and various action policies, were implemented to help agents learn the optimal path. This approach resulted in a 65% improvement in task completion efficiency compared to random movements.
SOURCE CODE
https://github.com/LePoisson104/3agentsSTACK
Python
LIBRARIES
numpy, mathplotlib, cv2
DESCRIPTION
3 agents: red, blue, black | pickup blocks: green | dropoff blocks: brown
