Treasure Hunt with SARSA: A Reinforcement Learning Adventure

Welcome to the Treasure Hunt with SARSA! This project is a fun and interactive way to explore reinforcement learning concepts using the SARSA algorithm. You’ll guide an agent through a maze-like environment, avoiding obstacles, collecting treasures, and finding the exit while managing energy consumption.

Introduction
Features
Installation
Usage
Training
Testing
Visualization
TODO List
Contributing
License

Introduction

In this project, you’ll be tasked with training an agent to navigate a 20x20 grid world. The agent starts at a fixed initial position and must find the exit while avoiding obstacles and collecting treasures. The environment includes:

Obstacles: Randomly placed on the map.
Treasures: Randomly placed near obstacles.
Exit: Randomly placed on the map’s boundary.

The agent receives rewards and penalties based on its actions:

Treasure: +10 reward.
Obstacle: -5 penalty.
Exit: +50 reward.
Boundary: -1 penalty.
Repeated Path: -2 penalty.
Energy Depletion: -10 penalty.

The agent’s energy decreases with each move, and if it runs out of energy, the episode ends.

Features

SARSA Algorithm: Implements the SARSA algorithm for reinforcement learning.
Dynamic Environment: Randomly generated obstacles, treasures, and exit.
Energy Management: The agent must manage its energy to avoid depletion.
Reward System: A comprehensive reward system to guide the agent’s learning.
Visualization: Real-time visualization of the agent’s progress during training and testing.

Installation

To get started, clone the repository and install the required dependencies:

git clone https://github.com/yourusername/treasure-hunt-sarsa.git
cd treasure-hunt-sarsa
pip install -r requirements.txt

Usage

To train the agent, run:

python main.py

You’ll be prompted to choose a model type. Currently, only sarsa is supported.

Training

The training process involves running multiple episodes where the agent learns to navigate the environment. The training progress is visualized every 10,000 episodes, and the average reward is printed every 1,000 episodes.

Testing

After training, you can test the agent’s performance by running:

python main.py

The agent will navigate the environment using the learned Q-values, and the best path will be visualized.

Visualization

The visualization shows the agent’s position, obstacles, treasures, and exit on the map. During training, the map is updated every 10,000 episodes. During testing, the best path taken by the agent is displayed.

TODO List

Here are some exciting features and improvements planned for the future:

Multi-Agent Support: Allow multiple agents to explore the environment simultaneously.
Different Environments: Introduce different map sizes and layouts.
Advanced Algorithms: Implement other reinforcement learning algorithms like Q-Learning, DQN, and PPO.
Energy Regeneration: Introduce energy regeneration points on the map.
Dynamic Obstacles: Make obstacles move or change positions over time.
User Interaction: Allow users to manually guide the agent during testing.
Performance Optimization: Optimize the training process for faster convergence.
Documentation: Improve documentation and add more examples.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Happy treasure hunting! 🗺️💎