Happy MAR10 Day! I trained a PPO agent to play Super Mario Bros and built a live browser viewer for it

Since it’s Mario Day, I figured this was the right time to post this.

I’ve been working on a small local project where I train a PPO reinforcement learning agent to play Super Mario Bros. The original goal was basically: stop reading about RL and actually build something with it.

So now I have a little AI Mario setup running locally with Stable-Baselines3, a Gym-compatible NES environment, and a browser viewer I built with FastAPI so I can watch the training live instead of only looking at charts.

Which means I now get to witness, in real time, a digital organism repeatedly make the worst possible decision near the same pipe for 20 minutes until it suddenly evolves and starts doing something almost competent.

Current status:

– it can reliably learn basic forward movement

– it gets better at handling common obstacles

– it still has moments where it looks like it has never seen Mario before in its life

– different seeds and hyperparameters can turn it into either a promising speedrunner or a complete clown

A lot of the work has been around:

– frame preprocessing

– limiting the action space

– reward shaping

– training stability

– checkpoint/resume support

So if you’re into Mario, RL, game AI, or just enjoy watching a machine slowly develop confidence and then immediately run into danger, you might get a kick out of it.

Repo is here:

https://github.com/mgelsinger/mario-ai-trainer

Also, if anyone has good ideas for PPO tuning, curriculum learning, or just making the little guy stop throwing runs for no reason, I’d love to hear them.

by pleasestopbreaking

Happy MAR10 Day! I trained a PPO agent to play Super Mario Bros and built a live browser viewer for it

Write A Comment