Lessons Learned Reproducing a Deep Reinforcement Learning Paper (2018)

Buttons840 · on April 27, 2023

Ah the siren call of reinforcement learning. I'm experiencing the pain of this right now.

I created a reinforcement learning agent to play Slay the Spire, it's training right now, I can hear my desktop computer running like a heater (why don't I do these things in the winter?). It's not working though.

First, the code is complex and hard to get right. This isn't your typical corporate programming where you shove more and more if-statements into the pile until it mostly works and is good enough. It's super tricky and if the code becomes a mess you have no idea where things went wrong. Numbers go in, numbers comes out, where they went wrong nobody knows.

Second, and worse, the entire time you're building the thing you imagine its going to work and be awesome. And indeed, it might. But then when you actually think you're finished and try it, it doesn't work. Then, as described in the article, all you can do is start randomly fiddling with things, and the feedback loop is like 2 days. And ever in the back of your mind is the idea that if you just let whatever your currently trying run for long enough, it will eventually work, so you hate to pull the plug, even after several days of it not working.

Third, there's no clear signal about how well a model is performing except for real world performance, and that is expensive to obtain and not smooth. My agent keeps dieing to the first boss. Is it improving? I don't know, maybe? The first boss is a pretty big hurdle so I expect it to be stuck there for some time. What about the loss values? Well, the loss is going up and I have no idea whether that's okay or not. The loss is going up while the real world performance is also going up, explain that to me. RL is weird. I wish I could just minimize a loss function like in supervised learning.

reallymental · on April 27, 2023

This feels exactly like the loop I'm stuck in right now, making me rely on sudden waves of motivation to complete such projects rather than curiosity driving me. Have you ever come out of this cycle before? I.e. Become more effective in planning a project so it doesn't lead one directly into a death-loop like above?

On a side-note, love your writing style, it's dry and sprinkled with the right amount of humour.

Buttons840 · on April 27, 2023

Thanks.

I'm about to give up on this project. Solving Slay the Spire on a single computer using model free methods was always quite ambitious. I've seen enough learning to believe that it works, I know I've implemented the algorithms correctly enough, that feels good at least. I've learned a lot along the way.

I might try to get everything running in a container, then I can just ship the containers out to run on the cheapest hosting I can find and I won't have to babysit the experiments so much.

I was recently watching https://karpathy.ai/zero-to-hero.html and at one point he basically says "stop, it's not time to train yet, first we need an experimental harness", and I've been thinking a lot about that. That is to say, I haven't actually followed the advice yet, because who the hell can resist trying as soon as there's the slimmest chance it might work.

jiggawatts · on April 28, 2023

> "stop, it's not time to train yet, first we need an experimental harness"

I was going to say that the model is probably stuck on the boss because it doesn't get sufficient success feedback for incremental improvement.

The Google Alpha team used simplified games that allowed the AI to get over humps like this.

reaperman · on May 8, 2023

So it's tricky. I believe the solution is to break the gameplay down into basic tasks - moving to a targeted point, deciding which point to move to, avoiding attacks, attacking, etc.

But the more "tasks" you train the RL AI on, its like you put more guardrails on it and it becomes closer and closer to "supervised" learning -- it's only learning what you imagined for it. Which misses out on the fun of a lot of the potential for RL AI, which is that it should come up with novel techniques and strategies to games that humans can learn from.

So, yeah. Personally I feel like the answer is doing lots of different versions of mini-training for tons of different games -- but just the sub-part training for the games. Then once it's trained on the "universal mechanics" of gameplay, allow it to start playing more games.

The trick is going to be super-generalization. IMHO. But in AI, opinions are a dime a dozen.

jarym · on April 28, 2023

Much the same experience as you. Making RL work is more like a marathon than a sprint and requires patience and bouts of inspiration.

All this tells me is there’s more room for improvement in the algorithms.

BTW: I’m really glad more researchers are including code. Often I see nuances in the code not mentioned clearly in the paper / would make blind reproduction difficult to say the least.

dekhn · on April 27, 2023

What's described here is roughly true of any deeply quantitative scientific research that depends on optimization and statistics, although for a number of reasons, it's far worse in ML than other fields.

I worked on a lot of "reproductions" in my career: given a paper, implement the paper and get the same results as the authors. As I got better, I learned techniques on how to be more effective, for example often running optimization many times and seeing how sensitive the optimization was to random seeds. Often times I'm at the point of inspecting individual steps in the data load/augmentation pipeline (buggy 90 degree image rotations are very common, as are swapped axis rectangle crops)

None of this should be surprising when we recognize that the people producing these papers are incentivized to move as quickly as possible, not produce stable systems for people with less ML skill. Any time a paper includes the exact source code, training data, and the shortcut to get training working, it saves me roughly 2-3 weeks at least.

And don't get me started on reproduction attempts that ultimately lead to papers being retracted. It's painfully common to fail to reproduce the author's work, then get access to their source, and find basic bugs that question the results.

Mxbonn · on April 27, 2023

"... a better project might be to read papers until you find something you’re really interested in that comes with clean code, and trying to implement an extension to it."

Doesn't have to be clean but starting from a reproducible codebase at least gives you the guarantees that you have everything. Too often papers omit details that are crucial for a successful reproduction.

henning · on April 27, 2023

This is also my experience trying to make genetic programming code do anything non-trivial. Very tricky. Not great when you have limited spare time.

bartek_gdn · on April 27, 2023

Very informative post, great job!

bartek_gdn · on April 27, 2023

I strongly recommend the book by Sutton and Barto https://web.stanford.edu/class/psych209/Readings/SuttonBarto...