> divergence problems where the training explodes and engages in catastrophic fo...

gwern · on Nov 6, 2017

> but the algorithm itself I'm quite sure I discussed with Go AI scientists in 2005 who had them in all their bots, afaik

You did not because first, MCTS was only published by Coulom in 2006 (so they couldn't 'all' have been using it in 2005), and second because you are missing the crucial iteration between the heavy heuristic (NN) for playouts and the refined estimates from the playouts: https://news.ycombinator.com/item?id=15627834 Any MCTS in 2006 for Go would either be using random light playouts or simple unchanging hand-engineered heavy heuristics, hence, no possible recursive self-improvement in self-play.