> divergence problems where the training explodes and engages in catastrophic forgetting
This is a new problem I haven't heard about. Thanks for adding it to the discussion.
> Er, this is exactly what OP is all about: the Monte Carlo tree search supervision.
And here, I don't know. I think I exclaimed confidently that this is a super old algorthim. What is new about it, that you need to mention it again? Really, may be that I'm misinterpreting something, or forgetting something, but the algorithm itself I'm quite sure I discussed with Go AI scientists in 2005 who had them in all their bots, afaik. Please correct if you believe my memory is cheating me here.
> but the algorithm itself I'm quite sure I discussed with Go AI scientists in 2005 who had them in all their bots, afaik
You did not because first, MCTS was only published by Coulom in 2006 (so they couldn't 'all' have been using it in 2005), and second because you are missing the crucial iteration between the heavy heuristic (NN) for playouts and the refined estimates from the playouts: https://news.ycombinator.com/item?id=15627834 Any MCTS in 2006 for Go would either be using random light playouts or simple unchanging hand-engineered heavy heuristics, hence, no possible recursive self-improvement in self-play.
This is a new problem I haven't heard about. Thanks for adding it to the discussion.
> Er, this is exactly what OP is all about: the Monte Carlo tree search supervision.
And here, I don't know. I think I exclaimed confidently that this is a super old algorthim. What is new about it, that you need to mention it again? Really, may be that I'm misinterpreting something, or forgetting something, but the algorithm itself I'm quite sure I discussed with Go AI scientists in 2005 who had them in all their bots, afaik. Please correct if you believe my memory is cheating me here.