Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> divergence problems where the training explodes and engages in catastrophic forgetting

This is a new problem I haven't heard about. Thanks for adding it to the discussion.

> Er, this is exactly what OP is all about: the Monte Carlo tree search supervision.

And here, I don't know. I think I exclaimed confidently that this is a super old algorthim. What is new about it, that you need to mention it again? Really, may be that I'm misinterpreting something, or forgetting something, but the algorithm itself I'm quite sure I discussed with Go AI scientists in 2005 who had them in all their bots, afaik. Please correct if you believe my memory is cheating me here.



> but the algorithm itself I'm quite sure I discussed with Go AI scientists in 2005 who had them in all their bots, afaik

You did not because first, MCTS was only published by Coulom in 2006 (so they couldn't 'all' have been using it in 2005), and second because you are missing the crucial iteration between the heavy heuristic (NN) for playouts and the refined estimates from the playouts: https://news.ycombinator.com/item?id=15627834 Any MCTS in 2006 for Go would either be using random light playouts or simple unchanging hand-engineered heavy heuristics, hence, no possible recursive self-improvement in self-play.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: