While Deep-Q-Learning is powerful, it uses backprop - your use of CNE genetic ev...

deepnet · on May 13, 2015

*references

Teaching Deep Convolutional Neural Networks to Play Go by Clarke & Storkey http://arxiv.org/abs/1412.3409

Move Evaluation in Go Using Deep Convolutional Neural Networks by Maddison , Huang , Silver & Sutskever http://arxiv.org/abs/1412.6564

Tesauro's TD-Gammon http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node108.htm...

Playing Atari with Deep Reinforcement Learning , Mnih et al. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Otoro's CNE Bibliography http://blog.otoro.net/2015/01/27/neuroevolution-algorithms/

Neuro Evolution - http://en.wikipedia.org/wiki/Neuroevolution

hardmaru · on May 13, 2015

Thanks for the references

So you think basically recording human actions (possibly in some feature extracted way) in part of the 'replay memory' for DQN would work well?

I have also been reading about deepmind's recent survey of combining deep learning methods with actor critic models

What I also want to explore is the possibility to use evolution to evolve q-functions, which shouldn't be so hard to do, rather than evolve policies directly (like in this game).

The possibility to evolve self learning machines excite me, rather than just evolving machines in a fixed state.

I can also explore whether Darwinian evolution (weights are randomized at birth) is better or worst than Lamarckism evolution where weights are passed to offsprings for.

I put some further thoughts and references in my post here

http://blog.otoro.net/2015/04/08/evolutionary-function-appro...

The surviving agents will be born with a better capacity to learn rather than the capacity to do a predefined job. Stay tuned!

deepnet · on May 13, 2015

http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-i...

This slide from David Silver's ICLR talk hint's at Google Deepmind's Gorila Parallel Large Scale Actor Critic Deep Q Architecture

There is some evidence that expert curriculii can make learning much faster , although with game agents I dont know of anyone exploring this since Michie and Chamber's 1968 work on tic-tac-toe and pole-balancing comparing expert training and self-play with these benchmarks.

http://aitopics.org/sites/default/files/classic/Machine_Inte...

Collobert, Weston and Bengio have explored evolving efficient curicula

http://ronan.collobert.com/pub/matos/2009_curriculum_icml.pd...