Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While Deep-Q-Learning is powerful, it uses backprop - your use of CNE genetic evolution rather than backprop may provide a more global search than Mnih's Deep-Q.

The methods seem complementary , I have an inkling that the combination may be greater than the sum of the parts.

Another innovation in the Atari paper, Mnih's deep-Q-agent remembers successful episodes and trains more from memory than experience.

Of course they are not training against a human player but perhaps this memory training could be used with human games to exploit expert knowledge.

The recent Neural Net Go players by Edinburgh's Clarke & Storkey group and Deepmind's Maddison & Huang both train on corpii of expert human play.

But the Go players are passive backprop learners.

Tesauro's TD-Gammon learned to expert level through self play.

I am speculating that a Reinforcement Learner could learn to utilise the benefits of human play with memory, expert corpii and self-play.

This is a fascinating parallel area - two equal players rather than Mnih's single player Atari - perhaps there are good 2-player Atari games that could be added to the ALE benchmark - Joust ?



*references

Teaching Deep Convolutional Neural Networks to Play Go by Clarke & Storkey http://arxiv.org/abs/1412.3409

Move Evaluation in Go Using Deep Convolutional Neural Networks by Maddison , Huang , Silver & Sutskever http://arxiv.org/abs/1412.6564

Tesauro's TD-Gammon http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node108.htm...

Playing Atari with Deep Reinforcement Learning , Mnih et al. https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Otoro's CNE Bibliography http://blog.otoro.net/2015/01/27/neuroevolution-algorithms/

Neuro Evolution - http://en.wikipedia.org/wiki/Neuroevolution


Thanks for the references

So you think basically recording human actions (possibly in some feature extracted way) in part of the 'replay memory' for DQN would work well?

I have also been reading about deepmind's recent survey of combining deep learning methods with actor critic models

What I also want to explore is the possibility to use evolution to evolve q-functions, which shouldn't be so hard to do, rather than evolve policies directly (like in this game).

The possibility to evolve self learning machines excite me, rather than just evolving machines in a fixed state.

I can also explore whether Darwinian evolution (weights are randomized at birth) is better or worst than Lamarckism evolution where weights are passed to offsprings for.

I put some further thoughts and references in my post here

http://blog.otoro.net/2015/04/08/evolutionary-function-appro...

The surviving agents will be born with a better capacity to learn rather than the capacity to do a predefined job. Stay tuned!


http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-i...

This slide from David Silver's ICLR talk hint's at Google Deepmind's Gorila Parallel Large Scale Actor Critic Deep Q Architecture

There is some evidence that expert curriculii can make learning much faster , although with game agents I dont know of anyone exploring this since Michie and Chamber's 1968 work on tic-tac-toe and pole-balancing comparing expert training and self-play with these benchmarks.

http://aitopics.org/sites/default/files/classic/Machine_Inte...

Collobert, Weston and Bengio have explored evolving efficient curicula

http://ronan.collobert.com/pub/matos/2009_curriculum_icml.pd...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: