A lot of comments in this thread are talking about how unsatisfactory this paper...

A lot of comments in this thread are talking about how unsatisfactory this paper is, because of course rewards are enough for _some_ agent, but this paper doesn't venture to say anything about that agent (e.g. what differentiates humans from squirrels even though both are trying to eat and reproduce?).

But I think though they talk about rewards incessantly, the interesting angle is the importance of a complex environment in which the agent learns to maximize a reward:

> we suggest the emergence of intelligence may be quite robust to the nature of the reward signal. This is because environments such as the natural world are so complex that intelligence and its associated abilities may be demanded by even a seemingly innocuous reward signal.

On the one hand, this does echo some old work on situated cognition, which one might actually believe. But perhaps politically, if the claim behind the claim is that we can only develop powerful AGI which understands how to interact with our world by developing agents that learn with unfettered access to the world, then perhaps this will be the beginning of a strong push for tolerating spastic ineffective robots in our physical environments, and letting error-prone agents have vast access to our virtual environments. We'll be asked to put up with their mistakes because that's supposedly the cost of progress; limiting their environment would limit their cognitive potential.

> For example, consider a signal that provides +1 reward to the agent each time a round-shaped pebble is collected. In order to maximise this reward signal effectively, an agent may need to classify pebbles, to manipulate pebbles, to navigate to pebble beaches, to store pebbles, to understand waves and tides and their effect on pebble distribution, to persuade people to help collect pebbles, to use tools and vehicles to collect greater quantities, to quarry and shape new pebbles, to discover and build new technologies for collecting pebbles, or to build a corporation that collects pebbles.

Aren't you comforted they chose round pebbles instead of paperclips? And though their example is meant to illustrate that the reward function doesn't matter, you'll notice there's no negative reward term for e.g. smashing a retaining wall to dig for pebbles in the rubble, or dredging a beach where a protected bird species nests, etc. "Allowing the agent to fully explore the complex environment is the only way it will learn complex representations and actions!"