With deep learning, perception is now solved. However, for AI with cognitive capacity it needs to know how to link together multiple actions.
Most approaches to this requires agency: systems that act in the world in pursuit of some goal. A sophisticated agent might be able to plan ahead. A simple agent is more likely to have just reflexes.
We can set a goal, and we can examine if the agent is able to achieve that goal. If it’s too complicated we can break that down in to sub tasks. Crucial we need to demonstrate to the user what the agent attempts to do so that the user can intervene.
Machine vision started to work in 2012. Language is starting to work now. Conversational AIs are about to get much much better.
Voice as a UI; what does this enable: