Saturday, March 26, 2016

Google's AlphaGo Wins with Value and Policy

AlphaGo is a deep-learning bilateral neural network, which is a computer program that has two main personalities; Value and Policy. AlphaGo Value and Policy in effect talk with each other but have fundamentally different feelings about how to win the game of Go.

theverge AlphaGo

Value loves to win more than he hates to lose, but Policy hates to lose more than she loves to win. In other words, AlphaGo Value and Policy represent the basic nature of feeling and choice that people recognize as consciousness. The basic definition of consciousness is a recursion of action and sensation; a conscious person acts just like they see (or sense) other conscious people act.

AlphaGo Value corresponds to the human emotion of pleasure while Policy corresponds to the human emotion of anxiety. People make choices based on a singular feeling that involves the processing of many pairs of complementary emotions and not just pleasure and anxiety. Compassion and selfishness, for example, is how people bond or conflict with others, joy and misery, anger and serenity, and pride and shame complete a basic set of complementary emotions that approximate human feeling.

AlphaGo Value gets great pleasure in winning as many stones as possible and Value is willing to take risk while AlphaGo Policy is anxious about losing only one stone...the one stone that wins the game, and plays very cautiously and avoids risk. While Value takes risk and goes for as many stones as possible to win, Policy avoids risk and settles for the one stone that wins the game.

We journey in our lives desiring the same bilateral futures as AlphaGo; a part of us wants the pleasure of winning big and taking risks and a complementary part of us is forever anxious about simply getting by and surviving by avoiding risk. People have more complex emotions than just pleasure and anxiety and so we have more complex and cooperative relationships with other people and the environment.

What AlphaGo's two personalities represent is a fundamental part of the recursion of consciousness and therefore what it is that we mean when we say that someone is conscious. Lee Sedol is the Go world champion who played against Value and Policy, who had been playing each other for four months prior and had therefore both won and lost literally millions of games with each other prior to the match with Sedol.

Sedol has played many hundreds of thousands of games during his life but is simply not able to play the many millions of games that taught Value and Policy how to beat him. While Sedol will improve his skill by playing Value and Policy together, he might do much better playing Value and Policy separately.

Also, to be fair, AlphaGo should also have additional human personalities like anger and shame, for example. That way Sedol could gain advantage in ways that better represent the complexity of human consciousness. The question is...how do you make an AlphaGo angry at a opponent's board position or ashamed of a its own board position. And of course, AlphaGo must show some compassion in not crushing its opponent and allowing some victories while still being selfish enough to win the tournament.