Game Playing Opponents and the Minimax Algorithm
In many games, the goal of the game is to use logic to determine the best possible move at any given point in the game. You have probably played Tic-Tac-Toe enough times to know that some moves are better or worse for a player to make. When designing opponents for games like these you can use an algorithm called Minimax (Apple and some others call it MinMax, as in GkMinmaxStrategist).
The basic concept of the algorithm is to provide a way to measure the value of a game state while the game is in play. So for instance in Tic-Tac-Toe if you were playing X, a game state where you have two X’s in a row, and there is a blank square available for you to move in so that you will have three Xs in a row would be a very powerful and high ranking game state. Having a heuristic for the value of a game state is essential for this algorithm to work.
Minimax Algorithm
At any given game state, look at all of the moves that you can make and for each of those moves, assign a value to the resulting game state using the heuristic that you created. Then simply choose the move that has the highest value.
In a normal implementation, you determine the value of each of the possible moves by looking at all of the reactions your opponent can make after you have made the move. The algorithm also needs a way to generate a value for each move that the opponent can make. You should consider that your opponent will make the move that will result in the game state being the best for him or her.
Implementing Minimax
Below is pseudocode for the minimax algorithm:
def minimax(node, depth, maximizingPlayer): if depth == 0 or node is a terminal node: return the heuristic value of this node if maximizingPlayer: bestValue = −∞ for child in node: v = minimax(child, dept - 1, False) bestValue = max(bestValue, v) return bestValue else: # Minimizing player bestValue = +∞ for child in node: v = minimax(child, depth - 1, True) bestValue = min(bestValue, v) return bestValue
The algorithm above is a recursive algorithm. As such it has some drawbacks. Also, each call to the function will create a set of game states. The more complicated the game state and the more options that the player has at any particular game state the more memory the algorithm will require to function.
Visualizing Minimax
Minimax is often viewed as a tree where the top node is the current position, and it will be the best move available to the player. The immediate children of that node represent the second player’s options based on the first move. The tree then continues with alternating players options. In Tic-Tac-Toe, you usually have more than one possible move, the first player to move has 9 choices, and then each round, each player has one less choice.
The tree is created with a single player in mind, and triangles that face up are Maximizing (see algorithm above) and triangles that face down are minimizing. There will be one node for each possible move that can be made by a player. For maximizing nodes, it is the number of moves for the current player, for minimizing nodes, it is the number of moves for the opposing player.
Each triangle has a score, the score is written next to the triangle and represents the value of that child. Only terminal nodes have actual scores, each node above the terminal node will choose a node below it. Minimizing nodes will choose the lowest score and maximizing will choose the highest score.
In the graph above, the agent will choose A1 as it’s move. Because that is the best possible move to make. However, the opponent may choose any move at that point. It could pick A12 for instance, and that would lead to a better outcome for the player. But the worst outcome for the player (opponent chooses A11) will still lead to the best possible outcome for the agent at the time that the move was made.
Of course, this ‘best’ move is only the best move based on the quality of the score, and that is the quality of the heuristic.
[1] Image from http://cs-alb-pc3.massey.ac.nz/notes/59302/l05.html