Tic-Tac-Toe, Deep Learning and Problem Representations

Data Driven

6 min readFeb 18, 2021

The source of knowledge is either obtained through theory (rules) or from experimental observations (data) (Dubois, Hájek and Prade, 2000). This post explains how one can train a neural network to play tic-tac-toe by utilising experimental observations. The notebook used to create the AI Agent can be found here.

Approach

The approach followed was inspired by image classification. On a conceptual level, we would like an AI agent to look at a tic-tac-toe board, then observe what moves are being played and follow if those moves lead to a win. Much the same as a human watching someone else play. However, the AI agent does not know the rules of tic-tac-toe, nor will anyone explain it to the AI agent. The AI agent will have to learn by experimenting and observing.

The two main processes the notebook followed were: First to create training data through self-play and second to take the game-play data generated by the first process and train a neural network.

Board Representation

For the AI agent (the neural network) to see the board and understand the game state, the board and the moves data were reformatted using one-hot encoding. The ‘X’ columns represent the board state, and the ‘y’ columns represent the move played.

The eighteen fields in the ‘X’ columns represent the board-state. The first 9 represent the ‘X’ positions and the second 9 represent the ‘O’ positions. The moves in the ‘y’ columns are represented by nine fields, indicating which move was played. The board-state was the input into the neural network and the moves played were the labels that the neural network was trained on.

Generate training data by game-play

By playing the game of tic-tac-toe, game-play data was generated. The game-play data contained the board-state, the moves that were made and who won the game. In order to generate gameplay data, an untrained agent (Agent U) played against another untrained agent (Agent S). Moves were made by random:

move = random.randint(1, 9)move_array = get_y(move)

In terms of the number of wins, the games played were equal in the outcome, this was to be expected as they were both playing random moves.

The moves that lead to a win were used to train the neural network. The neural network was created as:

The trained agent used the following logic to generate a move while still keeping randomness:

gamma = 3ynew = loaded_modelS.predict_proba(iX)ynew[0] = ynew[0] ** gamma / np.sum(ynew[0] ** gamma)ynew[0] = np.around(ynew[0].astype(float), decimals=3)ynew[0] /= ynew[0].sum()ynew[0] = ynew[0] * irandomness     # allow some randomnessout_ynew = np.random.multinomial(1, ynew[0])

The trained agent (Agent S) and the untrained agent (Agent U) then played against each other, where the trained Agent S won more games than the untrained agent.

A final round of training included the trained agent (Agent S) to play against itself. This generated more training data and was used to train Agent L.

The final result:

The agents’ ability to play tic-tac-toe improved over the cycles:

You can play Tic-Tac-Toe against the AI Agent here. Question: Can the same AI agent play other games?

An isomorph of Tic-Tac-Toe

In the book “The science of the artificial”, Herbert Simon explained a card game called scrabble as an isomorph of tic-tac-toe (2019).

The rules of Scrabble

“The game is played by two people with nine cards — let us say the ace through to the nine of hearts. The cards are placed in a row, face-up, between the two players. The players draw alternately, one at a time, selecting any one of the cards remaining in the centre. The aim of the game is for a player to make up a ‘book’, that is, a set of exactly three cards whose spots add to 15 before the opponent can do so. The first player who makes a book wins; if all nine cards have been drawn without either player making a book, the game is drawn” (Simon, 2019).

Representation of the Design

“The magic here is made up of numerals from 1 through 9. Each row, column, or diagonal add to 15, and every triple of these numerals that add to 15 is a row, column, or diagonal or the magic square. From this, it is obvious that making a book in number scrabble is equivalent to getting three in a row in the game of tic-tac-toe” (Simon, 2019).

4|9|2— — —3|5|7— — —8|1|6

When trying to play the scrabble card game in comparison to tic-tac-toe, it is clear that the card game is more difficult for a human to play as the representation of the problem is much more complex, even though it is in essence the same game. The AI agent created by training on tic-tac-toe data, had no problem to play the game of scrabble. A small web app was built to allow you to play against the AI agent.

You can play the game of scrabble against the AI Agent here.

Summary

An AI agent was able to learn how to play tic-tac-toe by only self-play and this ability improved over time. When playing a more complex representation of the game, as with the isomorph example explained by Herbert Simon, it is understandable that by playing the game of scrabble compared to playing tic-tac-toe requires more cognitive effort. Because the AI agent was trained on data and not programmed according to the rules of tic-tac-toe, the AI agent had no issue being able to play the game of scrabble. This shows the ability of AI to pick up complex relationships in data, which is much more difficult for us to do.

References

Dubois, D., Hájek, P. and Prade, H. 2000. Knowledge-Driven versus Data-Driven Logics. Journal of Logic, Language, and Information, 9:65–89.

Simon, H. A. 2019. The science of the artificial. 3rd edn. Cambridge, MA: The MIT Press.