Sunday, December 24, 2017

Monte Carlo Tree Search / Upper Confidence Bounds Introduction

Monte Carlo Tree Search is the search tree representation I decided to go for in this Puyo AI project.

First: you must have an understanding of game trees. A game AI "thinks" by exploring the Game Tree. Each node represents a board state, with the root-node being a representation of the current board. The children of this node are all of the immediate actions the AI could take. For example, in Chess, a child node could be "Pawn to E5", or in Puyo it could be "Rotate Twice and Drop in Column 3". The specifics of how this game tree is built up depends on the algorithm.

The "value" of a node is a heuristic that the programmer makes up. In chess, this could be "Queens are 9 points and Rooks are 5 points". In Go, most AIs try to evaluate a probability of which player will win from a particular board position. (This move has a 40% chance of winning, or that move has a 60% chance of winning).

In the case of Monte Carlo Tree Search, there are four steps that are tightly looped.

  1. Selection -- A process where the most "interesting" node is chosen. In the case of UCT (Upper Confidence Bounds applied to Trees), selection searches the nodes with the highest value and the fewest visits.
  2. Expansion -- Once a node is selected, a new child-node is added to the tree at that location.
  3. Simulation -- This calculates the "value" of a particular node. In the case of UCT, you play a random game all the way to the "end". For my Puyo AI, I usually cap the simulation to 20-moves (subject to future research). 
  4. Backpropagation -- Once statistics are calculated from the simulation, you change the value of the tree to match the simulation results. From there, the process begins at #1: Selection and the loop continues.

In the animation above, I assume that backpropagation is a simple "maximum" function, but every step of this process has been customized by various researchers. In any case, the growth of this tree is the AI pondering about future steps. In the case of AlphaGo, a self-trained neural network was used as both the "Selection" and "Simulation" algorithms. In the case of UCT, simple-random searches serve as simulation, while selection is a simple function. So there are lots of different ways to implement this methodology.

The overall concept of MCTS is just the above four steps, and many Google Searches will get you this far. But in my opinion, the devil is in the details... details that don't seem to be talked about too much yet. I'll elaborate on these details in a later post.

Thursday, December 21, 2017

Inaugural run

With only 14kb of RAM... to fit inside of L1 cache... I have relatively low expectations of this initial version of the PuyoAI. With that said, this is the first time I got to see my AI in action. And these results please me.

You can see the first 200-moves of my AI on PasteBin: https://pastebin.com/tBycFJyH

My "Legal Moves" check is a bit simplified: Puyos drop from the top, I don't worry about ghosted columns, and death only happens if ALL columns fill up. With that said, the AI does seem to have a reasonable amount of intelligence. It was able to play all 200 moves without killing itself, and it also managed to execute a fancy 7-chain on move 62.

See the Chainsim here: https://puyonexus.com/chainsim/chain/XCJdA



Even for a weak AI that barely even sees 2 move ahead... I think the formations demonstrated in this "first chain7" show some promise. In particular, the last "clear with garbage" was on iteration 44. So the AI spends ~18 turns building up this formation. If we look at the "before and after" shots, from iteration 44 through 62, we'll see the following:




The reds are popped in column 3/4, which leads to an interesting Yellow and Red L-shape in column 2/3.

There are "bad shapes" everywhere still: there's a "Very Bad GTR" on the right-hand side, where the Red Puyo is blocked in by the Yellow Puyos. But the GTR formation looks like it has some promise: the Yellows on the 2nd row from the bottom could turn into a real chain, and the left side of the formation looks like it can be turned into a serious tail.

Maybe I'm just being too optimistic here with the abilities of a grossly memory-constrained Puyo AI. But again, considering that this AI only had 14kB to work with, I'm very impressed that it managed to make anything that even looks remotely decent.

My next goal is to extend this AI to use 8-thread / 4-cores of my processor, and simultaneously allow this AI to grow to MB or even GB of thinking-space. At that point, I expect it will be a much better player!

Wednesday, December 20, 2017

Simple Stats

My current benchmark: 1512519 simulations in 36.074 seconds.


This called the "Little_MCTS" function 1000 times. There was a printout that indicated the total of 1512519 "visits".

Little_MCTS is my "cache-friendly" implementation of MCTS, with 16-bit "pointers" on an address space of 14kb. My plan is to run Little_MCTS on 8-threads (across 4 cores with hyperthreading), on my (kind of old) Sandy Bridge i7-2600k processor.

Sandy Bridge has 32kb of L1 data-cache (with 32kb on instructions cache). With two threads on one core running, that would use 28kb of the L1 cache, 4kb left for stack-data and other calculations. Hopefully that's enough? There's a lot of pointer-chasing in the "select" method in MCTS

The #1 "inner" function is "pop", at 56% of the time. "Pop" is my implementation of the flood-fill problem, that looks for Puyo-sets of 4. It basically implements the Puyo scoring algorithm. I've racked my brain at how to improve the speed of the "Pop" algorithm, but I haven't come up with anything faster yet.

Assuming ~100 bytes per simulation in the "full" implementation (not yet done), it would require 8GB / 100 bytes == 80-million simulations to fill it up. Assuming a 4x multiplier from 4-cores / 8-threads hyperthreaded, that's 8-minutes or so per analysis session.

Which is... way slower than most humans play. But for a deep-analysis of Puyo positions, that actually seems reasonable. I think this project, at this speed tier, can actually be useful at that speed!

Puyo AI Project

With freely available programs like Stockfish and SCID vs PC, it is possible to constantly have a super-human level analysis applied to every Chess game you play. You can find missed opportunities on both sides, and hone your chess skills to the max.

For a game like PuyoPuyo however, no such super-human AI exists. And my "Open Source Itch" is to develop some self-learning tool for myself, and the greater PuyoPuyo community. True, there are lots of training tools and tutorials out there, such as the excellent PuyoNexus chain simulator, and the community is quite helpful when discussing strategy. But there's nothing quite like a super-human AI that automatically tells you mistakes as you're playing the game.

The first step to solving this problem is to somehow create a super-human AI. But how to do it?

With Google's success with "AlphaGo" in recent years, I've grown to trust the Monte-Carlo Tree Search algorithm, although I'm still suspicious of "Deep Learning" or other neural network methodologies, at least on consumer hardware. Clearly a legion of 100TFLOP or 200TFLOP "TPUs" in Google supercomputers change the game (as AlphaGo has demonstrated), but standard hardware is limited to 1000x less performance with ~100GFLOP AVX instructions. Even if I used a more difficult GPGPU methodology, the best $800 GPUs only get 10TFLOPs, and I'd expect training and inference to take too long under those conditions.

So my initial plans are to create this PuyoPuyo AI that plays under the classical MCTS with "light random rollouts" akin to the original MCTS "MoGo" AI.

In short, expect blogposts on the following:

  • Puyo Puyo, a great little niche puzzle game
  • Brief OpenCL / GPGPU discussion, and why I'm holding off on this tech for this project
  • Monte-Carlo Tree Search, and how it relates to game trees.
  • Optimizations I'm employing to shrink down the memory-space to better fit into L1 Cache. Faster speed means more nodes explored, which will always improve the AI's strategy
  • SIMD opportunities in the MCTS algorithm.

Of course, as I learn about new opportunities, I'll write down my thoughts.