My current benchmark: 1512519 simulations in 36.074 seconds.
This called the "Little_MCTS" function 1000 times. There was a printout that indicated the total of 1512519 "visits".
Little_MCTS is my "cache-friendly" implementation of MCTS, with 16-bit "pointers" on an address space of 14kb. My plan is to run Little_MCTS on 8-threads (across 4 cores with hyperthreading), on my (kind of old) Sandy Bridge i7-2600k processor.
Sandy Bridge has 32kb of L1 data-cache (with 32kb on instructions cache). With two threads on one core running, that would use 28kb of the L1 cache, 4kb left for stack-data and other calculations. Hopefully that's enough? There's a lot of pointer-chasing in the "select" method in MCTS
The #1 "inner" function is "pop", at 56% of the time. "Pop" is my implementation of the flood-fill problem, that looks for Puyo-sets of 4. It basically implements the Puyo scoring algorithm. I've racked my brain at how to improve the speed of the "Pop" algorithm, but I haven't come up with anything faster yet.
Assuming ~100 bytes per simulation in the "full" implementation (not yet done), it would require 8GB / 100 bytes == 80-million simulations to fill it up. Assuming a 4x multiplier from 4-cores / 8-threads hyperthreaded, that's 8-minutes or so per analysis session.
Which is... way slower than most humans play. But for a deep-analysis of Puyo positions, that actually seems reasonable. I think this project, at this speed tier, can actually be useful at that speed!

No comments:
Post a Comment