I worked on UCT-based (one-arm bandit algorithm) go players for 3 semesters. It was truly a great, if often frustrating, experience.
It's a tough problem to get your head around. Sure, the premise makes sense- that the Monte Carlo heuristic plus the UCT tree-exploration algorithm balances promising nodes with uncertain ones to intelligently expand the tree. But tweaking that algorithm usually hurt performance more than helped, and the tweaks that did improve gameplay were non-intuitive.
One semester our goal was to improve the random-playout generator to play slightly more human-like games. It turns out that almost any play enhancements made in the Monte Carlo simulator crippled performance (for instance: instead of playing randomly, search the board for an opportunity to capture the opponent). The best result we had was achieved by applying the capture-if-possible heuristic on only two positions per random move (ex: at the beginning of the MC-simulator's move, pick two positions randomly- if moving in either one captures the opponent, do it). Applying that heuristic on any more than 2 positions per move made the random games play out more slowly than it was worth. Needless to say, a lot of time was spent testing our tweaks (which itself was quite time consuming).
It was frustrating to see our initial seemingly sensible tweaks cripple performance, and it definitely took some time to calibrate ourselves to the problem at hand. But I probably learned more about programming while working on Go than in any other 3-semester span.
Peter Drake's Orego player is a great starting point for exploring the UCT algorithm (Peter Drake wrote a popular Java textbook you may have used in undergrad). I highly encourage people interested in AI to check it out:
I'm confused as to why you would want to capture. My understanding is that pro players try to avoid having to actually capture unless necessary. (I.e. if they are winning a capturing race then the right move is always to tenuki unless the other player does something that demands a response.)
Real-life tactics unfortunately don't make for good heuristics in the Monte Carlo simulator.
Our goal was to improve overall gameplay by making the Monte Carlo simulator play slightly more realistically than random. Keep in mind that when the Monte Carlo simulator plays completely random games, it still does pretty good. I believe it's on par with GnuGo, a good pre-UCT go player, on a 9x9 board on a modern quad-core.
Whether a heuristic applied within the MC-simulator generates sample games that better assess the strength of a move is a tricky question to ponder. But it seems reasonable that most non-random tactics, however simple, would accomplish that (such as proximity- or capture-based tactics).
The reason that good real-life tactics do not correlate with improved gameplay when implemented in MC-simulators is because the logic required for even simple tactics can reduce the number of semi-random games the computer can play by several orders of magnitude. Picking a random move is extremely quick (choosing a number with a Mersenne Twister is faster than adding two numbers- compare that to the operations required to determine how many pieces surround a given position). That's partly why very simple heuristics like capture-if-possible are only effective if they're applied on a few (2, in our case) open board positions per move in a MC-simulated game. So its just slightly non-random.
Anyway I hope that made sense. It's a weird problem to think about at first, but it's actually really fun to play around with. I recommend checking out Drake's implementation.
Yeah that might have been an exaggeration. But as I recall, once you have the MT seeded and fired up, getting a new number requires little more than a bit shift. Anyway it was substantially faster than simple alternatives I tried (including other PRNGs). One early theory I had was that using a faster PRNG than MT could lead to a better result even if it wasn't as evenly distributed. I think MT proved to be the best solution though.
They were probably generating tons of them at once so the only operation was grabbing a number out of memory. But you're right that it takes a whole lot of operations to generate the numbers in the first place.
To further illustrate the point, I think I may have actually tried a no-capture heuristic at one point. I wouldn't be too surprised if it also worked better than random.
Correction: "most non-random tactics" wouldn't make it better, as someone else pointed out. Finding ones that do is a tough question, beyond just runtime; keep in mind that the MC player uses that strategy against itself over entire random games, which can lead to really un-humanlike games. Completely random games at least ensure that you're hitting all possibilities (and when that good move is found, that path is explored more).
Can I ask you a question? Is the random move approach simply brute force; or does it turn out that, weirdly, moves which have good outcomes from the approach also turn out to have good outcomes in general, even though the actual sequence of moves weren't actually considered?
i.e. I guess I'm just asking if it really is a heuristic.
The Monte Carlo simulator is a very simple and weak heuristic by itself. In UCT go players it's inseparable from the tree-traversal/expansion algorithm, which uses data propagated up the tree from the MC-tested leaf nodes to decide which move sequence to explore next.
When it's expanding the search tree, every node expansion is arrived at "intelligently". It starts at the root node, and at each branching, asking itself: which following move is the best mixture of good (based on MC games I've played on the leaf nodes of its children) and unknown (ie I haven't explored that branch very much yet). It does that at every node until the leaf, where it expands a node with the MC-simulator (and then propagates the result up the tree). So unlike some other tree-search algorithms, it isn't cobbled by having to expand all paths to a certain depth before exploring a particular sequence deeper. The result literally looks like a tree: some branches are really tall, maybe 18 moves deep, while others are really short. Even though the "heuristic" is MC, I don't like calling it brute force, because it uses the results of those MC games so well.
Thanks, I think I see the answer now. Let me summarize to check:
There's a conceptual tree of all possible games, with a move at each node. You run some random games to completion, giving sequences of moves from the root (starting position) to a leaf (game over). Then, from the current move, you explore the best of the "next" moves more thoroughly - by doing the same thing again, starting from each of those "next" moves. The benefit of running random games to completion is it gives (some) long-range vision.
I wondered whether those (relatively) sparse "tracer bullets" actually find whole "good" sections of the tree - or whether it's just the specific leaf nodes it found that were good.
The answer is that, since it's being used to guide exploration, it must be good - there'd be no point using it as a heuristic otherwise. So, the existence of a win in a region of the tree is a partial predictor of other wins in that region.
I read up on the rules of Go after asking, and it makes sense: once you own some territory, you'll tend to keep it - so that if a sequence of moves leads to a win, all the prefixes of those moves will also be strong positions along the way.
So it's not brute force, but a predictive heuristic. Still kinda surprising that it works well. :-)
Completely agree about the frustration you can have working on this algorithm. I can't count the number of seemingly very good ideas we had that just hurt his performance.
The worst were ideas that made it better against his former self but worse against human players (it happened !)
That's funny, I had that experience too. At first I tested my implementations against the plain UCT-player, and was puzzled why my "improvements" didn't do as well against GnuGo.
You remind me of one of the early Google engineers who complained that they'd put together some really clever and insightful algorithms, but that brute force with masses of data soundly beat them. And that being surprised can be the beginning of discovery.
The same thing applies in fuzzing (throwing bad data at applications in search of vulnerabilities). The dumbest fuzzers -- consisting of random bit flips, duplicating blocks of data, etc -- will often find several orders of magnitude more bugs than complicated, purpose-built fuzzers. Embracing stupidity is one of the hardest things about becoming a good fuzzer developer, oddly enough.
GP here. Whoa, just had this insight: maybe part of the maturation of a profession is developing practices that work well despite these counterintuitive effects (even if they aren't recognized/formalized as such - see bevan's follow-up too http://news.ycombinator.com/item?id=3860950). These are really hard to derive/infer because they depend on unknown probabilities (because you have no analytical solution to the "game"), and you just do a lot of experiments and develop a rule of thumb. They are informal heuristics. I saw similar approaches in machine learning in my grad school.
Change makes it harder for programming, not just that you have to keep relearning, but new technologies change the tradeoffs (e.g. plan vs. hack; performance vs. dev time). The rule of thumb shifts over time. Perhaps we'll develop a meta-rule of thumb...
Go is a vivid metaphor. Other disciplines have traditions/conventions/practices that are taken on faith, and aren't even noticed, let alone questioned or "proven" - because they can't be, without a sufficiently complete formal model of their world. Personally, I've found agile-like pragmatic conventions work well: focus on specific concrete things that are clearly needed (YAGNI); start before you are ready, it will become clear as you go along.
Just out of curiosity, which school did you go to that provides a 3-semester long AI course? Seems like you'd be able to go into quite a bit of depth there.
I went to Middlebury. There was an AI class, but the 3 semesters were research with my late professor Tim Huang.
The department at Middlebury was small, but I doubt that I could have received a better CS education elsewhere. The opportunity to do one-on-one interesting research so early in my college career was invaluable. All three semesters were Go-related, but I didn't focus on the same thing during each.
The first summer was spent scoping the lay of the land. Reading the papers, implementing naive "improvements", and a lot of testing. Our best result was achieved by implementing someone else's we'd read in a paper. It was a humbling experience.
The second semester started with finding ways to visualize the algorithm in action. Since the improvements were so nonintuitive I felt I had to find a better way to understand how it worked and what my changes were doing. Easier said than done.
The third semester was more varied. I did more testing and playing with parameters. I also ported a version to the iPhone (we wanted to be the first and best on there- this was before ios 2 was released). Then my professor and I started a company that was initially related to Go and iPhones but later pivoted substantially several times.
I'm very thankful for the opportunities I had at Middlebury and the chance to work with Tim Huang.
Sounds like a great education. I've been meaning to explore the research opportunities at my school; hopefully they will work out as well as yours did. Thanks for sharing, and sorry to hear about your loss.
It's a tough problem to get your head around. Sure, the premise makes sense- that the Monte Carlo heuristic plus the UCT tree-exploration algorithm balances promising nodes with uncertain ones to intelligently expand the tree. But tweaking that algorithm usually hurt performance more than helped, and the tweaks that did improve gameplay were non-intuitive.
One semester our goal was to improve the random-playout generator to play slightly more human-like games. It turns out that almost any play enhancements made in the Monte Carlo simulator crippled performance (for instance: instead of playing randomly, search the board for an opportunity to capture the opponent). The best result we had was achieved by applying the capture-if-possible heuristic on only two positions per random move (ex: at the beginning of the MC-simulator's move, pick two positions randomly- if moving in either one captures the opponent, do it). Applying that heuristic on any more than 2 positions per move made the random games play out more slowly than it was worth. Needless to say, a lot of time was spent testing our tweaks (which itself was quite time consuming).
It was frustrating to see our initial seemingly sensible tweaks cripple performance, and it definitely took some time to calibrate ourselves to the problem at hand. But I probably learned more about programming while working on Go than in any other 3-semester span.
Peter Drake's Orego player is a great starting point for exploring the UCT algorithm (Peter Drake wrote a popular Java textbook you may have used in undergrad). I highly encourage people interested in AI to check it out:
https://sites.google.com/site/drpeterdrake/research/orego