Open
Conversation
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The MCTSNodeState class
Its legal_actions() method currently returns the full set of player moves (CALL, RAISE, FOLD) unconditionally, but we must refine this so that it alternates between “nature” actions (blinds, dealing the flop/turn/river) and valid player options, respecting bet sizes and turn order.
Should probably implement these three functions or more.
next_state(action), which chooses state in response to a chosen action
is_terminal(), which recognizes when play has ended—either by all-in showdown or someone folding
rollout(), which simulates a random continuation of the hand to produce reward for backpropagation.
Each MCTSNode wraps one of these states and tracks statistics for each action: a Q-value estimate and a visit count. When expanding, a node picks one untried legal action, calls the state’s transition function, and adds the resulting child to its children mapping. The best_child(c_param) method applies the classic UCT formula, balancing exploration and exploitation by combining each action’s Q-value with an exploration bonus proportional to the square root of the log of parent visits over action visits. During backpropagation, we update Q-values using a temporal-difference style rule—incrementing by α times the difference between observed reward plus discounted future value and the old Q-estimate—and propagate that update up to the root, incrementing visit counts along the way.
At the top level, the MCTSTree class orchestrates repeated search iterations. It begins by seeding the root node with the forced nature actions for the small and big blinds. Each iteration proceeds by selecting the most promising leaf according to UCT, expanding it if it is not already terminal, running a random rollout to a terminal game outcome, and then backpropagating the resulting reward. After a suitable number of iterations, calling best_action() on the root returns the player decision that was explored most often, which in practice corresponds to the statistically strongest move.
To complete this agent, our next steps are clear. We must implement the placeholder methods in MCTSNodeState so that the tree can accurately reflect poker dynamics: dealing cards, managing bet rounds, and evaluating hand strength at showdown. We will also expose the exploration constant (c_param), learning rate (α), and discount factor (γ) as configurable hyperparameters to facilitate experimentation. Finally, we need a comprehensive suite of unit and integration tests to verify that the tree grows correctly, that simulated rollouts yield sensible rewards, and that the chosen actions align with expected strategies against baseline opponents. Once these elements are in place, the MCTS agent will provide our poker bot with a powerful, statistically grounded decision-making capability.