Red Mahjong
Usage
or load the environment class directly:
from mahjax.red_mahjong.env import RedMahjong
from mahjax.red_mahjong.state import GameConfig
env = RedMahjong(
round_mode="half",
observe_type="dict",
game_config=GameConfig(),
)
Description
red_mahjong is the full red-five riichi mahjong environment in MahJax.
It is the environment intended to track modern online riichi mahjong more closely.
This implementation is designed to follow Tenhou as closely as practical, and we validate it against downloaded Tenhou game logs. We do not describe the log-test details here, but this is the main correctness target for the environment.
Rules
The environment implements 4-player riichi mahjong with red fives. By default, the rule configuration enables:
- red fives
- double ron
- special abortive draws
- pao
The detailed external rule reference is:
The rule behavior is controlled through GameConfig, so advanced users can change settings such as red-fives usage, double ron, pao, and special abortive draws.
State Representation
red_mahjong uses the newer nested state layout.
Mahjong-specific fields are grouped into:
state.playersstate.round_state
This differs from no_red_mahjong, which still uses the older flat underscore-prefixed fields.
The two environments therefore do not currently share the same internal state representation, though we may align them in the future.
Specs
| Name | Value |
|---|---|
| Version | beta |
| Number of players | 4 |
| Number of actions | 87 |
| Observation types | dict, 2D |
| Dict action history shape | (3, 200) |
| Reward shape | (4,) |
| Reward semantics | score deltas in hundreds of points |
Dict Observation
The current training-oriented observation is dict.
It is the format we recommend if you need a structured observation today, though we still reserve the right to revise it in future releases.
The returned dictionary contains:
| Key | Shape | Meaning |
|---|---|---|
hand |
(14,) |
Current player's hand as sorted 34-type tiles; unused slots are -1. |
last_draw |
() |
Last drawn tile in [0, 36]; -1 means there is no drawn tile to expose. |
action_history |
(3, 200) |
Relative-view action history. |
shanten_count |
() |
Current player's shanten number. |
furiten |
() |
Whether the current player is in furiten. |
scores |
(4,) |
Scores ordered from the current player's perspective. |
round |
() |
Round index. |
honba |
() |
Honba count. |
kyotaku |
() |
Riichi stick count. |
prevalent_wind |
() |
Prevailing wind. |
seat_wind |
() |
Current player's seat wind. |
dora_indicators |
(4,) |
Dora indicator tile types in [0, 33]; missing entries are -1. |
Action History
action_history is stored as:
- Row
0: acting player index, converted to the current player's relative view - Row
1: action payload - Row
2: tsumogiri flag
The semantics are:
- For discards, row
1stores the actual discarded tile - For non-discard actions, row
1stores the raw action id - Row
2is1for tsumogiri,0for a non-tsumogiri discard, and-1for non-discard actions
For red_mahjong, discard tiles are in [0, 36] because red fives have dedicated tile ids in the action space, while raw action ids are in [0, 86].
2D Observation
observe_type="2D" is available, but we do not consider its design finalized yet.
If you need a representation whose semantics are less likely to move, prefer dict.
Action
The action space is:
| Range | Meaning |
|---|---|
0-36 |
Discard a tile, including red-five tile ids |
37-70 |
Closed kan / added kan |
71 |
TSUMOGIRI |
72 |
RIICHI |
73 |
TSUMO |
74 |
RON |
75 |
PON |
76 |
PON_RED |
77 |
OPEN_KAN |
78-83 |
Chi variants including red-five-aware actions |
84 |
PASS |
85 |
KYUUSHU |
86 |
DUMMY |
Rewards
Rewards are 4-player score deltas, represented in hundreds of points.
This includes:
- ron and tsumo score transfers
- honba and kyotaku handling
- exhaustive draw payments
- pao when enabled in
GameConfig - illegal-action termination penalties
Termination
round_mode="single"terminates after the first round ends.round_mode="east"runs East-only progression withround_limit=4.round_mode="half"runs East-South progression withround_limit=8.