No-Red Mahjong
Usage
or load the environment class directly:
from mahjax.no_red_mahjong.env import NoRedMahjong
env = NoRedMahjong(round_mode="half", observe_type="dict")
Description
no_red_mahjong is a fast 4-player riichi mahjong environment without red fives. It is the lightweight environment used by the current offline RL and PPO examples.
Compared with red_mahjong, this environment intentionally omits some rules in order to keep the implementation simpler and faster.
Rules
The environment is a simplified Japanese riichi mahjong variant without red fives.
- No red fives
- No special abortive draws (
特殊流局) - No pao
- No double ron
This environment exists primarily for speed and for simpler RL experiments. It is covered by hand-written tests, and we do not expect major rule-breaking bugs, but some corner cases may still remain. We expect this to improve over time.
For general riichi mahjong rules, see the European Mahjong Association rulebook.
State
no_red_mahjong uses the same nested state layout as red_mahjong:
- top-level RL handles (
current_player,terminated,rewards, …) state.players— per-player arrays (PlayerStateArrays)state.round_state— round-level arrays (RoundState)
The full field list and round-transition style (auto / dummy_share) are documented once in API. no_red_mahjong uses exactly the common field set there; it does not add any extra fields on top.
Specs
| Name | Value |
|---|---|
| Version | beta |
| Number of players | 4 |
| Number of actions | 79 |
| Observation types | dict, 2D |
| Reward shape | (4,) |
| Reward semantics | score deltas in hundreds of points |
Action
The action space (79 actions):
| Range | Meaning |
|---|---|
0-33 |
Discard a tile type |
34-67 |
Closed kan / added kan |
68 |
TSUMOGIRI |
69 |
RIICHI |
70 |
TSUMO |
71 |
RON |
72 |
PON |
73 |
OPEN_KAN |
74-76 |
CHI_L, CHI_M, CHI_R |
77 |
PASS |
78 |
DUMMY (only legal under next_round_style="dummy_share") |
Dict Observation
The current training examples use the dict observation. It is the most stable observation format in this repository right now, but it may still change in future releases.
The returned dictionary contains:
| Key | Shape | Meaning |
|---|---|---|
hand |
(14,) |
Current player's hand as sorted tile types in [0, 33]; unused slots are -1. |
last_draw |
() |
Last drawn tile in [0, 33]; -1 means there is no drawn tile to expose. |
action_history |
(3, 200) |
Action history. |
shanten_count |
() |
Current player's shanten number. |
furiten |
() |
Whether the current player is in furiten. |
scores |
(4,) |
Scores ordered from the current player's perspective. |
round |
() |
Round index used by the environment. |
honba |
() |
Honba count. |
kyotaku |
() |
Riichi stick count. |
prevalent_wind |
() |
Current round wind information used by the environment. |
seat_wind |
() |
Current player's seat wind information used by the environment. |
dora_indicators |
(5,) |
Dora indicator tile types in [0, 33]; missing entries are -1. |
Action History
action_history is stored as:
- Row
0: acting player index, converted to the current player's relative view - Row
1: action payload - Row
2: tsumogiri flag
The semantics match red_mahjong:
- For discards, row
1stores the actual discarded tile - For non-discard actions, row
1stores the raw action id - Row
2is1for tsumogiri,0for a non-tsumogiri discard, and-1for non-discard actions
For no_red_mahjong, discard tiles are in [0, 33] and raw action ids are in [0, 78].
2D Observation
observe_type="2D" exists, but its design is not yet fixed. You can use it for experiments, but we do not recommend treating it as a stable interface yet.
Rewards
Rewards are 4-player score deltas, represented in hundreds of points.
Examples:
- winning a hand gives positive reward to the winner and negative reward to the payer(s)
- exhaustive draw can produce tenpai / noten payments
- illegal actions end the game immediately with the standard illegal-action penalty
For how to consume these rewards in turn-based MARL training (per-player reward accumulator + GAE), see the API → Using auto rewards in RL section.
Termination
round_mode="single"terminates after the first round ends.round_mode="east"runs East-only progression withround_limit=4.round_mode="half"runs East-South progression withround_limit=8.
In multi-round modes, the next-round transition behavior is controlled by next_round_style (see API).