No-Red Mahjong

Usage

import mahjax

env = mahjax.make("no_red_mahjong", observe_type="dict")

or load the environment class directly:

from mahjax.no_red_mahjong.env import NoRedMahjong

env = NoRedMahjong(round_mode="half", observe_type="dict")

Description

no_red_mahjong is a fast 4-player riichi mahjong environment without red fives. It is the lightweight environment used by the current offline RL and PPO examples.

Compared with red_mahjong, this environment intentionally omits some rules in order to keep the implementation simpler and faster.

Rules

The environment is a simplified Japanese riichi mahjong variant without red fives.

No red fives
No special abortive draws (特殊流局)
No pao
No double ron

This environment exists primarily for speed and for simpler RL experiments. It is covered by hand-written tests, and we do not expect major rule-breaking bugs, but some corner cases may still remain. We expect this to improve over time.

For general riichi mahjong rules, see the European Mahjong Association rulebook.

State

no_red_mahjong uses the same nested state layout as red_mahjong:

top-level RL handles (current_player, terminated, rewards, …)
state.players — per-player arrays (PlayerStateArrays)
state.round_state — round-level arrays (RoundState)

The full field list and round-transition style (auto / dummy_share) are documented once in API. no_red_mahjong uses exactly the common field set there; it does not add any extra fields on top.

Specs

Name	Value
Version	`beta`
Number of players	`4`
Number of actions	`79`
Observation types	`dict`, `2D`
Reward shape	`(4,)`
Reward semantics	score deltas in hundreds of points

Action

The action space (79 actions):

Range	Meaning
`0-33`	Discard a tile type
`34-67`	Closed kan / added kan
`68`	`TSUMOGIRI`
`69`	`RIICHI`
`70`	`TSUMO`
`71`	`RON`
`72`	`PON`
`73`	`OPEN_KAN`
`74-76`	`CHI_L`, `CHI_M`, `CHI_R`
`77`	`PASS`
`78`	`DUMMY` (only legal under `next_round_style="dummy_share"`)

Dict Observation

The current training examples use the dict observation. It is the most stable observation format in this repository right now, but it may still change in future releases.

The returned dictionary contains:

Key	Shape	Meaning
`hand`	`(14,)`	Current player's hand as sorted tile types in `[0, 33]`; unused slots are `-1`.
`last_draw`	`()`	Last drawn tile in `[0, 33]`; `-1` means there is no drawn tile to expose.
`action_history`	`(3, 200)`	Action history.
`shanten_count`	`()`	Current player's shanten number.
`furiten`	`()`	Whether the current player is in furiten.
`scores`	`(4,)`	Scores ordered from the current player's perspective.
`round`	`()`	Round index used by the environment.
`honba`	`()`	Honba count.
`kyotaku`	`()`	Riichi stick count.
`prevalent_wind`	`()`	Current round wind information used by the environment.
`seat_wind`	`()`	Current player's seat wind information used by the environment.
`dora_indicators`	`(5,)`	Dora indicator tile types in `[0, 33]`; missing entries are `-1`.

Action History

action_history is stored as:

Row 0: acting player index, converted to the current player's relative view
Row 1: action payload
Row 2: tsumogiri flag

The semantics match red_mahjong:

For discards, row 1 stores the actual discarded tile
For non-discard actions, row 1 stores the raw action id
Row 2 is 1 for tsumogiri, 0 for a non-tsumogiri discard, and -1 for non-discard actions

For no_red_mahjong, discard tiles are in [0, 33] and raw action ids are in [0, 78].

2D Observation

observe_type="2D" exists, but its design is not yet fixed. You can use it for experiments, but we do not recommend treating it as a stable interface yet.

Rewards

Rewards are 4-player score deltas, represented in hundreds of points.

Examples:

winning a hand gives positive reward to the winner and negative reward to the payer(s)
exhaustive draw can produce tenpai / noten payments
illegal actions end the game immediately with the standard illegal-action penalty

For how to consume these rewards in turn-based MARL training (per-player reward accumulator + GAE), see the API → Using auto rewards in RL section.

Termination

round_mode="single" terminates after the first round ends.
round_mode="east" runs East-only progression with round_limit=4.
round_mode="half" runs East-South progression with round_limit=8.

In multi-round modes, the next-round transition behavior is controlled by next_round_style (see API).