Soichiro Nishimori

About Me

I am Soichiro Nishimori, a PhD student at Sugiyama-Yokoya-Ishida lab supervised by Prof. Sugiyama. Also, I am working as a research part-timer in Imperfect-information Learning Team at RIKEN AIP.

Mahjong 🀄️ and Tennis 🎾 lover.

I am interested in scalable reinforcement learning (RL) from three perspectives:

Scale to Data: The amount and quality of data can be traded off. I have worked on offline RL with weak supervision, including domain-unlabeled data RLC2025 and noisy preferences TMLR2026.
Scale to Computation: I am especially interested in GPU-accelerated RL frameworks, both for simulators, including board games ♟️ Pgx, NeurIPS2023 (Co-authored) and Riichi Mahjong 🀄️ Mahjax, arXiv2026, and for algorithm codebases, such as offline RL JAX-CORL.
Scale to Trial: Even if an agent can gain experience through massive data or computation, it is not truly scalable unless it can learn efficiently. From this perspective, I have worked on exploration in RL. I focus on a novel exploration objective called ReMax, based on the Retry idea (ICML2026, arXiv2026).

Jun 05, 2026	Preprints on Regret analysis of ReMax, ReMax in continous actions, Baseline for Max@K are out!
May 09, 2026	Our ReMax paper is accepted to ICML 2026!
Apr 27, 2026	1 paper is accepted to TMLR 2026.
Dec 26, 2025	I releaced JAX-based Mahjong simulator, Mahjax 🀄️
May 09, 2025	2 papers are accepted to RLC 2025.

Apr 09, 2024	個人ページ作りました．