Soichiro Nishimori
CV / Google Scholar / GitHub / Twitter (X)
About Me
I am Soichiro Nishimori, a PhD student at Sugiyama-Yokoya-Ishida lab supervised by Prof. Sugiyama. Also, I am working as a research part-timer in Imperfect-information Learning Team at RIKEN AIP.
I previously interned at OMRON SINIC X Corporation supervised by Dr. Yoshitaka Ushiku and Dr. Atsushi Hashimoto and Matsuo-Iwasawa lab at The University of Tokyo supervised by Prof. Paavo Parmas.
Mahjong 🀄️ and Tennis 🎾 lover.
Research Interests
I am interested in scalable reinforcement learning (RL) from three perspectives:
- Scale to Data: The amount and quality of data can be traded off. I have worked on offline RL with weak supervision, including domain-unlabeled data RLC2025 and noisy preferences TMLR2026.
- Scale to Computation: I am especially interested in GPU-accelerated RL frameworks, both for simulators, including board games ♟️ Pgx, NeurIPS2023 (Co-authored) and Riichi Mahjong 🀄️ Mahjax, arXiv2026, and for algorithm codebases, such as offline RL JAX-CORL.
- Scale to Trial: Even if an agent can gain experience through massive data or computation, it is not truly scalable unless it can learn efficiently. From this perspective, I have worked on exploration in RL. I focus on a novel exploration objective called ReMax, based on the Retry idea (ICML2026, arXiv2026).
news
| Jun 05, 2026 | Preprints on Regret analysis of ReMax, ReMax in continous actions, Baseline for Max@K are out! |
|---|---|
| May 09, 2026 | Our ReMax paper is accepted to ICML 2026! |
| Apr 27, 2026 | 1 paper is accepted to TMLR 2026. |
| Dec 26, 2025 | I releaced JAX-based Mahjong simulator, Mahjax 🀄️ |
| May 09, 2025 | 2 papers are accepted to RLC 2025. |
latest posts
| Apr 09, 2024 | 個人ページ作りました. |
|---|