Soichiro Nishimori

CV / Google Scholar / GitHub / Twitter (X)

portrait.png

About Me

I am Soichiro Nishimori, a PhD student at Sugiyama-Yokoya-Ishida lab supervised by Prof. Sugiyama. Also, I am working as a research part-timer in Imperfect-information Learning Team at RIKEN AIP.

I previously interned at OMRON SINIC X Corporation supervised by Dr. Yoshitaka Ushiku and Dr. Atsushi Hashimoto and Matsuo-Iwasawa lab at The University of Tokyo supervised by Prof. Paavo Parmas.

Mahjong 🀄️ and Tennis 🎾 lover.

Research Interests

I am interested in scalable reinforcement learning (RL) from three perspectives:

  • Scale to Data: The amount and quality of data can be traded off. I have worked on offline RL with weak supervision, including domain-unlabeled data RLC2025 and noisy preferences TMLR2026.
  • Scale to Computation: I am especially interested in GPU-accelerated RL frameworks, both for simulators, including board games ♟️ Pgx, NeurIPS2023 (Co-authored) and Riichi Mahjong 🀄️ Mahjax, arXiv2026, and for algorithm codebases, such as offline RL JAX-CORL.
  • Scale to Trial: Even if an agent can gain experience through massive data or computation, it is not truly scalable unless it can learn efficiently. From this perspective, I have worked on exploration in RL. I focus on a novel exploration objective called ReMax, based on the Retry idea (ICML2026, arXiv2026).

news

Jun 05, 2026 Preprints on Regret analysis of ReMax, ReMax in continous actions, Baseline for Max@K are out!
May 09, 2026 Our ReMax paper is accepted to ICML 2026!
Apr 27, 2026 1 paper is accepted to TMLR 2026.
Dec 26, 2025 I releaced JAX-based Mahjong simulator, Mahjax 🀄️
May 09, 2025 2 papers are accepted to RLC 2025.

latest posts

selected publications

  1. RLC
    PUORL.png
    Offline Reinforcement Learning with Domain-Unlabeled Data
    Soichiro Nishimori, Xin-Qiang Cai , Johannes Ackermann , and 1 more author
    Reinforcement Learning Conference, 2025
  2. arXiv
    red_mahjong_random_en.gif
    Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
    Soichiro Nishimori, Shinri Okano , Keigo Habara , and 3 more authors
    arXiv preprint arXiv:2605.20577, 2026
  3. ICML
    RePPO.png
    Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
    Soichiro Nishimori, Paavo Parmas , Sotetsu Koyamada , and 4 more authors
    International Conference on Machine Learning, 2026
  4. TMLR
    SymPO.png
    On Symmetric Losses for Policy Optimization with Noisy Preferences
    Soichiro Nishimori, Yu-Jie Zhang , Thanawat Lodkaew , and 1 more author
    Transactions on Machine Learning Research, 2026