publications

2026

  1. arXiv
    red_mahjong_random_en.gif
    Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
    Soichiro Nishimori, Shinri Okano , Keigo Habara , and 3 more authors
    arXiv preprint arXiv:2605.20577, 2026
  2. ICML
    RePPO.png
    Emergence of Exploration in Policy Gradient Reinforcement Learning via Retrying
    Soichiro Nishimori, Paavo Parmas , Sotetsu Koyamada , and 4 more authors
    International Conference on Machine Learning, 2026
  3. TMLR
    SymPO.png
    On Symmetric Losses for Policy Optimization with Noisy Preferences
    Soichiro Nishimori, Yu-Jie Zhang , Thanawat Lodkaew , and 1 more author
    Transactions on Machine Learning Research, 2026
  4. arXiv
    Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
    Shinnosuke Ono , Johannes Ackermann , Soichiro Nishimori, and 2 more authors
    arXiv preprint arXiv:2604.02986, 2026
  5. arXiv
    Finite-Time Regret Analysis of Retry-Aware Bandits
    Bingkui Tong , Junpei Komiyama , Soichiro Nishimori, and 1 more author
    arXiv preprint arXiv:2605.20854, 2026
  6. arXiv
    MaxPO.png
    On Advantage Estimates for Max@K Policy Gradients
    Shota Takashiro* , Soichiro Nishimori*, Paavo Parmas* , and 6 more authors
    arXiv preprint arXiv:2606.06080, 2026
    * Equal contribution
  7. arXiv
    ordergrad.png
    OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation
    Paavo Parmas , Yongmin Kim , Kohsei Matsutani , and 5 more authors
    arXiv preprint arXiv:2606.06096, 2026
  8. arXiv
    ReMAC.png
    Retry Policy Gradients in Continuous Action Spaces
    Soichiro Nishimori, and Paavo Parmas
    arXiv preprint arXiv:2606.05888, 2026

2025

  1. RLC
    RRA.png
    Recursive Reward Aggregation
    Yuting Tang , Yivan Zhang , Johannes Ackermann , and 4 more authors
    Reinforcement Learning Conference, 2025
  2. RLC
    PUORL.png
    Offline Reinforcement Learning with Domain-Unlabeled Data
    Soichiro Nishimori, Xin-Qiang Cai , Johannes Ackermann , and 1 more author
    Reinforcement Learning Conference, 2025

2024

  1. arXiv
    A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees
    Toshinori Kitamura , Tadashi Kozuno , Masahiro Kato , and 6 more authors
    Reinforcement Learning Conference Workshop, 2024
  2. RLC
    A Batch Sequential Halving Algorithm without Performance Degradation
    Sotetsu Koyamada , Soichiro Nishimori, and Shin Ishii
    Reinforcement Learning Conference, 2024
  3. github
    JAX-CORL: A single-file repository for offline reinforcement learning
    Soichiro Nishimori
    github, 2024

2023

  1. NeurIPS
    go-19x19_light.gif
    Pgx: Hardware-accelerated parallel game simulators for reinforcement learning
    Sotetsu Koyamada , Shinri Okano , Soichiro Nishimori, and 4 more authors
    Advances in Neural Information Processing Systems, 2023
  2. arXiv
    End-to-End Policy Gradient Method for POMDPs and Explainable Agents
    Soichiro Nishimori, Sotetsu Koyamada , and Shin Ishii
    arXiv preprint, 2023

2022

  1. IEEE
    mjx.png
    Mjx: A framework for Mahjong AI research
    Sotetsu Koyamada , Keigo Habara , Nao Goto , and 3 more authors
    In , 2022