About
Hi! I’m speed.
![]()
I am a second year graduate student at Kyoto University majoring in Computer Science.
I like to learn by building things.
Hobby
I enjoy reading and traveling.
Interests
I enjoy working on multimodal model development and conducting evaluations of foundation models. Above all, I love looking at data👀.
Education
Newer ↑
- Master’s degree, Course of Communications and Computer Engineering, Graduate School of Informatics, Kyoto University, Japan. April 2024 — March 2026(Expected)
- Bachelor’s degree, Software Science Course, Department of Information and Computer Sciences, School of Engineering Science, Osaka University, Japan. GPA: 3.57/4.00. April 2020 — March 2024
Internships / Employments
Newer ↑
| Date | Organization | Position | Description |
|---|---|---|---|
| 2024/12 ~ present | Sakana AI | Student Intern | Research on evaluating real-world task performance of large language models, with a focus on the financial domain. |
| 2024/4 ~ present | Research and Development Center for Large Language Models, National Institute of Informatics | Research Assistant | Research on training and evaluation of Japanese multimodal models. |
| 2024/2 ~ 2024/4 | LLM-jp, National Institute of Informatics | Student Intern | Research on memorization in large language models. |
| 2022/3 ~ 2024/4 | Center for Quantum Information and Quantum Biology at Osaka University | Software development as technical assistant | Research on quantum computation and quantum chemistry. |
Research
International Conference
- Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation. NAACL Student Research Workshop 2025, April 2025, Issa Sugiura, Shuhei Kurita, Yusuke Oda, Daisuke Kawahara, Naoaki Okazaki.
[Paper] | [Code] | [Model] | [Dataset] - Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model. NAACL 2025 Demo Track, April 2025. Keito Sasagawa, Koki Maeda, Issa Sugiura, Shuhei Kurita, Naoaki Okazaki, Daisuke Kawahara.
[Paper] - A Comprehensive Analysis of Memorization in Large Language Models. The 17th International Natural Language Generation Conference (INLG 2024), September 2024. Hirokazu Kiyomaru*, Issa Sugiura*, Daisuke Kawahara and Sadao Kurohashi.
[Paper] | [Code]
Journal
- Removing Mislabeled Data from Trained Models via Machine Unlearning. IEICE Transactions on Information and Systems, August 2025. Issa Sugiura, Shingo Okamura, Naoto Yanai.
[Paper]
Preprints
- WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models. Arxiv, October 2025. Issa Sugiura, Shuhei Kurita, Yusuke Oda, Daisuke Kawahara, Yasuo Okabe, Naoaki Okazaki.
[Paper] | [Project Page] - Llama-Mimi: Speech Language Models with Interleaved Semantic and Acoustic Tokens. Arxiv, September 2025. Issa Sugiura, Shuhei Kurita, Yusuke Oda, Ryuichiro Higashinaka.
[Paper] | [Model] | [Code] | [Demo] - EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements. Arxiv, Jun 2025. Issa Sugiura, Takashi Ishida, Taro Makino, Chieko Tazuke, Takanori Nakagawa, Kosuke Nakago, David Ha.
[Paper] | [Dataset] | [Code] - llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length. Arxiv, April 2025. Issa Sugiura, Kouta Nakayama, Yusuke Oda.
[Paper] | [Model] | [Code] - LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs, July 2024, LLM-jp: Akiko Aizawa, Eiji Aramaki, …, Issa Sugiura, …, Koichiro Yoshino (80 authors, Authors are listed in alphabetical order).
[Paper]
Domestic Conference
-
Common Crawlを用いた大規模音声音響データセットの構築. 日本音響学会2025年秋季研究発表会, 2025年9月. 浅井航平, 杉浦一瑳, 中田亘, 栗田修平, 高道慎之介, 小川哲司, 東中竜一郎.
[Code] -
オープンLLMによる翻訳を活用した日本語CLIPの開発. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 杉浦一瑳, 栗田修平, 小田悠介, 河原大輔, 岡崎直観.
[Paper] | [Model] | [Dataset] | [Code] -
ロススパイクの影響分析. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 杉浦一瑳, 栗田修平, 小田 悠介.
[Paper] -
llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 前田航希*, 杉浦一瑳*, 栗田修平, 小田悠介, 岡崎直観. (若手奨励賞)
[Paper] | [Code] -
LLM-jp-3 VILA: 日本語マルチモーダルデータセット及び強力な日本語マルチモーダルモデルの構築. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 笹川慶人, 前田航希, 杉浦一瑳, 栗田修平, 岡崎直観, 河原大輔. (委員特別賞)
[Paper] -
大規模言語モデルの事前学習ツールjax-llmの開発とinput-methodへの応用. 第19回言語処理若手シンポジウム(YANS2024), 2024年9月. 杉浦一瑳.
[Draft Paper] | [Poster] | [Code (jax-llm)] | [Code (input-method)] -
ミスラベルデータの忘却による学習済みモデルの汎化性能の向上手法の提案. 第16回データ工学と情報マネジメントに関するフォーラム (DEIM2024), 2024年2月. 杉浦一瑳, 岡村真吾, 山下恭佑, 矢内直人. (LINEヤフースポンサー賞)
[Paper] | [Slide] | [Code]
Personal Projects
- pqcat
- A fast command-line tool for inspecting Parquet files.
- jaccard
- A simple web app to compute Jaccard similarity between two texts by tokenizing them using OpenAI’s tiktoken.
Certifications/ Qualifications
Newer ↑
- TOEIC L&R: L: 420 + R: 425 = 845 (October 19, 2025)
- Security Camp organized by IPA(Information-technology Promotion Agency), Web Security Course. August 2022