Hi! I’m speed.

I am a second year graduate student at Kyoto University majoring in Computer Science.

I like to learn by building things.

Hobby

Newer ↑

Master’s degree, Course of Communications and Computer Engineering, Graduate School of Informatics, Kyoto University, Japan. April 2024 – March 2026(Expected)
Bachelor’s degree, Software Science Course, Department of Information and Computer Sciences, School of Engineering Science, Osaka University, Japan. GPA: 3.57/4.00. April 2020 – March 2024

Newer ↑

Date	Organization	Position	Description
2024/12 ~ present	Sakana AI	Student Intern	🐟🐠🐡
2024/4 ~ present	Research and Development Center for Large Language Models, National Institute of Informatics	Research Assistant	I am involved in research on the development and evaluation of multimodal models.
2024/2 ~ 2024/4	LLM-jp, National Institute of Informatics	Student Intern	I was engaged in research on memorization in LLMs.
2022/3 ~ 2024/4	Center for Quantum Information and Quantum Biology at Osaka University	Software development as technical assistant	I’m developing numerical software for quantum computation and quantum chemistry

Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation. NAACL Student Research Workshop 2025, April 2025, Issa Sugiura, Shuhei Kurita, Yusuke Oda, Daisuke Kawahara, Naoaki Okazaki.
[Paper]
Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model. NAACL 2025 Demo Track, April 2025. Keito Sasagawa, Koki Maeda, Issa Sugiura, Shuhei Kurita, Naoaki Okazaki, Daisuke Kawahara.
[Paper to appear]
A Comprehensive Analysis of Memorization in Large Language Models. The 17th International Natural Language Generation Conference (INLG 2024), September 2024. Hirokazu Kiyomaru*, Issa Sugiura*, Daisuke Kawahara and Sadao Kurohashi.
[Paper] | [Code]

Removing Mislabeled Data from Trained Models via Machine Unlearning. IEICE Transactions on Information and Systems, August 2025. Issa Sugiura, Shingo Okamura, Naoto Yanai.
[Paper]

EDINET-Bench: Evaluating LLMs on Complex Financial Tasks using Japanese Financial Statements. Arxiv, 2025. Issa Sugiura, Takashi Ishida, Taro Makino, Chieko Tazuke, Takanori Nakagawa, Kosuke Nakago, David Ha.
[Paper] | [Dataset] | [Code]
llm-jp-modernbert: A ModernBERT Model Trained on a Large-Scale Japanese Corpus with Long Context Length. Arxiv, 2025. Issa Sugiura, Kouta Nakayama, Yusuke Oda.
[Paper] | [Model] | [Code]
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs, 2024, LLM-jp: Akiko Aizawa, Eiji Aramaki, …, Issa Sugiura, …, Koichiro Yoshino (80 authors, Authors are listed in alphabetical order).
[Paper]

オープンLLMによる翻訳を活用した日本語CLIPの開発. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 杉浦一瑳, 栗田修平, 小田悠介, 河原大輔, 岡崎直観.
[Paper] | [Model] | [Dataset] | [Code]
ロススパイクの影響分析. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 杉浦一瑳, 栗田修平, 小田悠介.
[Paper]
llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 前田航希*, 杉浦一瑳*, 栗田修平, 小田悠介, 岡崎直観. (若手奨励賞)
[Paper] | [Code]
LLM-jp-3 VILA: 日本語マルチモーダルデータセット及び強力な日本語マルチモーダルモデルの構築. 言語処理学会第31回年次大会 (NLP2025), 2025年3月. 笹川慶人, 前田航希, 杉浦一瑳, 栗田修平, 岡崎直観, 河原大輔. (委員特別賞)
[Paper]
大規模言語モデルの事前学習ツールjax-llmの開発とinput-methodへの応用. 第19回言語処理若手シンポジウム(YANS2024), 2024年9月. 杉浦一瑳.
[Draft Paper] | [Poster] | [Code (jax-llm)] | [Code (input-method)]
ミスラベルデータの忘却による学習済みモデルの汎化性能の向上手法の提案. 第16回データ工学と情報マネジメントに関するフォーラム (DEIM2024), 2024年2月. 杉浦一瑳, 岡村真吾, 山下恭佑, 矢内直人. (LINEヤフースポンサー賞)
[Paper] | [Slide] | [Code]

pqcat
- A fast command-line tool for inspecting Parquet files.
jaccard
- A simple web app to compute Jaccard similarity between two texts by tokenizing them using OpenAI’s tiktoken.

Newer ↑

TOEIC L&R: 450 + 390 = 840 (October 2, 2022)
Security Camp organized by IPA(Information-technology Promotion Agency), Web Security Course. August 2022