About Me

Hello! I am a Ph.D. student in Computer Science at Texas A&M University (TAMU), advised by Prof. James Caverlee and Prof. Ali Mostafavi.

My research lies at the intersection of Information Retrieval and large language models, with a focus on building robust retrieval systems for data-scarce domains and developing LLM-powered multi-agent systems for complex task solving, including task planning and tool use.

My recent work includes DMRetriever, a family of dense retrieval models achieving state-of-the-art performance across 48 retrieval tasks (ACL 2026 Main), and DisastIR, the first comprehensive IR benchmark for disaster management (EMNLP 2025 Findings).

I am open to research collaboration and internship opportunities. Feel free to reach me at kai_yin@tamu.edu.

๐Ÿ”ฅ News

  • 2026.04 ๐ŸŽ‰๐ŸŽ‰ DMRetriever is accepted by ACL 2026 MainarXiv  Code  Data  Model
  • 2026.04 ๐ŸŽ‰๐ŸŽ‰ DisastQA is accepted by ACL 2026 FindingsarXiv
  • 2025.11 ๐Ÿš€ Released source code, training data, and model checkpoints for DMRetriever (33Mโ€“7.6B).  Code  Data  Model
  • 2025.08 ๐ŸŽ‰๐ŸŽ‰ DisastIR is accepted by EMNLP 2025 FindingsPaper
  • 2025.08 ๐Ÿš€ Released source code and dataset for DisastIRCode  Dataset

๐Ÿ“ Selected Publications and Projects

ACL 2026 Main
DMRetriever

DMRetriever: Difficulty-aware Progressive Fine-tuning LLM for Improved Textual Retrieval in Data-Scarce Domain

Kai Yin, Xiangjue Dong, Chengkai Liu, Allen Lin, Lingfeng Shi, Ali Mostafavi, & James Caverlee

ACL 2026 Main

  • Developed DMRetriever, a family of six dense retrieval models (33Mโ€“7.6B) from encoder (BERT)- and decoder (Qwen)-only backbones, achieving SOTA across 48 retrieval tasks at all scales with exceptional parameter efficiency.
  • Proposed difficulty-aware progressive instruction supervised fine-tuning to ensure models at different scales learn effectively.
  • Introduced an advanced data refinement method, including domain-specific data synthesis, mutual-agreement-based false positive filtering, and difficulty-aware hard negative mining.
  • Proposed multi-teacher knowledge distillation to further improve small-sized model performance and achieve parameter-efficiency for on-device model implementation.
  • Introduced a light-weight IR validation set, enabling over 30ร— faster model development while maintaining reliable performance rankings.
EMNLP 2025 Findings
DisastIR

DisastIR: Automatic Information Retrieval Benchmark Construction for Data-Scarce Domain

Kai Yin, Xiangjue Dong, Chengkai Liu, et al., Ali Mostafavi, & James Caverlee

EMNLP 2025 Findings

  • Developed DisastIR, a comprehensive IR benchmark covering 48 distinct retrieval tasks with over 1.3M automatically labeled query-passage pairs for a data-scarce domain.
  • Developed a four-stage automatic query-passage relevance labeling framework which fully replaces human labeling, ensures zero "hole" rate for model evaluation and achieves significant consistency for model performance ranking with human annotations.
  • Benchmarked 30 retrieval models (33Mโ€“7B) under exact and ANN search, guiding model selection.
ACL 2026 Findings
DisastQA

DisastQA: A Comprehensive Benchmark for Question Answering Evaluation in Disaster Management

Zhitong Chen*, Kai Yin*, Xiangjue Dong, Chengkai Liu, Ali Mostafavi, & James Caverlee

ACL 2026 Findings  |  Equal contribution, author order by alphabetical order

  • Developed DisastQA, a QA benchmark covering both multiple choice and open-ended question types with 3,000 QA pairs based on DisastIR.
  • Proposed a Human-LLM collaborative pipeline for efficient benchmark development with key point extraction for open-ended questions, ensuring verifiable evaluation.
  • Evaluated 18 LLMs considering different upstream retrieval performances under no relevant passage (base), only relevant passage (golden), and mixture (mix) settings.
Disaster-Agent
Disaster-Agent

Disaster-Agent: LLM-based Multi-agent System for Complex Disaster Management Task Solving

  • Proposed E_MCTS_TP (Efficient Monte Carlo Tree Search for Task Planning) at test time to improve task planning ability and efficiency of small language models (SLM) in multi-agent systems.
  • Developed DisasterTool through LLM-in-loop domain-specific agent discovery pipeline, reducing human workloads by 98.9%.
  • Introduced DisasterTask benchmark including user tasks in different complexities through random sampling tool and data graph and self-instruct LLM in node, chain, and directed acyclic graph levels.
arXiv 2024
CrisisSense-LLM

CrisisSense-LLM: Instruction Fine-Tuning LLM for Multi-task Social Media Text Processing in Disaster Informatics

Kai Yin, Chengkai Liu, Ali Mostafavi, & Xia Hu

arXiv:2406.15477

  • Designed fine-tuning prompt for multi-task tuning (text classification and named entity recognition) in multi-turn conversation format to instruction fine-tune Llama3.1-8B in LoRA and full-parameter tuning settings.
  • Searched for hyperparameter combinations of LoRA to achieve 96.7% performance of full-parameter tuning, achieving best overall accuracy of 87.2%.
  • Fine-tuned with data-parallel mixed-precision using DeepSpeed ZeRO-Stage-3, reducing GPU time by 35%.

๐Ÿ“„ Other Publications

  • R2RAG-Flood: A Reasoning-Reinforced Training-Free Retrieval Augmentation Generation Framework for Flood Damage Nowcasting.
    Lipai Huang, Kai Yin, Chia-Fu Liu, Ali Mostafavi (2026). (Corresponding author, Under review, Computer-Aided Civil and Infrastructure Engineering)  arXiv
  • CrisiSense-RAG: Crisis Sensing Multimodal Retrieval-Augmented Generation for Rapid Disaster Impact Assessment.
    Yiming Xiao, Kai Yin, Ali Mostafavi (2026). (Under review, Computer-Aided Civil and Infrastructure Engineering)  arXiv
  • FloodSQL-Bench: A Retrieval-Augmented Benchmark for Geospatially-Grounded Text-to-SQL.
    Hanzhou Liu, Kai Yin, Zhitong Chen, Chenyue Liu, Ali Mostafavi (2025).  arXiv
  • Disaster Management in the Era of Agentic AI Systems: A Vision for Collective Human-Machine Intelligence for Augmented Resilience.
    Bo Li, Junwei Ma, Kai Yin, Yiming Xiao, Chia-Wei Hsu, Ali Mostafavi (2025).  arXiv
  • Automated Wildfire Damage Assessment from Multi-View Ground Level Imagery via Vision Language Models.
    Miguel Esparza, Archit Gupta, Ali Mostafavi, Kai Yin, Yiming Xiao (2025).  arXiv
  • Deep Learning-driven Community Resilience Rating based on Intertwined Socio-Technical Systems Features.
    Kai Yin, Bo Li, Ali Mostafavi (2023). npj Urban Sustainability.  arXiv
  • Unsupervised graph deep learning reveals emergent flood risk profile of urban areas.
    Kai Yin, Ali Mostafavi (2023). (Under review, npj Urban Sustainability)  arXiv
  • An integrated resilience assessment model of urban transportation network: A case study of 40 cities in China.
    Kai Yin, Wu, J., Wang, W., Lee, D. H., & Wei, Y. Transportation Research Part A.  Paper

๐Ÿ“– Education

  • 2022.07 โ€“ 2027.05 (exp.) Texas A&M University, College Station, TX, USA
    Ph.D. student in Computer Science

๐Ÿ›  Skills

Python  ยท  PyTorch  ยท  AutoGen  ยท  Hugging Face (Transformers, PEFT)  ยท  DeepSpeed  ยท  Verl

๐Ÿ† Honors & Awards

  • Passed CSE Ph.D. Qualifying Exam with 99th percentile score.
  • National Second Prize, 13th National Undergraduate Transportation Science and Technology Competition (First Author).