GISA: A Benchmark for General Information-Seeking Assistant

Yutao Zhu1, Xingshuo Zhang1, Maosen Zhang1, Jiajie Jin1, Liancheng Zhang1, Xiaoshuai Song1, Kangzhi Zhao2, Wencong Zeng2, Ruiming Tang2, Han Li2, Ji-Rong Wen1, Zhicheng Dou1 Affiliation: 1 Gaoling School of Artificial Intelligence, Renmin University of China; 2 Kuaishou Technology Contact: yutaozhu94@gmail.com

What is GISA?

GISA is a benchmark for General Information-Seeking Assistants with 373 human-crafted queries that reflect real-world information needs. It includes both stable and live subsets, four structured answer formats (item, set, list, table), and complete human search trajectories for every query.

  • Diverse answer formats with deterministic evaluation. GISA uses four structured answer types (item, set, list, table) with strict matching metrics for reproducible evaluation, avoiding subjective LLM judging while preserving task diversity.
  • Unified deep + wide search capabilities. Tasks require both vertical reasoning and horizontal information aggregation across sources, evaluating long-horizon exploration and summarization in one benchmark.
  • Dynamic, anti-static evaluation. Queries are split into stable and live subsets; the live subset is periodically updated to reduce memorization and keep the benchmark challenging over time.
  • Process-level supervision via human trajectories. Full human search trajectories are provided for every query, serving as gold references for process reward modeling and imitation learning while validating task solvability.

Citation

@article{GISA,
  title     = {GISA: A Benchmark for General Information Seeking Assistant},
  author    = {Yutao Zhu and
               Xingshuo Zhang and
               Maosen Zhang and
               Jiajie Jin and
               Liancheng Zhang and
               Xiaoshuai Song and
               Kangzhi Zhao and
               Wencong Zeng and
               Ruiming Tang and
               Han Li and
               Ji-Rong Wen and
               Zhicheng Dou},
  journal      = {CoRR},
  volume       = {abs/2602.08543},
  year         = {2026},
  url          = {https://doi.org/10.48550/arXiv.2602.08543},
  doi          = {10.48550/ARXIV.2602.08543},
  eprinttype    = {arXiv},
  eprint       = {2602.08543}
}

Leaderboard

Model rankings on the official test split. Click a column to sort. Use search to filter by model name.

Rank Model / System Framework Date Overall EM Item Set List Table
EM EM F1 EM F1 Order EM Row-F1 Item-F1

Submit Results

Please follow our submission instructions (link coming soon) and open a pull request on the GitHub repository. We review PRs periodically and merge approved results.