LLMScholarBench is a benchmark for auditing large language models (LLMs) in LLM-based scholar recommendations. It provides a framework to collect model outputs, standardize ground-truth data, and analyze recommendations along dimensions such as consistency, factuality, demographics, prestige, and similarity.
# Example with pyenv
pyenv install 3.11.11
pyenv local 3.11.11The repository is organized into three main modules:
Code to query LLMs and collect scholar-recommendation responses. It supports multiple providers (e.g. OpenRouter, Gemini), configurable prompts and schemas, and structured storage of outputs for downstream auditing.
- Location:
LLMCaller/ - Features: Async execution across many models, JSON schema validation, experiment categories (top-k experts, field/epoch/seniority, bias tests, twins), and configurable temperatures.
- Setup: See
LLMCaller/README.mdfor credentials,config/paths.json, and installation (e.g.pip install -r LLMCaller/requirements.txt).
Code to analyze LLM outputs against ground truth. It runs preprocessing pipelines (valid answers, factuality, disciplines, similarities) and provides notebooks for consistency, factuality, demographics, prestige, and similarity analyses.
- Location:
Auditor/ - Features: Batch preprocessing scripts, factuality checks (author, field, epoch, seniority), demographic and similarity metrics, and visualization notebooks.
- Setup: Set
PYTHONPATHfromAuditor/code/and use the batch scripts; seeAuditor/README.mdfor the full pipeline.
Module to standardize ground-truth data so analyses can be run with any compatible GT source. It defines pipelines and schemas to build comparable scholar benchmarks. Currently implemented for American Physical Society (APS) publications (physics domain), with OpenAlex used for author/institution metadata.
- Location:
GTBuilder/ - Contents:
- APS: Scripts and notebooks to build author stats, demographics, affiliations, coauthorship networks, disciplines, and publication classes from APS + OpenAlex.
- OpenAlex API Pipeline: Scripts to fetch publications, authors, and institutions from the OpenAlex API (used by the APS pipeline).
-
Clone the repository and use Python 3.11.11:
git clone https://github.com/CSHVienna/LLMScholarBench.git cd LLMScholarBench # Use Python 3.11.11 (e.g. pyenv local 3.11.11)
-
Install dependencies per module (each has its own
requirements.txtif needed):pip install -r LLMCaller/requirements.txt # Install Auditor/GTBuilder deps as required by their scripts/notebooks -
Configure and run each module according to its README (
LLMCaller/,Auditor/,GTBuilder/*/).
| Version | Description | Link |
|---|---|---|
| v2.0 | Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation | arXiv:2602.08873 (under review) |
| v1.0 | Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations | arXiv:2506.00074 (under review) |
If you use LLMScholarBench in your work, please cite the benchmark and/or the relevant paper(s) below.
BibTeX (v2.0):
@article{espinnoboa2026llmscholarbench,
author = {Espín-Noboa, Lisette and Méndez, Gonzalo G.},
title = {Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation},
journal = {arXiv preprint},
year = {2026},
url = {https://arxiv.org/abs/2602.08873},
note = {v2.0, under review}
}BibTeX (v1.0):
@article{barolo2025llmscholaraudits,
author = {Barolo, Daniele and Valentin, Chiara and Karimi, Fariba and Galárraga, Luis and Méndez, Gonzalo G. and Espín-Noboa, Lisette},
title = {Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations},
journal = {arXiv preprint},
year = {2025},
url = {https://arxiv.org/abs/2506.00074},
note = {v1.0, under review}
}See the repository for license information.