LLMScholarBench

LLMScholarBench is a benchmark for auditing large language models (LLMs) in LLM-based scholar recommendations. It provides a framework to collect model outputs, standardize ground-truth data, and analyze recommendations along dimensions such as consistency, factuality, demographics, prestige, and similarity.

Requirements

Python: 3.11.11 (recommended; use pyenv or conda to manage versions)

# Example with pyenv
pyenv install 3.11.11
pyenv local 3.11.11

Repository structure

The repository is organized into three main modules:

1. LLMCaller

Code to query LLMs and collect scholar-recommendation responses. It supports multiple providers (e.g. OpenRouter, Gemini), configurable prompts and schemas, and structured storage of outputs for downstream auditing.

Location: LLMCaller/
Features: Async execution across many models, JSON schema validation, experiment categories (top-k experts, field/epoch/seniority, bias tests, twins), and configurable temperatures.
Setup: See LLMCaller/README.md for credentials, config/paths.json, and installation (e.g. pip install -r LLMCaller/requirements.txt).

2. Auditor

Code to analyze LLM outputs against ground truth. It runs preprocessing pipelines (valid answers, factuality, disciplines, similarities) and provides notebooks for consistency, factuality, demographics, prestige, and similarity analyses.

Location: Auditor/
Features: Batch preprocessing scripts, factuality checks (author, field, epoch, seniority), demographic and similarity metrics, and visualization notebooks.
Setup: Set PYTHONPATH from Auditor/code/ and use the batch scripts; see Auditor/README.md for the full pipeline.

3. GTBuilder

Module to standardize ground-truth data so analyses can be run with any compatible GT source. It defines pipelines and schemas to build comparable scholar benchmarks. Currently implemented for American Physical Society (APS) publications (physics domain), with OpenAlex used for author/institution metadata.

Location: GTBuilder/
Contents:
- APS: Scripts and notebooks to build author stats, demographics, affiliations, coauthorship networks, disciplines, and publication classes from APS + OpenAlex.
- OpenAlex API Pipeline: Scripts to fetch publications, authors, and institutions from the OpenAlex API (used by the APS pipeline).

Installation

Clone the repository and use Python 3.11.11:

git clone https://github.com/CSHVienna/LLMScholarBench.git
cd LLMScholarBench
# Use Python 3.11.11 (e.g. pyenv local 3.11.11)

Install dependencies per module (each has its own requirements.txt if needed):

pip install -r LLMCaller/requirements.txt
# Install Auditor/GTBuilder deps as required by their scripts/notebooks

Configure and run each module according to its README (LLMCaller/, Auditor/, GTBuilder/*/).

Related papers

Version	Description	Link
v2.0	Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation	arXiv:2602.08873 (under review)
v1.0	Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations	arXiv:2506.00074 (under review)

Citation

If you use LLMScholarBench in your work, please cite the benchmark and/or the relevant paper(s) below.

BibTeX (v2.0):

@article{espinnoboa2026llmscholarbench,
  author    = {Espín-Noboa, Lisette and Méndez, Gonzalo G.},
  title     = {Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation},
  journal   = {arXiv preprint},
  year      = {2026},
  url       = {https://arxiv.org/abs/2602.08873},
  note      = {v2.0, under review}
}

BibTeX (v1.0):

@article{barolo2025llmscholaraudits,
  author    = {Barolo, Daniele and Valentin, Chiara and Karimi, Fariba and Galárraga, Luis and Méndez, Gonzalo G. and Espín-Noboa, Lisette},
  title     = {Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations},
  journal   = {arXiv preprint},
  year      = {2025},
  url       = {https://arxiv.org/abs/2506.00074},
  note      = {v1.0, under review}
}

Contributors

License

See the repository for license information.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
Auditor		Auditor
GTBuilder		GTBuilder
LLMCaller		LLMCaller
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
citation.cff		citation.cff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMScholarBench

Requirements

Repository structure

1. LLMCaller

2. Auditor

3. GTBuilder

Installation

Related papers

Citation

Contributors

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

CSHVienna/LLMScholarBench

Folders and files

Latest commit

History

Repository files navigation

LLMScholarBench

Requirements

Repository structure

1. LLMCaller

2. Auditor

3. GTBuilder

Installation

Related papers

Citation

Contributors

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages