Prompt Language Effects on LLM Reasoning

Research project presented at the IVC/Saddleback Symposium investigating how prompt language influences large language model (LLM) reasoning accuracy, performance, and token cost.

Overview

This project analyzes how the language used in prompts affects:

Reasoning accuracy
Reading comprehension performance
Token consumption
API cost efficiency

We evaluated multiple LLMs using SAT Math and SAT Reading questions to measure performance differences across prompt languages.

Research Question

Does the language of a prompt (e.g., English vs. Chinese) significantly impact:

Logical reasoning accuracy?
Reading comprehension performance?
Token usage?
Inference cost?

Experimental Design

Dataset

Official SAT Math questions
Official SAT Reading comprehension passages

Independent Variable

Prompt language (English vs. non-English)

Dependent Variables

Accuracy (correct vs. incorrect answers)
Token usage (input + output tokens)
Estimated API cost

Controlled Variables

Same question sets
Same models
Same temperature and generation parameters
Same evaluation criteria

Metrics

Accuracy percentage
Average tokens per response
Cost per 100 questions
Cost vs. accuracy tradeoff analysis

Models Tested

OpenAI GPT models
Google Gemini models
(Add additional models if applicable)

Methodology

Selected SAT Math and Reading questions.
Generated structured prompts in multiple languages.
Queried each model using identical parameters.
Logged:
- Model responses
- Correct/incorrect results
- Input/output token counts
Calculated cost using published API pricing.
Compared accuracy and cost tradeoffs across languages.

Key Findings

(Add your actual results here)

Example format:

Chinese prompts improved math reasoning accuracy by X%.
English prompts used fewer tokens on reading tasks.
Certain models showed higher sensitivity to linguistic structure.
Cost-efficiency varied depending on prompt language.

Why This Matters

Prompt language is often overlooked in LLM evaluation.

This experiment provides insights into:

Prompt engineering optimization
Multilingual LLM behavior
Cost-efficient AI deployment
Model sensitivity to linguistic structure

The findings are relevant for AI researchers, engineers, and AI product builders.

Tech Stack

Python
OpenAI API
Google Gemini API
Pandas
Matplotlib
Jupyter Notebooks

Repository Structure

/data → SAT question datasets
/experiments → Prompt testing scripts
/analysis → Evaluation + cost analysis
/results → Accuracy + token metrics

Author

Lucas Trinh
Computer Science Student, Irvine Valley College
IVC/Saddleback Symposium Presenter

License

This project is for academic and research purposes.

Data (sat-en.jsonl and sat-math.jsonl) from: https://github.com/ruixiangcui/AGIEval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompt Language Effects on LLM Reasoning

Overview

Research Question

Experimental Design

Dataset

Independent Variable

Dependent Variables

Controlled Variables

Metrics

Models Tested

Methodology

Key Findings

Why This Matters

Tech Stack

Repository Structure

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
.gitignore		.gitignore
README.md		README.md
get_questions.py		get_questions.py
prompt_from_csv.py		prompt_from_csv.py
requirements.txt		requirements.txt
responses-to-5-sat-math-ENGLISH.csv		responses-to-5-sat-math-ENGLISH.csv
responses-to-50-sat-en-ENGLISH.csv		responses-to-50-sat-en-ENGLISH.csv
responses-to-50-sat-en-FRENCH.csv		responses-to-50-sat-en-FRENCH.csv
responses-to-50-sat-en-HINDI.csv		responses-to-50-sat-en-HINDI.csv
responses-to-50-sat-en-MANDARIN.csv		responses-to-50-sat-en-MANDARIN.csv
responses-to-50-sat-en-SPANISH.csv		responses-to-50-sat-en-SPANISH.csv
responses-to-50-sat-math-ENGLISH.csv		responses-to-50-sat-math-ENGLISH.csv
responses-to-50-sat-math-FRENCH.csv		responses-to-50-sat-math-FRENCH.csv
responses-to-50-sat-math-HINDI.csv		responses-to-50-sat-math-HINDI.csv
responses-to-50-sat-math-MANDARIN.csv		responses-to-50-sat-math-MANDARIN.csv
responses-to-50-sat-math-SPANISH.csv		responses-to-50-sat-math-SPANISH.csv
translate_csv.py		translate_csv.py

Folders and files

Latest commit

History

Repository files navigation

Prompt Language Effects on LLM Reasoning

Overview

Research Question

Experimental Design

Dataset

Independent Variable

Dependent Variables

Controlled Variables

Metrics

Models Tested

Methodology

Key Findings

Why This Matters

Tech Stack

Repository Structure

Author

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages