Research project presented at the IVC/Saddleback Symposium investigating how prompt language influences large language model (LLM) reasoning accuracy, performance, and token cost.
This project analyzes how the language used in prompts affects:
- Reasoning accuracy
- Reading comprehension performance
- Token consumption
- API cost efficiency
We evaluated multiple LLMs using SAT Math and SAT Reading questions to measure performance differences across prompt languages.
Does the language of a prompt (e.g., English vs. Chinese) significantly impact:
- Logical reasoning accuracy?
- Reading comprehension performance?
- Token usage?
- Inference cost?
- Official SAT Math questions
- Official SAT Reading comprehension passages
- Prompt language (English vs. non-English)
- Accuracy (correct vs. incorrect answers)
- Token usage (input + output tokens)
- Estimated API cost
- Same question sets
- Same models
- Same temperature and generation parameters
- Same evaluation criteria
- Accuracy percentage
- Average tokens per response
- Cost per 100 questions
- Cost vs. accuracy tradeoff analysis
- OpenAI GPT models
- Google Gemini models
- (Add additional models if applicable)
- Selected SAT Math and Reading questions.
- Generated structured prompts in multiple languages.
- Queried each model using identical parameters.
- Logged:
- Model responses
- Correct/incorrect results
- Input/output token counts
- Calculated cost using published API pricing.
- Compared accuracy and cost tradeoffs across languages.
(Add your actual results here)
Example format:
- Chinese prompts improved math reasoning accuracy by X%.
- English prompts used fewer tokens on reading tasks.
- Certain models showed higher sensitivity to linguistic structure.
- Cost-efficiency varied depending on prompt language.
Prompt language is often overlooked in LLM evaluation.
This experiment provides insights into:
- Prompt engineering optimization
- Multilingual LLM behavior
- Cost-efficient AI deployment
- Model sensitivity to linguistic structure
The findings are relevant for AI researchers, engineers, and AI product builders.
- Python
- OpenAI API
- Google Gemini API
- Pandas
- Matplotlib
- Jupyter Notebooks
/data → SAT question datasets
/experiments → Prompt testing scripts
/analysis → Evaluation + cost analysis
/results → Accuracy + token metrics
Lucas Trinh
Computer Science Student, Irvine Valley College
IVC/Saddleback Symposium Presenter
This project is for academic and research purposes.
Data (sat-en.jsonl and sat-math.jsonl) from: https://github.com/ruixiangcui/AGIEval