Add Question 189: Compute Direct Preference Optimization Loss by zhenhuan-yang · Pull Request #583 · Open-Deep-ML/DML-OpenProblem

zhenhuan-yang · 2026-02-06T07:35:40Z

Summary

This PR adds a new medium-difficulty Deep Learning question on computing Direct Preference Optimization (DPO) loss for language model alignment.

Question Details

ID: 189
Title: Compute Direct Preference Optimization Loss
Difficulty: Medium
Category: Deep Learning

Implementation

✅ Complete solution with proper numerical stability using np.log1p
✅ Comprehensive educational content covering DPO theory and Bradley-Terry model
✅ Mathematical formulation with LaTeX
✅ 4 diverse test cases with varying parameters
✅ Example with detailed reasoning

Validation

✅ Build successful
✅ Schema validation passed
✅ All test cases pass

Educational Value

Covers an important modern technique for LLM alignment that's simpler and more stable than traditional RLHF, making it highly relevant for current ML practitioners.

compute dpo loss

8d7ccc0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Question 189: Compute Direct Preference Optimization Loss#583

Add Question 189: Compute Direct Preference Optimization Loss#583
zhenhuan-yang wants to merge 1 commit intoOpen-Deep-ML:mainfrom
zhenhuan-yang:zhy-dpo

zhenhuan-yang commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhenhuan-yang commented Feb 6, 2026

Summary

Question Details

Implementation

Validation

Educational Value

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant