Shared References

← Back to browse

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

2024 article dpo_paper
Authors
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
arXiv