Data Leverage References

← Back to browse

Direct preference optimization: Your language model is secretly a reward model

2023 article rafailov2023 Not yet verified
Authors
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, Chelsea Finn
Venue
arXiv preprint arXiv:2305.18290

BibTeX

Local Entry
@article{rafailov2023,
  title = {Direct preference optimization: Your language model is secretly a reward model},
  author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D Manning and Chelsea Finn},
  year = {2023},
  journal = {arXiv preprint arXiv:2305.18290}
}
From AUTO:OPENALEX
@article{rafailov2023,
  title = {Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
  author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
  year = {2023},
  journal = {arXiv (Cornell University)},
  doi = {10.48550/arxiv.2305.18290}
}