Direct preference optimization: Your language model is secretly a reward model

2023 article rafailov2023 ⚠ Needs review - 1 field differ

Fields with differences: venue. Compare local vs external BibTeX below.

Authors

Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, Chelsea Finn

Venue

NeurIPS

Links

🔗 Source

BibTeX

Local Entry

@article{rafailov2023,
  title = {Direct preference optimization: Your language model is secretly a reward model},
  author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
  year = {2023},
  journal = {NeurIPS},
  url = {https://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html}
}

From AUTO:OPENALEX

@article{rafailov2023,
  title = {Direct Preference Optimization: Your Language Model is Secretly a Reward Model},
  author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
  year = {2023},
  journal = {arXiv (Cornell University)},
  doi = {10.48550/arxiv.2305.18290}
}