Shayne Longpre

Shayne Longpre

Shayne Longpre is a PhD candidate at MIT with a focus on data-centric AI, language models, and their societal impact. Shayne's current research explores the curation and augmentation of language training data, either for pretraining or instruction finetuning, as well as investigating the legal challenges associated with popular training corpora. He leads the Data Provenance Initiative (dataprovenance.org) and the Foundation Model Development Cheatsheet (fmcheatsheet.org) for best practices in developing open models. Previously he has worked on transparency and antitrust considerations in online technologies, and he has studied or worked at Google Brain, Stanford NLP, Apple AI/ML, and Salesforce AI Research.