Alyssa Unell

I'm Alyssa, a Computer Science PhD student at Stanford advised by Professor Sanmi Koyejo and Professor Nigam H Shah. My research generally focuses on trustworthy evaluations of AI for medical applications. I previously worked with Professor Serena Yeung on VLM generalization and biomedical dataset creation and with Professor Chis Ré on the capabilities of language models to perform acts of long context retrieval. Additionally, I have worked with Microsoft Research's Real World Evidence Group to evaluate calibration methods for model capabilities in data-sparse settings.

Prior to beginning my PhD at Stanford, I graduated from MIT with a degree in Computation and Cognition. I was extremely fortunate to receive amazing mentorship throughout my undergraduate experience. I worked with Professor Pawan Sinha, Dr. Kyle Keane, and Dr. Xavier Boix Boisch within the MIT Quest for Intelligence.

I worked with Professor Martin Jaggi and Dr. Annie Hartley in the Machine Learning Optimization Lab where we explored the implementation of federated learning architecture for secure medical information sharing.

I have also had the privilege to work with Professor Polina Golland on projects relating to the use of generative AI for improving MRI acquisitions. Additionally, I have worked as a Machine Learning Intern for Intel serving to improve their optimization software.

aunell@stanford.edu / CV / LinkedIn / Github / Twitter

Research

Holistic Evaluation of Large Language Models for Medical Tasks with MedHELM
(α-β) Suhana Bedi*, Hejie Cui*, Miguel Fuentes*, Alyssa Unell*,... Percy Liang, Mike Pfeffer, Nigam H Shah
Nature Medicine, 2025
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
Alyssa Unell ... Matthew Lungren, Hoifung Poon
ML4H Proceedings 2025. (Presented at NeurIPS 2025 Workshop on GenAI for Health.)
Smarter Sampling for LLM Judges: Reliable Evaluation on a Budget
Alyssa Unell*, Natalie Dullerud*, Nils Kasper, Nigam Shah, Sanmi Koyejo
NeurIPS 2025 Workshop LLM-Eval, 2025
TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records
Hejie Cui*, Alyssa Unell*, Bowen Chen, Jason Alan Fries, Emily Alsentzer, Sanmi Koyejo, Nigam H Shah
NPJ Digital Medicine 2025. (Presented at ICLR 2025 Workshop on Synthetic Data.)
Real-World Usage Patterns of Large Language Models in Healthcare
Alyssa Unell*, Mehr Kashyap*, Michael Pfeffer, Nigam H Shah
MedRxiv, 2025
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy
Conference on Neural Information Processing Systems, 2024
µ-BENCH: VISION-LANGUAGE BENCHMARK FOR MICROSCOPY UNDERSTANDING
Alejandro Lozano, Jeffrey Nirschl, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024
Feasibility of Automatically Detecting Practice of Race-Based Medicine by Large Language Models
Akshay Swaminathan, Sid Salvi, Philip Chung, Alison Callahan, Suhana Bedi, Alyssa Unell, Mehr Kashyap, Roxana Daneshjou, Nigam H Shah, Dev Dash
AAAI 2024 Spring Symposium on Clinical Foundation Models
From Clear to Noise: Investigating Neural Noise Progression in Visual System Robustness
Hojin Jang, Alyssa Unell, Suayb Arslan, Walt Dixon, Michael Fux, Matt Groth, Joydeep Munshi & Pawan Sinha
Vision Sciences Society Poster Session, 2024
Transformation Tolerance of Machine-based Face Recognition Systems
Ashika Verma, Kyle Keane, Alyssa Unell, Anna Musser & Pawan Sinha
ICLR Generalization Beyond the Training Distribution in Brains and Machines Workshop, 2021
Influence of Visual Feedback Persistence on Visuo-Motor Skill Improvement
Alyssa Unell, Zachary M. Eisenstat, Ainsley Braun, Abhinav Gandhi, Sharon Gilad-Gutnick, Shlomit Ben-Ami & Pawan Sinha
Nature Scientific Reports, 2021

Open-Source Contributions

Distributed Collaborative Learning (DisCo)
Added security guarantees to the DisCo platform that allows clients to securely train models in a decentralized fashion.