Alyssa Unell

I'm Alyssa, a Computer Science PhD student at Stanford advised by Professor Sanmi Koyejo and Professor Nigam H Shah. My research generally focuses on trustworthy evaluations of AI for medical applications. I previously worked with Professor Serena Yeung on VLM generalization and biomedical dataset creation and with Professor Chis Ré on the capabilities of language models to perform acts of long context retrieval. Additionally, I have worked with Microsoft Research's Real World Evidence Group to evaluate calibration methods for model capabilities in data-sparse settings.

Prior to beginning my PhD at Stanford, I graduated from MIT with a degree in Computation and Cognition. I was extremely fortunate to receive amazing mentorship throughout my undergraduate experience. I worked with Professor Pawan Sinha, Dr. Kyle Keane, and Dr. Xavier Boix Boisch within the MIT Quest for Intelligence.

I worked with Professor Martin Jaggi and Dr. Annie Hartley in the Machine Learning Optimization Lab where we explored the implementation of federated learning architecture for secure medical information sharing.

I have also had the privilege to work with Professor Polina Golland on projects relating to the use of generative AI for improving MRI acquisitions. Additionally, I have worked as a Machine Learning Intern for Intel serving to improve their optimization software.

aunell@stanford.edu  /  CV  /  LinkedIn  /  Github  /  Twitter

profile photo
Research
  1. Holistic Evaluation of Large Language Models for Medical Tasks with MedHELM
    (α-β) Suhana Bedi*, Hejie Cui*, Miguel Fuentes*, Alyssa Unell*,... Percy Liang, Mike Pfeffer, Nigam H Shah
    Nature Medicine, 2025
  2. CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
    Alyssa Unell ... Matthew Lungren, Hoifung Poon
    ML4H Proceedings 2025. (Presented at NeurIPS 2025 Workshop on GenAI for Health.)
  3. Smarter Sampling for LLM Judges: Reliable Evaluation on a Budget
    Alyssa Unell*, Natalie Dullerud*, Nils Kasper, Nigam Shah, Sanmi Koyejo
    NeurIPS 2025 Workshop LLM-Eval, 2025
  4. TIMER: Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records
    Hejie Cui*, Alyssa Unell*, Bowen Chen, Jason Alan Fries, Emily Alsentzer, Sanmi Koyejo, Nigam H Shah
    NPJ Digital Medicine 2025. (Presented at ICLR 2025 Workshop on Synthetic Data.)
  5. Real-World Usage Patterns of Large Language Models in Healthcare
    Alyssa Unell*, Mehr Kashyap*, Michael Pfeffer, Nigam H Shah
    MedRxiv, 2025
  6. Why are Visually-Grounded Language Models Bad at Image Classification?
    Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy
    Conference on Neural Information Processing Systems, 2024
  7. µ-BENCH: VISION-LANGUAGE BENCHMARK FOR MICROSCOPY UNDERSTANDING
    Alejandro Lozano, Jeffrey Nirschl, James Burgess, Sanket Rajan Gupte, Yuhui Zhang, Alyssa Unell, Serena Yeung-Levy
    Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024
  8. Feasibility of Automatically Detecting Practice of Race-Based Medicine by Large Language Models
    Akshay Swaminathan, Sid Salvi, Philip Chung, Alison Callahan, Suhana Bedi, Alyssa Unell, Mehr Kashyap, Roxana Daneshjou, Nigam H Shah, Dev Dash
    AAAI 2024 Spring Symposium on Clinical Foundation Models
  9. From Clear to Noise: Investigating Neural Noise Progression in Visual System Robustness
    Hojin Jang, Alyssa Unell, Suayb Arslan, Walt Dixon, Michael Fux, Matt Groth, Joydeep Munshi & Pawan Sinha
    Vision Sciences Society Poster Session, 2024
  10. Transformation Tolerance of Machine-based Face Recognition Systems
    Ashika Verma, Kyle Keane, Alyssa Unell, Anna Musser & Pawan Sinha
    ICLR Generalization Beyond the Training Distribution in Brains and Machines Workshop, 2021
  11. Influence of Visual Feedback Persistence on Visuo-Motor Skill Improvement
    Alyssa Unell, Zachary M. Eisenstat, Ainsley Braun, Abhinav Gandhi, Sharon Gilad-Gutnick, Shlomit Ben-Ami & Pawan Sinha
    Nature Scientific Reports, 2021
Open-Source Contributions
  1. Distributed Collaborative Learning (DisCo)
    Added security guarantees to the DisCo platform that allows clients to securely train models in a decentralized fashion.