Current Research Interests

I am currently broadly interested in problems at the intersection of biology and machine learning. Some of my current interests include:

  • Generative models and pretraining for proteins and chemistry
  • Machine learning for protein engineering
  • Uncertainty quantification in neural networks


Protein structure generation via folding diffusion. Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, Sarah Alamdari, James Y. Zou, Alex X. Lu, Ava P. Amini. Nature Communications, 2024. 10.1038/s41467-024-45051-2

Masked inverse folding with sequence transfer for protein representation learning. Kevin K. Yang, Niccolò Zanichelli, Hugh Yeh. Protein Engineering, Design and Selection, 2024. 10.1101/2022.05.25.493516

Convolutions are competitive with transformers for protein sequence pretraining. Kevin K. Yang, Nicolo Fusi, Alex X. Lu. Cell Systems, 2024. 10.1101/2022.05.19.492714

Randomized gates eliminate bias in sort-seq assays. Brian L. Trippe, Buwei Huang, Erika A. DeBenedictis, Brian Coventry, Nicholas Bhattacharya, Kevin K. Yang, David Baker, Lorin Crawford. Protein Science, 2022. biorxiv

Deep self-supervised learning for biosynthetic gene cluster detection and product classification. Carolina Rios-Martinez, Nicholas Bhattacharya, Ava P Amini, Lorin Crawford, Kevin K. Yang. PLoS Computational Biology, 2023. 10.1371/journal.pcbi.1011162

Exploring evolution-based &-free protein language models as protein function predictors. Mingyang Hu, Fajie Yuan, Kevin K. Yang, Fusong Ju, Jin Su, Hui Wang, Fei Yang, Qiuyang Ding. NeurIPS 2022

Evolutionary velocity with protein language models. Brian L. Hie, Kevin K. Yang, and Peter S. Kim. Cell Systems, 2022. 10.1016/j.cels.2022.01.003

Machine learning modeling of family wide enzyme-substrate specificity screens. Samuel Goldman, Ria Das, Kevin K Yang, Connor W Coley. PLoS computational biology, 2022. 10.1371/journal.pcbi.1009853

A topological data analytic approach for discovering biophysical signatures in protein dynamics. Wai Shing Tang, Gabriel Monteiro da Silva, Henry Kirveslahti, Erin Skeens, Bibo Feng, Timothy Sudijono, Kevin K. Yang, Sayan Mukherjee, Brenda Rubenstein, Lorin Crawford. PLoS computational biology, 2022. 10.1371/journal.pcbi.1010045

Adaptive machine learning for protein engineering. Brian L. Hie and Kevin K. Yang. Current Opinion in Structural Biology, 2022. 10.1016/

FLIP: Benchmark tasks in fitness landscape inference for proteins. Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang. NeurIPS 2021 Datasets and Benchmarks Track. 10.1101/2021.11.09.467890

Protein sequence design with deep generative models. Zachary Wu, Kadina E. Johnston, Frances H. Arnold, and Kevin K. Yang. Current Opinion in Chemical Biology, 2021. 10.1016/j.cbpa.2021.04.004

Learned embeddings from deep learning to visualize and predict protein sets. Christian Dallago, Konstantin Schütze, Michael Heinzinger, Tobias Olenyi, Maria Littmann, Amy X. Lu, Kevin K. Yang, Seonwoo Min, Sungroh Yoon, James T. Morton, Burkhard Rost. Current Protocols, May 2021. 10.1002/cpz1.113

Signal Peptides Generated by Attention-Based Neural Networks. Zachary Wu, Kevin K. Yang, Michael J. Liszka, Alycia Lee, Alina Batzilla, David Wernick, David P. Weiner, and Frances H. Arnold. ACS Synthetic Biology, 10 July 2020. 10.1021/acssynbio.0c00219

Machine learning-guided channelrhodopsin engineering enables minimally-invasive optogenetics. Bedbrook CN, Yang KK, Robinson JE, Gradinaru V, Arnold FH. Nature Methods, October 14, 2019. 10.1038/s41592-019-0583-8.

Machine-learning-guided directed evolution for protein engineering. Yang KK, Wu Z, Arnold FH. Nature Methods, July 15, 2019. 10.1038/s41592-019-0496-6.

Batched stochastic Bayesian optimization via combinatorial constraints design. Yang KK, Chen Y, Lee A, Yue Y. AIStats 2019. arxiv.

The Generation of Thermostable Fungal Laccase Chimeras by SCHEMA-RASPP Structure-Guided Recombination in Vivo. Mateljak I, Rice A, Yang KK, Tron T, Alcalde M. ACS Synthetic Biology, March 21, 2019. 10.1021/acssynbio.8b00509

Learned protein embeddings for machine learning. Yang KK, Wu Z, Bedbrook CN, Arnold FH. Bioinformatics. 23 March 2018. 10.1093/bioinformatics/bty178.

Machine learning to predict eukaryotic expression and plasma membrane localization of engineered integral membrane proteins. Bedbrook CN, Yang KK, Rice AJ, Gradinaru V, Arnold FH. PLOS Computational Biology 13(10): e1005786 (2017). 10.1371/journal.pcbi.1005786.

“Structure-Guided SCHEMA Recombination Generates Diverse Chimeric Channelrhodopsins. C. N. Bedbrook, A. J. Rice, K. K. Yang, X. Ding, S. Chen, E. M. LeProust, V. Gradinaru, F. H. Arnold. Proceedings of the National Academy of Sciences 114, E2624-E2633 (2017). 10.1073/pnas.170026911.


Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models. Francesca-Zhoufan Li, Ava P. Amini, Yisong Yue, Kevin K. Yang, Alex X. Lu. 10.1101/2024.02.05.578959

Protein generation with evolutionary diffusion: sequence is all you need.
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Xijie Lu, Nicolo Fusi, Ava Pardis Amini, Kevin K. Yang. 10.1101/2023.09.11.556673

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks. Sean R Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang. 10.1101/2023.03.04.531015

Benchmarking uncertainty quantification for protein engineering. Kevin P. Greenman, Ava P. Amini, Kevin K. Yang. 10.1101/2023.04.17.536962