Research Topics

  • Biological Language Modeling

  • We regard biological sequences (DNA/RNA sequences and amino acid sequences) as a special kind of language, and explore biomolecular structure and functions based on sequence analysis using large language models and deep learning methods.

  • Biomedical Image Understanding

  • We design machine learning algorithms for the annotation, clustering, and segmentation of biomedical images, including the recognition of complex molecular localization patterns in biological microscopy images and the identification of biomarkers in medical imaging.

  • Representation Learning of Molecules and Materials

  • We extract features from compound molecules and chemical reactions, propose novel descriptors for elements and materials, and utilize these for predicting chemical reaction outcomes, retrosynthesis, and the design of new materials with enhanced properties.

  • Machine Learning/Deep Learning Model Research

  • We focus on the uncertainties in machine learning models, including the modeling, measurement, and mitigation of epistemic and aleatoric uncertainties.

Research Grants

  • "In situ intelligent perception techniques for deep-sea organism investigation", National Key Research and Development Program of China (2023YFC2811502), PI, 2023-2026
  • “RNA Molecular Representation Learning Based on Self-Supervised Learning”, National Natural Science Foundation of China (62272300), PI, 2023-2026
  • “Prediction of ncRNA subcellular localization based on multi-modal machine learning”, National Natural Science Foundation of China (61972251), PI, 2020-2023
  • “Study on the approaches for reliably screening microRNA biomarkers”, Shanghai Municipal Natural Science Foundation (16ZR1448700), PI, 2016-2019
  • “A Study of the transcript and post-transcript co-regulatory network in plant defense response”, Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, PI, 2015-2016