AI Models

We develop frontier AI models, trained on large-scale biological datasets to understand and model life from the level of molecules to tissue and cells.

ESM Cambrian

A next generation language model trained on protein sequences at the scale of life on Earth. ESMC defines a new state of the art for protein representation learning.

ESM3

A generative, multi-modal model that reasons over protein sequence, structure, and function. ESM3 enables programmable generation of proteins.

Data

We generate large-scale biological data that spans model systems and organisms, experimental and observational methods, and diverse cellular states and make these data openly available to help scientists accelerate discoveries.

CELL×GENE dataset

CELL×GENE

An interactive data explorer for single-cell datasets that leverages modern web development techniques to enable fast visualizations of at least 1 million cells, enabling data exploration.

CryoET Data Portal

A cloud-based, open-source portal aimed at driving the development of automated annotations of cryoET datasets and shortening data processing time from months or years to weeks.

A coordinated global effort for scaling biological data to build a predictive model of life

Biohub’s Virtual Biology Initiative is a shared global effort to generate the data that is critical for building artificial intelligence models for cellular biology and unlocking new scientific insights. This initiative is the next step in Biohub’s decade-long effort to advance technologies to measure cells across scales and contexts, and to accelerate the scientific understanding of cellular biology to cure or prevent disease, including its support of large-scale data generation projects such as the Human Cell Atlas, the Billion Cells Project, and the Tabula Sapiens multi-organ cell atlas, and a range of integrated grant programs across imaging and instrumentation, spatial molecular biology, and synthetic biology.