This document contains a guide for installing and running cryoDRGN šŸ‰ ā„ļø. In particular, we follow the processing steps for particle filtering and heterogeneous reconstruction of the assembling ribosome dataset (EMPIAR-10076) used in Zhong et al. This is meant as a general guide ā€” submission commands may need to be updated depending on your workstation or cluster setup.

Quick links:

cryoDRGN installation with anaconda

cryoDRGN EMPIAR-10076 tutorial

Please send any feedback, issues, or typos to Ellen Zhong (zhonge@princeton.edu) or file a github issue.


Background

CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of discrete methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a continuous distribution of density maps parameterized by a coordinate-based neural network.

Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from Zhong et al 2021

Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from Zhong et al 2021

The inputs to a cryoDRGN training run are 1) extracted particle images, 2) the CTF parameters associated with each particle, and 3) poses for each particle from a 3D refinement. Note that cryoDRGN treats the reconstruction as C1 (asymmetric). For a few thoughts on (pseudo-)symmetric complexes, see this note.

The final result of the software will be 1) latent embeddings for each particle image in the form of a real-valued vector (usually denoted with z, and output as a z.pkl file by the software), and 2) neural network weights modeling the distribution of density maps (parameterizing the function from zā†’V). Once trained, the software can reconstruct a 3D density map given a value of z.

How do you interpret the resulting distribution of structures? Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the cryodrgn analyze pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the cryodrgn analyze pipeline will produce 1) N density maps sampled from different regions of the latent space (N=20, by default), 2) continuous trajectories along the principal components axes of the latent space embeddings, and 3) visualizations of the latent space with PCA and UMAP**.**