This document contains a guide for installing and running **cryoDRGN** š āļø. In particular, we follow the processing steps for **particle filtering** and **heterogeneous reconstruction** of the **assembling ribosome dataset (EMPIAR-10076)** used in Zhong et al. This is meant as a general guide ā submission commands may need to be updated depending on your workstation or cluster setup.

**Quick links:**

cryoDRGN installation with anaconda

cryoDRGN EMPIAR-10076 tutorial

Please send any feedback, issues, or typos to Ellen Zhong (`zhonge@princeton.edu`

) or file a github issue.

### Background

CryoDRGN is a neural network-based method for heterogeneous reconstruction. Instead of *discrete* methods like 3D classification that produce an ensemble of K density maps, cryoDRGN performs heterogeneous reconstruction by learning a *continuous distribution* of density maps parameterized by a coordinate-based neural network.

Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from Zhong et al 2021

Principal component trajectories and graph traversal trajectories of the pre-catalyic spliceosome. SI Video 4 from Zhong et al 2021

The inputs to a cryoDRGN training run are **1) extracted particle images**, **2) the CTF parameters** associated with each particle, and **3) poses** for each particle from a 3D refinement. Note that cryoDRGN treats the reconstruction as C1 (asymmetric). For a few thoughts on (pseudo-)symmetric complexes, see this note.

The final result of the software will be **1) latent embeddings** for each particle image in the form of a real-valued vector (usually denoted with z, and output as a `z.pkl`

file by the software), and **2) neural network weights** modeling the distribution of density maps (parameterizing the function from zāV). Once trained, the software can reconstruct a 3D density map given a value of z.

How do you interpret the resulting distribution of structures? Since different datasets have diverse sources of heterogeneity (e.g. discrete vs. continuous), cryoDRGN contains a variety of automated and interactive tools to analyze the reconstructed distribution of structures. The starting point for analysis is the `cryodrgn analyze`

pipeline, which generates a sample of 3D density maps and visualizations of the latent space. Specifically, the `cryodrgn analyze`

pipeline will produce **1) N density maps** sampled from different regions of the latent space (N=20, by default), **2) continuous trajectories** along the principal components axes of the latent space embeddings, and **3) visualizations of the latent space** with PCA and UMAP**.**