Terms

Colab Notebooks

Databases

AI Tools for Protein Engineering

Protein Design Assignment

Untitled%20(2).png

RCSB PDB is a database of experimentally-determined 3D structures of proteins. UniProt is A comprehensive, high-quality and freely accessible resource of protein sequence and functional information. It is linked to almost every other database. More recently, both databases also started providing computationally predicted structures of each entry in their database. In this section, we will learn how to download, pdb

For rest of the assignment, we will be using the complex PDB ID 7CR5 - Complex structure of a human monoclonal antibody with SARS-CoV-2 nucleocapsid protein NTD

Section 1 - Databases, File Formats, Visualization

RCSB PDB is a database of experimentally-determined 3D structures of proteins. UniProt is A comprehensive, high-quality and freely accessible resource of protein sequence and functional information. It is linked to almost every other database. More recently, both databases also started providing computationally predicted structures of each entry in their database. In this section, we will learn how to download, pdb

1.1 Databases & File Formats

Structure of a protein can be downloaded as a PDB file and sequence information of a protein can be stored as a FASTA file. In this section, we will learn how information about a protein is organized in these file formats.

  1. Download a PDB File with PDB ID - 7CR5 from ********https://www.rcsb.org/ . 7CR5 is a protein complex with multiple chains. Answer the following questions

    1. How many unique protein chains are in this PDB (3)
    2. What is the crystallization method used ?: (X-RAY diffraction)
    3. What is the resolution of this PDB structure. Why is resolution of the structure important for inferring the function of the protein ? https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/resolution#:~:text=High-resolution structures%2C with resolution,atomic structure must be inferred. (2.08 Å, resolution is important to determine quality of the collected protein data. Resolution in this context is defined as a measure of the level of detail present in the diffraction pattern and the level of detail that will be seen when the electron density map is calculated.)
    4. Find the UniProt ID of the SARS-CoV-2 Nucleoprotein (P0DTC9)
    5. [OPTIONAL] You can open the PDB File as a text file with your regular text editor. Familiarize yourself with various columns in the PDB Format. Learn more here - https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/dealing-with-coordinates
  2. Download the FASTA File for PDB ID. Open the fasta file as a regular text file with your Text Editor.

    1. Can you list the sequence of each protein chain along with it’s name using the FASTA file?

    7CR5_1|Chain A|Nucleoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049) RPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAE

    7CR5_2|Chain B[auth H]|monoclonal antibody chain H|Homo sapiens (9606) QVQLVESGGGVVQPGRSLRLSCAASGFTFSSYIMHWVRQAPGKGLEWVAVISYDGSNEAYADSVKGRFTISRDNSKNTLYLQMSSLRAEDTGVYYCARETGDYSSSWYDSWGRGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSCDK

    7CR5_3|Chain C[auth L]|monoclonal antibody chain L|Homo sapiens (9606) QLVLTQSPSASASLGASVKLTCTLSSGHSNYAIAWHQQQPEKGPRYLMKVNSDGSHTKGDGIPDRFSGSSSGAERYLTISSLQSEDEADYYCQTWGTGIQVFGGGTKLTVLGQPKAAPSVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADSSPVKAGVETTTPSKQSNNKYAASSYLSLTPEQWKSHRSYSCQVTHEGSTVEKTVAPTECS

  3. Go to UniProt - https://www.uniprot.org/ . And input the uniprot ID of the nucleoprotein you found in the previous question.

    1. Can you identify possible amino acid sites involved with RNA Binding using information available on UniProt ?

    Amino acid 92, 107, and 149

1.2 3D Molecule Visualization software

Once you download a PDB file from RCSB or AlphaFold etc, it is valuable to interact with the protein visually and inspect binding sites/sidechain networks etc. For this we have some valuable molecule visualization tools. In the recitation we will cover using PyMOL. You can choose any one of the following tools for this assignment.