AI Tools for Protein Engineering
RCSB PDB is a database of experimentally-determined 3D structures of proteins. UniProt is A comprehensive, high-quality and freely accessible resource of protein sequence and functional information. It is linked to almost every other database. More recently, both databases also started providing computationally predicted structures of each entry in their database. In this section, we will learn how to download, pdb
For rest of the assignment, we will be using the complex PDB ID 7CR5 - Complex structure of a human monoclonal antibody with SARS-CoV-2 nucleocapsid protein NTD
RCSB PDB is a database of experimentally-determined 3D structures of proteins. UniProt is A comprehensive, high-quality and freely accessible resource of protein sequence and functional information. It is linked to almost every other database. More recently, both databases also started providing computationally predicted structures of each entry in their database. In this section, we will learn how to download, pdb
Structure of a protein can be downloaded as a PDB file and sequence information of a protein can be stored as a FASTA file. In this section, we will learn how information about a protein is organized in these file formats.
Download a PDB File with PDB ID - 7CR5 from ********https://www.rcsb.org/ . 7CR5 is a protein complex with multiple chains. Answer the following questions
Download the FASTA File for PDB ID. Open the fasta file as a regular text file with your Text Editor.
7CR5_1|Chain A|Nucleoprotein|Severe acute respiratory syndrome coronavirus 2 (2697049) RPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAE
7CR5_2|Chain B[auth H]|monoclonal antibody chain H|Homo sapiens (9606) QVQLVESGGGVVQPGRSLRLSCAASGFTFSSYIMHWVRQAPGKGLEWVAVISYDGSNEAYADSVKGRFTISRDNSKNTLYLQMSSLRAEDTGVYYCARETGDYSSSWYDSWGRGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKRVEPKSCDK
7CR5_3|Chain C[auth L]|monoclonal antibody chain L|Homo sapiens (9606) QLVLTQSPSASASLGASVKLTCTLSSGHSNYAIAWHQQQPEKGPRYLMKVNSDGSHTKGDGIPDRFSGSSSGAERYLTISSLQSEDEADYYCQTWGTGIQVFGGGTKLTVLGQPKAAPSVTLFPPSSEELQANKATLVCLISDFYPGAVTVAWKADSSPVKAGVETTTPSKQSNNKYAASSYLSLTPEQWKSHRSYSCQVTHEGSTVEKTVAPTECS
Go to UniProt - https://www.uniprot.org/ . And input the uniprot ID of the nucleoprotein you found in the previous question.
Amino acid 92, 107, and 149
Once you download a PDB file from RCSB or AlphaFold etc, it is valuable to interact with the protein visually and inspect binding sites/sidechain networks etc. For this we have some valuable molecule visualization tools. In the recitation we will cover using PyMOL. You can choose any one of the following tools for this assignment.