https://en.wikipedia.org/wiki/List_of_file_formats#Biology
http://www.nationalarchives.gov.uk/pronom/
https://oclc-research.github.io/infoURI-Frozen/info-uri.info/ListRecords.html
https://curlie.org/Computers/Data_Formats/
| Protein | “File” | ["*.pdb"] | 3D protein structure. |
|---|---|---|---|
| Small molecule | “File” | [".sdf"] or [".sdf", "*.mol2"] | 3D small molecule structures. We generally recommend using .sdf files. |
| Small molecule SMILES | “File” | ["*.smi"] | SMILES (Simplified Molecular Input Line Entry Specification) describes the structure of molecules using short ASCII strings. |
| Peptide sequence (e.g. amino acid chains such as proteins) | “File” | ["*.fasta"] | Common format for sequencing data. |
| Nucleotide sequence (e.g. DNA, RNA) | “File” | ["*.fasta"] | As above. |
| Sequencing raw data | “File” | ["*.fastq"] | FASTQ is an extension of FASTA. It stores the biological sequence and the corresponding quality scores. Often this data comes from 2nd generation sequencing machines from Illumina. |
| Nanopore sequencing raw data | “File” | ["*.fast5"] | The standard sequencing output for Oxford Nanopore sequencers such as the MinION. Based on the HDF5 standard. Unlike .fasta and .fastq, .fast5 is binary. |