Data downloads
Introduction
Users can download the full database, or alternatively can filter and select certain subsets of data through different pages on our website. The file structure of downloads is explained below and in the ReadMe files accompanying downloaded data.
Full database
The full database can be downloaded here. The folder is split into two parts; The simple version contains ohnolog pairs for each species. This can be found under ohnologs/. The full database dump can be found under database/.
Simple ohnologs list
The simple version of the database can be found under ohnologs/. Each file contains a list of all of a species’ ohnologs a tab-separated format, e.g., all human ohnologs are contained in human.tsv. Each row contains a query and subject gene and their relationship.
List of files:
acipenser_ruthenus.tsvamia_calva.tsvanolis_carolinensis.tsvcallorhinchus_milii.tsvcanis_lupus_familiaris.tsvdanio_rerio.tsvgallus_gallus.tsvgasterosteus_aculeatus.tsvhomo_sapiens.tsvlatimeria_chalumnae.tsvlepisosteus_oculatus.tsvleucoraja_erinacea.tsvmeleagris_gallopavo.tsvmonodelphis_domestica.tsvmus_musculus.tsvoryzias_latipes.tsvpolypterus_senegalus.tsvrhincodon_typus.tsvstegostoma_tigrinum.tsvtaeniopygia_guttata.tsvtakifugu_rubripes.tsv
Relationships key:
r1- Ohnologs are 1R-only (only 1R ohnologs have been retained in this gene family)r2- Ohnologs are 2R-only (only 2R ohnologs have been retained in this gene family)both- Ohnologs in this gene family have been retained after both 1R and 2Runk- Ohnologs are either 1R-onlyr1or 2R-onlyr2, but it is unclear whichsyn- Ohnologs were identified using a micro-synteny analysishtf- Ohnologs are part of the ‘hard-to-find’ set, described in our paper
Whole database dump
This section contain the entire database dump, located at database/. Ohnolog data from all species is aggregated and split into multiple tab-separated tables for easy use with SQL databases. Columns are annotated in the file headers and explained below.
List of files:
sources.tsvspecies.tsvscaffolds.tsvsegments.tsvfamilies.tsvgenes.tsvlabels.tsvgene_labels.tsvgene_ohnology.tsvtrees.tsvtree_species.tsvtree_genes.tsvsynteny_blocks.tsvsynteny_tracks.tsvsynteny_groups.tsvsynteny_genes.tsv
Sources
File: sources.tsv
Columns:
source:sourceId- Internal database keysource:name- The name of a genome database or publication, e.g., Ensembl
Species
File: species.tsv
Columns:
source:sourceId- Internal database key, refers to a genome source insources.tsvspecies:speciesId- Latin name of the species in snake_casespecies:name- Species namespecies:version- Genome version, e.g., Ensembl version 100species:assembly- Whether this genome is a scaffold or chromosome-level assemblyspecies:outgroup- Whether this species is an outgroup to vertebratesspecies:reconstruction- Whether this genome is an ancestral reconstruction
Scaffolds
File: scaffolds.tsv
Columns:
species:speciesId- Latin name of the species in snake_casescaffold:scaffoldId- Chromosome or scaffold name as per the annotation usedscaffold:start- Chromosome or scaffold start coordinate (first feature as per the annotation used)scaffold:end- Chromosome or scaffold end coordinate (last feature as per the annotation used)
Segments
File: segments.tsv
Columns:
species:speciesId- Latin name of the species in snake_casescaffold:scaffolId- Chromosome or scaffold name as per the annotation usedsegment:segmentId- Zero-indexed macro-synteny segment identifiersegment:start- Segment start coordinate on chromosome or scaffoldsegment:end- Segment end coordinate on chromosome or scaffold
Families
File: families.tsv
Columns:
family:familyId- Internal database key
Genes
File: genes.tsv
Columns:
species:speciesId- Latin name of the species in snake_casescaffold:scaffoldId- Chromosome or scaffold name as per the annotation usedsegment:segmentId- Zero-indexed macro-synteny segment identifierfamily:familyId- Internal database key, refers to a gene family infamilies.tsvgene:geneId- Unique gene identifiergene:proteinId- Unique protein identifiergene:start- Gene start coordinate on chromosome or scaffoldgene:end- Gene end coordinate on chromosome or scaffoldgene:pvc- Proto-vertebrate chromosome that the gene has been assigned togene:pgc- Proto-gnathostome chromosome that the gene has been assigned to
Labels
File: labels.tsv
Columns:
label:labelId- Internal database keylabel:name- Label describing the evidence used to determine that a gene is an ohnolog
Gene Labels
File: gene_labels.tsv
Columns:
gene:proteinId- Unique protein identifierlabel:labelId- Internal database key, refers to a label inlabels.tsv
Gene Ohnology
File: gene_ohnology.tsv
Columns:
gene:queryId- Unique protein identifier of the first ohnolog in this pairgene:subjectId- Unique protein identifier of the second ohnolog in this pairohnology:relation- Relationship between the ohnologs in this pair
Relationships:
r1- Ohnologs are 1R-only (only 1R ohnologs have been retained in this gene family)r2- Ohnologs are 2R-only (only 2R ohnologs have been retained in this gene family)both- Ohnologs in this gene family have been retained after both 1R and 2Runk- Ohnologs are either 1R-onlyr1or 2R-onlyr2, but it is unclear whichsyn- Ohnologs were identified using a micro-synteny analysishtf- Ohnologs are part of the ‘hard-to-find’ set
Trees
File: trees.tsv
Columns:
tree:treeId- Internal database keytree:newick- Newick representation of the gene tree
Tree Species
File: tree_species.tsv
Columns:
tree:treeId- Internal database key, refers to a gene tree intrees.tsvspecies:speciesId- Latin name of the species in snake_case
Tree Genes
File: tree_genes.tsv
Columns:
tree:treeId- Internal database key, refers to a gene tree intrees.tsvgene:proteinId- Unique protein identifier
Synteny Blocks
File: synteny_blocks.tsv
Columns:
block:blockId- Internal database key
Synteny Tracks
File: synteny_tracks.tsv
Columns:
block:blockId- Internal database key, refers to a synteny block insynteny_blocks.tsvspecies:speciesId- Latin name of the species in snake_casescaffold:scaffoldId- Chromosome or scaffold name as per the annotation usedtrack:start- Synteny track start coordinate on chromosome or scaffoldtrack:end- Synteny track start coordinate on chromosome or scaffold
Synteny Groups
File: synteny_groups.tsv
Columns:
block:blockId- Internal database key, refers to a synteny block insynteny_blocks.tsvgroup:groupId- Internal database key
Synteny Genes
File: synteny_genes.tsv
Columns:
block:blockId- Internal database key, refers to a synteny block insynteny_blocks.tsvspecies:speciesId- Latin name of the species in snake_casescaffold:scaffoldId- Chromosome or scaffold name as per the annotation usedgroup:groupId- Internal database key, refers to a group of homologs insynteny_groups.tsvgene:proteinId- Unique protein identifier
Database subset
When a subset of genes are selected and downloaded, the file selection.tsv is downloaded in a zip archive. In this file, each row contains a gene in the current selection, its metadata, and all the other ohnologs from the gene family that it belongs to.
Columns:
gene:geneId- Unique gene identifier of the selected genegene:proteinId- Unique protein identifier of the selected genespecies:speciesId- Latin name of the species of the selected gene in snake_casespecies:name- Species name of the selected geneacipenser_ruthenus- All sturgeon ohnologs in this familyamia_calva- All bowfin ohnologs in this familyanolis_carolinensis- All green anole ohnologs in this familycallorhinchus_milii- All elephant shark ohnologs in this familycanis_lupus_familiaris- All dog ohnologs in this familydanio_rerio- All zebrafish ohnologs in this familygallus_gallus- All chicken ohnologs in this familygasterosteus_aculeatus- All stickleback ohnologs in this familyhomo_sapiens- All human ohnologs in this familylatimeria_chalumnae- All coelacanth ohnologs in this familylepisosteus_oculatus- All spotted gar ohnologs in this familyleucoraja_erinacea- All little skate ohnologs in this familymeleagris_gallopavo- All turkey ohnologs in this familymonodelphis_domestica- All opossum ohnologs in this familymus_musculus- All mouse ohnologs in this familyoryzias_latipes- All medaka ohnologs in this familypolypterus_senegalus- All bichir ohnologs in this familyrhincodon_typus- All whale shark ohnologs in this familystegostoma_tigrinum- All zebra shark ohnologs in this familytaeniopygia_guttata- All zebra finch ohnologs in this familytakifugu_rubripes- All pufferfish ohnologs in this family