To choose the database appropiate for your analysis, start by selecting the species and ranking type (i.e. What do you want to analyze: genes or regions?).
Note that the download size is typically over 1GB (100GB for mammal region databases), we recommend downloading the files with zsync_curl (see the Help with downloads).
- sha256sum.txt: To confirm whether the file was succesfuly downloaded
- TF annotation: Annotation to transcripton factors for the motifs or ChIP-seq tracks in each collection (30-100 Mb)
Region: The ranking contains regions (i.e. for analyses of region-sets from ATAC-seq, ChIP-seq, …)
Genes: The ranking contains genes.
Distance: For gene rankings only. Indicates the search space around the TSS of gene in which the motif is scored:
500bpUp: 500bp upstream of TSS
TSS+/-10kb: 10kb around the TSS (total: 20kb)
TSS+/-5kb: 5kb around the TSS (total: 10kb)
5kbUp,FullTx: 5kb upstream TSS and transcript introns
500bpUp100Dw: 500bp upstream of TSS, and 100bp downstream.
Motif or track collection:
- Motifs – Version 8 (
mc8nr): 20003 motifs
- Motifs – Version 9 (
mc9nr): 24453 motifs
- TF ChIP-seq – Version 1 (
dm6: 1503 tracks
hg19: 3040 tracks
hg38: 2993 tracks
nOrt: Number of orthologous species used to select the regions based on conservation. In case of doubt of which version to use: 7 species is normally appropiate for most analyses.
Genome: Genome version used to construct the ranking. For region-based analyses it is important that this version matches your data! Gene annotation version is shown in parenthesis.
Database name: Database name (add the extensions to obtain specific file names, e.g.
Download URL: Link to the database (
.feather file, and its size).
List of databases: