SCENIC:cisTarget databases数据库下载

List of databases for the cisTarget family of tools (e.g. RcisTarget, SCENIC/pySCENIC, and cisTopic).

To choose the database appropiate for your analysis, start by selecting the species and ranking type (i.e. What do you want to analyze: genes or regions?).

Note that the download size is typically over 1GB (100GB for mammal region databases), we recommend downloading the files with zsync_curl (see the Help with downloads).


Related files:

  • sha256sum.txt: To confirm whether the file was succesfuly downloaded
  • TF annotation: Annotation to transcripton factors for the motifs or ChIP-seq tracks in each collection (30-100 Mb)
    • Human TFs (motif collection v8 / v9); (ChIP-seq v1 hg19 / hg38)
    • Mouse TFs (motif collection v8 / v9)
    • Fly TFs (motif collection v8 / v9); (ChIP-seq v1 dm6)

Column info:

Species:

  • Human (Homo sapiens)
  • Mouse (Mus musculus)
  • Fly (Drosophila melanogaster)

Ranking type:

  • Region: The ranking contains regions (i.e. for analyses of region-sets from ATAC-seq, ChIP-seq, …)
  • Genes: The ranking contains genes.

Distance: For gene rankings only. Indicates the search space around the TSS of gene in which the motif is scored:

  • 500bpUp: 500bp upstream of TSS
  • TSS+/-10kb: 10kb around the TSS (total: 20kb)
  • TSS+/-5kb: 5kb around the TSS (total: 10kb)
  • 5kbUp,FullTx: 5kb upstream TSS and transcript introns
  • 500bpUp100Dw: 500bp upstream of TSS, and 100bp downstream.

Motif or track collection:

  • Motifs – Version 8 (mc8nr): 20003 motifs
  • Motifs – Version 9 (mc9nr): 24453 motifs
  • TF ChIP-seq – Version 1 (tc_v1):
    • dm6: 1503 tracks
    • hg19: 3040 tracks
    • hg38: 2993 tracks

nOrt: Number of orthologous species used to select the regions based on conservation. In case of doubt of which version to use: 7 species is normally appropiate for most analyses.

Genome: Genome version used to construct the ranking. For region-based analyses it is important that this version matches your data! Gene annotation version is shown in parenthesis.

Database name: Database name (add the extensions to obtain specific file names, e.g. .feather or .feather.zsync).

Download URL: Link to the database (.feather file, and its size).


List of databases:

如若转载,请注明出处:https://www.ouq.net/1460.html

(0)
打赏 微信打赏,为服务器增加50M流量 微信打赏,为服务器增加50M流量 支付宝打赏,为服务器增加50M流量 支付宝打赏,为服务器增加50M流量
上一篇 03/21/2022 12:55
下一篇 04/01/2022 14:38

相关推荐

  • NCBI SRA Toolkit介绍

    SRA Toolkit The Sequence Read Archive (SRA Toolkit) stores raw sequence data from “next-generation” sequenci…

    生物信息技术 01/05/2025
    29
  • CS229 机器学习课程复习材料-概率论

    CS229 机器学习课程复习材料-概率论 概率论复习和参考 概率论是对不确定性的研究。通过这门课,我们将依靠概率论中的概念来推导机器学习算法。这篇笔记试图涵盖适用于CS229的概率论基础。概率论的数学理论非常复杂,并且涉及到“分析”的一个分…

    12/23/2024
    44
  • 机器学习:数学基础知识

    数学基础知识 高等数学 1.导数定义: 导数和微分的概念  (1) 或者:  (2) 2.左右导数导数的几何意义和物理意义 函数在处的左、右导数分别定义为: 左导数: 右导数: 3.函数的可导性与连续性之间的关系 Th1: 函数在处可微在处…

    机器学习 12/23/2024
    49
  • Alphafold3安装

    You will need a machine running Linux; AlphaFold 3 does not support other operating systems. Full installation requires …

    机器学习 12/09/2024
    254
  • AlphaFold 3学习笔记-input输入格式(1)

    AlphaFold 3可以模拟由以下一种或多种生物分子类型组成的结构:蛋白质、DNA、RNA 生物学上常见的配体:ATP、ADP、AMP、GTP、GDP、FAD、NADP、NADPH、NDP、血红素、血红素 C、肉豆蔻酸、油酸、棕榈酸、柠檬…

    12/09/2024
    215