NCBI SRA Toolkit介绍

SRA Toolkit

The Sequence Read Archive (SRA Toolkit) stores raw sequence data from “next-generation” sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.

Availability and Restrictions

The following versions of SRA Toolkit are available on OSC clusters:

Version Owens Pitzer Cardinal Note
2.6.3 X These versions no longer support  downloading SRA data** but still can be used to process local data.
2.9.0 X
2.9.1 X
2.9.6 X* X*
2.10.7 X X
2.11.2 X X
3.0.2 X X X*
* Current default version
** NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

You can use  module spider sratoolkit to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

National Center for Biotechnology Information, Freeware

Usage

Usage on Pitzer and Owens

Download SRA Data

NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

Set up the credentials (recommended)

Once you have obtained an AWS or GCP credential file, you can set the credentials by following these steps:

module load sratoolkit/2.11.2
vdb-config --report-cloud-identity yes 

# For GCP credentials
vdb-config --set-gcp-credentials /path/to/gcp/creddential/file

# For AWS credentials
vdb-config --set-aws-credentials /path/to/aws/creddential/file
Each version of the toolkit comes with its own set of configuration options. To modify the defaults, run vdb-config -i to access the interactive configuration. For additional information, please visit the following link: https://github.com/ncbi/sra-tools/wiki/03.-Quick-Toolkit-Configuration.

You can now download SRA data using prefetch

prefetch SRR390728

The default download path is located in your home directory at ~/ncbi. For instance, if you’re looking for the SRA file SRR390728.sra, you can find it at ~/ncbi/sra, and the resource files can be found at ~/ncbi/refseq. You can use srapath to verify if the SRA accession is accessible in the download path

$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/sra/sra/SRR390728.sra

You can now run other SRA tools, such as fastq-dump, on computing nodes. Here is an example job script:

#!/bin/bash
#SBATCH --job-name use_fastq_dump
#SBATCH --time=0:10:0
#SBATCH --ntasks-per-node=1

module load sratoolkit/2.11.2
module list
fastq-dump -X 5 -Z SRR390728

Unfortunately, Home Directory file system is not optimized for handling heavy computations. If the SRA file is particularly large, you can change the default download path for SRA data to our scratch file system using one of the following two approaches. The following approaches use the /fs/scratch/PAS1234/johndoe/ncbi directory as an example.

Change the prefetch directory using vdb-config

module load sratoolkit/2.11.2
vdb-config -s /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi
prefetch SRR390728
srapath SRR390728

You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra

Download to the current directory (available for version 2.10 or later)

module load sratoolkit/2.11.2
vdb-config --prefetch-to-cwd
mkdir -p /fs/scratch/PAS1234/johndoe/ncbi
cd /fs/scratch/PAS1234/johndoe/ncbi prefetch SRR390728 srapath SRR390728

You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/SRR390728/SRR390728.sra

Known Issues

Error when downloading SRA data

NCBI now utilizes cloud-style object stores. To access SRA cloud data, please use version 2.10 or later and provide your AWS or GCP access credentials to vdb-config. For more information, please visit https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials. However, you can continue to use older versions to process SRA local data.

 

Further Reading

本站原创,如若转载,请注明出处:https://www.ouq.net/3373.html

(0)
打赏 微信打赏,为服务器增加50M流量 微信打赏,为服务器增加50M流量 支付宝打赏,为服务器增加50M流量 支付宝打赏,为服务器增加50M流量
上一篇 01/02/2025 00:06
下一篇 01/16/2025 00:25

相关推荐

  • Rstudio/Rstudio Server enable Copilot-Rstudio Server打开Copilot

    Rstudio Server 默认是关闭Copilot Copilot is turned off by default. Copilot is turned off with copilot-enabled=0 in /etc/rstud…

    R 06/05/2025
    105
  • R_Code: KEGG analysis

    library(clusterProfiler) library(org.Hs.eg.db) # 读取输入数据文件 file_path <- “C:/Users/Lamarck/Desktop/UP_genes_ENSEMBL_ENT…

    R 06/02/2025
    129
  • R_Code:GO and Functional GO

    GO analysis: library(AnnotationDbi) library(org.Hs.eg.db) #基因注释包 library(clusterProfiler) #富集包 # 读取CSV文件 file_path <-…

    R 06/02/2025
    129
  • R_Code:WGCNA and WGCNA_Get_Gene_Length

    library(WGCNA) library(DESeq2) # enableWGCNAThreads(nThreads = 10) # 在处理数据框(data.frame)时,不会自动给将String类型转换成factor类型 optio…

    R 06/02/2025
    133
  • R_Code:RNAseq_GSEA_analysis

    library(clusterProfiler) # GSEA 和富集分析主力包 library(org.Hs.eg.db) # 人类注释数据库(ENTREZID 与 SYMBOL 等 ID 转换) library(enrichplot) …

    R 06/02/2025
    132