NCBI SRA Toolkit介绍

SRA Toolkit

The Sequence Read Archive (SRA Toolkit) stores raw sequence data from “next-generation” sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.

Availability and Restrictions

The following versions of SRA Toolkit are available on OSC clusters:

Version Owens Pitzer Cardinal Note
2.6.3 X These versions no longer support  downloading SRA data** but still can be used to process local data.
2.9.0 X
2.9.1 X
2.9.6 X* X*
2.10.7 X X
2.11.2 X X
3.0.2 X X X*
* Current default version
** NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

You can use  module spider sratoolkit to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

National Center for Biotechnology Information, Freeware

Usage

Usage on Pitzer and Owens

Download SRA Data

NCBI now uses cloud-style object stores. To access SRA cloud data, use version 2.10 or later and provide your AWS or GCP access credentials (recommended) to vdb-config. For more information, see https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

Set up the credentials (recommended)

Once you have obtained an AWS or GCP credential file, you can set the credentials by following these steps:

module load sratoolkit/2.11.2
vdb-config --report-cloud-identity yes 

# For GCP credentials
vdb-config --set-gcp-credentials /path/to/gcp/creddential/file

# For AWS credentials
vdb-config --set-aws-credentials /path/to/aws/creddential/file
Each version of the toolkit comes with its own set of configuration options. To modify the defaults, run vdb-config -i to access the interactive configuration. For additional information, please visit the following link: https://github.com/ncbi/sra-tools/wiki/03.-Quick-Toolkit-Configuration.

You can now download SRA data using prefetch

prefetch SRR390728

The default download path is located in your home directory at ~/ncbi. For instance, if you’re looking for the SRA file SRR390728.sra, you can find it at ~/ncbi/sra, and the resource files can be found at ~/ncbi/refseq. You can use srapath to verify if the SRA accession is accessible in the download path

$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/sra/sra/SRR390728.sra

You can now run other SRA tools, such as fastq-dump, on computing nodes. Here is an example job script:

#!/bin/bash
#SBATCH --job-name use_fastq_dump
#SBATCH --time=0:10:0
#SBATCH --ntasks-per-node=1

module load sratoolkit/2.11.2
module list
fastq-dump -X 5 -Z SRR390728

Unfortunately, Home Directory file system is not optimized for handling heavy computations. If the SRA file is particularly large, you can change the default download path for SRA data to our scratch file system using one of the following two approaches. The following approaches use the /fs/scratch/PAS1234/johndoe/ncbi directory as an example.

Change the prefetch directory using vdb-config

module load sratoolkit/2.11.2
vdb-config -s /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi
prefetch SRR390728
srapath SRR390728

You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra

Download to the current directory (available for version 2.10 or later)

module load sratoolkit/2.11.2
vdb-config --prefetch-to-cwd
mkdir -p /fs/scratch/PAS1234/johndoe/ncbi
cd /fs/scratch/PAS1234/johndoe/ncbi prefetch SRR390728 srapath SRR390728

You should find the SRR390728 accession at /fs/scratch/PAS1234/johndoe/ncbi/SRR390728/SRR390728.sra

Known Issues

Error when downloading SRA data

NCBI now utilizes cloud-style object stores. To access SRA cloud data, please use version 2.10 or later and provide your AWS or GCP access credentials to vdb-config. For more information, please visit https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials. However, you can continue to use older versions to process SRA local data.

 

Further Reading

如若转载,请注明出处:https://www.ouq.net/3373.html

(0)
打赏 微信打赏,为服务器增加50M流量 微信打赏,为服务器增加50M流量 支付宝打赏,为服务器增加50M流量 支付宝打赏,为服务器增加50M流量
上一篇 01/02/2025 00:06
下一篇 2小时前

相关推荐

  • CS229 机器学习课程复习材料-概率论

    CS229 机器学习课程复习材料-概率论 概率论复习和参考 概率论是对不确定性的研究。通过这门课,我们将依靠概率论中的概念来推导机器学习算法。这篇笔记试图涵盖适用于CS229的概率论基础。概率论的数学理论非常复杂,并且涉及到“分析”的一个分…

    12/23/2024
    44
  • 机器学习:数学基础知识

    数学基础知识 高等数学 1.导数定义: 导数和微分的概念  (1) 或者:  (2) 2.左右导数导数的几何意义和物理意义 函数在处的左、右导数分别定义为: 左导数: 右导数: 3.函数的可导性与连续性之间的关系 Th1: 函数在处可微在处…

    机器学习 12/23/2024
    49
  • Alphafold3安装

    You will need a machine running Linux; AlphaFold 3 does not support other operating systems. Full installation requires …

    机器学习 12/09/2024
    254
  • AlphaFold 3学习笔记-input输入格式(1)

    AlphaFold 3可以模拟由以下一种或多种生物分子类型组成的结构:蛋白质、DNA、RNA 生物学上常见的配体:ATP、ADP、AMP、GTP、GDP、FAD、NADP、NADPH、NDP、血红素、血红素 C、肉豆蔻酸、油酸、棕榈酸、柠檬…

    12/09/2024
    214
  • RSEM: rsem-calculate-expression – Estimate gene and isoform expression from RNA-Seq data

    SYNOPSIS rsem-calculate-expression [options] upstream_read_file(s) reference_name sample_name rsem-calculate-expression …

    R 11/09/2024
    98