Example Dataset

Drosophila melanogaster

In this example, we run the entire RetroDetector pipeline on a Drosophila melanogaster sample with Illumina paired-end and PacBio HiFi sequencing data. First make sure you have installed RetroDetector and all its depencies according to the Installation and Dependencies section. Then, make a working directory and change into that directory. Please make sure to run all of the following code while you are in your working directory. The first step is to download the necessary reference files from NCBI:

mkdir Downloads
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT/GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.fna.gz -O Downloads/ncbi_dmel.genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT/GCF_000001215.4_Release_6_plus_ISO1_MT_genomic.gtf.gz -O Downloads/ncbi_dmel.genomic.gtf.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/215/GCF_000001215.4_Release_6_plus_ISO1_MT/GCF_000001215.4_Release_6_plus_ISO1_MT_rna_from_genomic.fna.gz -O Downloads/ncbi_dmel.rna_from_genomic.fna.gz

The next step is to download and gzip the sequencing data. Make sure you have sra-tools installed. If you need more assistance please follow the instructions for downloading, installing, and using fasterq-dump.

prefetch SRR10238607 --max-size 21G
fasterq-dump SRR10238607 --outdir Downloads
gzip Downloads/SRR10238607.fastq
prefetch SRR10728584
fasterq-dump SRR10728584 --outdir Downloads
gzip Downloads/SRR10728584_1.fastq
gzip Downloads/SRR10728584_2.fastq

The next step is to set up the configuration file config.yaml which tells RetroDetector where to find the necessary input files. If RetroDetector is installed at /user/software/RetroDetector, you would copy config.yaml to your working directory with the following command:

cp /user/software/RetroDetector/example/config.yaml .

Now, you need to edit config.yaml to add the scripts directory location from your RetroDetector installation. For example, if RetroDetector is installed at /user/software/RetroDetector, you would edit the scripts directory line of config.yaml as follows:

scripts_directory: /user/software/RetroDetector/scripts

Next, copy Snakefile from your RetroDetector installation to your working directory. For example, if RetroDetector is installed at /user/software/RetroDetector and you are in your working directory, you would run:

cp /user/software/RetroDetector/Snakefile .

Now you are ready to run RetroDetector. Activate your snakemake environment, and perform a dry run.

conda activate snakemake
snakemake -n

If everything looks good, specify the number of available cores and run RetroDetector. For example, if your machine has 32 available cores, you would run:

snakemake --cores 32

Once the pipeline finishes running, the following files should be in the RESULTS folder:

RESULTS
├── genome.fa
├── genome.fa.amb
├── genome.fa.ann
├── genome.fa.bwt
├── genome.fa.fai
├── genome.fa.pac
├── genome.fa.sa
└── samples
    ├── A.fastq
    ├── B.fastq
    └── C.fastq