Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 2.22 KB

File metadata and controls

68 lines (46 loc) · 2.22 KB

DataPrep

Author: Gil Poiares-Oliveira | PI: Margarida Gama-Carvalho

RNA Systems Biology Lab
BioISI - Biosystems and Integrative Sciences Institute, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Lisbon, Campo Grande, 1749-016 Lisboa, Lisbon, Portugal

Introduction

DataPrep runs the first prepping steps in the RNA-Seq pipeline to get FASTQ files ready for quality filtering. It's a tool for automating the first steps of the pipeline by calling established bioinformatics programs.

This is what it does:

  1. Creates a working directory for your new project
  2. Expands the original TAR.GZ files into your project directory
  3. Runs FastQC to generate reports on your datasets.
  4. Uses Cutadapt to cut the adapters for your sequence, and generates another FastQC report with the trimmed sequences.

The result should be a new set of trimmed FASTQ sequences ready for quality filtering.

You can skip over any step at your discretion. That allows you to repeat any step with other parameters should the results not be adequate.

Running

Considering you followed the installation steps outlines in the README.md, you should just be able to run:

dataprep

Runtime parameters

-h, --help
List all runtime parameters.

-i, --input <path>
Specify the location of the raw FASTQ.TAR.GZ files by replacing <path>.

Walkthrough

Step 1: Decompression

The script will list the subfolders of ORIGINAL_FILES_DIR, select the folder which contains the FASTQ TAR.GZ files using the up/down arrow keys and select with Enter, alternatively, skip this step by manually specifying the path to the folder using the -i or --input flags.

The decompressed FASTQ files will be saved in a folder with the same same as the original under WORKING_DIR.

Step 2: Quality control on raw data

FastQC will be run on the raw data for quality control, reports will be placed in WORKING_DIR/PROJECT_NAME/faw_data_fastq/fastqc. You can pull the files in html from the server and open then on a web browser.