Author: Gil Poiares-Oliveira | PI: Margarida Gama-Carvalho
RNA Systems Biology Lab
BioISI - Biosystems and Integrative Sciences Institute, Department of Chemistry
and Biochemistry, Faculty of Sciences, University of Lisbon, Campo Grande,
1749-016 Lisboa, Lisbon, Portugal
DataPrep runs the first prepping steps in the RNA-Seq pipeline to get FASTQ files ready for quality filtering. It's a tool for automating the first steps of the pipeline by calling established bioinformatics programs.
This is what it does:
- Creates a working directory for your new project
- Expands the original TAR.GZ files into your project directory
- Runs FastQC to generate reports on your datasets.
- Uses Cutadapt to cut the adapters for your sequence, and generates another FastQC report with the trimmed sequences.
The result should be a new set of trimmed FASTQ sequences ready for quality filtering.
You can skip over any step at your discretion. That allows you to repeat any step with other parameters should the results not be adequate.
Considering you followed the installation steps outlines in the README.md,
you should just be able to run:
dataprep-h, --help
List all runtime parameters.
-i, --input <path>
Specify the location of the raw FASTQ.TAR.GZ files by replacing <path>.
The script will list the subfolders of ORIGINAL_FILES_DIR, select the folder
which contains the FASTQ TAR.GZ files using the up/down arrow keys and select
with Enter, alternatively, skip this step by manually specifying the
path to the folder using the -i or --input flags.
The decompressed FASTQ files will be saved in a folder with the same same as the
original under WORKING_DIR.
FastQC will be run on the raw data for quality control, reports will be placed
in WORKING_DIR/PROJECT_NAME/faw_data_fastq/fastqc. You can pull the files in
html from the server and open then on a web browser.