User Tools

Site Tools


**Tuto to use the pipeline for basic analysis in RNAseq**

A) Tutorial

Let see it with a tutorial with real data.

Go to cd /home/Share/ftessier/jeuTest/ You will find a “config.txt” file (already done), and a directory “data”, with 12 fastq.gz files (Paired-end, so one file for R1 and 1 for R2 for each sample)

1) Run the pipeline

Run the command below:

python /home/Share/ftessier/PipelineNGS/PipelineNGS.py -i data/ -o resultsPipeline -s all -c config.txt

It will start the pipeline, with STAR first, which will map the reads from samples on the genome.

and then featurecounts

Then, it will create a matrix of counts, and run deseq2, and test the differential expression we set on the config.txt file. Here, “control vs asxl2”.

Because it may take a long time, you could stop the command, and check directly the results

2) Check the results

a) Results from the mapping

After the end of the pipeline, you can check the result in the output directory.

You have directory for each of the sample, which contain result files from STAR. You can also find a resume of this in “overallstats.txt” file.

and a table of counts for each sample:

you can open it with libreoffice, run this command: libreoffice –calc allcounts-gene.txt &

b) Results from the différential expression

Now, to see the results from the differential expression analysis: cd DiffExpress

You will find a table with the normalized counts (Normalized_Counts.xls), you can open it with libreoffice too.

and a directory for each test you made. Here, we tested the control againts the asxl2, so we have the directory “control_vs_asxl2”

You will find several files:

first, you can check the volcano-plot and heatmap

okular control_vs_asxl2_FoldchangeRplots.pdf

ALL_control_vs_asxl2.xls contains all the D.E. genes, with also GO term, entrez id, etc

open it with libreoffice:

You can also visualize the pathways were the most D.E are involve. The file keggres.txt summarize all the pathways, and the number of genes you are involve in.

open the pathway were the most gene are upregulate: display mmu00190.control_vs_asxl2_upregulate_n1.png &

B) Run your data

First, you need to connect to the IRCAN server: http://bioinfomed.fr/doku.php?id=tutos:ircan_server

Be sure you have all your data you want to analyze put in a same directory, in fastq format (It can be compress format).

If you had used the genomic plateform to make tour run and made the first step to clean and control the quality of your data, you will find them in the repertory:

/home/NEXTSEQ/clean_data/

You need to provide a config file to run the pipeline.

For that, copy the config file from the pipeline's directory:

cp ~/Documents/RNA-SEQ/PipelineNGS/config.txt .

Open the file, with a text editor (like gedit)

gedit config.txt &

specify if you have paired-end or single-end reads, and which genome you want to select.

If you want to do differential expression, you will need to make changes on the second part of the config.txt file Put all the name of your sample and their condition in the part “Sample description” and the tests you want to do in the second part “tests”. Use the tabulation to separate the different parts.

To start the pipeline write the command below in your terminal if you want to do analysis on genome.

python PipelineNGS.py -i /home/NEXTSEQ/clean_data/Directory_Test -o Output_Directory -s all -c config.txt &

If you prefer run your analysis on transcriptome, use this command:

python PipelineNGS.py -i /home/NEXTSEQ/clean_data/Directory_Test -o Output_Directory -s allSalmon -c config.txt &

tutos/ircan_rnaseq/tuto_pipeline_rnaseq_1.txt · Last modified: 2017/09/18 15:36 by ftessier