In this tutorial, we will cover how to trim adapters such as the Nextera Transposase Sequence and the Illumina Universal Adapter with cutadapt.
What are adapter sequences?
After library preparation, adapter sequences are present on both 5′ and 3′ ends of the DNA fragments and are not supposed to be sequenced. But, if the insert size is smaller than the read length, adapter sequences can be found in the reads at the 3′ end.
Adapter sequences used by FastQC
By default, FastQC searches for several adapter sequences in each library, including those two:
Adapter name | Adapter sequence | Example of library preparation kit |
Nextera Transposase Sequence | CTGTCTCTTATA | Nextera XT |
Illumina Universal Adapter | AGATCGGAAGAG | TruSeq |
Adapter sequences provided by Illumina
However, the adapter sequences recommended by Illumina for Nextera XT and TruSeq are longer than the ones used by FastQC:
Library preparation kit | Read | Adapter sequence |
Nextera XT | R1 & R2 | CTGTCTCTTATACACATCT |
TruSeq | R1 | AGATCGGAAGAGCACACGTCTGAACTCCAGTCA |
TruSeq | R2 | AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT |
Adapter sequences trimmed by cutadapt
Tool used
In this post, we will use the following version of cutadapt:
cutadapt --version
2.3
Scripts used
Script to trim Nextera Transposase Sequence (length of 19 nucleotides):
adapter="CTGTCTCTTATACACATCT"
cutadapt -a ${adapter} -A ${adapter} --overlap 7 \
--output ${output}_R1.fastq.gz \
--paired-output ${output}_R2.fastq.gz \
${input}_R1.fastq.gz ${input}_R2.fastq.gz
Script to trim Illumina Universal Adapter (length of 33 nucleotides):
R1_adapter="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
R2_adapter="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"
cutadapt -a ${R1_adapter} -A ${R2_adapter} --overlap 7 \
--output ${output}_R1.fastq.gz \
--paired-output ${output}_R2.fastq.gz \
${input}_R1.fastq.gz ${input}_R2.fastq.gz
The difficult part in this procedure is to chose a number for the –overlap parameter, which defines the minimum overlap between a read and an adapter.
Results
As a result, here is the number of sequences trimmed by cutadapt:
Total read pairs processed: 5,441,659
Read 1 with adapter: 1,459 (0.0%)
Read 2 with adapter: 1,736 (0.0%)
Pairs written (passing filters): 5,441,659 (100.0%)
Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 1459 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1
Bases preceding removed adapters:
A: 13.4%
C: 35.6%
G: 23.0%
T: 28.0%
none/other: 0.0%
Overview of removed sequences
length count expect max.err error counts
7 522 332.1 0 522
8 186 83.0 0 186
9 111 20.8 0 6 105
10 175 5.2 1 15 160
11 71 1.3 1 0 71
12 84 0.3 1 0 84
13 8 0.1 1 0 8
14 12 0.0 1 0 12
15 3 0.0 1 1 2
16 30 0.0 1 1 29
17 19 0.0 1 0 19
18 24 0.0 1 0 24
19 3 0.0 1 0 3
21 5 0.0 1 0 5
23 4 0.0 1 0 4
24 3 0.0 1 0 3
25 4 0.0 1 0 4
26 2 0.0 1 0 2
28 1 0.0 1 0 1
29 7 0.0 1 0 7
31 7 0.0 1 0 7
32 3 0.0 1 0 3
33 4 0.0 1 0 4
34 1 0.0 1 0 1
35 1 0.0 1 0 1
36 7 0.0 1 0 7
37 2 0.0 1 0 2
38 2 0.0 1 0 2
39 5 0.0 1 0 5
40 2 0.0 1 0 2
41 3 0.0 1 0 3
43 6 0.0 1 0 6
45 9 0.0 1 0 9
46 1 0.0 1 0 1
47 3 0.0 1 0 3
48 3 0.0 1 0 3
49 2 0.0 1 0 2
50 6 0.0 1 0 6
51 3 0.0 1 0 3
52 3 0.0 1 0 3
53 1 0.0 1 0 1
54 2 0.0 1 0 2
55 5 0.0 1 0 5
57 2 0.0 1 0 2
58 4 0.0 1 0 4
59 2 0.0 1 0 2
60 3 0.0 1 0 3
61 3 0.0 1 0 3
63 4 0.0 1 0 4
65 1 0.0 1 0 1
67 2 0.0 1 0 2
68 3 0.0 1 0 3
69 1 0.0 1 0 1
70 3 0.0 1 0 3
72 6 0.0 1 0 6
73 2 0.0 1 0 2
74 2 0.0 1 0 2
75 7 0.0 1 0 7
76 2 0.0 1 0 2
77 3 0.0 1 0 3
78 1 0.0 1 0 1
79 2 0.0 1 0 2
80 1 0.0 1 0 1
81 3 0.0 1 0 3
82 1 0.0 1 0 1
83 2 0.0 1 0 2
84 4 0.0 1 0 4
85 1 0.0 1 0 1
86 2 0.0 1 0 2
87 3 0.0 1 0 3
88 4 0.0 1 0 4
89 1 0.0 1 0 1
90 1 0.0 1 0 1
92 4 0.0 1 0 4
93 3 0.0 1 0 3
94 1 0.0 1 0 1
97 1 0.0 1 0 1
99 3 0.0 1 0 3
100 2 0.0 1 0 2
101 2 0.0 1 0 2
102 1 0.0 1 0 1
104 1 0.0 1 0 1
105 1 0.0 1 0 1
106 2 0.0 1 0 2
107 1 0.0 1 0 1
108 2 0.0 1 0 2
110 1 0.0 1 0 1
113 2 0.0 1 0 2
116 1 0.0 1 0 1
Conclusion
To sum up, it is really straightforward to trim adapters with cutadapt! Which overlap between read and adapter have you chosen to use?