How to trim adapters with cutadapt

In this tutorial, we will cover how to trim adapters such as the Nextera Transposase Sequence and the Illumina Universal Adapter with cutadapt.

What are adapter sequences?

After library preparation, adapter sequences are present on both 5′ and 3′ ends of the DNA fragments and are not supposed to be sequenced. But, if the insert size is smaller than the read length, adapter sequences can be found in the reads at the 3′ end.

Adapter sequences used by FastQC

By default, FastQC searches for several adapter sequences in each library, including those two:

Adapter nameAdapter sequenceExample of library preparation kit
Nextera Transposase SequenceCTGTCTCTTATANextera XT
Illumina Universal AdapterAGATCGGAAGAGTruSeq

Adapter sequences provided by Illumina

However, the adapter sequences recommended by Illumina for Nextera XT and TruSeq are longer than the ones used by FastQC:

Library preparation kitReadAdapter sequence
Nextera XTR1 & R2CTGTCTCTTATACACATCT
TruSeqR1AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
TruSeqR2AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Adapter sequences trimmed by cutadapt

Tool used

In this post, we will use the following version of cutadapt:

cutadapt --version
2.3

Scripts used

Script to trim Nextera Transposase Sequence (length of 19 nucleotides):

adapter="CTGTCTCTTATACACATCT"

cutadapt -a ${adapter} -A ${adapter} --overlap 7 \
	--output ${output}_R1.fastq.gz \
	--paired-output ${output}_R2.fastq.gz \
	${input}_R1.fastq.gz ${input}_R2.fastq.gz

Script to trim Illumina Universal Adapter (length of 33 nucleotides):

R1_adapter="AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"
R2_adapter="AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT"

cutadapt -a ${R1_adapter} -A ${R2_adapter} --overlap 7 \
	--output ${output}_R1.fastq.gz \
	--paired-output ${output}_R2.fastq.gz \
	${input}_R1.fastq.gz ${input}_R2.fastq.gz

The difficult part in this procedure is to chose a number for the –overlap parameter, which defines the minimum overlap between a read and an adapter.

Results

As a result, here is the number of sequences trimmed by cutadapt:

Total read pairs processed:          5,441,659
  Read 1 with adapter:                   1,459 (0.0%)
  Read 2 with adapter:                   1,736 (0.0%)
Pairs written (passing filters):     5,441,659 (100.0%)

Sequence: CTGTCTCTTATACACATCT; Type: regular 3'; Length: 19; Trimmed: 1459 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1

Bases preceding removed adapters:
  A: 13.4%
  C: 35.6%
  G: 23.0%
  T: 28.0%
  none/other: 0.0%

Overview of removed sequences
length	count	expect	max.err	error counts
7	522	332.1	0	522
8	186	83.0	0	186
9	111	20.8	0	6 105
10	175	5.2	1	15 160
11	71	1.3	1	0 71
12	84	0.3	1	0 84
13	8	0.1	1	0 8
14	12	0.0	1	0 12
15	3	0.0	1	1 2
16	30	0.0	1	1 29
17	19	0.0	1	0 19
18	24	0.0	1	0 24
19	3	0.0	1	0 3
21	5	0.0	1	0 5
23	4	0.0	1	0 4
24	3	0.0	1	0 3
25	4	0.0	1	0 4
26	2	0.0	1	0 2
28	1	0.0	1	0 1
29	7	0.0	1	0 7
31	7	0.0	1	0 7
32	3	0.0	1	0 3
33	4	0.0	1	0 4
34	1	0.0	1	0 1
35	1	0.0	1	0 1
36	7	0.0	1	0 7
37	2	0.0	1	0 2
38	2	0.0	1	0 2
39	5	0.0	1	0 5
40	2	0.0	1	0 2
41	3	0.0	1	0 3
43	6	0.0	1	0 6
45	9	0.0	1	0 9
46	1	0.0	1	0 1
47	3	0.0	1	0 3
48	3	0.0	1	0 3
49	2	0.0	1	0 2
50	6	0.0	1	0 6
51	3	0.0	1	0 3
52	3	0.0	1	0 3
53	1	0.0	1	0 1
54	2	0.0	1	0 2
55	5	0.0	1	0 5
57	2	0.0	1	0 2
58	4	0.0	1	0 4
59	2	0.0	1	0 2
60	3	0.0	1	0 3
61	3	0.0	1	0 3
63	4	0.0	1	0 4
65	1	0.0	1	0 1
67	2	0.0	1	0 2
68	3	0.0	1	0 3
69	1	0.0	1	0 1
70	3	0.0	1	0 3
72	6	0.0	1	0 6
73	2	0.0	1	0 2
74	2	0.0	1	0 2
75	7	0.0	1	0 7
76	2	0.0	1	0 2
77	3	0.0	1	0 3
78	1	0.0	1	0 1
79	2	0.0	1	0 2
80	1	0.0	1	0 1
81	3	0.0	1	0 3
82	1	0.0	1	0 1
83	2	0.0	1	0 2
84	4	0.0	1	0 4
85	1	0.0	1	0 1
86	2	0.0	1	0 2
87	3	0.0	1	0 3
88	4	0.0	1	0 4
89	1	0.0	1	0 1
90	1	0.0	1	0 1
92	4	0.0	1	0 4
93	3	0.0	1	0 3
94	1	0.0	1	0 1
97	1	0.0	1	0 1
99	3	0.0	1	0 3
100	2	0.0	1	0 2
101	2	0.0	1	0 2
102	1	0.0	1	0 1
104	1	0.0	1	0 1
105	1	0.0	1	0 1
106	2	0.0	1	0 2
107	1	0.0	1	0 1
108	2	0.0	1	0 2
110	1	0.0	1	0 1
113	2	0.0	1	0 2
116	1	0.0	1	0 1

Conclusion

To sum up, it is really straightforward to trim adapters with cutadapt! Which overlap between read and adapter have you chosen to use?

Related posts

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply