The Complete Chloroplast Genome of Coptis teeta (Ranunculaceae), An Endangered Plant Species Endemic to the Eastern Himalaya

Coptis teeta is an endemic and endangered medicinal plant from the Eastern Himalaya. It has been categorized by the International Union for Conservation of Nature (IUCN) as Endangered (EN). The whole chloroplast genome of C. teeta was sequenced based on nextgeneration sequencing (NGS) in present study. The circular chloroplast genome exhibits typical quadripartite regions with 154,280 bp in size, including two inverted repeat (IR, 24,583 bp) regions, one large singe copy region (LSC) and one small singe copy region (SSC) of 87,519 bp and 17,595 bp, respectively. The genome contains 125 genes, including 81 protein-coding genes (PCGs), 36 tRNA genes and 8 rRNA genes. Total GC content of C. teeta is 38.3%, while those of IR regions (43.3%) are higher than LSC (36.7%) and SSC (32.2%) regions. Forty-two forward and twenty-three reverted repeats were detected in cp genome of C. teeta. The genome was rich in SSRs and totally 62 SSRs were visualized. The phylogenetic tree showed that species from the Ranunculaceae formed a monophyletic clade and the intra-family topology was consistent with previous studies. The results strongly supported C. teeta and its congeneric species, C. chinensis, as sister group with 100% bootstrap value.


Introduction
Coptis teeta Wallich, a perennial herb of Ranunculaceae, was endemic to Eastern Himalaya with narrow distribution range. It is a shade-tolerant species, mainly distributed in the moist temperate, evergreen, broad-leaved forests in northwest Yunnan, China, and northeast India and it occupied highly specialized niches in temperate oak -rhododendron forests and restricted to elevations between 2350 and 3100 m [1,2]. The rhizome of this species, known as Yunnan goldthread (Yunlian in Chinese), is important Chinese herbology since the period of Sheng-Nong (3000 B.C.) [1]. It has excellent pharmacological activity and was used to treat various diseases such as diarrhea, disorder of glucose metabolism, hypertension, cardiovascular and cerebral vessel diseases [3]. The previous study revealed that the species have highly specific microsite requirements that cannot be met in other habitats. Owing to the over-exploitation, several anthropogenic factors, and environmental disruption, the wild population of C. teeta decreased rapidly in recent years [4]. C. teeta has been listed in IUCN Red List of Threatened Species (http://www.iucnredlist.org/) as endangered species with status "A2cd". And it is also included in Category II of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) [5]. Therefore, it is necessary to protect this endangered plant for its highly economic and ecological values, and for the conservation of biodiversity.
To date, few studies of this species have been performed due to lack of genomic data of C. teeta. The previous studies mainly focus on phylogenetic analysis and biogeographic pattern of Coptis by using two plastid and one nuclear markers including psbA-trnH, trnL-trnF and ITS, and six markers, including five plastid and one nuclear markers, respectively [6,7]. In this study, as a part of the genome sequencing project of C. teeta, we assemble and annotate its complete plastid genome and describing its characteristics.

Plant material and DNA extraction
Fresh leaves of C. teeta were collected from Gongshan County (27°73′E, 98°66′N), Yunnan province and voucher specimens were deposited in Yunnan University of Traditional Chinese Medicine. Total genomic DNA was extracted using the modified plant genome kit (Bioteke, Beijing, China). DNA quality was detected by electrophoresis on 1% agarose gel ( Figure 1) and 1 μL of DNA sample to test concentration using to the NanoDrop spectrophotometers (ThermoFisher Scientific, Wilmington, Delaware, USA), the result showed that its value is 62.6 ng/μL>50 ng/μL.

Repeats and simple sequence repeats (SSRs) analysis
REPuter [12] was used to find forward and reversed tandem repeats≥15 bp with minimum alignment score and maximum period size at 100 and 500, respectively. IMEx [13] was used to visualize the SSRs with the minimum repeat numbers set to 10, 5, 4, 3, 3 and 3 for mono-, di-, tri-, tetra-, penta-and hexanucleotides, respectively.

Phylogenetic analysis
The phylogenetic analysis was conducted based on 31 published chloroplast genomes to infer phylogenetic position of C. teeta within the family of Ranunculaceae. The cp genome of Nandina domestica (GenBank: DQ923117) was included as outgroup. The LSC, SSC and one IR region of the total 32 chloroplast genomes were aligned using MAFFT 7.308 [14]. The maximum likelihood (ML) tree was reconstructed by RAxML 8.2.11 [15] with the nucleotide substitution model of GTR+G and node support was estimated by means of bootstrap analysis with 1000 replicates.

Characteristics of chloroplast genome of C. teeta
The complete chloroplast genome of C. teeta is a circular DNA with 154,280 bp in length, comprising four subunits: one large singe copy (LSC) (87,519 bp), one small singe copy (SSC) (17,595 bp) and two inverted repeat regions (IRs) (24,583 bp for each) (Figure 2). The overall GC content was 38.3 %. The IR regions had a higher GC content (43.3%) than LSC (36.7%) and SSC regions (32.2%). That was caused by the high GC content of the four ribosomal RNA (rRNA) genes (55.5%) presented in the IR regions, similar to that of C. chinensis Franchet [16].

Figure 2
Plastome map of Coptis teeta. The darker gray in the inner circle corresponds to GC content, while the lighter gray corresponds to AT content.

Repeat and SSR analysis
For repeat structure analysis, 42 forward and 23 reverted repeats with minimal repeat size of 15 bp were detected in cp genome of C. teeta ( Table 1). Most of these repeats were between 15 and 20 bp. The longest forward repeats were of 39 bp, one sequence of which located in the intergenic region between trnV-GAC and rps7 of inverted repeated regions (IR), the other sequence located in ycf3 of LSC. There are 31 repeats with two sequences started in the same region. Among them, 21 repeats located in the LSC region, 7 located in the IR regions, and 3 located in SSC region. Other 34 repeats with two sequences started in separated regions. cpSSRs markers are widely used to study the population genetics and evolutionary processes of wild plants [17,18]. There were totally 62 SSRs in cp genome of C. teeta, most of which were in LSC ( Table 2). Among them, 31 (50.0%) were mononucleotide SSRs, fifteen (24.2%) were dinucleotide SSRs, six (9.7%) were tri-nucleotide SSRs, eight (12.9%) were tetranucleotide SSRs, one (0.2%) was penta-nucleotide SSR, and one (0.2%) was hexa-nucleotide SSRs. Only twelve SSRs were located in genes and the others were in intergenic regions. 30 (96.8%) of the mononucleotide SSRs belonged to the A/T type, which were consistent with the hypothesis that cpSSRs were generally composed of short polyadenine (poly A) or polythymine (poly T) repeats and rarely contained tandem guanine (G) or cytosine (C) repeats. These cpSSR markers can be used in the conservation genetics of C. teeta.