JOURNAL OF CLINICAL MICROBIOLOGY, Oct. 2010, p. 3614–3623 Vol. 48, No. 10 0095-1137/10/$12.00 doi:10.1128/JCM.00157-10 Copyright ? 2010, American Society for Microbiology. All Rights Reserved.
Genomic Signatures of the Haarlem Lineage of Mycobacterium tuberculosis:
Implications of Strain Genetic Variation in Drug and
11223,4Andre?s Cubillos-Ruiz,† Andrea Sandoval,† Viviana Ritacco,Beatriz Lo?pez,Jaime Robledo, 3,4111,4Nidia Correa,Iva?n Hernandez-Neuta,Maria Mercedes Zambrano,and Patricia Del Portillo*
1Corporacio?n Corpogen, Carrera 5 no. 66A-34, Bogota?, Colombia; Instituto Nacional de Enfermedades Infecciosas ANLIS “Carlos G. Malbra?n,” 2Velez Sars？eld 563, Buenos Aires, Argentina; Corporacio?n para Investigaciones Biolo?gicas, CIB, Universidad Ponti？cia Bolivariana, 34Carrera 72a no. 78B-141, Medellín, Colombia; and Centro Colombiano de Investigacio?n en Tuberculosis, Medellı?n, Colombia
Received 25 January 2010/Returned for modi，cation 5 April 2010/Accepted 7 July 2010
Tuberculosis is the world’s leading cause of death due to a single infectious agent, and efforts aimed at its control require a better understanding of host, environmental, and bacterial factors that govern disease outcome. Growing evidence indicates that certain Mycobacterium tuberculosis strains of distinct phylogeo- graphic lineages elicit unique immunopathological events. However, identifying the genetic basis of these phenotypic peculiarities has proven dif；cult. Here we report the presence of six large sequence polymorphisms which, together with two single-nucleotide changes previously described by our group, consistently differentiate Haarlem strains from the remaining M. tuberculosis lineages. The six newly found Haarlem-speci；c genetic events are four deletions, which altogether involve more than 13 kb, and two intragenic insertions of the element IS6110. The absence of the genes involved in these polymorphisms could have an important physio- logical impact on Haarlem strains, i.e., by affecting key genes, such as Rv1354c and cyp121, which have been recently proposed as plausible drug targets. These lineage-speci；c polymorphisms can serve as genetic mark- ers for the rapid PCR identi；cation of Haarlem strains, providing a useful tool for strain surveillance and molecular epidemiology studies. Strain variability such as that described here underscores the need for the de；nition of a core set of essential genes in M. tuberculosis that are ubiquitously present in all circulating lineages, as a requirement in the development of effective antituberculosis drugs and vaccines.
Mycobacterium tuberculosis is the causative agent of tuber- the principal sources of phenotypic variation in M. tuberculosis, culosis, the leading cause of death by a single bacterial agent in the speci，c genomic changes that de，ne each lineage have not the world (36). Infection with M. tuberculosis has historically yet been fully de，ned. shown to result in a variety of clinical outcomes that are usually There are currently six phylogeographic lineages that make associated with host inherited susceptibility and environmental up the M. tuberculosis global population (10). One is the Euro- risk factors (2, 31, 32). Moreover, increasing evidence suggests American group, which includes all the spoligotype families that genetic variation in the tubercle bacilli also plays an im- predominating in the Western world, such as Haarlem, LAM, portant role in the outcome of the disease (4, 19, 33). Due to and the ill-de，ned T group (3). In particular, the Haarlem the absence of exchange of genetic material with a global genotype is ubiquitous worldwide (15) and represents about microbial gene pool, M. tuberculosis had long been considered 25% of the isolates in Europe, Central America, and the Ca- to have a clonal population structure. However, a signi，cant ribbean, suggesting a link with the post-Columbus European strain-to strain genetic variation within M. tuberculosis has re- colonization (8). Haarlem strains are actively transmitted in cently been unveiled (11, 19). urban settings in Colombia, causing major public health prob- Changes in neutral regions of the chromosome, such as the lems (N. E. Correa, E. Zapata, V. Go?mez, G. E. Mejia, A. direct repeat (DR) locus, and in the mycobacterial inter- Restrepo, J. Robledo, and CCITB, presented at the 107th spersed repetitive units (MIRUs) are useful in epidemiological General Meeting of the American Society for Microbiology, and phylogenetic analyses and in describing the most conspic- Toronto, Canada, 2007) and have also been responsible for a uous M. tuberculosis lineages (3, 21). In addition to the varia- prolonged outbreak of multidrug-resistant tuberculosis in Ar- tion in neutral regions, genetic polymorphisms involving gentina (26, 29). coding regions have been described to occur through single- An intriguing question is whether M. tuberculosis strains nucleotide changes and through deletion and insertion events, differ in terms of pathogenic characteristics as a consequence the latter mediated mainly by the IS6110 element (23, 30). of long-standing interactions of particular lineages with speci，c Although these genomic alterations are thought to be among human populations. Animal models that take advantage of an identical genetic background, and therefore a uniform host immune response, have given insight regarding the contribu- * Corresponding author. Mailing address: Corporacio?n Corpogen, tion of strain genetic diversity to the outcome of the infectious Carrera 5 no. 66a-34, Bogota?, D.C., Colombia. Phone: 57-1-8050106. process (7, 20). It is currently accepted that genetically differ- Fax: 57-1-3484607. E-mail: firstname.lastname@example.org. ent M. tuberculosis strains produce markedly different immu- † These authors contributed equally to this work. nopathological events in isogenic mice (4, 18). Thus, under- Published ahead of print on 14 July 2010.
VOL. 48, 2010 GENOMIC SIGNATURES OF M. TUBERCULOSIS HAARLEM STRAINS 3615
of insertions, PCRs were carried out for 35 cycles consisting in 45 s of denatur- standing genotypic differences and mechanisms underlying ation at 94?C, 45 s of annealing at 66?C for HSI1 and 71?C for HSI2, and 120 s infection variability and identifying speci，c changes or genes of extension at 72?C. PCR products were veri，ed by 1.5% agarose gel electro- associated with both virulence and immunopathogenicity of phoresis for the presence of a single ampli，cation band. Five randomly chosen the different M. tuberculosis lineages have important implica- products for each region were sequenced using the BigDye terminator cycling tions for the future effective control of tuberculosis (7, 33). conditions (Macrogen, South Korea) in order to con，rm that the target region was ampli，ed. For the detection of single-nucleotide polymorphisms (SNPs) in In a recent bioinformatic study using multiple genome align- the ogt and ung genes, the primers and conditions reported previously for allelic ments of six fully sequenced M. tuberculosis strains belonging discriminatory PCR were used (25). to different lineages, we showed a trend toward accumulation Statistical analysis. The Fisher exact test was applied to determine signi，cant of a limited number of genome-speci，c polymorphisms pref- associations between polymorphisms and M. tuberculosis lineages. erentially associated with circulating strains and underrepre- sented in laboratory strains. This suggests that such polymor- RESULTS phisms arise as active mechanisms of adaptation to the human
Speci；c polymorphisms in strains of the Haarlem lineage of host (5). We speculated that some of these genome-speci，c
polymorphisms might be common to strains of a particular M. tuberculosis. Of 12 deletions and 6 insertions identi，ed in a
lineage rather than being an exclusive property of the isolate previous bioinformatic study as unique to the sequenced Haar- examined. To test this, in the present study we examined lem strain (5) (www.broadinstitute.org/), we selected the most whether genome-speci，c polymorphisms previously identi，ed conspicuous to investigate if they were lineage-wide mutations. in fully sequenced strains were present in a broader group of Speci，cally, the IS6110 insertions and the largest deletion strains and could thus represent a lineage-wide condition. In polymorphisms were chosen for a preliminary analysis using particular, we explored whether polymorphisms identi，ed as PCR with four Haarlem strains. Polymorphisms spanning re-
petitive regions, such as Pro-Pro-Glu (PPE) family genes, were speci，c to the sequenced M. tuberculosis Haarlem strain (5)
excluded from the present analysis in order to avoid possible were prevalent in additional members of the Haarlem lineage
and absent from other lineages. In the present paper, we report misinterpretation. Likewise, polymorphisms of 200 bp were the presence of eight genomic signatures highly exclusive to the excluded because they cannot be unequivocally differentiated M. tuberculosis Haarlem lineage that can prove important for from intrinsic errors occurred during sequencing and ，nishing
of the Haarlem strain genome. Results from this preliminary the rapid identi，cation of these strains and also contribute to
our understanding of the genetic variations underlying pheno- analysis indicated that only six polymorphisms were in fact typic differences among the various lineages of the tubercle present in the four analyzed Haarlem strains (Table 2). The bacilli. occurrence of these mutations was therefore further inspected
using a larger panel of epidemiologically unrelated isolates
from Argentina and Colombia. For this analysis we used these MATERIALS AND METHODS
six large-sequence polymorphisms and two additional SNPs M. tuberculosis isolates. A set of 40 M. tuberculosis clinical isolates belonging to located in the ogt and ung DNA repair genes previously re- the Haarlem lineage and 62 non-Haarlem isolates, including LAM, S, T, X, EIA, and Beijing, were selected from the collection of the Instituto Nacional de ported by our group as speci，c to the Haarlem lineage (25). Enfermedades Infeccciosas ANLIS “Carlos G. Malbra?n” in Buenos Aires, Ar- The analysis of the 102 strains indicated that the presence of gentina, and from the collection of the Centro Colombiano de Investigacio?n en all eight studied polymorphisms correlated highly with the Tuberculosis (CCITB) held at the Corporacio?n para Investigaciones Biolo?gicas Haarlem lineage (Table 3). For this reason, the regions dis- (CIB) in Medellín, Colombia. Isolates were selected based on different IS6110 restriction fragment length polymorphism (RFLP) patterns to ensure that they playing deletions were designated Haarlem-speci，c deletions represented the most conspicuous patterns of strains circulating in both settings (HSD1 to HSD4), and the two IS6110 element insertions were between 1997 and 2005. Laboratory strain H37Rv was also included in the named Haarlem-speci，c insertions (HSI1 and HSI2). Like- non-Haarlem group (see Fig. 2). DNA was obtained from culture lysates as wise, SNPs present in genes ogt and ung were named Haarlem- described previously (25). speci，c SNPs (HSSNP1 and HSSNP2, respectively). When an- IS6110 RFLP typing and phylogenetic analysis. IS6110 RFLP and spoligotype patterns (14, 35) were available at genotype databases in Buenos Aires and alyzed individually, each of these genetic events showed a Medellin laboratories. Computer-assisted analysis of IS6110 RFLP patterns was highly signi，cant association with the Haarlem lineage: HSD1 performed with the software BioNumerics 5.1 (Applied Maths, Sint-Martens- was found in 37/40 Haarlem versus 1/62 non-Haarlem strains Latem, Belgium) as described previously (12). Similarity between patterns was (P 0.00001), HSD2 and HSD3 were found in 38/40 Haarlem calculated using the Dice coef，cient with 1% band position tolerance and 1% optimization. Cluster analysis was performed using the unweighted pair group versus 1/62 non-Haarlem isolates (P 0.00001), and HSD4 method with arithmetic averages (UPGMA). Phylogenetic lineages and spoligo- was found in 38/40 Haarlem versus 2/62 non-Haarlem isolates international shared types (SITs) were assigned according to SpolDB4, available (P 0.00001). The two insertions of the IS6110 element also at www.pasteur-guadeloupe.fr/tb/bd_myco.htlm (3). correlated highly with the Haarlem lineage: HSI1 was present Primers and PCR assays. Two sets of primers were designed for each poly- in 38/40 Haarlem versus 4/62 non-Haarlem isolates (P morphic region to determine the presence or absence of a speci，c deletion in each M. tuberculosis isolate. IS6110 insertions were detected using primers that 0.00001), and HSI2 was present 38/40 Haarlem versus 0/62 annealed with the IS6110 ！anking regions, generating bands that differed in size non-Haarlem strains (P 0.00001). Lastly, and commensurate (1,362 bp) depending on whether the IS6110 element was present or absent. with previous reports, HSSNP1 in ogt and HSSNP2 in ung both Table 1 summarizes the sequences of the primers and the expected ampli，cation were present in 38/40 Haarlem versus 1/62 non-Haarlem iso- products for each region. All PCRs were performed in a iCycler DNA thermal cycler (Bio-Rad) in a ，nal volume of 50 l containing 2.5 units of TucanTaq lates (P 0.00001). DNA polymerase (Corpogen, Bogota?, Colombia), 1 TucanTaq ampli，cation Genes involved in the deletions and insertions. The genes buffer, 1.5 mM MgCl, 0.5 M each primer, 0.3 mM deoxynucleoside triphos- 2involved in large Haarlem-speci，c polymorphisms are depicted phates (dNTPs), and 2 l of DNA from culture lysate extracts. For detection of in Fig. 1. HSD1 is a 1,774-bp deletion that removes most of deletions, PCRs were carried out for 35 cycles consisting of 45 s of denaturation genes helZ and Rv2102, HSD2 is a 6,480-bp deletion that at 94?C, 45 s of annealing at 64?C, and 120 s of extension at 72?C. For detection
3616 CUBILLOS-RUIZ ET AL. J. CLIN. MICROBIOL.
TABLE 1. Primers used in this study for the identi，cation of the Haarlem-speci，c polymorphisms
Region or Product Lineage Primer Sequence Reference locus Size (bp) HSD1 2,230 Haarlem hsd1 A(f) 5 CGCTCCGTCGACAAGAGAG This study hsd1 B(r) 5 TATCCTGGCGAGAATGCTGA Non-Haarlem 1,786 This study hsd1 C(f) 5 ACGCGGCCCTACATCCT hsd1 B(r) 5 TATCCTGGCGAGAATGCTGA
hsd2 A(f) 5 TTGCGCGAATGTGCTTTCTC Haarlem 1,840 This study HSD2 hsd2 B(r) 5 CCGGCCGGCTCTTGTC Non-Haarlem 3,507 This study hsd2 A(f) 5 TTGCGCGAATGTGCTTTCTC hsd2 C(r) 5 CTTCGGGCCGTCTTCTTGC
hsd3 A(f) 5 TAAGCCCTCAACGCGCCACC Haarlem 831 This study HSD3
hsd3 B(r) 5 GCGCTCGATCCCACGTTGT Non-Haarlem 1,737 This study hsd3 A(f) 5 TAAGCCCTCAACGCGCCACC hsd3 C(r) 5 CACACCGTCGGACCTCCTGC
hsd4 A(f) 5 AACACGCCGATACCTATTTGGTC Haarlem 849 This study HSD4 hsd4 B(r) 5 CGTGAGGGCATCGAGGTGGC Non-Haarlem 1,287 This study hsd4 A(f) 5 AACACGCCGATACCTATTTGGTC hsd4 B(r) 5 CGTGAGGGCATCGAGGTGGC
hsi1 A(f) 5 AATGCCGTCGTGGTCAA Haarlem 2,381 This study HSI1
hsi1 B(r) 5 CGGTTTCTCGGGTGCTAC Non-Haarlem 1,019 This study hsi1 A(f) 5 AATGCCGTCGTGGTCAA hsi1 B(r) 5 CGGTTTCTCGGGTGCTAC
hsi2 A(f) 5 GGTCAGGCTGCGGGATGTT Haarlem 2,280 This study HSI2 hsi2 B(r) 5 AGCGTTGCGGGATACTCTGG Non-Haarlem 918 This study hsi2 A(f) 5 GGTCAGGCTGCGGGATGTT hsi2 B(r) 5 AGCGTTGCGGGATACTCTGG
ogt F-M(f) 5 CCCCATCGGGCCATTAAG Haarlem 545 25 HSSNP1
ogt R(r) 5 ACTCAGCCGCTCGCGAGC Non-Haarlem 545 25 ogt F-W(f) 5 CCCCATCGGGCCATTAAC ogt R(r) 5 ACTCAGCCGCTCGCGAGC
ung F-M(f) 5 GCTGGTGGCGATCCTA HSSNP2 Haarlem 287 25 ung R(r) 5 GGCAACAAGAAGCGACTC Non-Haarlem 287 25 ung F-W(f) 5 GCTGGTGGCGATCCTG ung R(r) 5 GGCAACAAGAAGCGACTC
hsd4 IS7(f) 5 AACACGCCGATACCTATTTGGTC Haarlem This study DR hsd4 INS1(r) 5 CGTGAGGGCATCGAGGTGGC hsd4 DR30(r) 5 GAAACTCTTGACGATGCGGTTG
removes genes Rv2271 through Rv2278 and partially truncates More importantly, only 6 out of 62 non-Haarlem strains dis-
played some of these polymorphisms, and none of these six lppN, HSD3 is a 4,753-bp deletion that affects genes Rv1353c
strains harbored all of them. Five of them belonged to the through Rv1356c, and HSD4 is a 439-bp deletion within the
LAM lineage and carried only one Haarlem-speci，c polymor- DR locus between genes Rv2813 and Rv2814c. Insertions HSI1
phism each, suggesting a certain relationship with the Haarlem and HSI2 interrupt genes Rv2336 and Rv0963c, respectively.
lineage or, alternatively, an evolutionary process involving sim- Detailed examination of the Haarlem isolates in the set
ilar selection pressures (Table 3 and Fig. 2, DNA numbers showed that every strain classi，ed within the H1 and the H2
subfamilies consistently displayed all polymorphisms, as did 10 UT89, UT272, 1632, 1506, and 1516). The sixth strain, which out of 12 strains classi，ed within the H3 subfamily (Table 3). had an unde，ned (U) lineage, appeared to be fairly close to Another H3 strain (isolate no. 1511) displayed seven of the the Haarlem family by both IS6110 pattern and spoligotype, eight analyzed polymorphisms. In contrast, two isolates (no. and its relatedness to this lineage was con，rmed by its display-
1633 and 1089, of the H3 and H4 subfamilies, respectively), did ing six out of the eight Haarlem-speci，c polymorphisms (Fig.
not have any of these polymorphisms. Consistently, these two 2, DNA UT148).
isolates did not ，t within the Haarlem branch in the IS6110 Genomic organization of the DR locus in Haarlem strains.
RFLP dendrogram constructed with the whole set of M. tuber- The HSD4 deletion mapped to the DR region and eliminated culosis strains used in this study (Fig. 2). On the other hand, spacers 26 to 31. PCR was positive for this deletion in all the large majority of isolates belonging to non-Haarlem lin- Haarlem 1 and 2 subfamilies and in 11 out of 12 strains be- eages did not contain these polymorphic regions (Table 3). longing to Haarlem 3 subfamily (Fig. 2). This deletion explains
VOL. 48, 2010 GENOMIC SIGNATURES OF M. TUBERCULOSIS HAARLEM STRAINS 3617
Deletion insertion c b a HSIs HSDs NoND, HSD, 6 5 4 3 2 1 12 11 10 9 8 7 6 5 4 3 2 1 ampli，cation. not or Haarlem a done. 414493919732101834511–41449971714841–1973436–1834569 1071617–1716200 2606037–1072976 –2607396 strain 43651354033587 3192481 3117314 2792261 25468492365930 2060412 1521604 1451173 Position 965426912152 deletion;
Haarlem Size 1,359 1,3591,359 6,480 1,774 4,753 226 616 439170 643 115481 58 58 84 87 92 (bp) strain
LeuAPPE24Rv1637c Insertion Insertion Insertion PPE69 Rv3600cIS RepeatPE-PGRS42 277c Rv2271 HelZPE -PGRS33Rv1353c IntergenicPE -Rv0823cPGRS15 insertion. 1539 protein) hypothetical decarboxylase Rv2275transmembranetransmembrane biosynthesis(diguanylate(homoserine (probable (Helicase)-Rv2102 (2-isopropylmalate and Transposase region (hypothetical TABLE (hypothetical (conserved (probable (Possible of of of (hypothetical c region intergenic the the the protein)-Rv1356c (DR) protein) cyclase glycerolphosphodiesterase) kinase) IS IS IS precursor) between 2. 6110 6110 6110 protein)protein)-Rv2274c-Rv2273 transcriptional transcriptional contained hypothetical intergenic Identi，cation protein)-Rv2272 region protein) protein)-MoeY element elementelement protein)-Cyp121 (conserved synthetase) ThrB Gene with in between product on on on (hypothetical(threonine protein)repeat-PanD regulatory PE36 regulatory of (probable Mmpl12Rv0963c Rv2336 (hypothetical hypothetical indels (molybdopterin (probable region Rv2813 (cytochrome (hypothetical synthetase) in protein) (transmembrane (conserved protein)-Rv1354c conserved protein) four (aspartate and conserved protein) protein)- M. Rv2814 P450)-Rv2 tuberculosis and protein) ThrC
Haarlem ND NDND ND ND ND ND ND NDND ( 1 Presence ) b strains NDND ND ND ND ND ND NDND ND 234 in Haarlem NDND ND ND ( ND ND ND NDND ND ) strain: NDND ND ND ND ND ND NDND ND
5 5 5 5 5 5 5 5 CGGCGAGAAAGATGTGC,GGTCAGGCTGCGGGATGTT,AATGCCGTCGTGGTCAA, CGGGTGTTCTTCTTGGAG, CGTGAGGGCATCGAGGTGGC, TTGCGCGAATGTGCTTTCTC, CGCTCCGTCGACAAGAGAG, TAAGCCCTCAACGCGCCACC, 5 CTGG5 5 ATGA5 GGTC5 5 CTGA5 5 TTGAGGGCGTTGACCATAGAGCGTTGCGGGATACTCGGTTTCTCGGGTGCTACCGACGTGCCAGCTTAA AACACGCCGATACCTATTT CCGGCCGGCTCTTGTC TATCCTGGCGAGAATG GCGCTCGATCCCACGTTGT
3618 CUBILLOS-RUIZ ET AL. J. CLIN. MICROBIOL.
TABLE 3. Presence of Haarlem-speci，c polymorphisms in a set of 102 M. tuberculosis strains (40 belonging to the Haarlem lineage aand 62 non-Haarlem)
No. (%) of:
Lineage Deletions Insertions SNPs Isolates HSD1 HSD2 HSD3 HSD4 HSI1 HSI2 HSSNP1 HSSNP2
Haarlem H1 17 17 17 17 17 17 17 17 17 H2 10 10 10 10 10 10 10 10 10 H3 12 10 11 11 11 11 11 11 11 H4 1 0 0 0 0 0 0 0 0 Total 40 (100) 37 (92.5) 38 (95.0) 38 (95.0) 38 (95.0) 38 (95.0) 38 (95.0) 38 (95.0) 38 (95.0) Non-Haarlem
LAM 28 0 0 1 0 3 0 0 0 LAM3/S convergent 1 0 0 0 1 0 0 0 0 S 3 0 0 0 0 0 0 0 0 T 20 0 0 0 0 0 0 0 0 Undetermined 4 1 1 0 1 1 0 1 1 X 3 0 0 0 0 0 0 0 0 EAI 1 0 0 0 0 0 0 0 0 Beijing 1 0 0 0 0 0 0 0 0 H37Rv 1 0 0 0 0 0 0 0 0 Total 62 (100) 1 (1.6) 1 (1.6) 1 (1.6) 2 (3.2) 4 (6.5) 0 (0) 1 (1.6) 1 (1.6)
a Abbreviations: H, Haarlem; LAM, Latin America and Mediterranean; EAI, East African and Indian; HSD, Haarlem-speci，c deletion; HSI, Haarlem-speci，c insertion; SNP, single-nucleotide polymorphism; HSSNP, Haarlem-speci，c single-nucleotide polymorphism.
clearly the spoligotype observed in M. tuberculosis strains of the nology from a collection of epidemiologically well-character- Haarlem 1 and 2 subfamilies, which are characterized by the ized isolates from San Francisco (34). Four of these strains absence of those six spacers. However, most strains of the H3 belong to the Haarlem family and the other one to the U subfamily lack only spacer 31 in the spoligotyping despite the lineage (M. Kato-Maeda, personal communication), reinforc- presence of spacers 26 through 30. In order to understand ing the idea that these are Haarlem-speci，c polymorphisms. the DR organization in H3, we designed primers to amplify the The fact that strains from different geographical origins such as DR locus between spacers 30 to 32 and between the IS6110 Argentina, Colombia, and the United States share the same and these spacers in strains belonging to Haarlem subfamilies speci，c polymorphisms prompts us to propose that these mu- 1, 2, and 3. Figure 3 shows a schematic representation of the tations are widely distributed. DR locus organization in the Haarlem lineage in the H1, H2, The high frequency of these polymorphisms within one of and H3 subfamilies. A deletion of spacers 26 to 31 was con- the most widespread and successful genotypes can have key ，rmed by sequencing in all six analyzed strains belonging to H1 biological signi，cance. In particular, these Haarlem-speci，c and H2. The sequence of H3 showed a different DR locus polymorphisms may have important functional consequences organization. We identi，ed the presence of spacers 26 to 31 for the tubercle bacilli and be relevant in terms of strategies for downstream of the ancestral IS6110 insertion, followed by an disease control. This is especially evident with respect to the extra copy of the insertion element together with spacers 25 validity of some recently proposed drug targets. Gene Rv1354c, and 32. This organization was further con，rmed by comparison for example, which codes for the only identi，ed putative digua- with the Haarlem strain sequenced by the Broad Institute nylate cyclase in the genome, is associated with the inner mem- (www.broadinstitute.org/). The identity was 100%, indicating brane (22) and thought to be involved in the turnover of that the sequenced strain belongs to the Haarlem 3 subfamily. cyclic-di-GMP, a multifunctional second-messenger molecule exclusive of the bacterial domain. Based on this, Rv1354c has DISCUSSION been recently proposed as an ideal target for the design of new
drugs (6). However, gene Rv1354c is completely deleted in our The Haarlem family was described in the Netherlands in Haarlem strains as part of HSD3, indicating that strains with 1999 (15). The family is highly diverse and has been amply this genotype can dispose of this protein and signaling pathway studied to better understand its evolutionary history. In previ- without losing their capacity to infect and cause disease. Thus, ous work, we identi，ed two SNPs in DNA repair genes, ung gene Rv1354c cannot be considered a suitable antituberculosis and ogt, present in all analyzed Haarlem strains (25). In the drug target (6). Similarly, the cytochrome P450 gene cyp121 present study we report additional markers that can constitute was shown to be essential for M. tuberculosis H37Rv viability a genomic signature of the Haarlem family. These genomic and was proposed as a novel target for azole drugs (24). This markers include six speci，c large polymorphisms (four dele-
gene, however, is also deleted in Haarlem strains as part of tions and two IS6110 insertions) along with the two previously
HSD2, making necessary a reevaluation of the antimicrobial described SNPs in ung and ogt. Three of the deletions involved
activity in circulating strains. The results obtained here under- in these speci，c polymorphisms have been previously reported
score the importance of strain diversity and the need to iden- in 5 out of 100 clinical isolates analyzed by microarray tech-
VOL. 48, 2010 GENOMIC SIGNATURES OF M. TUBERCULOSIS HAARLEM STRAINS 3619
FIG. 1. Haarlem-speci，c polymorphisms. Genes involved in four Haarlem-speci，c deletions (HSD1 to HSD4) and two Haarlem-speci，c insertions (HSI1 and HSI2) are shown. The upper part of each horizontal panel represents the wild-type sequence, and the lower part represents the Haarlem genotype.
tify a core set of genes common to all M. tuberculosis lineages among the 38 toxin-antitoxin (TA) operons present in the M.
as a crucial step in the development of new antituberculosis tuberculosis genome (1, 27). It has been proposed that these drugs. systems can ful，ll a variety of roles associated with retardation Likewise, genetic variation can re！ect differences in anti- of cell growth and persistence in stressful environments (1). genic repertoire composition among the different lineages of The IS6110 element insertion in HSI1 interrupts Rv2336, a
M. tuberculosis, as pointed out previously (34) and exempli，ed gene that has been implicated in virulence because it is down- here by deletion of the transmembrane proteins Rv2272 and regulated in the attenuated strain H37Ra (28). No obvious Rv2273 as part of HSD2. Consequently, vaccine candidates phenotypic effects can be inferred from the other speci，c poly-
should be effective against challenge not only with laboratory morphisms identi，ed here, i.e., HSD1, HSI2, HSSNP1, and strains but also with strains representative of the major lin- HSSNP2, and additional functional analysis of these mutations eages of the global population of M. tuberculosis. would be required.
In addition to these, other genes affected in Haarlem strains A striking result of our analysis is that the lineage classi，-
could result in important phenotypic changes. Genes Rv2274c cation given by spoligotyping matches almost perfectly with the and Rv2274A, absent due to HSD2, have been annotated one resulting from the presence of these Haarlem-speci，c
VOL. 48, 2010 GENOMIC SIGNATURES OF M. TUBERCULOSIS HAARLEM STRAINS 3621
FIG. 3. Proposed evolution of the Haarlem DR locus. A schematic representation of hypothetical changes in the direct repeat (DR) locus of
Mycobacterium tuberculosis Haarlem strains is shown. H3 probably arose due to a duplication of IS6110, together with spacer 25. A deletion, probably mediated by IS6110 recombination, including 25 to 31 spacers generated H1. H2 could be the result of an ulterior large deletion of all spacers upstream of IS6110. Insertion of IS6110 is shown as a triangle. Partial deletion of IS6110 is shown as an arrow. Numbered bars represent consecutive spacers between DRs.
polymorphisms and also with the Haarlem branch in the RFLP extra copy of IS6110 was previously described (9, 16). A dele- IS6110 dendrogram (Fig. 2). HSI1 and HSI2 result from the tion, possibly mediated by IS6110 recombination, could then
integration of the IS6110 element and are proposed to be the generate the observed distribution of the locus in H1, and categorical markers of the RFLP IS6110 pattern observed in ，nally, the H2 DR organization could have arisen as the result the Haarlem lineage; likewise, HSD4 encompasses the dele- of a spacer-mediated recombination that eliminated the 5 tion of the DR spacers that give Haarlem strains their unique region of the DR encompassing the ，rst spacers up to the
spoligotyping pattern. This observation demonstrates that the IS6110. An alternative explanation for the H3 DR locus orga- classi，cation given by the polymorphisms reported here is nization could be that it resulted from a recombination event linked with the most commonly used genotyping methods, between different strains, as was previously suggested for M.
showing the usefulness of these new markers in phylogenetic tuberculosis (17). In addition to contributing to understanding studies. The results presented here reinforce the idea that the organization of the DR locus, our results also indicate that Haarlem is indeed a distinct phylogenetic group. changes in this region, which is the basis for spoligotyping Another interesting result comes from the analysis of the lineage assignment, correlate with changes in other regions of DR region in Haarlem family strains. Based on our ，ndings the genome, some of which may affect the physiology of the and the spoligotyping patterns, we propose the following sce- tubercle bacilli and contribute to the establishment and world- nario for the evolution of the Haarlem DR region (Fig. 3). The wide spread of successful lineages.
organization seen in H3 most probably arose due to a dupli- Increased resolution of the phylogeny of Euro-American cation of IS6110, together with spacer 25. The insertion of an lineages is needed to provide more accurate data for evolu-
FIG. 2. Dendrogram of clinical isolates. IS6110 RFLP and spoligopatterns of 101 clinical isolates from Colombia (CO) and Argentina (AR), obtained between 1997 and 2005, and of laboratory strain H37Rv are shown. The IS6110 RFLP dendrogram was constructed using arithmetic average linkages and the Dice coef，cient with the software BioNumerics v 5.1 (Applied Maths, St-Martens-Latem, Belgium). Spoligo-shared types (SITs) and lineages were assigned according to the SITVIT database (http://www.pasteur-guadeloupe.fr:8081/SITVITDemo/). The 38 isolates sharing all eight polymorphisms here postulated as Haarlem speci，c are indicated with arrows, two isolates displaying six or seven of these polymorphisms are marked with stars, two isolates classi，ed as Haarlem according to SITVIT and lacking these polymorphisms are indicated with crossed squares, and the ，ve isolates marked with dots displayed only one of the eight Haarlem-speci，c polymorphisms.
3622 CUBILLOS-RUIZ ET AL. J. CLIN. MICROBIOL.
2004. Phylogenetic reconstruction of Mycobacterium tuberculosis within four tionary, epidemiological, and public health applications. In this settings of the Caribbean region: tree comparative analyse and ，rst appraisal respect, our ，ndings fully support results of a recent SNP study on their phylogeography. Infect. Genet. Evol. 4:5–14. (Christophe Sola, personal communication) indicating that 9. Filliol, I., C. Sola, and N. Rastogi. 2000. Detection of a previously unampli- ，ed spacer within the DR locus of Mycobacterium tuberculosis: epidemiolog- some spoligotypes classi，ed as Haarlem in SpolDB4 (3), espe- ical implications. J. Clin. Microbiol. 38:1231–1234. cially those de，ned as H4, are not related to this family and 10. Gagneux, S., K. DeRiemer, T. Van, M. Kato-Maeda, B. C. de Jong, S. that a more stringent de，nition is needed for this group. The Narayanan, M. Nicol, S. Niemann, K. Kremer, M. C. Gutierrez, M. Hilty, P. C. Hopewell, and P. M. Small. 2006. Variable host-pathogen compatibility Haarlem-speci，c mutations described here may be used to in Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. U. S. A. 103:2869–2873. optimize a single-target PCR and/or to include the best-，tted 11. Gagneux, S., and P. M. Small. 2007. Global phylogeography of Mycobacte- target in multiplex assays aimed to classify strains into the main rium tuberculosis and implications for tuberculosis product development. Lancet Infect. Dis. 7:328–337. strain families. Indeed, studies associating distinct lineages 12. Heersma, H. F., K. Kremer, and J. D. van Embden. 1998. Computer analysis with patient clinical and epidemiologic traits will improve our of IS6110 RFLP patterns of Mycobacterium tuberculosis. Methods Mol. Biol. 101:395–422. understanding of disease pathogenesis and improve current 13. Hershberg, R., M. Lipatov, P. M. Small, H. Sheffer, S. Niemann, S. Ho- control measures, thus preventing further spread of epidemic molka, J. C. Roach, K. Kremer, D. A. Petrov, M. W. Feldman, and S. strains. Gagneux. 2008. High functional diversity in Mycobacterium tuberculosis driven by genetic drift and human demography. PLoS Biol. 6:e311. A recent analysis of M. tuberculosis complex strains indicated 14. Kamerbeek, J., L. Schouls, A. Kolk, M. van Agterveld, D. van Soolingen, S. that much of the observed genetic diversity has phenotypic Kuijper, A. Bunschoten, H. Molhuizen, R. Shaw, M. Goyal, and J. van consequences and that purifying selection is severely reduced Embden. 1997. Simultaneous detection and strain differentiation of Myco- bacterium tuberculosis for diagnosis and epidemiology. J. Clin. Microbiol. in this highly clonal population, which suffers constant bottle- 35:907–914. necks, produced when a single cell is enough to establish an 15. Kremer, K., D. van Soolingen, R. Frothingham, W. H. Haas, P. W. Hermans, C. Martin, P. Palittapongarnpim, B. B. Plikaytis, L. W. Riley, M. A. Yakrus, infection (13). In this respect, the identi，cation of genomic J. M. Musser, and J. D. van Embden. 1999. Comparison of methods based changes unique to the Haarlem lineage can provide a basis on different molecular epidemiological markers for typing of Mycobacterium from which to begin to unravel some of the speci，c phenotypic tuberculosis complex strains: interlaboratory study of discriminatory power and reproducibility. J. Clin. Microbiol. 37:2607–2618. characteristics that distinguish this particular genotype from 16. Legrand, E., I. Filliol, C. Sola, and N. Rastogi. 2001. Use of spoligotyping to the rest of the M. tuberculosis lineages. study the evolution of the direct repeat locus by IS6110 transposition in Mycobacterium tuberculosis. J. Clin. Microbiol. 39:1595–1599. 17. Liu, X., M. M. Gutacker, J. M. Musser, and Y. X. Fu. 2006. Evidence for ACKNOWLEDGMENTS recombination in Mycobacterium tuberculosis. J. Bacteriol. 188:8169–8177. This work was supported by Colciencias grant 431-2004, the Colom- 18. Lopez, B., D. Aguilar, H. Orozco, M. Burger, C. Espitia, V. Ritacco, L. Barrera, K. Kremer, R. Hernandez-Pando, K. Huygen, and D. van Soolin- bian Center for Excellence in Tuberculosis Research (CCITB), gen. 2003. A marked difference in pathogenesis and immune response in- CYTED grant 207RT0311, and grant FP7-HEALTH-2007-A-201690 duced by different Mycobacterium tuberculosis genotypes. Clin. Exp. Immu- from the EC. nol. 133:30–37. 19. Malik, A. N., and P. Godfrey-Faussett. 2005. Effects of genetic variability of REFERENCES Mycobacterium tuberculosis strains on the presentation of disease. Lancet 1. Arcus, V. L., P. B. Rainey, and S. J. Turner. 2005. The PIN-domain toxin- Infect. Dis. 5:174–183. 20. Marquina-Castillo, B., L. Garcia-Garcia, A. Ponce-de-Leon, M. E. Jimenez- antitoxin array in mycobacteria. Trends Microbiol. 13:360–365. Corona, M. Bobadilla-Del Valle, B. Cano-Arellano, S. Canizales-Quintero, 2. Berrington, W. R., and T. R. Hawn. 2007. Mycobacterium tuberculosis, mac- A. Martinez-Gamboa, M. Kato-Maeda, B. Robertson, D. Young, P. Small, G. rophages, and the innate immune response: does common variation matter? Schoolnik, J. Sifuentes-Osornio, and R. Hernandez-Pando. 2009. Virulence, Immunol. Rev. 219:167–186. immunopathology and transmissibility of selected strains of Mycobacterium 3. Brudey, K., J. R. Driscoll, L. Rigouts, W. M. Prodinger, A. Gori, S. A. Al-Hajoj, C. Allix, L. Aristimuno, J. Arora, V. Baumanis, L. Binder, P. tuberculosis in a murine model. Immunology 128:123–133. 21. Mathema, B., N. E. Kurepina, P. J. Bifani, and B. N. Kreiswirth. 2006. Cafrune, A. Cataldi, S. Cheong, R. Diel, C. Ellermeier, J. T. Evans, M. Fauville-Dufaux, S. Ferdinand, D. Garcia de Viedma, C. Garzelli, L. Gaz- Molecular epidemiology of tuberculosis: current insights. Clin. Microbiol. zola, H. M. Gomes, M. C. Guttierez, P. M. Hawkey, P. D. van Helden, G. V. Rev. 19:658–685. Kadival, B. N. Kreiswirth, K. Kremer, M. Kubin, S. P. Kulkarni, B. Liens, 22. Mawuenyega, K. G., C. V. Forst, K. M. Dobos, J. T. Belisle, J. Chen, E. M. T. Lillebaek, M. L. Ho, C. Martin, I. Mokrousov, O. Narvskaia, Y. F. Ngeow, Bradbury, A. R. Bradbury, and X. Chen. 2005. Mycobacterium tuberculosis L. Naumann, S. Niemann, I. Parwati, Z. Rahim, V. Rasolofo-Razanam- functional network analysis by global subcellular protein pro，ling. Mol. Biol. parany, T. Rasolonavalona, M. L. Rossetti, S. Rusch-Gerdes, A. Sajduda, S. Cell 16:396–404. Samper, I. G. Shemyakin, U. B. Singh, A. Somoskovi, R. A. Skuce, D. van 23. McEvoy, C. R., A. A. Falmer, N. C. Gey van Pittius, T. C. Victor, P. D. van Soolingen, E. M. Streicher, P. N. Suffys, E. Tortoli, T. Tracevska, V. Vincent, Helden, and R. M. Warren. 2007. The role of IS6110 in the evolution of T. C. Victor, R. M. Warren, S. F. Yap, K. Zaman, F. Portaels, N. Rastogi, and Mycobacterium tuberculosis. Tuberculosis (Edinb.) 87:393–404. C. Sola. 2006. Mycobacterium tuberculosis complex genetic diversity: mining 24. McLean, K. J., P. Carroll, D. G. Lewis, A. J. Dunford, H. E. Seward, R. Neeli, M. R. Cheesman, L. Marsollier, P. Douglas, W. E. Smith, I. Rosenkrands, the fourth international spoligotyping database (SpolDB4) for classi，cation, S. T. Cole, D. Leys, T. Parish, and A. W. Munro. 2008. Characterization of population genetics and epidemiology. BMC Microbiol. 6:23. active site structure in CYP121. A cytochrome P450 essential for viability of 4. Caws, M., G. Thwaites, S. Dunstan, T. R. Hawn, N. T. Lan, N. T. Thuong, K. Stepniewska, M. N. Huyen, N. D. Bang, T. H. Loc, S. Gagneux, D. van Mycobacterium tuberculosis H37Rv. J. Biol. Chem. 283:33406–33416. Soolingen, K. Kremer, M. van der Sande, P. Small, P. T. Anh, N. T. Chinh, 25. Olano, J., B. Lopez, A. Reyes, M. P. Lemos, N. Correa, P. Del Portillo, L. H. T. Quy, N. T. Duyen, D. Q. Tho, N. T. Hieu, E. Torok, T. T. Hien, N. H. Barrera, J. Robledo, V. Ritacco, and M. M. Zambrano. 2007. Mutations in Dung, N. T. Nhu, P. M. Duy, N. van Vinh Chau, and J. Farrar. 2008. The DNA repair genes are associated with the Haarlem lineage of Mycobacterium in！uence of host and bacterial genotype on the development of disseminated tuberculosis independently of their antibiotic resistance. Tuberculosis (Edinb.) 87:502–508. disease with Mycobacterium tuberculosis. PLoS Pathog. 4:e1000034. 5. Cubillos-Ruiz, A., J. Morales, and M. M. Zambrano. 2008. Analysis of the 26. Palmero, D., V. Ritacco, M. Ambroggi, N. Marcela, L. Barrera, L. Capone, genetic variation in Mycobacterium tuberculosis strains by multiple genome A. Dambrosi, M. di Lonardo, N. Isola, S. Poggi, M. Vescovo, and E. Abbate. 2003. Multidrug-resistant tuberculosis in HIV-negative patients, Buenos alignments. BMC Res. Notes 1:110. 6. Cui, T., L. Zhang, X. Wang, and Z. G. He. 2009. Uncovering new signaling Aires, Argentina. Emerg. Infect. Dis. 9:965–969. proteins and potential drug targets through the interactome analysis of 27. Pandey, D. P., and K. Gerdes. 2005. Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes. Nucleic Acids Res. Mycobacterium tuberculosis. BMC Genomics 10:118. 7. Dormans, J., M. Burger, D. Aguilar, R. Hernandez-Pando, K. Kremer, P. 33:966–976. Roholl, S. M. Arend, and D. van Soolingen. 2004. Correlation of virulence, 28. Rindi, L., N. Lari, and C. Garzelli. 1999. Search for genes potentially in- volved in Mycobacterium tuberculosis virulence by mRNA differential display. lung pathology, bacterial load and delayed type hypersensitivity responses after infection with different Mycobacterium tuberculosis genotypes in a Biochem. Biophys. Res. Commun. 258:94–101. BALB/c mouse model. Clin. Exp. Immunol. 137:460–468. 29. Ritacco, V., M. Di Lonardo, A. Reniero, M. Ambroggi, L. Barrera, A. Dam- 8. Duchene, V., S. Ferdinand, I. Filliol, J. F. Guegan, N. Rastogi, and C. Sola. brosi, B. Lopez, N. Isola, and I. N. de Kantor. 1997. Nosocomial spread of
VOL. 48, 2010 GENOMIC SIGNATURES OF M. TUBERCULOSIS HAARLEM STRAINS 3623
human immunode，ciency virus-related multidrug-resistant tuberculosis in between Mycobacterium tuberculosis genotype and the clinical phenotype of Buenos Aires. J. Infect. Dis. 176:637–642. pulmonary and meningeal tuberculosis. J. Clin. Microbiol. 46:1363–1368. 34. Tsolaki, A. G., A. E. Hirsh, K. DeRiemer, J. A. Enciso, M. Z. Wong, M. 30. Sampson, S. L., R. M. Warren, M. Richardson, G. D. van der Spuy, and P. D. Hannan, Y. O. Goguet de la Salmoniere, K. Aman, M. Kato-Maeda, and van Helden. 1999. Disruption of coding regions by IS6110 insertion in P. M. Small. 2004. Functional and evolutionary genomics of Mycobacterium Mycobacterium tuberculosis. Tuber. Lung Dis. 79:349–359. tuberculosis: insights from genomic deletions in 100 strains. Proc. Natl. Acad. 31. Schmidt, C. W. 2008. Linking TB and the environment: an overlooked Sci. U. S. A. 101:4865–4870. mitigation strategy. Environ. Health Perspect. 116:A478–A485. 35. van Embden, J. D., M. D. Cave, J. T. Crawford, J. W. Dale, K. D. Eisenach, 32. Suchindran, S., E. S. Brouwer, and A. Van Rie. 2009. Is HIV infection a risk B. Gicquel, P. Hermans, C. Martin, R. McAdam, T. M. Shinnick, et al. 1993. factor for multi-drug resistant tuberculosis? A systematic review. PLoS One Strain identi，cation of Mycobacterium tuberculosis by DNA ，ngerprinting: 4:e5561. recommendations for a standardized methodology. J. Clin. Microbiol. 31: 33. Thwaites, G., M. Caws, T. T. Chau, A. D’Sa, N. T. Lan, M. N. Huyen, S. 406–409. Gagneux, P. T. Anh, D. Q. Tho, E. Torok, N. T. Nhu, N. T. Duyen, P. M. Duy, 36. WHO. 2008. Global tuberculosis control. Surveillance, planning, ，nancing. J. Richenberg, C. Simmons, T. T. Hien, and J. Farrar. 2008. Relationship WHO report 2008. WHO, Geneva, Switzerland.