TY - JOUR
T1 - COMET
T2 - Adaptive context-based modeling for ultrafast HIV-1 subtype identification
AU - Struck, Daniel
AU - Lawyer, Glenn
AU - Ternes, Anne Marie
AU - Schmit, Jean Claude
AU - Bercoff, Danielle Perez
N1 - Publisher Copyright:
© 2014 The Author(s).
PY - 2014/10/13
Y1 - 2014/10/13
N2 - Viral sequence classification has wide applications in clinical, epidemiological, structural and functional categorization studies. Most existing approaches rely on an initial alignment step followed by classification based on phylogenetic or statistical algorithms. Here we present an ultrafast alignment-free subtyping tool for human immunodeficiency virus type one (HIV-1) adapted from Prediction by Partial Matching compression. This tool, named COMET, was compared to the widely used phylogeny-based REGA and SCUEAL tools using synthetic and clinical HIV data sets (1 090 698 and 10 625 sequences, respectively). COMET's sensitivity and specificity were comparable to or higher than the two other subtyping tools on both data sets for known subtypes. COMET also excelled in detecting and identifying new recombinant forms, a frequent feature of the HIV epidemic. Runtime comparisons showed that COMET was almost as fast as USEARCH. This study demonstrates the advantages of alignment-free classification of viral sequences, which feature high rates of variation, recombination and insertions/deletions. COMET is free to use via an online interface. at https://comet.lih.lu/
AB - Viral sequence classification has wide applications in clinical, epidemiological, structural and functional categorization studies. Most existing approaches rely on an initial alignment step followed by classification based on phylogenetic or statistical algorithms. Here we present an ultrafast alignment-free subtyping tool for human immunodeficiency virus type one (HIV-1) adapted from Prediction by Partial Matching compression. This tool, named COMET, was compared to the widely used phylogeny-based REGA and SCUEAL tools using synthetic and clinical HIV data sets (1 090 698 and 10 625 sequences, respectively). COMET's sensitivity and specificity were comparable to or higher than the two other subtyping tools on both data sets for known subtypes. COMET also excelled in detecting and identifying new recombinant forms, a frequent feature of the HIV epidemic. Runtime comparisons showed that COMET was almost as fast as USEARCH. This study demonstrates the advantages of alignment-free classification of viral sequences, which feature high rates of variation, recombination and insertions/deletions. COMET is free to use via an online interface. at https://comet.lih.lu/
UR - http://www.scopus.com/inward/record.url?scp=84922394990&partnerID=8YFLogxK
UR - https://pubmed.ncbi.nlm.nih.gov/25120265
U2 - 10.1093/nar/gku739
DO - 10.1093/nar/gku739
M3 - Article
C2 - 25120265
AN - SCOPUS:84922394990
SN - 0305-1048
VL - 42
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 18
ER -