Conference Abstract

Using validated algorithms to estimate annual Lyme disease incidence in the United States, 2016-2023

May 29, 2025
Authors:

Kluberg SA, Cocoros NM, Oneill J, Shapiro K, Rosen E, Jin R, Aucott J, Daniels K, Love S, Djibo DA, Selvan M, DeVries A, Ma Q, Stark JH, Mo'si JC, Willis SJ

Capability:
Validation Studies & Algorithm Development
Expertise:
RWE Research & Consulting

Poster presented at the 17th International Conference on Lyme Borreliosis and Tick-Borne Diseases

Introduction: Lyme disease (LD) is the most common vector-borne disease in the US. Although LD is a nationally notifiable condition, traditional surveillance underestimates the true burden of disease. We previously validated algorithms to identify LD cases in administrative claims data. We calculated the positive predictive values (PPVs) based on medical record review, with separate calculations for states with high LD incidence and states neighboring high-incidence states. The primary algorithm identified individuals with a LD diagnosis code and an indicated antibiotic (high incidence PPV 90.7%, neighboring state PPV 81.3%), while algorithms to identify cases of disseminated disease required a diagnosis code for a LD-related symptom (but no LD-specific code), an indicated antibiotic, and a LD diagnostic test procedure code (PPVs 1.8% - 12.9%, varying by setting and specific algorithm). The current study provides annual estimates of the burden of LD in 2016 through 2023 from multiple large data sources using PPV adjustment and standardization to the US census.

Methods: We identified LD cases and estimated population denominators from January 2016 through December 2023 from four large national health plans (Anthem, CVS Health, Humana, Inc., and Optum) and US Medicare (fee-for-service and Medicare Advantage). We identified potential LD cases that met the validated algorithms or that had evidence of hospitalization for LD, allowing each individual only one LD event per calendar year. We classified each potential case as localized or disseminated LD based on the specific diagnosis code used, duration and route of antibiotic treatment, and hospitalization status.

In order to account for imperfect specificity of the claims-based algorithms, we adjusted observed case counts according to PPVs for each algorithm and region (high-incidence, neighboring, and low-incidence) and calculated incidence rates as adjusted case counts divided by all enrolled members within the population stratum (by year, state, sex, and 5-year age category). We applied these incidence rates to population counts from US census data to estimate the standardized LD case count per stratum. Based on these counts, we calculated the annual incidence rate nationally, by region and age category, and separately for localized and disseminated LD.

Results: The data sources in this study cover approximately 72 million individuals per year, representing all ages and all US states. Data are currently being analyzed; results will be available by September 2025.

Conclusion: This study will provide an estimate of LD incidence across the US using several large, nationally representative data sources and multiple validated algorithms, shedding light on the burden of both localized and disseminated LD in the US. No prior research of US LD incidence has included a study population of this magnitude and diversity. Our findings will support public health activities aimed at improving LD prevention strategies.

View the Poster