Most of the epidemiological community agrees that near real-time assessment of transmission dynamics during an outbreak is super important; among many other uses, it helps us establish a baseline before interventions are put in place, which thus allows us to evaluate how those same interventions perform in the weeks and months ahead. Clearly, the utility for this kind of work is many fold, especially for an outbreak like Zika, which has grown to affect 40+ countries; if we can understand what interventions work well in countries that are affected earlier, it gives us a better sense of what to try first when new countries start reporting cases, as well. However, we can’t do this without data.
Unfortunately, one of the issues we’ve faced during the ongoing Central and South American Zika outbreak has been a dearth of official case count data. In recent weeks, this has been partially mitigated thanks to this interactive, case count visualization PAHO released a little while ago. However, it’s worth noting that these data aren’t in a machine-readable format (and scraping from interactive visualizations can often be more challenging than doing so from PDFs). This, of course, adds an additional, unnecessary step for researchers and curious minds alike… But, at least it exists!
During the early days of the outbreak – and even while we were working on analyses for our JMIR paper – this wasn’t the case. As we describe at length in the paper, we actually used media reported case counts over time (i.e. digital disease detection [DDD]!) to get a sense of what was happening on the ground in Colombia. The raw data, using HealthMap media reports, is shown in gray in the figure below. To convert this unrealistic, L-shaped curve into something that seemed more representative of how outbreaks tend to grow, we used Google search fractions to smooth the curve (shown in purple). We then proceeded to use this smoothed curve to estimate a few key transmission parameters associated with the outbreak (i.e. basic and observed reproductive numbers, final outbreak size, etc.).
However, I think perhaps one of the strongest components of our paper is the fact that we were actually able to get our hands on a validation data set from the INS (Instituto Nacional de Salud) in Colombia. These INS data were essentially gold-standard, governmental surveillance data (shown in green). From the figure below, it becomes evident that agreement between the official INS data and the smoothed HealthMap data is pretty decent. However, to cement this further, we actually went ahead and used the INS data to model the same transmission parameters we estimated using the smoothed HealthMap data. Though model estimates for the transmission parameters of interest certainly varied between data sets, we found that they were certainly similar enough for planning and preparedness purposes.
So, what does this all mean? In short, we hope that our work here will lend more credibility to the use of digital disease surveillance data (e.g. HealthMap media reports, Google search fractions) for near real-time estimation of outbreak transmission dynamics.
With little doubt, outbreaks of infectious diseases – Zika and otherwise – are on the rise. A variety of enabling conditions are to blame, including globalization and ease of travel, as well as displacement of both human and animal populations due to climate change and conflict. For (re-)emerging pathogens like Zika virus – either long-neglected or never truly known – traditional, healthcare-based surveillance is particularly weak. Thankfully, it’s precisely under these circumstances that digital disease surveillance can provide a truly valuable service, by shedding light upon and lending insight into critical information gaps.