Large-scale Epidemiological modeling: Scanning for Mosquito-Borne Diseases Spatio-temporal Patterns in Brazil

Eduardo C. Araujo1, Claudia T. Codeço2, Sandro Loch1,
Luã B. Vacaro1, Laís P. Freitas3,4, Raquel M. Lana5,
Leonardo S. Bastos2, Iasmim F. de Almeida6, Fernanda Valente7,
Luiz M. Carvalho1 and Flávio C. Coelho1
1EMAp - FGV, Brazil, 2PROCC-FIOCRUZ, Brazil, 3ESPUM, Canada,
4CReSP, Canada, 5BSC, Spain, 6ENSP-FIOCRUZ, Brazil, 7Observatório de Bioeconomia, FGV, Brazil
Abstract

The influence of climate on mosquito-borne diseases like dengue and chikungunya is well-established, but comprehensively tracking long-term spatial and temporal trends across large areas has been hindered by fragmented data and limited analysis tools. This study presents an unprecedented analysis, in terms of breadth, estimating the SIR transmission parameters from incidence data in all 5,570 municipalities in Brazil over 14 years (2010-2023) for both dengue and chikungunya. We describe the Episcanner computational pipeline, developed to estimate these parameters, producing a reusable dataset describing all dengue and chikungunya epidemics that have taken place in this period, in Brazil. The analysis reveals new insights into the climate-epidemic nexus: We identify distinct geographical and temporal patterns of arbovirus disease incidence across Brazil, highlighting how climatic factors like temperature and precipitation influence the timing and intensity of dengue and chikungunya epidemics. The innovative Episcanner tool empowers researchers and public health officials to explore these patterns in detail, facilitating targeted interventions and risk assessments. This research offers a new perspective on the long-term dynamics of climate-driven mosquito-borne diseases and their geographical specificities linked to the effects of global temperature fluctuations such as those captured by the ENSO index.

Keywords: episcanner, mosquito-borne diseases, climate-epidemic nexus, geographical and temporal patterns

1 Introduction

Dengue and chikungunya are among the most significant mosquito-borne diseases in terms of disease burden and potential for global expansion. Worldwide, the estimated number of dengue cases has increased from 30 million, in 1990, to 56 million in 2019 (Yang et al.,, 2021), reaching more than 100 countries. Chikungunya, primarily transmitted by the same mosquitoes Aedes aegypti and Aedes albopictus, has been spreading worldwide since the early 2000s and is now present in all continents with tropical zones (Manzoor et al.,, 2022).

The temporal dynamics of these vector-borne diseases is characterized by seasonal and multi-year cycles that can vary from place to place. Within seasons, they can further differ in their reproductive number, peak size, timing and total attack rate (de Almeida et al.,, 2022). This spatial-temporal heterogeneity is strongly associated with the climate and meteorological conditions that affect the mosquito vectorial capacity and viral transmission, in combination with the history of previous population exposure that affects the level of collective immunity. The urban environment, characterized by crowding, inadequate garbage and water services, can further increase the population vulnerability and exposure, while effective vector control measures can potentially flatten the curve. As a result, all of these local factors contribute to generate very diverse spatially-temporal epidemic dynamics locally.

Dengue has maintained a continuous presence in the tropical regions of Brazil since 1986 (Rodriguez-Barraquer et al.,, 2011), but with patterns varying from transient transmission to epidemic and endemic persistence (de Almeida et al.,, 2022). Chikungunya fever was introduced in Brazil in 2014 and persisted, with outbreaks increasing in frequency (de Souza et al.,, 2023). In the last years, probably due to the gradual warming of the southern states, outbreaks of both diseases have begun to occur in areas with subtropical and temperate climates (Codeco et al.,, 2022; de Almeida et al.,, 2023). Understanding how climate can affect the reproductive number and timing of dengue and chikungunya epidemics locally is useful for guiding decisions at the national level.

This study is centred on estimating and examining the characteristics of all the dengue and chikungunya epidemics that occurred in Brazilian municipalities from 2010 to 2023. The key descriptors are the reproductive number, size and timing, estimated by the Episcanner pipeline, a computational pipeline developed for this end.

2 Methods

2.1 Data Sources

Brazil comprises 5570 municipalities, divided into 26 states, plus the Federal District, the country’s capital. The disease data for all the municipalities was obtained from the Infodengue project (Codeco et al.,, 2018), available at info.dengue.mat.br, for the period between 2010 and 2023. The Infodengue project gathers notification data for dengue, chikungunya, and Zika from the Brazilian Ministry of Health, sanitizes and re-publishes it, along with epidemiological analyses, such as estimates for the effective reproduction number, tsubscript𝑡{\cal R}_{t}caligraphic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, by week, for all municipalities. The code for obtaining the Infodengue data is available on the GitHub repository associated with this paper. The climate data were obtained from the Copernicus ERA5 reanalysis dataset (Muñoz Sabater and Service,, 2019) and daily averaged by the municipality. The climate variables included land-surface temperature, relative humidity, precipitation and atmospheric pressure.

2.2 The Episcanner Pipeline

If all municipalities experienced a dengue epidemic per year, in 10 years, there would be 55,700 epidemics, varying in velocity, timing, and magnitude. This is a large number of models to be fitted individually. Episcanner is a computational pipeline that efficiently scans all these potential epidemics and estimates the epidemiological descriptors whenever an epidemic is detected. The epidemic descriptors are the basic reproduction number (0subscript0{\cal R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT), peak week, the total outbreak size, as well as timing and duration of the epidemic.

The core of the Episcanner pipeline is the Richards logistic growth function, a sigmoid curve that has a one-to-one relationship with the SIR epidemiological model (Wang et al.,, 2012). Based on this relationship, the epidemiologically meaningful parameters of the SIR (Susceptible-Infectious-Recovered) transmission model can be efficiently derived from the fitted Richard model, which serves as a curve template for identifying an epidemic. This simplification is pivotal for curtailing the computational cost associated with estimating the epidemic descriptors of interest for all municipalities and epidemic years considered.

The Episcanner pipeline is depicted in Figure 1, and can be summarized in the following steps:

  1. 1.

    Infodengue’s incidence time series are obtained from their API.

  2. 2.

    Filter years with a potential epidemic, defined as an annual time series having a minimum of 3 weeks with at least 0,9 probability of t>1subscript𝑡1{\cal R}_{t}>1caligraphic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 1 and more than fifty cumulative cases. Any year with less than that is discarded as a short outbreak. This is done for all municipalities.

  3. 3.

    Split all-time series at week 45 to obtain a collection of 52 week-long time series, one for each year and each municipality. The cut point defines the typical beginning of the dengue season in Brazil.

  4. 4.

    Richards model is fitted to the observed curves, using an optimization algorithm, generating the SIR parameter estimates (see below for details).

  5. 5.

    Epidemiological parameters dataset is created and made available through the Mosqlimate API.

Refer to caption
Figure 1: The computational pipeline of Episcanner inferential engine is the dark blue block. Data from Infodengue, after being filtered to identify candidates for epidemic years, is then fitted to the Richards model. The resulting estimated parameter dataset is made available through an API as a JSON object (https://api.mosqlimate.org/datastore/). This data can be visualized in the Episcanner dashboard, which is part of the Infodengue website (https://info.dengue.mat.br/epi-scanner).

The complete set of parameters produced from all Brazilian municipalities can be browsed through the online dashboard (Coelho,, 2024) called Episcanner, within the Infodengue website. This dashboard is updated weekly and allows anyone to browse the estimated epidemic characteristics of dengue and chikungunya, as described in this paper.

2.3 Estimation of the Epidemiological Parameters

The Richards model is defined by the following equation:

J(t)=LL[1+αeb(ttj)]1/α,𝐽𝑡𝐿𝐿superscriptdelimited-[]1𝛼superscript𝑒𝑏𝑡subscript𝑡𝑗1𝛼\displaystyle\begin{split}J(t)=L-L[1+\alpha e^{b(t-t_{j})}]^{-1/\alpha},\end{split}start_ROW start_CELL italic_J ( italic_t ) = italic_L - italic_L [ 1 + italic_α italic_e start_POSTSUPERSCRIPT italic_b ( italic_t - italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT - 1 / italic_α end_POSTSUPERSCRIPT , end_CELL end_ROW (1)

where J(t)𝐽𝑡J(t)italic_J ( italic_t ) is interpreted as the accumulated number of cases at week t𝑡titalic_t, L>0𝐿0L>0italic_L > 0 is the estimated total number of cases at the end of the epidemic, tj[1,52]subscript𝑡𝑗152t_{j}\in[1,52]italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ [ 1 , 52 ] is the week of the inflexion point of the sigmoid curve, which corresponds to the peak of the epidemic. The remaining parameters α𝛼\alphaitalic_α and b𝑏bitalic_b are additional parameters to be estimated.

When mapping to the SIR model, the function J(t)𝐽𝑡J(t)italic_J ( italic_t ) corresponds to the number of recovered individuals at time t𝑡titalic_t, which in the SIR model is denoted R(t)𝑅𝑡R(t)italic_R ( italic_t ). The equivalent SIR model’s parameters can be derived from the Richards equation’s parameters (1) through the equations given below (refer to Wang et al., (2012) for derivation):

β=bα,𝛽continued-fraction𝑏𝛼\displaystyle\begin{split}\beta=\cfrac{b}{\alpha},\end{split}start_ROW start_CELL italic_β = continued-fraction start_ARG italic_b end_ARG start_ARG italic_α end_ARG , end_CELL end_ROW (2)
γ=b(1α1),𝛾𝑏continued-fraction1𝛼1\displaystyle\begin{split}\gamma=b\left(\cfrac{1}{\alpha}-1\right),\end{split}start_ROW start_CELL italic_γ = italic_b ( continued-fraction start_ARG 1 end_ARG start_ARG italic_α end_ARG - 1 ) , end_CELL end_ROW (3)
R0=βγ=11α.subscript𝑅0continued-fraction𝛽𝛾continued-fraction11𝛼\displaystyle\begin{split}R_{0}=\cfrac{\beta}{\gamma}=\cfrac{1}{1-\alpha}.\end% {split}start_ROW start_CELL italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = continued-fraction start_ARG italic_β end_ARG start_ARG italic_γ end_ARG = continued-fraction start_ARG 1 end_ARG start_ARG 1 - italic_α end_ARG . end_CELL end_ROW (4)

Besides the reproductive number (R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT), epidemic size (L𝐿Litalic_L) and peak week (tjsubscript𝑡𝑗t_{j}italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT), the Episcanner dataset also includes the onset of the epidemic (wssubscript𝑤𝑠w_{s}italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT) that is defined as the week in which the new cases crossed the 5% percentile, and the final week of the epidemic (wesubscript𝑤𝑒w_{e}italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT) defined as the week when the cases dropped below this threshold (See Figure 2). From these points, the epidemic duration (wewssubscript𝑤𝑒subscript𝑤𝑠w_{e}-w_{s}italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT) in weeks is computed and also added to the dataset.

From now on, we will refer to these derived parameters, collectively, as epidemic parameters or epidemic descriptors.

Refer to caption
Figure 2: Example of the outcome of estimating the start and end weeks of an epidemic for Rio de Janeiro municipality in all epidemic years between 2010 and 2023. The red area represents the cumulative number of cases. Vertical dashed lines indicate the start and end of the epidemic. The blue curve is the fitted Richards curve, and the orange one at the bottom is the absolute error of the fit every week.

2.4 Fitting the model to data

For every selected city and epidemic year, as defined in section 2.1, we fitted the Richards model (eq. 1) to the data by solving the following optimization problem: determining the values of the parameters set ξ:={L,α,b,tjL,α,b,tj}+assign𝜉conditional-set𝐿𝛼𝑏subscript𝑡𝑗𝐿𝛼𝑏subscript𝑡𝑗superscript\xi:=\{L,\alpha,b,t_{j}\mid L,\alpha,b,t_{j}\}\in\mathbb{R}^{+}italic_ξ := { italic_L , italic_α , italic_b , italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∣ italic_L , italic_α , italic_b , italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT that minimizes the error function below:

argminξt=152(C(t)J(t,ξ))252,subscriptargmin𝜉superscriptsubscript𝑡152superscript𝐶𝑡𝐽𝑡𝜉252\displaystyle\begin{split}\operatorname{argmin}_{\xi}\sum_{t=1}^{52}\frac{(C(t% )-J(t,\xi))^{2}}{52},\end{split}start_ROW start_CELL roman_argmin start_POSTSUBSCRIPT italic_ξ end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 52 end_POSTSUPERSCRIPT divide start_ARG ( italic_C ( italic_t ) - italic_J ( italic_t , italic_ξ ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 52 end_ARG , end_CELL end_ROW (5)

where C(t)𝐶𝑡C(t)italic_C ( italic_t ) is the observed cumulative number of cases at week t𝑡titalic_t. The following ranges restricted the possible values of the parameters of ξ𝜉\xiitalic_ξ: γ[0.95,1.05]𝛾0.951.05\gamma\in[0.95,1.05]italic_γ ∈ [ 0.95 , 1.05 ] (1-week infectious period), α[0.001,1]𝛼0.0011\alpha\in[0.001,1]italic_α ∈ [ 0.001 , 1 ], b[106,1]𝑏superscript1061b\in[10^{-6},1]italic_b ∈ [ 10 start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT , 1 ], tj[5,35]subscript𝑡𝑗535t_{j}\in[5,35]italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ [ 5 , 35 ] (based on the overall distribution of peak weeks). Additionally, we restricted the possible values of α𝛼\alphaitalic_α to always satisfy α=bγ+b𝛼𝑏𝛾𝑏\alpha=\frac{b}{\gamma+b}italic_α = divide start_ARG italic_b end_ARG start_ARG italic_γ + italic_b end_ARG (Wang et al.,, 2012).

This fitting process is computationally more efficient than to fit the SIR ordinary differential equations, since it avoids numerical integration steps. The optimization was implemented with the lmfit Python package111https://lmfit.github.io/lmfit-py/intro.html, using a global optimization approach based on the differential evolution algorithm (Storn and Price,, 1997). Using the estimated epidemiological parameters, we explored their association with climate variables across time and space.

2.5 Predicting the week of epidemic peak

One question of interest is how early the epidemic peak will occur in a given year, in a given place. To test if the dengue epidemic descriptors, estimated by Episcanner, show a correlation to epidemiological, demographic and climate variables, we built a Histogram Gradient Boosting Regression (HGBR) model. The outcome of the model is the peak week of epidemics, Wi,ypeaksubscriptsuperscript𝑊𝑝𝑒𝑎𝑘𝑖𝑦W^{peak}_{i,y}italic_W start_POSTSUPERSCRIPT italic_p italic_e italic_a italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_y end_POSTSUBSCRIPTfor every epidemic year in every location. Predictors included lagged climate variables, case counts and population sizes, as well as lagged epidemic descriptors. The regression model proposed can be written as below

Wi,ypeakCi,y1+Di,y1+Ei,y1+ε,similar-tosubscriptsuperscript𝑊𝑝𝑒𝑎𝑘𝑖𝑦subscript𝐶𝑖𝑦1subscript𝐷𝑖𝑦1subscript𝐸𝑖𝑦1𝜀\displaystyle\begin{split}W^{peak}_{i,y}\sim C_{i,y-1}+D_{i,y-1}+E_{i,y-1}+% \varepsilon,\end{split}start_ROW start_CELL italic_W start_POSTSUPERSCRIPT italic_p italic_e italic_a italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_y end_POSTSUBSCRIPT ∼ italic_C start_POSTSUBSCRIPT italic_i , italic_y - 1 end_POSTSUBSCRIPT + italic_D start_POSTSUBSCRIPT italic_i , italic_y - 1 end_POSTSUBSCRIPT + italic_E start_POSTSUBSCRIPT italic_i , italic_y - 1 end_POSTSUBSCRIPT + italic_ε , end_CELL end_ROW (6)

where Wi,ypeaksubscriptsuperscript𝑊𝑝𝑒𝑎𝑘𝑖𝑦W^{peak}_{i,y}italic_W start_POSTSUPERSCRIPT italic_p italic_e italic_a italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_y end_POSTSUBSCRIPT is the peak week of the epidemic at municipality i𝑖iitalic_i on year y𝑦yitalic_y. Ci,y1subscript𝐶𝑖𝑦1C_{i,y-1}italic_C start_POSTSUBSCRIPT italic_i , italic_y - 1 end_POSTSUBSCRIPT, Di,y1subscript𝐷𝑖𝑦1D_{i,y-1}italic_D start_POSTSUBSCRIPT italic_i , italic_y - 1 end_POSTSUBSCRIPT and Ei,y1subscript𝐸𝑖𝑦1E_{i,y-1}italic_E start_POSTSUBSCRIPT italic_i , italic_y - 1 end_POSTSUBSCRIPT are respective sets of climate, demographic, and epidemiological predictors. A comprehensive description of all model features is provided in Table S1 of the supplementary material. We used the Scikit-learn Python package to fit the model (https://scikit-learn.org).

The model was fitted using data encompassing all Brazilian cities for the entire period, filtered as described above for epidemic parameter estimation. A single model was fitted to each geographical region of Brazil. Brazil has 5 geographical regions, namely: north (N), northeast (NE), midwest (MW), southeast (SE) and south (S). This model was fitted only for dengue due to its broader geographic coverage. We used Shapley Additive Explanations (SHAP values) framework (Lundberg and Lee,, 2017) to analyze the importance of each feature (predictor) of the model.

3 Results

3.1 Spatio-temporal patterns of dengue and chikungunya epidemics

Out of the 5570 Brazilian cities, 4096 (73%) had at least one epidemic year for dengue and 1263 cities (22%) for chikungunya between 2010 and 2023. In total, there were 18567 dengue epidemic events and 2199 chikungunya epidemic events. Table 1 contains summary statistics for some of the parameters estimated. The full set of parameters also includes the SIR model’s β𝛽\betaitalic_β and γ𝛾\gammaitalic_γ parameters, the α𝛼\alphaitalic_α parameter from the Richards equation, and the starting and ending weeks of the epidemic.

Table 1: Summary statistics for selected parameters estimated for dengue and chikungunya across all epidemics.
dengue 0subscript0{\cal R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT duration peak week total cases incidence (per 100k)
mean 1.97 26.62 22.04 992.09 2146.22
std 0.52 10.38 5.80 4262.09 2841.39
25% 1.60 19.00 18.40 113.75 523.45
50% 1.86 25.00 22.11 240.40 1203.41
75% 2.20 33.00 25.24 624.70 2626.13
chikungunya
mean 2.12 25.47 24.69 624.43 1156.76
std 0.70 12.17 7.29 2341.62 1620.65
25% 1.60 16.0 19.78 98.65 213.80
50% 1.94 24.0 24.68 195.33 594.58
75% 2.45 34.0 29.51 461.99 1436.29

The spatial distribution of epidemic parameters (0subscript0{\cal R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, peak week, and epidemic duration) exhibit discernible patterns across different regions of Brazil. Notably, the average duration of dengue epidemics exhibits an ascending gradient from the southeast (SE) to the northwest (NW) regions, see Figure 3. The reproductive numbers of both dengue and chikungunya exhibited spatial and temporal variability across the country (see Figure S1 of the supplementary material). This variability is evident not only between different geographic regions but also over time.

Refer to caption
Figure 3: Median Epidemic duration (in weeks) estimated for Dengue and Chikungunya cases. Note the increase in the duration of dengue epidemics as one moves from the southern to the northern states. This trend is less marked but also visible for chikungunya epidemics. In the smaller map in the middle, we have the geographic regions of Brazil: North(N), Northeast (NE), Midwest (MW), Southeast (SE), and South (S).

The estimated duration of dengue epidemics varied from short (<10absent10<10< 10 weeks) to year-long. Longer epidemics were seen mostly in the North region (Figure 3). The peak week tended to occur earlier for dengue than chikungunya and in the west earlier than the east side of the country (see Figure 4). Looking at the country-wide time series for dengue, we can see that higher incidence peaks are associated with shorter epidemics with higher 0subscript0{\cal R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (Figure 5).

Refer to caption
Figure 4: Median peak weeks estimated for Dengue and Chikungunya epidemics between 2010 and 2023 by municipality. The inset map shows Brazil’s geographic regions: North(N), Northeast (NE), Midwest (MW), Southeast (SE), and South (S).
Refer to caption
Figure 5: The boxplots, from top to bottom, show the estimated peak week, the estimated basic reproduction number, and the epidemic duration in weeks, for dengue epidemics by year. Above, we find the multivariate ENSO index v2 series, and the last panel is the time series of dengue cases in Brazil between November 2010 and November 2023.

A nonlinear relationship was seen between epidemics duration and R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (see Figure S2). Epidemics with high R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT tended to be shorter, while outbreaks with low R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT tended to last longer. This trend was also discernible, albeit less prominently, in the context of chikungunya epidemics, see Figure S2 of the supplementary material.

Refer to caption
Figure 6: Estimated duration of the dengue epidemic as a function of estimated 0subscript0{\cal R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT.Longer epidemics are associated with lower estimates for R0. Dots are individual cities.

3.2 Earlier epidemics and climate

The Figure 7 depicts the HGBR model’s performance in predicting the dengue epidemic peak week as a function of climate and epidemiological covariates measured one year before. The model provided more accurate predictions in the South, Southeast and Midwest, where seasonality is stronger.

Refer to caption
Figure 7: Performance of the Histogram Gradient Boosting regression model trained with the entire dataset. Each point represents the predicted vs. observed weeks for a city. The dashed black line represents the identity.

When we look at the importance of the model’s features (Figure 8), in all regions, the features related to the peak week, population, and cases in the previous year were always among the top ten most important features. Also, all regions had at least two climate predictors in the top ten, with the average temperature in January of the current year being the most common among them.

Refer to caption
Figure 8: The top 10 most important predictors in each region, encoded as SHAP values. The SHAP scale represents how much each feature affects the model’s output (peak week), with positive values associated with an increase in peak week and negative with a decrease. Each dot represents one epidemic. Their colour is mapped to the value of each feature, with blue standing for lower values and pink for higher. The 0 in the SHAP scale is aligned with the expected value of each feature above.

Ocean surface temperature oscillations, as measured by the El niño/La niña index (ENSO), also stood out as an important feature to explain variations in the timing of epidemics. We looked specifically at the value of the ENSO index in the last quarter of the previous year. Positive average ENSO values (El Niño) in that quarter were associated with earlier epidemics in the following year (Figure 9), while negative 4th quarter ENSO index led to later epidemics.

Refer to caption
Figure 9: Estimated median peak week of dengue epidemics by region for El Niño and La Niña years. La Niña years, in this analysis, are years when the mean ENSO index in the last quarter of the previous year was below 0, and El niño years are the ones where the mean ENSO index in the last quarter of the year was above 0.

4 Discussion

This study introduces a computational pipeline (Figure 1) capable of processing incidence data from all municipalities and estimating key epidemic statistics in a standardized format. To achieve the required computational scalability, we employed optimization techniques to fit the Richards model, which has a one-to-one relationship with the archetypical SIR mathematical model. Even though the SIR model is an approximation to the transmission mechanisms taking place in the field, we believe that this simplified description of the epidemics is a small price to pay in exchange for developing a consistent set of epidemic parameters over a large spatio-temporal range. Using more detailed transmission models would be undesirable, not only because of the added computational cost to fit but also for the added uncertainty associated with the estimation of a larger number of parameters subject to identifiability issues (Gallo et al.,, 2022; Kao and Eisenberg,, 2018).

Following parameter estimation, a regression analysis was conducted to elucidate the influence of local and global climate patterns on epidemic parameters. Although not exhaustive, these exploratory analyses revealed a discernible association between the El Niño Southern Oscillation (ENSO) and epidemic timing (Figure 9), represented by the peak week of the epidemic. The regression analysis results show that various epidemiological, demographic, and climatic factors determine the shape of future epidemics. This association has been reported before for Singapore (Earnest et al.,, 2012), Thailand (Tipayamongkholgul et al.,, 2009), and other regions of the world (Johansson et al.,, 2009; Moraes et al.,, 2019; Banu et al.,, 2015; Do et al.,, 2014). However, our study differentiates itself by the larger spatio-temporal scale of the analysis and for generating an open dataset that enables further explorations and will be continuously updated.

The observed association between 0subscript0{\cal R}_{0}caligraphic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and epidemic duration, shown in figure S2, exhibits remarkable consistency, partly reflecting the interdependence between these parameters within the transmission model. Given the environmental influences driving epidemics, for example, the well-described correlation between transmission and temperature, it is noteworthy that these two parameters remain highly correlated, as per theoretical expectations, that assume constant transmission rates and endogenous determination of duration (by depletion of susceptibles). The small size of the epidemics (relative to population size) suggests that such depletion is not to be expected, except in the hypothesis of a massive underreporting of cases in most cities, which could mask larger attack ratios.

The extensive dataset of epidemiological parameters generated by Episcanner has demonstrated utility in developing predictive models, particularly when integrated with complementary datasets such as climate and demographic series. Although predictive modelling was only superficially explored in this study, its potential for future research is considerable. To our knowledge, this dataset represents a unique large-scale compilation of epidemic parameters estimated using a consistent methodology, enabling robust comparisons. We anticipate that statistical modellers will leverage this open dataset for future investigations.

5 Conclusion

The methodological framework introduced in this paper facilitates the comprehensive characterization of epidemic dynamics across both temporal and spatial dimensions. This approach offers invaluable insights into the transmission patterns of arbovirus diseases, enabling public health authorities to devise timely interventions to mitigate disease spread effectively. Furthermore, the identification of spatial variations in epidemic parameters underscores the imperative for region-specific strategies in disease surveillance and control initiatives.

The predictive potential for this parameter set has been demonstrated and can be further elaborated to reveal more specific associations at smaller spatial scales. One limitation of such models is the lack of data on the immunological status of the population, i.e., the proportion of the population previously exposed to each dengue serotype. Data about the proportion of cases infected with each DENV type per week or at least per year would contribute significantly to understanding these epidemics’ long-term dynamics. We believe that making such data available would be the most important governmental investment in controlling arbovirus diseases.

The epidemiological parameters used in this paper are updated weekly and are made available through Episcanner, a publicly available web dashboard (Coelho,, 2024).

Continuous monitoring of the epidemiological characteristics (described in section 2.3) of dengue and chikungunya can be a key tool for efficient control of their incidence and morbidity. When it comes to accurate epidemiological assessments, methodological consistency over time and geographical space is indispensable. In particular, for robust comparisons across time and space, employing the same analytical methods is crucial to ensure that methodological differences are not confounded with data variations.

This study elucidates longer-term association patterns between dengue epidemics’ intensity and global climate variations, as exemplified by the multivariate ENSO index. These findings reinforce the longstanding hypothesis (Reiter,, 2001) that the ongoing escalation in global average temperatures will inevitably exacerbate the burden of mosquito-borne diseases.

Acknowledgment

This work was funded by a grant from the Wellcome Trust (226088/Z/22/Z).

References

  • Banu et al., (2015) Banu, S., Guo, Y., Hu, W., Dale, P., Mackenzie, J. S., Mengersen, K., and Tong, S. (2015). Impacts of el niño southern oscillation and indian ocean dipole on dengue incidence in bangladesh. Scientific reports, 5(1):16105.
  • Codeco et al., (2018) Codeco, C., Coelho, F., Cruz, O., Oliveira, S., Castro, T., and Bastos, L. (2018). Infodengue: A nowcasting system for the surveillance of arboviruses in Brazil. Revue d’Épidémiologie et de Santé Publique, 66:S386.
  • Codeco et al., (2022) Codeco, C. T., Oliveira, S. S., Ferreira, D. A., Riback, T. I., Bastos, L. S., Lana, R. M., Almeida, I., Godinho, V. B., Cruz, O. G., and Coelho, F. C. (2022). Fast expansion of dengue in Brazil. The Lancet Regional Health–Americas, 12.
  • Coelho, (2024) Coelho, F. C. (2024).
  • de Almeida et al., (2023) de Almeida, I. F., Codeço, C. T., Lana, R. M., Bastos, L. S., de Souza Oliveira, S., da Cruz Ferreira, D. A., Godinho, V. B., Riback, T. I. S., Cruz, O. G., and Coelho, F. C. (2023). The expansion of chikungunya in brazil. The Lancet Regional Health–Americas, 25.
  • de Almeida et al., (2022) de Almeida, I. F., Lana, R. M., and Codeço, C. T. (2022). How heterogeneous is the dengue transmission profile in brazil? a study in six brazilian states. PLOS Neglected Tropical Diseases, 16(9):1–20.
  • de Souza et al., (2023) de Souza, W. M., de Lima, S. T., Mello, L. M. S., Candido, D. S., Buss, L., Whittaker, C., Claro, I. M., Chandradeva, N., Granja, F., de Jesus, R., et al. (2023). Spatiotemporal dynamics and recurrence of chikungunya virus in brazil: an epidemiological study. The Lancet Microbe, 4(5):e319–e329.
  • Do et al., (2014) Do, T. T. T., Martens, P., Luu, N. H., Wright, P., and Choisy, M. (2014). Climatic-driven seasonality of emerging dengue fever in Hanoi, vietnam. BMC public health, 14:1–10.
  • Earnest et al., (2012) Earnest, A., Tan, S., and Wilder-Smith, A. (2012). Meteorological factors and el nino southern oscillation are independently associated with dengue infections. Epidemiology & Infection, 140(7):1244–1251.
  • Gallo et al., (2022) Gallo, L., Frasca, M., Latora, V., and Russo, G. (2022). Lack of practical identifiability may hamper reliable predictions in covid-19 epidemic models. Science advances, 8(3):eabg5234.
  • Johansson et al., (2009) Johansson, M. A., Cummings, D. A., and Glass, G. E. (2009). Multiyear climate variability and dengue—el nino southern oscillation, weather, and dengue incidence in puerto rico, mexico, and thailand: a longitudinal data analysis. PLoS medicine, 6(11):e1000168.
  • Kao and Eisenberg, (2018) Kao, Y.-H. and Eisenberg, M. C. (2018). Practical unidentifiability of a simple vector-borne disease model: Implications for parameter estimation and intervention assessment. Epidemics, 25:89–100.
  • Lundberg and Lee, (2017) Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
  • Manzoor et al., (2022) Manzoor, K. N., Javed, F., Ejaz, M., Ali, M., Mujaddadi, N., Khan, A. A., Khattak, A. A., Zaib, A., Ahmad, I., Saeed, W. K., and Manzoor, S. (2022). The global emergence of chikungunya infection: An integrated view. Reviews in Medical Virology, 32(3):e2287.
  • Moraes et al., (2019) Moraes, B. C. d., Souza, E. B. d., Sodré, G. R. C., Ferreira, D. B. d. S., and Ribeiro, J. B. M. (2019). Sazonalidade nas notificações de dengue das capitais da amazônia e os impactos do el niño/la niña. Cadernos de Saúde Pública, 35:e00123417.
  • Muñoz Sabater and Service, (2019) Muñoz Sabater, J. and Service, C. C. C. (2019). Era5-land hourly data from 1950 to present.
  • Reiter, (2001) Reiter, P. (2001). Climate change and mosquito-borne disease. Environmental health perspectives, 109(suppl 1):141–161.
  • Rodriguez-Barraquer et al., (2011) Rodriguez-Barraquer, I., Cordeiro, M. T., Braga, C., de Souza, W. V., Marques, E. T., and Cummings, D. A. (2011). From re-emergence to hyperendemicity: the natural history of the dengue epidemic in brazil. PLoS neglected tropical diseases, 5(1):e935.
  • Storn and Price, (1997) Storn, R. and Price, K. (1997). Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11(4):341–359.
  • Tipayamongkholgul et al., (2009) Tipayamongkholgul, M., Fang, C.-T., Klinchan, S., Liu, C.-M., and King, C.-C. (2009). Effects of the el niño-southern oscillation on dengue epidemics in thailand, 1996-2005. BMC public health, 9:1–15.
  • Wang et al., (2012) Wang, X.-S., Wu, J., and Yang, Y. (2012). Richards model revisited: Validation by and application to infection dynamics. Journal of theoretical biology, 313:12–19.
  • Yang et al., (2021) Yang, X., Quam, M. B. M., Zhang, T., and Sang, S. (2021). Global burden for dengue and the evolving pattern in the past 30 years. Journal of Travel Medicine, 28(8):taab146.
Table S1: Description of the features used in the Histogram Gradient Boosting Regressor model. The climatic data are obtained daily, and the number of cases is obtained weekly. In the features identified with long tails as cases and rainy days, a log transformation was applied to improve the model performance.
Feature description Type
year: year of the peak week that are being predicted. temporal
casos_01: sum of cases in the January of the year whose peak week are being predicted. epidemiological
casos_1_3: sum of cases on the third quarter of the previous year. epidemiological
casos_1_4: sum of cases on the fourth quarter of the previous year. epidemiological
populacao_1: population in the previous year. demographic
peak_week_1: peak week estimated on the previous year. epidemiological
R0_1: reproduction number estimated on the previous year. epidemiological
ep_dur_1: epidemic duration estimated on the previous year. epidemiological
dummy_ep: 1 if the previous year were an epidemic identified by Episcanner and 0 otherwise. epidemiological
temp_med_4: average of the average temperature over the fourth quarter of the previous year. climatic
temp_amp_4: average of the temperature amplitude over the fourth quarter of the previous year. climatic
temp_max_4: average of the maximum temperature over the fourth quarter of the previous year. climatic
temp_min_4: average of the minimum temperature over the fourth quarter of the previous year. climatic
umid_min_4: average of the minimum humidity over the fourth quarter of the previous year. climatic
umid_max_4: average of the maximum humidity over the fourth quarter of the previous year. climatic
umid_amp_4:average of the humidity amplitude over the fourth quarter of the previous year. climatic
enso_4: average of the multivariate ENSO (El Niño-Southern Oscillation) in the fourth quarter of the previous year. climatic
precip_tot_4: sum of the total precipitation over the fourth quarter of the previous year. climatic
rainy_day_4: sum of days with rain (precipitation above zero) over the fourth quarter of the previous year. climatic
thr_temp_min_4: sum of days with minimum temperature below 15-celsius degrees climatic
thr_temp_amp_4: sum of days with temperature amplitude above celsius degrees. climatic
thr_umid_med_4: sum of days with average humidity above 0.8. climatic
temp_med_1_current: average of the average temperature over the january of the year whose peak week is being predicted. climatic
temp_amp_1_current: average of the temperature amplitude over the january of the year whose peak week is being predicted. climatic
temp_max_1_current: average of the maximum temperature over the january of the year whose peak week is being predicted. climatic
temp_min_1_current: average of the minimum temperature over the january of the year whose peak week is being predicted. climatic
precip_tot_1_current: sum of total precipitation over the january of the year whose peak week is being predicted. climatic
rainy_day_1_current: sum of days with precipitation over the january of the year whose peak week is being predicted. climatic
enso_1_current: average of the ENSO (El Niño-Southern Oscillation) over the January of the year whose peak week is being predicted. climatic
latitude: latitude of the city center. spatial
longitude: longitude of the city center. spatial
Refer to caption
Figure S1: Median of the reproduction number (R0subscript𝑅0R_{0}italic_R start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT) estimated for Dengue and Chikungunya cases. In the middle map, N refers to the North, NE to the North East, MW to the Midwest, SE to the Southeast, and S to the South.
Refer to caption
Figure S2: Duration of the chikungunya epidemic as a function of R0. Longer epidemics are associated with lower estimates for R0.