Estimation of COVID-19 under-reporting in Brazilian States through SARI


Due to its impact, COVID-19 has been stressing the academy to search for curing, mitigating, or controlling it. However, when it comes to controlling, there are still few studies focused on under-reporting estimates. It is believed that under-reporting is a relevant factor in determining the actual mortality rate and, if not considered, can cause significant misinformation.

Therefore, we condicted a study to estimate the under-reporting of cases and deaths of COVID-19 in Brazilian states using data from the Infogripe on notification of Severe Acute Respiratory Infection (SARI). The methodology is based on the concepts of inertia and the use of event detection techniques to study the time series of hospitalized SARI cases.

The methodology is based on the combination of data analytics (event detection methods) and time series modeling (inertia and novelty concepts) over hospitalized SARI cases. The estimate of real cases of the disease, called novelty, is calculated by comparing the difference in SARI cases in 2020 (after COVID-19) with the total expected cases in recent years (2016 to 2019). The expected cases are derived from a seasonal exponential moving average.

Under-reporting rates of cases of COVID-19 for the states of Brazil

Under-reporting rates of deaths by COVID-19 for the states of Brazil

Data and Code

The code description and Jupyter notebook (implemented in R) complements this work. This material can be found in the repository: Covid19_BR_underreport

In it, it is possible to check the entire process on the calculation of the under-reporting rates and all numerical and graphical results.

Event Detection

Event detection methods include the discovery of anomaly and change points. Anomalies are observations that stand out because they do not appear to have been generated by the same process as the other observations in the time series. Change points characterize a transition between different states in a process that generates the time series data.

There are several methods to address the detection of anomalies and change points. Among them, there are methods that consider the effects of inertia on time series data. As this work is based on inertial concepts, we use two methods of this group.

Change Finder

Change Finder is a technique that detects change points in univariate time series data. Given a time serie, the event detection process consists of two phases. In the first phase, outliers are detected. In the second phase, change points are detected.

For more information see:

 Takeuchi, J.-I., and K. Yamanishi. 2006. “A Unifying Framework for Detecting Outliers and Change Points from Time Series.” IEEE Transactions on Knowledge and Data Engineering 18 (4): 482–92.

Adaptative Normalization

Adaptive Normalization is used to detect anomalies. This technique uses inertia to address heteroscedastic non-stationary series. Given a time series, the outlier removal process consists of three stages: (i) inertia calculation, (ii) noise calculation, and (iii) anomaly identification.

For more information see:

Ogasawara, E., L.C. Martinez, D. De Oliveira, G. Zimbrão, G.L. Pappa, and M. Mattoso. 2010. “Adaptive Normalization: A Novel Data Normalization Approach for Non-Stationary Time Series.” In Proceedings of the International Joint Conference on Neural Networks.

Events detected in the SARI cases (left) and deaths (right) curves in Brazilian States. The yellow dots mark anomalies (Adaptive Normalization), and the red dotted lines mark the change points (Change Finder).



Evolution of the under-reporting rates

In order to create a better characterize the behavior of underrates-report, we analyze them week by week.

The lack of tests for the population results in an increased rate of under-report in the beginning. Over time, tests are expected to occur more, and the rates start to decrease.

As it can be observed, under-report rates tend to stabilize throughout time. This convergence enables more confidence in computed under-report rates.


Data analytics ensures transparency and consistency in the choice of the adopted parameters. In contrast, novelty and inertia enable a comprehensible approach to estimate under-report. 

COVID-19 causes a rupture in the SARI series inertial behavior, changing the statistical properties of the time series. Event detection techniques identify this rupture. Assuming that the change occurred is due to COVID-19, the computed novelty then corresponds to estimates of the values of cases and deaths from the disease. From this, under-reporting rates were computed for both cases and deaths. 

The rates of under-reporting of cases were estimated for all states except for Mato Grosso do Sul. The values vary between 0.124 (Espírito Santo) and 1.811 (Minas Gerais), thus reaching almost two under-reported cases for each notified case. The novelty observed by our SARI analysis in the states is lower, in their majority, compared to the cases reported by the Ministry of Health. It is expected since many diagnosed cases of COVID-19 are asymptomatic.

Under-reporting rates for deaths were estimated for 25 of the 27 states in Brazil. For the states of Acre and Mato Grosso do Sul, the under-report was not verified and, therefore, death rates were not calculated for these states. Rates vary between 0.072 (Espírito Santo) and 0.983 (the Rio Grande do Sul), thus indicating that there may be more than twice as many deaths as reported. The novelties for deaths cases using SARI analysis in the states are commonly higher when compared to the deaths notified by the Ministry of Health. It helps to corroborate the justification that the death rates are better estimated since SARI covers most of the individuals who die.

No pattern of behavior was observed for the events detected or for the evolution and values of under-reporting rates between states in the same Brazilian region. Therefore, it is observed that the states behave in different and independent ways concerning the occurrence/notification of COVID-19.

The methodology developed in this paper can be adapted to support the under-report rate for other diseases as long as it exists a proxy variable that presents an inertial behavior.  Besides, the methodology is also able to support the detection of outbreaks, as it uses both the combination of event detection and inertia concepts. 

Balthazar Paixão

Marcel Pedroso

Rebecca Salles

Luciana Escobar

Carlos de Sousa