Orthographic Educational Game for Portuguese Language Countries

eogasawara setembro 21, 2020

Authors: Paula Chaves, Luan Paschoal, Tauan Velasco, Tiago Bento Sampaio,
Julliany Brandão, Carlos Otávio Schocair, João Quadros, Talita Oliveira, Eduardo Ogasawara

Abstract

The new orthographic agreement introduces some changes in the vocabulary of the Portuguese language. Although these changes have modified a small percentage of the vocabulary words, people are struggling to adapt to some of the new orthographic rules. Aiming to mitigate this problem using a ludic approach, we developed Orthographic Educational Game (JOE). JOE focuses predominantly on the rules of accents and hyphens. The game is divided into two modes: training and playing. In the playing mode, the current level of knowledge of the player in orthography is checked and measured. In the training mode, each word comes with a hint related to the rule that is being practiced at the moment. The game was evaluated through an experiment with both undergraduates and high school students. The results indicated that more than 80\% of students enjoyed learning orthography through the game-based approach of JOE..

JOE at PlayStore

TSPred Package for R: Functions for Benchmarking Time Series Prediction

eogasawara setembro 21, 2020

Student: Rebecca Pontes Salles (rebeccapsalles@acm.org)

Advisor: Eduardo Ogasawara (eogasawara@ieee.org)

Description: Functions for time series prediction and accuracy assessment using automatic linear modeling. The generated linear models and its yielded prediction errors can be used for benchmarking other time series prediction methods and for creating a demand for the refinement of such methods. For this purpose, benchmark data from prediction competitions may be used.

Available at CRAN: https://CRAN.R-project.org/package=TSPred

Code repository at Git-Hub: https://github.com/RebeccaSalles/TSPred

Reference manual: TSPred.pdf

Acknowledgments: The authors thank CNPq for partially sponsoring this work.

Amê: An Environment to Learn and Analyze Adversarial Search Algorithms Using Stochastic Card Games

eogasawara setembro 21, 2020

Students: Ana Beatriz Cruz, Sabrina Seriques, and Leonardo Preuss

Advisor: Eduardo Ogasawara

Abstract:

Computer Science students are usually enthusiastic about learning Artificial Intelligence (AI) due to the possibility of developing computer games that incorporate AI behaviors. Under this scenario, Search Algorithms (SA) are a fundamental AI subject for various games. Implementing deterministic games, varying from tic-tac-toe to chess games, are common used to teach AI. Considering the perspective of game playing, however, stochastic games are usually more fun to play and are not much explored during the AI learning process. Other approaches in AI learning include developing search algorithms to compete against each other. These approaches are relevant and engaging but lack an environment that features both algorithm design and benchmarking capabilities. To address this issue, we present Amê – an environment to support the learning process and analysis of adversarial search algorithms using a stochastic card game. We have conducted a pilot experiment with Computer Science students that developed different adversarial search algorithms for Hanafuda (a traditional Japanese card game).

Data Privacy

While playing and using AME, no user data is sent or stored at AME. Only scores associated with your gameplay are sent so that you can access your ranking data.

Published paper

Amê at Google Playstore

Manual

Supporting the Learning of Evolution Theory Using an Educational Simulator

eogasawara setembro 21, 2020

Students: Diego Vaz Caetano, Josué Dias Cardoso and Luana Guimarães Piani Ferreira
Collaborators: Raphael Abreu, João Quadros, Joel Santos
Advisors: Leonardo Lignani, Eduardo Ogasawara

Abstract

The use of simulators as educational tools aims to increase students’ engagement in classes and seems to help them to understand difficult concepts. They can be used in natural science classes as an alternative for practical or experimental approaches when the time and space scales required are not compatible with the scholar environment. Through interactive simulators, students can explore the topic under study. Discoveries are made, predictions are confirmed or refuted by subsequent simulations, enhancing the comprehension of the phenomenon. This article presents Sim-Evolution, an educational simulator to help teachers presenting Charles Darwin’s Theory of Evolution by Natural Selection (TENS). Our intention with Sim-Evolution is to enable students to practice and comprehend TENS as a process that occurs at the population level. Given that it focuses on High School level, its interface was designed to be joyful, helping to engage students. We developed Sim-Evolution focusing on three basic biological principles that structure TENS: (i) variation, (ii) heredity and (iii) selection. Also, simulation design is based on Mendelian Genetics and population genetics. However, we intended that knowledge of these areas should not be a requisite for using Sim-Evolution. Therefore, students can observe laws of evolution and the genetic properties (genotypes) by analyzing species phenotypes and surviving populations. Sim-Evolution was evaluated by High-School students in a Biology class. Experiments indicate that students could observe TENS as a population process and were able to identify the principles of variation, heredity, and selection by indirect analysis from living species phenotypes.

App

Sim-Evolution at Google Play store
User manual

Experimental Evaluation

Evaluation Procedure

Evaluation Form

Source code at GitHub

Privacy policy

We understand that user privacy must be protected and, while our application does not collect any personal information from the user or device, we are committed to transparency and therefore feel obligated to develop this Privacy Policy, the To help the user understand what data we collect, why we collect them, and what we do with them.

Information we collect and how we use it

Our application has educational purposes and was developed under the coordination of the Computer Science Departament of CEFET/RJ (http://eic.cefet-rj.br). We collect the following application information for statistical research purposes:

Application Name
Application version
Date / Time of application usage
Simulation duration time (reported in simulation time counter)
Type of environment used for the simulation (Forest, Savannah or Custom)
Types of birds selected for the beginning of the simulation
Types of birds existing at the close of the simulation, with their respective quantities

The information is collected and transmitted automatically at the end of each simulation when the user exits the simulation screen.

We use this information for application usage search purposes and also for possible application feature enhancements.

When this Privacy Policy applies

Our Privacy Policy applies to Sim-Evolution and its built-in features but excludes services that Google offers on Android devices.

Our Privacy Policy does not apply to services offered by other companies or individuals, including products or websites that may be displayed to the user in search results, sites that may include Sim-Evolution services or other sites with links to our services. Our Privacy Policy does not control the information practices of other companies and organizations that advertise our services and may use cookies, pixel tags, and other technologies to deliver relevant ads.

If you have any questions or concerns about our privacy policy or our practices, please contact us at gpcacefetrj@gmail.com.

Brazilian Flight Dataset

eogasawara setembro 19, 2020

Abstract

This webpage provides information about the Brazilian Flight Dataset (BFD). The description includes the data sources from regular flights (VRA) and weather data. The VRA is provided by ANAC (Agência Nacional de Aviação Civil) and contains all Brazilian flights. The weather data is provided by ASOS (Automated Surface Observing Systems). ASOS is managed by IOWA University (USA). It contains weather data collected by sensors installed at airports around the world.

Data Sources

The VRA dataset contains departure and arrival data for Brazilian domestic flights with improvements in the loading process were carried out in order to:

1. Improve data quality;
2. Make this dataset easier to use; and
3. Contemplate the period from 2000 to 2019 in a single dataset;

Through the URL http://www.anac.gov.br/assuntos/dados-e-estatisticas/historico-de-voos it is possible to consult and obtain data for the period from Jan 1st, 2000 to Dec 31st, 2019.

ASOS (Automated Surface Observing Systems) is a program that involves several agencies American government agencies. It was created to form an official network of meteorological information to support primarily aviation entities. It includes information regarding meteorological, climatological, and hydrological research.

The Department of Agronomy at Iowa State University, in the United States, compiles daily not only information from the US ASOS system, but also various entities linked to civil and military aviation from all over the planet.

This university makes this data freely available for download through the website
https://mesonet.agron.iastate.edu/request/download.phtml

BFD Extraction Transformation and Load (ETL) process

The ETL process can be accessed at https://github.com/cefet-rj-dal/BFD/blob/master/VRA_ASOS_Datasets_Integration_Report.ipynb

BFD Exploratory Data Analysis (EDA) graphs

The EDA graphs can be accessed at https://github.com/cefet-rj-dal/BFD/blob/master/graphsDataPaper.ipynb

BFD dataset repository

The Brazilian flight dataset (BFD) and its documentation are available at IEEE Data Port at http://dx.doi.org/10.21227/k10b-qn21

An Analysis of Brazilian Flight Delays Based on Frequent Patterns

eogasawara setembro 19, 2020

Authors: Alice Sternberg, Diego Carvalho, Leonardo Murta, Jorge Soares and Eduardo Ogasawara

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ)

Abstract: In this paper, we applied data indexing techniques combined with association rules to unveil hidden patterns of flight delays. Considering Brazilian flight data and guided by six research questions related to causes, moments, differences, and relationships between airports and airlines, we evaluated and quantified all attributes that may lead to delays, showing not only the main patterns, but also their chances of occurrence in the entire network, in each airport and airline. We observed that Brazilian flight system has difficulties to recover from previous delays and when operating under adverse meteorological conditions, delays occurrences may increase up to 216%.

Acknowledgments: The authors thank CNPq and FAPERJ for partially sponsoring this research.

Arules Package for R: Functions for mining association rules and frequent itemsets. The Apriori algorithm is intended to be used on the generation of association rules for flight delays. Restricting the right-hand side of the rule to a flight delay may show its reasons on the left-hand side. For this purpose, in order to understand domestic delays in Brazil, a data set containing flight and meteorological data was built and evaluated through the association rules generated by Apriori.

Available at CRAN: https://cran.r-project.org/web/packages/arules/index.html
Reference manual: https://cran.r-project.org/web/packages/arules/arules.pdf

 
#Install Apriori package
install.packages(&amp;quot;arules&amp;quot;)

#Load Arules package
library(&amp;quot;arules&amp;quot;)

data(flightBR)
rules_delay &amp;lt;- apriori(flightBR,parameter=list(supp = 0.00077, conf = 0.2276, minlen=2, maxlen= 4, target = &amp;quot;rules&amp;quot;),
appearance=list(rhs = c(&amp;quot;delay_dep=1&amp;quot;),default=&amp;quot;lhs&amp;quot;), control=NULL)
#delay_dep=1 means a departure delay or a cancellation

The Arules R-Package enables the generation of association rules using the Apriori algorithm. Restricting the right-hand side of the rule to delays and setting the thresholds for support, confidence and minimum and maximum lengths, we obtain the conditions that may explain the reasons for flight delays on the left-hand side of the rules. For this purpose, we built the flightBR data set after some preprocessing stages, such as integration of multiple sources, cleaning of discrepancies and outliers, selection of the main airports and airlines and transformation, in which we created 12 derived attributes using concept hierarchies, binning, and temporal aggregation. Thus, the flightBR data set contains Brazilian domestic and commercial flights data between January 2009 and February 2015.

flightBR data set: flightBR.RData

Firstly, the apriori function was applied to this dataset considering a support of 0.00077 (approximately equivalent to once per day), a confidence of 0.2276 (the total percentage of delays of the dataset), a minimum length of 2 and maximum length from 2 to 4, generating the following three sets of rules.

Rules of maximum length = 2: rules2.csv
Rules of maximum length = 3: rules3.csv
Rules of maximum length = 4: rules4.csv

Then, the rules were evaluated based on their lifts. The lift is a correlation measure between the conditions on the left-hand side and the consequent on the right-hand side, which in our case is a flight delay. When greater than 1, the chances of experiencing a delay grow with the increase of the lift.

We also generated some specific sets of rules considering some important attributes verified on the first analysis, such as the year of departure, the time of the day and their relationship with airports and the relationship between airlines and airports. For this purpose, support and confidence were very low in order to consider all the situations experienced by the flightBR flights.

Year of departure: year.csv
Time of departure: time_of_day.csv
Time of departure and airport: time_airport.csv
Airline and airport: airline_airport.csv

Finally, we add arrival attributes to the flightBR dataset, creating the flightBR_arr dataset, in order to compare departure and arrival delays. Using very low support and confidence, we investigated when a late departure can be recovered and transformed into a punctual arrival and when a punctual departure leads to a delayed arrival.

flightBR_arr data set: flightBR_arr.RData
Late departures and punctual arrivals: late_dep_punctual_arr.csv
Punctual departures and late arrivals: punctual_dep_late_arr.csv

Flight delay review

eogasawara julho 3, 2020

Systematic Review Data

survey-data

Reproducibility

The possibility for the reader to be able to reproduce all the results presented in papers is significant for the scientific method. Initiatives that publishes methods and experimental evaluation using active documents (such as Jupyter notebook) are relevant for support reproducibility. We have provided an example (analytics-example.ipynb) of a reproducible code that enables the comprehension of some data analytics methods presented in the paper.

Appendix

Flight delay appendix

Estimation of COVID-19 under-reporting in Brazilian States through SARI

eogasawara junho 14, 2020

Estimation of COVID-19 under-reporting in Brazilian States through SARI

Overview

Due to its impact, COVID-19 has been stressing the academy to search for curing, mitigating, or controlling it. However, when it comes to controlling, there are still few studies focused on under-reporting estimates. It is believed that under-reporting is a relevant factor in determining the actual mortality rate and, if not considered, can cause significant misinformation.

Therefore, we condicted a study to estimate the under-reporting of cases and deaths of COVID-19 in Brazilian states using data from the Infogripe on notification of Severe Acute Respiratory Infection (SARI). The methodology is based on the concepts of inertia and the use of event detection techniques to study the time series of hospitalized SARI cases.

The methodology is based on the combination of data analytics (event detection methods) and time series modeling (inertia and novelty concepts) over hospitalized SARI cases. The estimate of real cases of the disease, called novelty, is calculated by comparing the difference in SARI cases in 2020 (after COVID-19) with the total expected cases in recent years (2016 to 2019). The expected cases are derived from a seasonal exponential moving average.

Under-reporting rates of cases of COVID-19 for the states of Brazil

Under-reporting rates of deaths by COVID-19 for the states of Brazil

Data and Code

The code description and Jupyter notebook (implemented in R) complements this work. This material can be found in the repository: Covid19_BR_underreport

In it, it is possible to check the entire process on the calculation of the under-reporting rates and all numerical and graphical results.

Event Detection

Event detection methods include the discovery of anomaly and change points. Anomalies are observations that stand out because they do not appear to have been generated by the same process as the other observations in the time series. Change points characterize a transition between different states in a process that generates the time series data.

There are several methods to address the detection of anomalies and change points. Among them, there are methods that consider the effects of inertia on time series data. As this work is based on inertial concepts, we use two methods of this group.

Change Finder

Change Finder is a technique that detects change points in univariate time series data. Given a time serie, the event detection process consists of two phases. In the first phase, outliers are detected. In the second phase, change points are detected.

For more information see:

Takeuchi, J.-I., and K. Yamanishi. 2006. “A Unifying Framework for Detecting Outliers and Change Points from Time Series.” IEEE Transactions on Knowledge and Data Engineering 18 (4): 482–92.

Adaptative Normalization

Adaptive Normalization is used to detect anomalies. This technique uses inertia to address heteroscedastic non-stationary series. Given a time series, the outlier removal process consists of three stages: (i) inertia calculation, (ii) noise calculation, and (iii) anomaly identification.

For more information see:

Ogasawara, E., L.C. Martinez, D. De Oliveira, G. Zimbrão, G.L. Pappa, and M. Mattoso. 2010. “Adaptive Normalization: A Novel Data Normalization Approach for Non-Stationary Time Series.” In Proceedings of the International Joint Conference on Neural Networks.

Events detected in the SARI cases (left) and deaths (right) curves in Brazilian States. The yellow dots mark anomalies (Adaptive Normalization), and the red dotted lines mark the change points (Change Finder).

Cases

Deaths

Evolution of the under-reporting rates

In order to create a better characterize the behavior of underrates-report, we analyze them week by week.

The lack of tests for the population results in an increased rate of under-report in the beginning. Over time, tests are expected to occur more, and the rates start to decrease.

As it can be observed, under-report rates tend to stabilize throughout time. This convergence enables more confidence in computed under-report rates.

Conclusions

Data analytics ensures transparency and consistency in the choice of the adopted parameters. In contrast, novelty and inertia enable a comprehensible approach to estimate under-report.

COVID-19 causes a rupture in the SARI series inertial behavior, changing the statistical properties of the time series. Event detection techniques identify this rupture. Assuming that the change occurred is due to COVID-19, the computed novelty then corresponds to estimates of the values of cases and deaths from the disease. From this, under-reporting rates were computed for both cases and deaths.

The rates of under-reporting of cases were estimated for all states except for Mato Grosso do Sul. The values vary between 0.124 (Espírito Santo) and 1.811 (Minas Gerais), thus reaching almost two under-reported cases for each notified case. The novelty observed by our SARI analysis in the states is lower, in their majority, compared to the cases reported by the Ministry of Health. It is expected since many diagnosed cases of COVID-19 are asymptomatic.

Under-reporting rates for deaths were estimated for 25 of the 27 states in Brazil. For the states of Acre and Mato Grosso do Sul, the under-report was not verified and, therefore, death rates were not calculated for these states. Rates vary between 0.072 (Espírito Santo) and 0.983 (the Rio Grande do Sul), thus indicating that there may be more than twice as many deaths as reported. The novelties for deaths cases using SARI analysis in the states are commonly higher when compared to the deaths notified by the Ministry of Health. It helps to corroborate the justification that the death rates are better estimated since SARI covers most of the individuals who die.

No pattern of behavior was observed for the events detected or for the evolution and values of under-reporting rates between states in the same Brazilian region. Therefore, it is observed that the states behave in different and independent ways concerning the occurrence/notification of COVID-19.

The methodology developed in this paper can be adapted to support the under-report rate for other diseases as long as it exists a proxy variable that presents an inertial behavior. Besides, the methodology is also able to support the detection of outbreaks, as it uses both the combination of event detection and inertia concepts.