DAL Toolbox is a data analytics framework inspired by the Experiment Lines model. The package organizes, within an integrated environment, preprocessing, classification, regression, clustering, graphical analysis, and the construction of reproducible analytical pipelines. In the current package version, 1.3.727, the documentation was reorganized to support a guided learning track and more didactic thematic collections.
Didactic organization
The daltoolbox material is now organized around two complementary entry points. The first is a guided track, recommended for readers who want to learn the flow of an analytical experiment step by step. The second is composed of thematic collections, aimed at readers who want to study specific families of transformations, models, and visualizations.
This organization reinforces the central idea behind the framework: data analytics should not be treated as a loose sequence of isolated functions, but as a coherent workflow that integrates data preparation, modeling, evaluation, model comparison, visualization, and framework extension.
Available stages and methods
- Transformations:
sampling, data cleaning, outlier handling, scaling, categorical encoding, discretization, balancing, feature selection, dimensionality reduction, and curvature-based heuristics. - Classification:
baselines, decision trees, instance-based methods, probabilistic models, ensembles, support vector machines, neural networks, and hyperparameter selection. - Regression:
interpretable models, neighborhood-based methods, ensembles, margin-based regression, neural networks, and hyperparameter tuning. - Clustering:
partitional methods, medoid-based methods, density-based approaches, and model selection in unsupervised settings. - Graphics:
visualizations for category comparison, distribution analysis, relationships between variables, time series, and figure export for reports. - Customization:
integration of new transformations, classifiers, regressors, and clustering methods while preserving the framework contract. - Integration and extensibility:
support for integration with external libraries and complementary use of ecosystems such as Python when needed.
Architecture
The daltoolbox architecture was built to keep the experimental cycle of split, fit, predict, evaluate, and compare stable regardless of the method family being used. With a uniform data model and a consistent API, the framework supports reproducibility, extensibility, and integration across the different stages of the analytical process.
Installation
The stable version of DAL Toolbox on CRAN is available at: https://CRAN.R-project.org/package=daltoolbox
To install the stable CRAN version:
install.packages("daltoolbox")
To install the development version directly from GitHub:
library(devtools)
devtools::install_github("cefet-rj-dal/daltoolbox", force = TRUE, dependencies = FALSE, upgrade = "never")
Documentation and examples
The daltoolbox examples are organized into a guided track and thematic collections covering transformations, classification, regression, clustering, graphics, and customization:
https://github.com/cefet-rj-dal/daltoolbox/tree/main/examples
Guided track
The current guided track covers the full logic of an analytical experiment: first experiment, sampling strategies, data quality and cleaning, preprocessing, baselines, metrics, model comparison, tuning, end-to-end pipelines, regression, clustering, visual analysis, and custom framework extension.
Additional material
Beyond the thematic examples, daltoolbox serves as the conceptual and architectural foundation for other frameworks in the DAL ecosystem, such as tspredit and harbinger, providing the common infrastructure for organizing reproducible analytical workflows.
https://cefet-rj-dal.github.io/daltoolbox/