Evaluation of a process for the Experimental Development of Data Mining, AI and Data Science applications aligned with the Strategic Planning

Methanias Colaço Júnior, Rodrigo Cruz, Luciano Araújo, Ana Bliacheriene, Fátima Nunes


Context: The Big Data phenomenon has imposed maturity on companies regarding the exploration of their data, as a prerogative to obtain valuable insights into their clients and the power of analysis to guide decision-making processes. Therefore, a general approach that describes how to extract knowledge for the execution of the business strategy needs to be established. Purpose: The purpose of this research paper is to introduce and evaluate the implementation of a process for the experimental development of Data Mining (DM), AI and Data Science applications aligned with the strategic planning. Method: A case study with the proposed process was conducted in a federal educational institution. Results: The results generated evidence showing that it is possible to integrate a strategic alignment approach, an experimental method, and a methodology for the development of DM applications. Conclusion: Data Mining (DM) and Data Science (DS) applications also present the risks of other Information Systems, and the adoption of strategy-driven and scientific method processes are critical success factors. Moreover, it was possible to conclude that the application of the scientific method was facilitated, besides being an important tool to ensure the quality, reproducibility and transparency of intelligent applications. In conclusion, the process needs to be mapped to foment and guide the strategic alignment.


Big Data, Strategic Alignment, Experimentation, Small Data, Reproducibility

Full Text:



Basili, V. R. (1996, March). The role of experimentation in software engineering: past, current, and future. In Proceedings of IEEE 18th International Conference on Software Engineering (pp. 442-449). IEEE.

Basili, V., Heidrich, J., Lindvall, M., Munch, J., Regardie, M., & Trendowicz, A. (2007, September). GQM+Strategies – Aligning Business Strategies with Software Measurement. In First international symposium on empirical software engineering and measurement (ESEM 2007) (pp. 488-490). IEEE.

Basili, V. R., Lindvall, M., Regardie, M., Seaman, C., Heidrich, J., Münch, J., ... & Trendowicz, A. (2010). Linking software development and business strategy through measurement. Computer, 43(4), 57-65.

V. Basili, A. Trendowicz, M. Kowalczyk, J. Heidrich, C. Seaman, J. Münch, D. Rombach. (2014). Aligning Organizations Through Measurement: The GQM+Strategies Approach. Springer Publishing Company, Incorporated.

Berry, M. J., & Linoff, G. S. (2004). Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons.

Bock, C., Gumbsch, T., Moor, M., Rieck, B., Roqueiro, D., & Borgwardt, K. (2018). Association mapping in biomedical time series via statistically significant shapelet mining. Bioinformatics, 34(13), i438-i446.

Bosch‐Sijtsema, P., & Bosch, J. (2015). User involvement throughout the innovation process in high‐tech industries. Journal of Product Innovation Management, 32(5), 793-807.

Botelho¹, F. R., & Filho, E. R. (2014). Conceituando o termo business intelligence: origem e principais objetivos. Sistemas, Cibernética e Informática, vol. 11, n.º 11, pp. 55–60.

Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A. (1998). Discovering data mining: from concept to implementation. Prentice-Hall, Inc.

Cheng, H., Lu, Y. C., & Sheu, C. (2009). An ontology-based business intelligence application in a financial knowledge management system. Expert Systems with Applications, 36(2), 3614-3622.

Cios, K. J., Teresinska, A., Konieczna, S., Potocka, J., & Sharma, S. (2000). Diagnosing myocardial perfusion from PECT bull’s-eye maps-A knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine, 19(4), 17-25.

Clancy, T. (1995). The standish group report. Chaos report.

Colaço Júnior, Methanias, de Fátima Menezes, M., Corumba, D., Mendonça, M., & Santos, B. S. (2015). Do software engineers have preferred representational systems?. Journal of Research and Practice in Information Technology, 47(1), 23-46.

Colaço Júnior, M. (2018). Vocabulário e Definição de Estudos Experimentais [Material da Disciplina de Engenharia de Software Experimental]. Mestrado em Ciência da Computação, Universidade Federal de Sergipe, São Cristóvão, Sergipe.

Colaço Júnior, M. ; CRUZ, R. F. ; LIMA, A. S. (2019). Proposta e Avaliação de um Processo para o Desenvolvimento de Aplicações de Business Intelligence Dirigido à Estratégia. In: International Conference on Information Systems and Technology Management, 2019, São Paulo. ContecSI.

Côrte-Real, N., Oliveira, T., & Ruivo, P. (2017). Assessing business value of Big Data Analytics in European firms. Journal of Business Research, 70, 379-390.

Costa, J. K. G., Santos, I. P. O., Nascimento, A. V. R., & Júnior, M. C. (2015, May). Experimentation at Industrial Setting to Improve the Effectiveness of the ETL Procedures Implementation in a Business Intelligence Environment. In SBSI (pp. 459-466).

Costa, J. K., Santos, I. P., junior, M. C., & Nascimento, A. V. (2016, May). An Experiment in an Industrial Business Intelligence environment to improve data loads maintenance. In Proceedings of the XII Brazilian Symposium on Information Systems on Brazilian Symposium on Information Systems: Information Systems in the Cloud Computing Era-Volume 1 (pp. 534-541).

Covões, T. F. (2010). Seleção de atributos via agrupamento (Doctoral dissertation, Universidade de São Paulo).

CRISP-DM. (2003). Cross Industry Standard Process for Data Mining 1.0: Step by Step Data Mining Guide. [Online] 20 de Junho de 2019. http://www.crisp-dm.org/.

Cruz, R. F.; Colaço Júnior, Methanias; Gois, V. M. (2022). Quão experimentais e estratégicas são as aplicações de Business Intelligence (BI) e Data Mining? Revista Ibero-Americana de Estratégia.

Demo, P. A. e Silva, R. (2012). Pesquisa e Informação Qualitativa 5ª edição. . São Paulo :s.n..

Dittrich, Y., Nørbjerg, J., Tell, P., & Bendix, L. (2018, May). Researching cooperation and communication in continuous software engineering. In 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE) (pp. 87-90). IEEE.

Endres, A., & Rombach, H. D. (2003). A handbook of software and systems engineering: Empirical observations, laws, and theories. Pearson Education.

Fagerholm, F., Guinea, A. S., Mäenpää, H., & Münch, J. (2017). The RIGHT model for continuous experimentation. Journal of Systems and Software, 123, 292-305.

Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996, August). Knowledge Discovery and Data Mining: Towards a Unifying Framework. In KDD (Vol. 96, pp. 82-88).

Gain, U., & Hotti, V. (2021, February). Low-code AutoML-augmented Data Pipeline–A Review and Experiments. In Journal of Physics: Conference Series (Vol. 1828, No. 1, p. 012015). IOP Publishing.

Goldratt, E. M., & Cox, J. (2016). The goal: a process of ongoing improvement. Routledge.

Goldschmidt, R., & Passos, E. (2005). Data mining: um guia prático. Gulf Professional Publishing.

Hohnhold, H., O'Brien, D., & Tang, D. (2015, August). Focusing on the long-term: It's good for users and business. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1849-1858).

IBM. (2005). Analytics solutions unified method. ftp://ftp.software.ibm.com/software/data/sw-library/services/ASUM.pdf.

Kluger, A. N., & Tikochinsky, J. (2001). The error of accepting the" theoretical" null hypothesis: the rise, fall, and resurrection of commonsense hypotheses in psychology. Psychological bulletin, 127(3), 408.

Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled experiments on the web: survey and practical guide. Data mining and knowledge discovery, 18(1), 140-181.

Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., & Pohlmann, N. (2013, August). Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1168-1176).

Kohavi, R., Deng, A., Longbotham, R., & Xu, Y. (2014, August). Seven rules of thumb for web site experimenters. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1857-1866).

Kohavi, R., & Longbotham, R. (2017). Online Controlled Experiments and A/B Testing. Encyclopedia of machine learning and data mining, 7(8), 922-929.

Kubina, M., Varmus, M., & Kubinova, I. (2015). Use of big data for competitive advantage of company. Procedia Economics and Finance, 26, 561-565.

Kurgan, L. A., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. Knowledge Engineering Review, 21(1), 1-24.

Ławrynowicz, A., & Potoniec, J. (2014). Pattern based feature construction in semantic data mining. International Journal on Semantic Web and Information Systems (IJSWIS), 10(1), 27-65.

Lima, Adriano; Colaço Júnior, Methanias; Nascimento, Andre Vinicius RP. (2017). Um Survey com Empresas Brasileiras acerca da Utilização de Business Intelligence (BI) e um diagnóstico sobre a infraestrutura e metodologias associadas. Conferência Ibero-Americana de Engenharia de Software – Trilha de Engenharia de Software Experimental.

Ma, L., & Fan, S. (2017). CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. BMC bioinformatics, 18(1), 1-18.

Maione, C. (2020). Balanceamento de dados com base em oversampling em dados transformados. 2020. 135 f. Tese (Doutorado em Ciência da Computação em Rede) - Universidade Federal de Goiás, Goiânia.

Mandić, V., Basili, V., Harjumaa, L., Oivo, M., & Markkula, J. (2010, September). Utilizing GQM+ Strategies for business value analysis: An approach for evaluating business goals. In Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement (pp. 1-10).

Martin, R. C. (2002). Agile software development: principles, patterns, and practices. Prentice Hall.

Martínez-Plumed, F., Contreras-Ochando, L., Ferri, C., Orallo, J. H., Kull, M., Lachiche, N., ... & Flach, P. A. (2019). CRISP-DM twenty years later: From data mining processes to data science trajectories. IEEE Transactions on Knowledge and Data Engineering.

Olsson, H. H., & Bosch, J. (2014). The HYPEX model: from opinions to data-driven software development. In Continuous software engineering (pp. 155-164). Springer, Cham.

Orne, M. T. (1962). Sobre a psicologia social da experiência psicológica: Com referência particular para exigir características e suas implicações.

Pinto, P. (2015). Introdução à Análise Estatística-Vol 2 (Vol. 2). Sílabas & Desafios.

Rodríguez, P., Haghighatkhah, A., Lwakatare, L. E., Teppola, S., Suomalainen, T., Eskeli, J., ... & Oivo, M. (2017). Continuous deployment of software intensive products and services: A systematic mapping study. Journal of Systems and Software, 123, 263-291.

Roy, R. K. (2001). Design of experiments using the Taguchi approach: 16 steps to product and process improvement. John Wiley & Sons.

Santos, A. C. M., Colaço Junior, Methanias, & de Carvalho Andrade, E. (2020). Multimedia resources as a support for requirements engineering and software maintenance. In Journal of Software: Evolution and Process.

Santos, B. S., Junior, M. C., & de Souza, J. G. (2018, June). An Experimental Evaluation of the NeuroMessenger: A Collaborative Tool to Improve the Empathy of Text Interactions. In 2018 IEEE Symposium on Computers and Communications (ISCC) (pp. 00573-00579). IEEE.

SAS. (2005). Semma data mining methodology. http://www.sas.com/technologies/analytics/datamining/miner/semma.html.

Schäfer, F., Zeiselmair, C., Becker, J., & Otten, H. (2018, November). Synthesizing CRISP-DM and quality management: A data mining approach for production processes. In 2018 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD) (pp. 190-195). IEEE.

Sedkaoui, S. (2018). Statistical and Computational Needs for Big Data Challenges. In Big Data Analytics in HIV/AIDS Research (pp. 21-53). IGI Global.

Sjøberg, D. I., Hannay, J. E., Hansen, O., Kampenes, V. B., Karahasanovic, A., Liborg, N. K., & Rekdal, A. C. (2005). A survey of controlled experiments in software engineering. IEEE transactions on software engineering, 31(9), 733-753.

Sharma, S., Osei-Bryson, K. M., & Kasper, G. M. (2012). Evaluation of an integrated Knowledge Discovery and Data Mining process model. In Expert Systems with Applications, 39(13), 11335-11348.

Singh, B. (2016). The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. Bangalore Vol. 11, Ed. 2

Sculley, D., Snoek, J., Rahimi, A., Wiltschko, A. (2018). Winner’s curse? On pace, progress, and empirical rigor. In Proceedings of the 6th International Conference on Learning Representations, Workshop Track.

Svatá, V. (2019). COBIT 2019: Should We Care? 9th International Conference on Advanced Computer Information Technologies (ACIT), pp. 329-332.

Vasconcelos, N., Júnior, M. C., Almeida, T., & da Silva, V. M. (2019). Comparative Analysis of Data Mining Algorithms Applied to the Context of School Dropout. In FedCSIS (Communication Papers) (pp. 3-10).

Yin, R. (2015). Estudo de Caso - 5.Ed.: Planejamento e Métodos. s.l. : BOOKMAN.

DOI: http://dx.doi.org/10.4301/S1807-1775202219018

Licensed under