Data Mining Techniques for Knowledge Discovery in Large-Scale Information Systems
DOI:
10.29303/jaie.v1i2.1542Published:
2025-10-31Downloads
Abstract
This study systematically examines data mining techniques and their role in knowledge discovery within large-scale information systems. A total of 32 peer-reviewed studies published between 2013 and 2026 were reviewed, covering diverse domains such as industry, education, healthcare, and cloud-based environments. The selected studies utilized a variety of datasets, including KDD Cup datasets, UCI Machine Learning Repository datasets, IoT sensor data, industrial production logs, healthcare records, educational datasets, and cloud system data, to demonstrate the applicability of data mining methods. The analysis reveals that classification techniques, such as decision trees, support vector machines, and neural networks, are widely applied for predictive analytics and anomaly detection. Clustering methods enable pattern recognition in high-dimensional and unstructured datasets, while association rule mining identifies relationships and correlations to support industrial optimization, recommendation systems, and decision-making. Hybrid and evolutionary algorithms enhance scalability, accuracy, and interpretability, particularly in distributed and cloud-based environments. Key challenges identified include high dimensionality, data heterogeneity, scalability limitations, model interpretability, and data quality issues, which can affect the efficiency and reliability of knowledge discovery. Overall, this study provides a conceptual framework linking data sources, preprocessing, mining techniques, and knowledge discovery outcomes, highlighting the transformative potential of data mining for actionable insights, operational optimization, and informed decision-making in complex, large-scale information systems.


