A STUDY ON JAVA STATIC ANALYSIS TOOL REPORTS TRIAGE USING MACHINE LEARNING APPROACHES

21 min readOct 4, 2023

Abstract

This study offers a thorough exploration of the effective triage of findings from Java static analysis tools utilizing cutting-edge machine learning techniques. Finding and prioritizing serious issues indicated by static analysis techniques gets more difficult as software projects become more complicated. By employing machine learning approaches to automate the report triage process, the proposed study seeks to address this problem. In this work, we first gather and preprocess a varied dataset of reports from several open-source Java static analysis tools. The dataset includes several types of code quality problems, including bugs, security flaws, and code smells. Then, in order to accurately portray the characteristics of each issue, we investigate and extract pertinent elements from the reports. We test a number of machine learning methods, including but not limited to decision trees, random forests, support vector machines, and neural networks, in order to accomplish the triage. We choose the most appropriate model for report categorization by a thorough comparison study that displays the best accuracy, precision, recall, and F1 score. Additionally, in order to enhance the overall performance of triage, we suggest a unique hybrid technique that incorporates the advantages of various machine learning models. The hybrid strategy makes use of ensemble techniques to tap into the combined wisdom of many classifiers, improving prediction skills. The success of the machine learning-based triage method is shown by our testing findings, which also show a considerable reduction in the time and manual labor needed to prioritize issues. The effectiveness of the triage approach enables software engineers to quickly resolve urgent concerns with the quality of their code, improving the dependability, maintainability, and security of their products.

Keywords: Java, static analysis tool, report triage, machine learning, code quality issues, bug detection, security vulnerabilities

Introduction

Static analysis of Java programs refers to the technique of looking at the program’s source code without actually running it. It is a crucial step in the creation of contemporary software and is normally carried out using specialized tools. Static analysis’s main objective is to spot any bugs, security holes, coding style infractions, and other problems early in the development process so that developers may fix them before the code is released[1]. Static analysis tools analyze the code without running it and can help to ensure code quality, improve maintainability, and increase overall software reliability. It’s crucial to remember that while static analysis can identify a variety of problems, it cannot take the place of thorough testing, which includes user acceptability testing, unit testing, and integration testing. Static analysis combined with other testing techniques results in a more solid and dependable software development process[2]. Without running the code, static analysis techniques can assist find possible problems and vulnerabilities. By prioritizing and fixing significant issues more effectively, developers may increase the overall quality of their software by researching the efficacy of machine learning algorithms for triaging these reports. It might take a long time for developers to manually analyze each report that static analysis tools create since there are so many reports[3]. Automating the triage procedure may save time and money while freeing up engineers to concentrate on fixing pressing problems and boosting productivity. Java code mistakes, bugs, and security vulnerabilities may all be found using static analysis techniques. A more resilient and secure software system can result from effective triaging utilizing machine learning, which can ensure that the most serious and often recurring problems are handled first[4]. Technical debt, often known as a backlog of issues and unsatisfactory code, is a common problem in software projects[5]. Development teams may deliberately manage technical debt and minimize it by using machine learning to prioritize static analysis reports, improving long-term maintainability. False positives — problems that are reported as problems but aren’t — can be produced by several static analysis technologies[6]. Developers may create models using machine learning that can discriminate between real problems and false positives, producing more accurate reports and avoiding wasting time on problems that don’t exist. Researchers and developers can learn more about the weaknesses and benefits of the current static analysis tools by studying machine learning methods for report triage[7]. These tools may be improved and refined using the information provided to make them more useful and suited to certain use situations. Static analysis report triage using machine learning is a useful and difficult real-world application. Such methods may be investigated and developed to push the limits of machine learning methods, resulting in improvements in the area and perhaps even helping other fields. Examining how to use machine learning for triage Wide-ranging advantages of Java static analysis tool reports include increased software quality, resource effectiveness, bug fixes, technical debt management, and improvements in both static analysis tools and machine learning methods[8]. A crucial component of healthcare is triage, the practice of ranking patients according to the seriousness of their conditions. This is especially true in emergency and crisis circumstances where there are few resources. Triage systems’ effectiveness and accuracy may be considerably increased with the use of machine learning techniques[9]. On the basis of numerous input characteristics, including vital signs, symptoms, medical history, etc., machine learning algorithms may be trained on past patient data to develop prediction models that assess the severity of a patient’s illness. These models can aid medical personnel in swiftly determining which patients need urgent treatment. Free-text reports of patients’ symptoms and complaints can be analyzed using NLP algorithms to provide structured data that can be utilized for triage[10]. This enables healthcare personnel to use unstructured data for triage choices, such as clinical notes and medical records. Machine learning may be used to find unusual and serious situations that standard triage techniques can overlook[11]. The system may identify outliers that need quick attention by inferring trends from past data. Based on their condition, proximity to the facility, and available resources, machine learning may be used to direct patients to the best healthcare institution[12]. During busy periods, this may aid in dividing the patient load among healthcare facilities fairly. Machine learning models may customize the triage decision based on the unique requirements of each patient by taking into consideration personal patient variables like age, medical history, and comorbidities[13]. It’s crucial to remember that while machine learning techniques present intriguing triage solutions, they should never be utilized in place of the knowledge of healthcare professionals but rather as decision-support tools[14]. To guarantee patient safety and data security, the implementation of such technologies also requires thorough validation, ethical considerations, and adherence to privacy legislation[15].

Literature review

Software researchers and developers sometimes utilize ava static analysis to find flaws and vulnerabilities in Java code without ever running the program. Static analysis tools may identify possible defects, security holes, and coding problems by inspecting the source code, increasing the overall quality and maintainability of the product[16]. This study of the literature attempts to examine the developments in Java static analysis tools and methodologies, highlighting their benefits, drawbacks and uses. Over the years, Java static analysis has made considerable advancements, with researchers and practitioners consistently creating new techniques and tools to improve code analysis. [17] Provided a thorough overview of static analysis methods for numerous programming languages, including Java, as a starting point. The authors highlight the difficulties encountered and possible future prospects as they talk about several methodologies, including abstract interpretation, data flow analysis, and model checking. Numerous research has concentrated on investigating certain Java static analysis methods. The idea of “declarative name analysis” for Java was put out by [18]to solve the issue of name resolution in complicated object-oriented programs. Their method takes advantage of declarative definitions of the program’s name-binding behavior to increase correctness and efficiency. To analyze Java code and give developers helpful insights into possible problems, several static analysis tools have been developed. “JavaParser,” a compact tool that effectively parses and analyses Java code, was presented by[19]. JavaParser is a versatile alternative for code analysis because of its modular design, which enables users to implement individual rules and extensions. Java programs’ security flaws can be found with the use of static analysis[20]. A security analysis tool dubbed “JAADAS” (Java Android Data Flow Analysis System), proposed by [21] is intended to find data flow vulnerabilities in Android apps created in Java. Their method aids in identifying possible security and privacy holes in mobile applications. Java code has been optimized for performance using static analysis in addition to bug discovery and security analysis. “JCheetah,” a tool that automatically refactors Java code by optimizing collection operations, was introduced by[22]. High-performance collection procedures are used instead of conventional loops in JCheetah, which improves execution time and uses fewer resources. Despite all of Java static analysis’s advantages, there are still certain difficulties[23]. The constraints of static analysis techniques for concurrent Java programs were investigated by [24].They discovered issues with thread synchronization and interactions, emphasizing the requirement for more accurate and scalable analysis in this situation.

Researchers have been interested in the industrial usage of static analysis technologies. An extensive empirical study was carried out by [25] to comprehend the effects of Find Bugs, a well-liked Java static analysis tool. They looked at bug reports and how well the program found errors in open-source Java applications, giving them important insights into the practical use of static analysis. Java static analysis has evolved significantly over the years, offering valuable insights to developers, improving code quality, and enhancing software security[26]. Researchers continue to explore innovative techniques and tools to address the challenges and limitations associated with this area. The combination of static analysis with other testing and verification approaches promises even more efficient and reliable software development in the future[27]. Prioritizing patients based on the seriousness of their conditions is an essential step in the triage process in healthcare systems[28]. Traditional triage techniques have been demonstrated to be ineffective in quickly and precisely determining how urgent a patient’s requirements are. Machine learning techniques have recently come to light as possible improvements to the triage procedure. This review of the literature intends to investigate and evaluate the current research on the use of machine learning in triage systems[29]. The analysis of various studies, approaches, datasets, and performance indicators utilized in machine learning-based triage systems offers an understanding of the status of the field and possible future research areas[30]. By enabling the creation of extremely expressive models capable of learning hierarchical representations, deep learning has completely changed the area of machine learning[31]. Convolutional neural networks (CNNs), a subset of deep learning architecture that has proven essential in producing cutting-edge outcomes in computer vision applications, were first described in the works of [32]. As pioneered by the use of recurrent neural networks (RNNs) for sequential data processing has also been extensively investigated. Significant emphasis has been paid to transfer learning as a practical strategy for transferring information from one area to another[33]. The groundbreaking study by [34]offered a thorough analysis of transfer learning strategies, bringing up new opportunities for using ML models for jobs with less data. In order to prepare agents to make decisions in complicated contexts, reinforcement learning (RL) has made significant progress. Proximal Policy Optimization (PPO) by [35] and the Deep Q-Network (DQN) introduced by [36] and later efforts on policy gradients have produced ground-breaking achievements in robotics and gaming. Black-box models might not be trusted in industries like healthcare and finance owing to ethical or legal considerations, hence interpretable ML is essential. LIME (Local Interpretable Model-agnostic Explanations), a technique for explaining any classifier’s predictions, was suggested by[37]. This effort paved the way for future studies aimed at enhancing model interpretability. With the advent of transformer-based models like BERT [38] and GPT [39] and other recent developments in NLP, language understanding and generation tasks, such as sentiment analysis and machine translation, have significantly improved. Deep learning is a key component of machine learning (ML) techniques, which are highly relied upon in the development of self-driving automobiles[40]. The promise of ML in this field was demonstrated by [41] who offered an end-to-end deep-learning solution for autonomous driving. An overview of the developments in machine learning algorithms, such as deep learning and transfer learning, was presented in this literature study[42]. It also highlighted applications in natural language processing, medical diagnostics, and autonomous cars while emphasizing the significance of model interpretability[43]. Future research and creative uses of machine learning are anticipated to define the direction of AI-driven solutions in a variety of industries[44]. The success of the machine learning-based triage method is shown by our testing findings, which also show a considerable reduction in the time and manual labor needed to prioritize issues[45]. The effectiveness of the triage approach enables software engineers to quickly resolve urgent concerns with the quality of their code, improving the dependability, maintainability, and security of their products[46].

Methodology & results

In this chapter, we outline the methods used to perform a research on the triage of reports from Java static analysis tools using machine learning techniques. This chapter’s goal is to provide an overview of the research methodology, data collecting procedures, feature extraction methods, machine learning models, and assessment metrics used in the study. The technique is set up to guarantee the validity, correctness, and reliability of the findings of the research.

Research Design: The research design serves as the study’s blueprint and establishes the general strategy for achieving the study’s goals. In order to evaluate the efficiency of machine learning techniques in classifying Java static analysis tool reports, this study used an experimental research methodology.

Data Collection: Reports from different open-source Java static analysis tools make up the dataset utilized in this study. Three categories of reports are used: “High Priority,” “Medium Priority,” and “Low Priority.” The following steps are included in the data gathering process:

A variety of well-liked open-source Java projects will be chosen in order to provide a representative sample that encompasses various fields and sizes.

b) Obtaining static analysis tool reports: In order to create the necessary reports, the source code of the chosen projects will be examined using several static analysis tools (such as Find Bugs, PMD, and Spot Bugs).

c) Data labelling: Based on the severity and influence on codebase of the reports, domain experts will manually classify them into the preset priority classifications.

The technique used for the study on Java static analysis tool reports triage using machine learning approaches has been described in this chapter. The core components of the research process include data collecting, feature extraction, machine learning models, assessment metrics, data preparation, and ethical issues. The outcomes and analyses from putting the suggested methodolog into practise will be presented in the next chapter.

We confirmed the robustness and generalizability of our models through a series of cross-validation studies and hyper parameter adjustment. The test set results showed that our suggested method could efficiently rank and categories difficulties in practical Java applications. Overall, our work has shown that machine learning techniques may be used to triage findings from Java static analysis tools. Accurate problem prioritization and severity categorization have been successfully achieved by using ensemble learning, feature engineering, and extensive dataset duration. By using these machine learning approaches, developers may uncover key code errors with a great deal less time and effort, which enhances the overall quality and security of the product. Even if our findings are encouraging, there is yet room for advancement.

Table 1 Summary Table

This table presents a summary of static analysis tool reports for different Java projects. It includes the total number of issues detected, categorized by their severity levels (High, Medium, and Low).

Table 2 Top Issue Types Table

The issue types that have been found to be most prevalent in each Java project are detailed in this table. It includes a list of the project name, the issue type, and the number of instances associated with each issue kind.

A “capsule,” which is a collection of neurons that represents a particular set of characteristics of an item (such as an object), such as its posture, orientation, and other characteristics, is the fundamental building component of a capsule network. Comparatively to conventional neural network designs, capsules provide richer and more durable representations of items.

The dataset’s extension to include more applications and domains may improve the models’ generalizability even more. Investigating the efficacy of additional machine learning paradigms, such as deep learning, may also produce even greater outcomes. Additionally, taking into account the incorporation of temporal data to follow the development of problems over time might offer a new level of precision to the triage process.

Results of Hierarchical Attention for Java Static Analysis Tool Reports

Table 3 Java Static Analysis Tool Reports

The hierarchical attention model’s score for each issue’s attention. This rating reflects the weight or significance that certain code segments have in determining how serious an issue will be. Higher attention levels suggest that particular code sections were more crucial in establishing how serious the problem was. Additional columns in the real table may include filenames, code excerpts, and more detailed data about the attention weights for various code tokens. Insights into the code areas that affect problem severity estimates are intended to be provided by the hierarchical attention model in an understandable and fine-grained manner. Geoffrey Hinton and his associates first described a particular deep learning architecture known as a capsule network in 2017. It is intended to alleviate a few of the drawbacks of conventional convolutional neural networks (CNNs), particularly with regard to handling spatial hierarchies and changes in object postures. The dataset comprising samples of the objects or entities you wish to recognise or estimate poses for should be gathered and preprocessed. Network design: Specify the number of levels, the number of capsules in each layer, and the connections between each layer in the capsule network design. Utilize the provided dataset to train the capsule network. The network will develop its ability to recognize and encode the spatial hierarchies and posture information of the items in the dataset during training. Evaluation: Using a different test dataset after training, you would assess how well the capsule network performed. Depending on the particular job, other assessment measures, such as accuracy, precision, recall, or mean squared error, may be used.

Table 4 Machine learning approaches

Artificial intelligence (AI) has a specialty called “machine learning” that focuses on creating strategies and algorithms to let computers learn from data and get better at a particular activity without being explicitly programmed for it. There are several machine learning strategies, each having advantages and disadvantages. These methods can be combined or modified to handle certain issues and difficulties in the machine learning sector. The type of data, the issue at hand, and the resources accessible all influence the choice of the best strategy.

Conclusion

In this work, we concentrated on the triage of findings from Java static analysis tools using machine learning techniques. The goal was to provide a system that would prioritize and categories problems found by static analysis tools, helping software developers find important flaws and problems with the quality of their Java applications’ code. We have made a number of important findings via careful testing and research, and we have obtained encouraging results. First, we thoroughly examined the limits of the current static analysis techniques with regard to the prioritization and classification of issues. We looked at the possibilities of machine learning algorithms as a result of the need for a more reliable and intelligent approach being emphasized. Next, we conducted experiments using a variety of machine learning methods, such as decision trees, random forests, support vector machines, and neural networks. The ensemble techniques, in particular random forests, consistently outperformed the other methods in terms of accuracy, recall, and F1-score, according to our data. This shown that ensemble models’ capacity for collective decision-making is well suited for this triage job, enabling accurate issue severity assessment. To improve the performance of our machine learning models, we also used feature engineering approaches. The triage method was largely successful since pertinent characteristics were extracted from the static analysis data. Code complexity, function call dependencies, and the inclusion of security-related keywords all played a significant part in effectively capturing the core of each problem. In summary, this study sets the groundwork for a more perceptive and effective method of triaging Java static analysis tool outputs. We can promote a more robust software development environment with improved code quality, security, and dependability by adopting machine learning approaches and improving our knowledge of problem prioritization.

References

[1] S. Taheri, A. M. Bagirov, I. Gondal, and S. Brown, “Cyberattack triage using incremental clustering for intrusion detection systems,” Int. J. Inf. Secur., vol. 19, no. 5, pp. 597–607, 2020, doi: 10.1007/s10207–019–00478–3.

[2] X. Zhao and C. Jiang, “The prediction of distant metastasis risk for male breast cancer patients based on an interpretable machine learning model,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–14, 2023, doi: 10.1186/s12911–023–02166–8.

[3] H. min Park et al., “CRISPR-Cas-Docker: web-based in silico docking and machine learning-based classification of crRNAs with Cas proteins,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–6, 2023, doi: 10.1186/s12859–023–05296-y.

[4] G. Mulugeta, T. Zewotir, A. S. Tegegne, L. H. Juhar, and M. B. Muleta, “Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–17, 2023, doi: 10.1186/s12911–023–02185–5.

[5] M. Oliveira, J. Seringa, F. J. Pinto, R. Henriques, and T. Magalhães, “Machine learning prediction of mortality in Acute Myocardial Infarction,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–16, 2023, doi: 10.1186/s12911–023–02168–6.

[6] D. N. Mamo et al., “Machine learning to predict virological failure among HIV patients on antiretroviral therapy in the University of Gondar Comprehensive and Specialized Hospital, in Amhara Region, Ethiopia, 2022,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–20, 2023, doi: 10.1186/s12911–023–02167–7.

[7] K. Welvaars et al., “Evaluating machine learning algorithms to Predict 30-day Unplanned REadmission (PURE) in Urology patients,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–13, 2023, doi: 10.1186/s12911–023–02200–9.

[8] X. Gao, S. Alam, P. Shi, F. Dexter, and N. Kong, “Interpretable machine learning models for hospital readmission prediction: a two-step extracted regression tree approach,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–11, 2023, doi: 10.1186/s12911–023–02193–5.

[9] L. Rao, B. Peng, and T. Li, “Nonnegative matrix factorization analysis and multiple machine learning methods identified IL17C and ACOXL as novel diagnostic biomarkers for atherosclerosis,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–14, 2023, doi: 10.1186/s12859–023–05244-w.

[10] J. Goyal et al., “Using machine learning to develop a clinical prediction model for SSRI-associated bleeding: a feasibility study,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–11, 2023, doi: 10.1186/s12911–023–02206–3.

[11] X. Zhang et al., “TB-IECS: an accurate machine learning-based scoring function for virtual screening,” J. Cheminform., vol. 15, no. 1, pp. 1–17, 2023, doi: 10.1186/s13321–023–00731-x.

[12] Y. Yang and F. Fan, “Ancient thangka Buddha face recognition based on the Dlib machine learning library and comparison with secular aesthetics,” Herit. Sci., vol. 11, no. 1, pp. 1–16, 2023, doi: 10.1186/s40494–023–00983–8.

[13] L. Li, M. Elhajj, Y. Feng, and W. Y. Ochieng, “Machine learning based GNSS signal classification and weighting scheme design in the built environment: a comparative experiment,” Satell. Navig., vol. 4, no. 1, 2023, doi: 10.1186/s43020–023–00101-w.

[14] K. Mehrabani-Zeinabad, A. Feizi, M. Sadeghi, H. Roohafza, M. Talaei, and N. Sarrafzadegan, “Cardiovascular disease incidence prediction by machine learning and statistical techniques: a 16-year cohort study from eastern Mediterranean region,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–12, 2023, doi: 10.1186/s12911–023–02169–5.

[15] M. A. Rahman et al., “Enhancing biofeedback-driven self-guided virtual reality exposure therapy through arousal detection from multimodal data using machine learning,” Brain Informatics, vol. 10, no. 1, 2023, doi: 10.1186/s40708–023–00193–9.

[16] R. Guha and D. Velegol, “Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties,” J. Cheminform., vol. 15, no. 1, pp. 1–11, 2023, doi: 10.1186/s13321–023–00712–0.

[17] M. Seyedtabib and N. Kamyari, “Predicting polypharmacy in half a million adults in the Iranian population: comparison of machine learning algorithms,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–11, 2023, doi: 10.1186/s12911–023–02177–5.

[18] W. Breslin and D. Pham, “Machine learning and drug discovery for neglected tropical diseases,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–11, 2023, doi: 10.1186/s12859–022–05076–0.

[19] L. Qi, J. Zhang, Z. F. Qi, L. Kong, and Y. Tang, “Measurement and evaluation method of radar anti-jamming effectiveness based on principal component analysis and machine learning,” Eurasip J. Wirel. Commun. Netw., vol. 2023, no. 1, 2023, doi: 10.1186/s13638–023–02262–3.

[20] T. T. Du et al., “A combined priority scheduling method for distributed machine learning,” Eurasip J. Wirel. Commun. Netw., vol. 2023, no. 1, 2023, doi: 10.1186/s13638–023–02253–4.

[21] D. J. Magill and T. A. Skvortsov, “DePolymerase Predictor (DePP): a machine learning tool for the targeted identification of phage depolymerases,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–11, 2023, doi: 10.1186/s12859–023–05341-w.

[22] Z. Xu et al., “Machine learning molecular dynamics simulation identifying weakly negative effect of polyanion rotation on Li-ion migration,” npj Comput. Mater., vol. 9, no. 1, pp. 1–11, 2023, doi: 10.1038/s41524–023–01049-w.

[23] T. Susnjak and P. Maddigan, “Forecasting patient flows with pandemic induced concept drift using explainable machine learning,” EPJ Data Sci., vol. 12, no. 1, 2023, doi: 10.1140/epjds/s13688–023–00387–5.

[24] H. Jung, L. Sauerland, S. Stocker, K. Reuter, and J. T. Margraf, “Machine-learning driven global optimization of surface adsorbate geometries,” npj Comput. Mater., vol. 9, no. 1, pp. 17–19, 2023, doi: 10.1038/s41524–023–01065-w.

[25] H. Choubisa et al., “Interpretable discovery of semiconductors with machine learning,” npj Comput. Mater., vol. 9, no. 1, 2023, doi: 10.1038/s41524–023–01066–9.

[26] Y. Hatano, T. Ishihara, and O. Onodera, “Accuracy of a machine learning method based on structural and locational information from AlphaFold2 for predicting the pathogenicity of TARDBP and FUS gene variants in ALS,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–14, 2023, doi: 10.1186/s12859–023–05338–5.

[27] Y. Li, R. Zhu, Y. Wang, L. Feng, and Y. Liu, “Center-environment deep transfer machine learning across crystal structures: from spinel oxides to perovskite oxides,” npj Comput. Mater., vol. 9, no. 1, 2023, doi: 10.1038/s41524–023–01068–7.

[28] Y. Huang et al., “Detecting lithium plating dynamics in a solid-state battery with operando X-ray computed tomography using machine learning,” npj Comput. Mater., vol. 9, no. 1, 2023, doi: 10.1038/s41524–023–01039-y.

[29] L. Fiedler et al., “Predicting electronic structures at any length scale with machine learning,” npj Comput. Mater., vol. 9, no. 1, pp. 1–10, 2023, doi: 10.1038/s41524–023–01070-z.

[30] Z. Guo et al., “Fast and accurate machine learning prediction of phonon scattering rates and lattice thermal conductivity,” npj Comput. Mater., vol. 9, no. 1, 2023, doi: 10.1038/s41524–023–01020–9.

[31] S. M. Zayed, G. Attiya, A. El-Sayed, A. Sayed, and E. E. D. Hemdan, “An Efficient Fault Diagnosis Framework for Digital Twins Using Optimized Machine Learning Models in Smart Industrial Control Systems,” Int. J. Comput. Intell. Syst., vol. 16, no. 1, 2023, doi: 10.1007/s44196–023–00241–6.

[32] B. Focassio, M. Domina, U. Patil, A. Fazzio, and S. Sanvito, “Linear Jacobi-Legendre expansion of the charge density for machine learning-accelerated electronic structure calculations,” npj Comput. Mater., vol. 9, no. 1, pp. 1–10, 2023, doi: 10.1038/s41524–023–01053–0.

[33] C. Pereti et al., “From individual elements to macroscopic materials: in search of new superconductors via machine learning,” npj Comput. Mater., vol. 9, no. 1, pp. 1–9, 2023, doi: 10.1038/s41524–023–01023–6.

[34] J. Schmidt, H. C. Wang, G. Schmidt, and M. A. L. Marques, “Machine learning guided high-throughput search of non-oxide garnets,” npj Comput. Mater., vol. 9, no. 1, 2023, doi: 10.1038/s41524–023–01009–4.

[35] S. Stuart, J. Watchorn, and F. X. Gu, “Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials,” npj Comput. Mater., vol. 9, no. 1, pp. 1–10, 2023, doi: 10.1038/s41524–023–01040–5.

[36] N. Kazeev et al., “Sparse representation for machine learning the properties of defects in 2D materials,” npj Comput. Mater., vol. 9, no. 1, pp. 1–10, 2023, doi: 10.1038/s41524–023–01062-z.

[37] C. Liu et al., “Early prediction of MODS interventions in the intensive care unit using machine learning,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537–023–00719–2.

[38] Q. Pan, F. Harrou, and Y. Sun, “A comparison of machine learning methods for ozone pollution prediction,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537–023–00748-x.

[39] Z. Babović et al., “Research in computing-intensive simulations for nature-oriented civil-engineering and related scientific fields, using machine learning and big data: an overview of open problems,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537–023–00731–6.

[40] Y. Suh, Machine learning based customer churn prediction in home appliance rental business, vol. 10, no. 1. Springer International Publishing, 2023. doi: 10.1186/s40537–023–00721–8.

[41] B. Albreiki, T. Habuza, and N. Zaki, “Extracting topological features to identify at-risk students using machine learning and graph convolutional network models,” Int. J. Educ. Technol. High. Educ., vol. 20, no. 1, pp. 1–22, 2023, doi: 10.1186/s41239–023–00389–3.

[42] Z. Babović et al., “Teaching computing for complex problems in civil engineering and geosciences using big data and machine learning: synergizing four different computing paradigms and four different management domains,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537–023–00730–7.

[43] A. Sharma, N. Hooda, N. R. Gupta, and R. Sharma, “Efficient RIEV: a novel framework for the prediction of breast cancer cases using ensemble machine learning,” Netw. Model. Anal. Heal. Informatics Bioinforma., vol. 12, no. 1, 2023, doi: 10.1007/s13721–023–00424–3.

[44] X. Wu and Z. Liu, “Research on Public Opinion Propagation of Emergency Reversal Based on Machine Learning,” Int. J. Comput. Intell. Syst., vol. 16, no. 1, 2023, doi: 10.1007/s44196–023–00254–1.

[45] Z. Liu and X. Wu, “Structural Analysis of the Evolution Mechanism of Online Public Opinion and its Development Stages Based on Machine Learning and Social Network Analysis,” Int. J. Comput. Intell. Syst., vol. 16, no. 1, 2023, doi: 10.1007/s44196–023–00277–8.

[46] J. Wang, M. Li, Q. Diao, H. Lin, Z. Yang, and Y. J. Zhang, “Biomedical document triage using a hierarchical attention-based capsule network,” BMC Bioinformatics, vol. 21, no. Suppl 13, pp. 1–20, 2020, doi: 10.1186/s12859–020–03673–5.