Help ?

IGMIN: あなたがここにいてくれて嬉しいです. お願いクリック '新しいクエリを作成してください' 当ウェブサイトへの初めてのご訪問で、さらに情報が必要な場合は.

すでに私たちのネットワークのメンバーで、すでに提出した質問に関する進展を追跡する必要がある場合は, クリック '私のクエリに連れて行ってください.'

Abstract

要約 at IgMin Research

私たちの使命は、学際的な対話を促進し、広範な科学領域にわたる知識の進展を加速することです.

Technology Group Research Article Article ID: igmin197

Enhancing Material Property Predictions through Optimized KNN Imputation and Deep Neural Network Modeling

Materials Science Machine LearningData Science Affiliation

Affiliation

    Department of Computer Engineering, Jeju National University, Jeju 63243, Republic of Korea

Abstract

In materials science, the integrity and completeness of datasets are critical for robust predictive modeling. Unfortunately, material datasets frequently contain missing values due to factors such as measurement errors, data non-availability, or experimental limitations, which can significantly undermine the accuracy of property predictions. To tackle this challenge, we introduce an optimized K-Nearest Neighbors (KNN) imputation method, augmented with Deep Neural Network (DNN) modeling, to enhance the accuracy of predicting material properties. Our study compares the performance of our Enhanced KNN method against traditional imputation techniques—mean imputation and Multiple Imputation by Chained Equations (MICE). The results indicate that our Enhanced KNN method achieves a superior R² score of 0.973, which represents a significant improvement of 0.227 over Mean imputation, 0.141 over MICE, and 0.044 over KNN imputation. This enhancement not only boosts the data integrity but also preserves the statistical characteristics essential for reliable predictions in materials science.

Figures

References

    1. Emmanuel T. A survey on missing data in machine learning. Journal of Big Data. 2021; 8: 1-37.
    2. Lee KJ, Tilling KM, Cornish RP, Little RJA, Bell ML, Goetghebeur E, Hogan JW, Carpenter JR; STRATOS initiative. Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework. J Clin Epidemiol. 2021 Jun;134:79-88. doi: 10.1016/j.jclinepi.2021.01.008. Epub 2021 Feb 2. PMID: 33539930; PMCID: PMC8168830.
    3. Saeipourdizaj P, Sarbakhsh P, Gholampour A. Application of imputation methods for missing values of PM10 and O3 data: Interpolation, moving average and K-nearest neighbor methods. Environ Health Eng Manage J. 2021;8(3):215-226.
    4. Abidin NZ, Ismail AR. An improved K-nearest neighbour with grasshopper optimization algorithm for imputation of missing data. Int J Adv Intell Informatics. 2021; 7(3).
    5. Xie Q. Online prediction of mechanical properties of hot rolled steel plate using machine learning. Mater Des. 2021; 197:109201.
    6. Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P. Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12. PMID: 33844136; PMCID: PMC8040371.
    7. Peng D. RESI: a region-splitting imputation method for different types of missing data. Expert Syst Appl. 2021; 168:114425.
    8. Adhikari D. A comprehensive survey on imputation of missing data in internet of things. ACM Comput Surveys. 2022; 55(7):1-38.
    9. Alnowaiser K. Improving Healthcare Prediction of Diabetic Patients Using KNN Imputed Features and Tri-Ensemble Model. IEEE Access. 2024.
    10. Bertsimas D, Pawlowski C, Zhuo YD. From predictive methods to missing data imputation: an optimization approach. J Mach Learn Res. 2018; 18(196):1-39.
    11. Khan MA. An optimized ensemble prediction model using AutoML based on soft voting classifier for network intrusion detection. J Netw Comput Appl. 2023; 212:103560.
    12. Jäger S, Allhorn A, Bießmann F. A benchmark for data imputation methods. Front Big Data. 2021; 4:693674.
    13. Gad AM, Abdelkhalek RHM. Imputation methods for longitudinal data: A comparative study. Int J Stat Distr Appl. 2017; 3(4):72.
    14. Van Buuren S. Flexible imputation of missing data. CRC Press; 2018.
    15. Chen S, Haziza D. Recent developments in dealing with item non-response in surveys: A critical review. Int Stat Rev. 2019; 87(S192-S218).
    16. Van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011; 45:1-67.
    17. Troyanskaya O. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001; 17(6):520-525.
    18. Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell. 2003; 17(5-6):519-533.
    19. Keerin P, Boongoen T. Improved knn imputation for missing values in gene expression data. Comput Mater Continua. 2021; 70(2):4009-4025.
    20. Chang Z. Neural Embeddings for kNN Search in Biological Sequence. Proc AAAI Conf Artif Intell. 2024; 38(1).
    21. Di Gesu V, Lo Bosco G, Pinello L. A one class KNN for signal identification: a biological case study. Int J Knowl Eng Soft Data Paradigms. 2009; 1(4):376-389.
    22. Khan MA. Enhanced abnormal data detection hybrid strategy based on heuristic and stochastic approaches for efficient patients rehabilitation. Future Gener Comput Syst. 2024; 154:101-122.
    23. Triguero I. Transforming big data into smart data: An insight on the use of the k-nearest neighbors algorithm to obtain quality data. Wiley Interdiscip Rev Data Min Knowl Discov. 2019; 9(2)
    24. Li D, Gu H, Zhang L. A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft Comput. 2013; 17:1787-1796.
    25. Petrazzini BO. Evaluation of different approaches for missing data imputation on features associated to genomic data. BioData Min. 2021; 14:1-13.
    26. Nadimi-Shahraki MH. A hybrid imputation method for multi-pattern missing data: A case study on type II diabetes diagnosis. Electronics. 2021; 10(24):3167.
    27. Xiang G. Research on Predicting the Bending Strength of Ceramic Matrix Composites with Process of Incomplete Data. Int J Mach Learn Comput. 2021; 11(3).
    28. Han W. Prediction of flowability and strength in controlled low-strength material through regression and oversampling algorithm with deep neural network. Case Stud Constr Mater. 2024; 20.
    29. Lyngdoh GA. Prediction of concrete strengths enabled by missing data imputation and interpretable machine learning. Cem Concr Compos. 2022; 128:104414.
    30. Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi AA, Alsubai S, Umer M. Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach. Cancers (Basel). 2023 Sep 4;15(17):4412. doi: 10.3390/cancers15174412. PMID: 37686692; PMCID: PMC10486648.
    31. Johnston J, Kistemaker G, Sullivan PG. Comparison of different imputation methods. Interbull Bull. 2011; 44.
    32. Khan SI, Hoque ASML. SICE: an improved missing data imputation technique. J Big Data. 2020;7(1):37. doi: 10.1186/s40537-020-00313-w. Epub 2020 Jun 12. PMID: 32547903; PMCID: PMC7291187.
    33. Sanjar K. Missing data imputation for geolocation-based price prediction using KNN–MCF method. ISPRS Int J Geo-Inf. 2020; 9(4):227.
    34. Zhou X, Chai H, Zhao H, Luo CH, Yang Y. Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network. Gigascience. 2020 Jul 1;9(7):giaa076. doi: 10.1093/gigascience/giaa076. PMID: 32649756; PMCID: PMC7350980.
    35. Smith JL, Wilson ML, Nilson SM, Rowan TN, Schnabel RD, Decker JE, Seabury CM. Genome-wide association and genotype by environment interactions for growth traits in U.S. Red Angus cattle. BMC Genomics. 2022 Jul 16;23(1):517. doi: 10.1186/s12864-022-08667-6. PMID: 35842584; PMCID: PMC9287884.
    36. Lee T, Shi D. A comparison of full information maximum likelihood and multiple imputation in structural equation modeling with missing data. Psychol Methods. 2021 Aug;26(4):466-485. doi: 10.1037/met0000381. Epub 2021 Jan 28. PMID: 33507765.
    37. Kumar N. A new approach of outlier-robust missing value imputation for metabolomics data analysis. Curr Bioinformatics. 2019; 14(1):43-52.

Similar Articles