Revolutionizing Duplicate Question Detection: A Deep Learning Approach for Stack Overflow

Muhammad Faseeh; Harun Jamil

doi:10.61927/igmin135

23 of 186

Properties of Indium Antimonide Nanocrystals as Nanoelectronic Elements

Nikolai Dmitrievich Zhukov

25 of 186

The policy development and current situation of information technology education in Taiwan

Min-Ying Tsai

Engineering Group Mini Review 記事ID: igmin135

Revolutionizing Duplicate Question Detection: A Deep Learning Approach for Stack Overflow

Machine Learning DOI10.61927/igmin135

Muhammad Faseeh and

Harun Jamil ^*

Affiliation

Harun Jamil, Department of Electronic Engineering, Jeju National University, Jeju-si, Jeju-do, 63243, Republic of Korea, Email: [email protected]

Fulltext HTML Fulltext PDF Cite this article

29

REFERENCES

3.7k

VIEWS

574

DOWNLOADS

160

要約

This study provides a novel way to detect duplicate questions in the Stack Overflow community, posing a daunting problem in natural language processing. Our proposed method leverages the power of deep learning by seamlessly merging Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to capture both local nuances and long-term relationships inherent in textual input. Word embeddings, notably Google’s Word2Vec and GloVe, raise the bar for text representation to new heights. Extensive studies on the Stack Overflow dataset demonstrate the usefulness of our approach, generating excellent results. The combination of CNN and LSTM models improves performance while streamlining preprocessing, establishing our technology as a viable piece in the arsenal for duplicate question detection. Aside from Stack Overflow, our technique has promise for various question-and-answer platforms, providing a robust solution for finding similar questions and paving the path for advances in natural language processing.

数字

参考文献

Ye X, Manoharan S. Marking essays automatically. In Proceedings of the 2020 4th International Conference on E-Education, E-Business and E-Technology. 2020; 56–60.
Stack Overflow Dataset. https://www.kaggle.com/datasets/stackoverflow/ stackoverflow
Yazdaninia M, Lo D, Sami A. Characterization and prediction of questions without accepted answers on stack overflow. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE. 2021; 59–70.
Zhang H, Zeng P, Hu Y, Qian J, Song J, Gao L. Learning visual question answering on controlled semantic noisy labels. Pattern Recognition. 2023; 138:109339.
Roy PK, Saumya S, Singh JP, Banerjee S, Gutub A. Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review. CAAI Transactions on Intelligence Technology. 2023; 8(1):95-117.
Fan M, Lin W, Feng Y, Sun M, Li P. A globalization-semantic matching neural network for paraphrase identification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018; 2067–2075.
Vani K, Gupta D. Text plagiarism classification using syntax-based linguistic features. Expert Systems with Applications. 2017; 88:448–464.
Wang L, Zhang L, Jiang J. Duplicate question detection with deep learning in a stack overflow. IEEE Access. 2020; 8:25964–25975.
Prabowo DA, Herwanto GB. Duplicate question detection in question-answer websites using a convolutional neural network. In 2019 5th International conference on science and technology (ICST). IEEE. 2019; 1:1–6.
Roy PK, Singh JP. Predicting closed questions on community question answering sites using convolutional neural network: Neural Computing and Applications. 2020; 32(14):10555-10572.
Chali Y, Islam R. Question-question similarity in online forums. In Proceedings of the 10th annual meeting of the forum for information retrieval evaluation. 2018; 21–28.
Kamath CN, Bukhari SS, Dengel A. Comparative study between traditional machine learning and deep learning approaches for text classification. In Proceedings of the ACM Symposium on Document Engineering. 2018; 1–11.
Kim Y, Jernite Y, Sontag D, Rush A. Characteraware neural language models. In Proceedings of the AAAI conference on artificial intelligence 2016; 30.
Jiang JY, Zhang M, Li C, Bendersky M, Golbandi N, Najork M. Semantic text matching for long-form documents. In The world wide web conference. 2019; 795–806.
Imtiaz Z, Umer M, Ahmad M, Ullah S, Choi GS, Mehmood A. Duplicate questions pair detection using siamese malstm. IEEE Access. 2020; 8:21932–21942.
Goldberg Y, Levy O. word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722. 2014.
Eyecioglu A, Keller B. Twitter paraphrase identification with simple overlap features and svms. In Proceedings of the 9th International Workshop on Semantic Evaluation. 2015; 64–69.
Mudgal RK, Niyogi R, Milani A, Franzoni V. Analysis of tweets to find the basis of popularity based on events semantic similarity. International Journal of Web Information Systems. 2018; 14(4):438–452.
Roul RK, Sahoo JK, Arora K. Modified tf-idf term weighting strategies for text categorization. In 2017 14th IEEE India council international conference (INDICON). IEEE. 2017; 1–6.
Dey K, Shrivastava R, Kaushik S. A paraphrase and semantic similarity detection system for user generated short-text content on microblogs. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016; 2880–2890.
Hassanzadeh H, Groza T, Nguyen A, Hunter J. A supervised approach to quantifying sentence similarity: with application to evidence based medicine. PloS one. 2015; 10(6):e0129392.
Soğancıoğlu G, Öztürk H, Özgür A. Biosses: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics. 2017; 33(14):i49–i58.
Wu D, Huang J, Yang S. A joint model for sentence semantic similarity learning. In 2017 13th International Conference on Semantics, Knowledge and Grids (SKG). IEEE. 2017; 120–125.
Shaheer S, Hossain I, Sarna SN, Mehedi MHK, Rasel AA. Evaluating Question generation models using QA systems and Semantic Textual Similarity. In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC). IEEE. 2023; 0431-0435
Amur ZH, Hooi KY, Bhanbhro H, Dahri K, Soomro GM. Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives. Applied Sciences. 2023; 13(6):3911.
Huang J, Yao S, Lyu C, Ji D. Multi-granularity neural sentence model for measuring short text similarity. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10177 LNCS. 2017; 439– 455. doi: 10.1007/978-3-319-55753-3_28.
Ferreira R, Cavalcanti GDC, Freitas F, Lins RD, Simske SJ, Riss M. Combining sentence similarities measures to identify paraphrases. Comput. Speech Lang. 2018; 47:59–73. doi: 10.1016/j.csl.2017.07.002.
Jiang JY, Bendersky M, Zhang M, Golbandi N, Li C, Najork M. Semantic text matching for long-form documents. Web Conf. 2019 - Proc. World Wide Web Conf. WWW 2019. 2019; 795–806. doi: 10.1145/3308558.3313707.
Homma Y, Sy S, Yeh C. Detecting Duplicate Questions with Deep Learning. 30th Conf. Neural Inf. Process. Syst. (NIPS 2016), no. Nips. 2016; 1–8. https://pdfs.semanticscholar.org/6ffd/e80e503fe6125237476494e777f4fe6d62c4.pdf

類似の記事

Evaluating Digital Imaging Technologies for Anogenital Injury Documentation in Sexual Assault Cases
Jon Giolitti, Abbigail Behmlander, Sydney Brief, Emma Dixon, Sydney Hudock, Linda Rossman, Stephanie Solis, Meredith Busman, Lisa Ambrose, Lindsey Ouellette and Jeffrey Jones
DOI10.61927/igmin246

Solar Energy Resource Potentials of the City of Arkadag
Penjiyev Ahmet Myradovich and Orazov Parahat Orazmuhamedovich
DOI10.61927/igmin119

Kinetic Study of the Removal of Reafix Yellow B8G Dye by Boiler Ash
Peterson Filisbino Prinz, Mariane Hawerroth, Liliane Schier de Lima and Juliana Martins Teixeira de Abreu Pietrobelli
DOI10.61927/igmin127

EB Naevi-like Lesion in Infant Bullous Pemphigoid
Laura Serpa, Haizza Monteiro, Maria de Oliveira Buffara, Raíssa Rodriguez, Ana Luisa Alves, Viviane Maria Maiolini and Elisa Fontenelle*
DOI10.61927/igmin201

Modeling of an Electric-fired Brick Oven, Directly Heated
André-Jacques Nlandu Mvuezolo, Jean Noël Luzolo Ngimbi and Lucien Mbozi
DOI10.61927/igmin157

Maternal Knowledge and Practices in Caring for Children under Five with Pneumonia: A Cross-Sectional Study in Vietnam
Thai Nguyen Duy, Xuyen Doan Huu, Phuong Ngo Phi and Viet Pham Tuan
DOI10.61927/igmin287

Diagnostic Challenges in Pancreatic Tumors
Ionuţ Simion Coman, Elena Violeta Coman, Costin George Florea, Teodora Elena Tudose, Cosmin Burleanu, Anwar Erchid and Valentin Titus Grigorean
DOI10.61927/igmin185

Nanorobots in Medicine: Advancing Healthcare through Molecular Engineering: A Comprehensive Review
Mahima Antil and Vaibhav Gupta
DOI10.61927/igmin271

A Machine Learning-based Method for COVID-19 and Pneumonia Detection
Qazi Waqas Khan
DOI10.61927/igmin211

The Examination of Game Skills of Children Aged 5-6 Years Participating in Movement Education
Bekir Erhan Orhan, Aydin Karaçam and Yuni Astuti
DOI10.61927/igmin196

Page Navigation

研究を公開する

私たちは、科学、技術、工学、医学に関する幅広い種類の記事を編集上の偏見なく公開しています。

提出する

見る原稿のガイドライン追加論文処理料

IgMin 科目を探索する

トップ10の記事をクリック

クイックリンク

原稿を提出する

研究論文

[1] Ye X, Manoharan S. Marking essays automatically. In Proceedings of the 2020 4th International Conference on E-Education, E-Business and E-Technology. 2020; 56–60.

[2] Stack Overflow Dataset. https://www.kaggle.com/datasets/stackoverflow/ stackoverflow

[3] Yazdaninia M, Lo D, Sami A. Characterization and prediction of questions without accepted answers on stack overflow. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE. 2021; 59–70.

[4] Zhang H, Zeng P, Hu Y, Qian J, Song J, Gao L. Learning visual question answering on controlled semantic noisy labels. Pattern Recognition. 2023; 138:109339.

[5] Roy PK, Saumya S, Singh JP, Banerjee S, Gutub A. Analysis of community question‐answering issues via machine learning and deep learning: State‐of‐the‐art review. CAAI Transactions on Intelligence Technology. 2023; 8(1):95-117.

[6] Fan M, Lin W, Feng Y, Sun M, Li P. A globalization-semantic matching neural network for paraphrase identification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2018; 2067–2075.

[7] Vani K, Gupta D. Text plagiarism classification using syntax-based linguistic features. Expert Systems with Applications. 2017; 88:448–464.

[8] Wang L, Zhang L, Jiang J. Duplicate question detection with deep learning in a stack overflow. IEEE Access. 2020; 8:25964–25975.

[9] Prabowo DA, Herwanto GB. Duplicate question detection in question-answer websites using a convolutional neural network. In 2019 5th International conference on science and technology (ICST). IEEE. 2019; 1:1–6.

[10] Roy PK, Singh JP. Predicting closed questions on community question answering sites using convolutional neural network: Neural Computing and Applications. 2020; 32(14):10555-10572.

[11] Chali Y, Islam R. Question-question similarity in online forums. In Proceedings of the 10th annual meeting of the forum for information retrieval evaluation. 2018; 21–28.

[12] Kamath CN, Bukhari SS, Dengel A. Comparative study between traditional machine learning and deep learning approaches for text classification. In Proceedings of the ACM Symposium on Document Engineering. 2018; 1–11.

[13] Kim Y, Jernite Y, Sontag D, Rush A. Characteraware neural language models. In Proceedings of the AAAI conference on artificial intelligence 2016; 30.

[14] Jiang JY, Zhang M, Li C, Bendersky M, Golbandi N, Najork M. Semantic text matching for long-form documents. In The world wide web conference. 2019; 795–806.

[15] Imtiaz Z, Umer M, Ahmad M, Ullah S, Choi GS, Mehmood A. Duplicate questions pair detection using siamese malstm. IEEE Access. 2020; 8:21932–21942.

[16] Goldberg Y, Levy O. word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722. 2014.

[17] Eyecioglu A, Keller B. Twitter paraphrase identification with simple overlap features and svms. In Proceedings of the 9th International Workshop on Semantic Evaluation. 2015; 64–69.

[18] Mudgal RK, Niyogi R, Milani A, Franzoni V. Analysis of tweets to find the basis of popularity based on events semantic similarity. International Journal of Web Information Systems. 2018; 14(4):438–452.

[19] Roul RK, Sahoo JK, Arora K. Modified tf-idf term weighting strategies for text categorization. In 2017 14th IEEE India council international conference (INDICON). IEEE. 2017; 1–6.

[20] Dey K, Shrivastava R, Kaushik S. A paraphrase and semantic similarity detection system for user generated short-text content on microblogs. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016; 2880–2890.

[21] Hassanzadeh H, Groza T, Nguyen A, Hunter J. A supervised approach to quantifying sentence similarity: with application to evidence based medicine. PloS one. 2015; 10(6):e0129392.

[22] Soğancıoğlu G, Öztürk H, Özgür A. Biosses: a semantic sentence similarity estimation system for the biomedical domain. Bioinformatics. 2017; 33(14):i49–i58.

[23] Wu D, Huang J, Yang S. A joint model for sentence semantic similarity learning. In 2017 13th International Conference on Semantics, Knowledge and Grids (SKG). IEEE. 2017; 120–125.

[24] Shaheer S, Hossain I, Sarna SN, Mehedi MHK, Rasel AA. Evaluating Question generation models using QA systems and Semantic Textual Similarity. In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC). IEEE. 2023; 0431-0435

[25] Amur ZH, Hooi KY, Bhanbhro H, Dahri K, Soomro GM. Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives. Applied Sciences. 2023; 13(6):3911.

[26] Huang J, Yao S, Lyu C, Ji D. Multi-granularity neural sentence model for measuring short text similarity. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10177 LNCS. 2017; 439– 455. doi: 10.1007/978-3-319-55753-3_28.

[27] Ferreira R, Cavalcanti GDC, Freitas F, Lins RD, Simske SJ, Riss M. Combining sentence similarities measures to identify paraphrases. Comput. Speech Lang. 2018; 47:59–73. doi: 10.1016/j.csl.2017.07.002.

[28] Jiang JY, Bendersky M, Zhang M, Golbandi N, Li C, Najork M. Semantic text matching for long-form documents. Web Conf. 2019 - Proc. World Wide Web Conf. WWW 2019. 2019; 795–806. doi: 10.1145/3308558.3313707.

[29] Homma Y, Sy S, Yeh C. Detecting Duplicate Questions with Deep Learning. 30th Conf. Neural Inf. Process. Syst. (NIPS 2016), no. Nips. 2016; 1–8. https://pdfs.semanticscholar.org/6ffd/e80e503fe6125237476494e777f4fe6d62c4.pdf

Browse by Subjects

Members

Articles

Explore Content

Identify Us

Publish Now

Policies

Manuscript Guidelines

Other Services

Identify Us

Search

Select Language

Explore Section

Revolutionizing Duplicate Question Detection: A Deep Learning Approach for Stack Overflow

Affiliation

要約

数字

参考文献

類似の記事

Page Navigation

研究を公開する

IgMin 科目を探索する

クイックリンク

研究論文

私たちを識別する

今すぐ公開する

その他のサービス

政策

原稿のガイドライン

連絡

Why Publish with IgMin Research?

Revolutionizing Duplicate Question Detection: A Deep Learning Approach for Stack Overflow

Affiliation

要約

数字

参考文献

類似の記事

Most Viewed

Nanorobots in Medicine: Advancing Healthcare through Molecular Engineering:...

The Salt and Dust of the Aral Sea Could Turn Central Asia into A Second Sah...

Revisiting Ice Ages Cycles...

Revisit TBCK-A Pseudo Kinase or a True Kinase...

Efficacy of Alternative Insecticides against Dusky Cotton Bug (Oxycarenus l...

Use of Augmented Reality as a Radiation-free Alternative in Pain Management...

Mastocytosis: Principles and Pitfalls in the Diagnosis of a Unique Disease...

Study of the Histological Features of the Stroma of High-Grade Gliomas Depe...

Correlation between Different Factors of Non-point Source Pollution in Yang...

Utilising Phytoremediation in Green Technologies: Exploring Natural Means o...

The Impact of Teledentistry on Modern Dental Practice...

The Role of CCL18 in Rheumatoid Arthritis Diseases...

A Study of Multi-Pose Effects On a Face Recognition System...

Synergistic Assessment of Supplementation of Ascorbic Acid and Massularia a...

The Influence of Low Pesticide Doses on Fusarium Molds...

Most Latest

Risks and Effects of Medicinal Plants as an Adjuvant Treatment in Mental Di...

A New Modification of Classification of Traumatic Patients with Pelvic Frac...

Revisiting 2,000 Years of Climate Change (Bad Science and the “Hockey Sti...

Innovative Strategies in the Prevention and Treatment of Peri-implantitis...

A Comprehensive Review of Federated Learning in Cancer Diagnosis and Progno...

Risk of Nutritional Deficiencies and Changes in Dietary Patterns after Bari...

Comparative Analysis of Lattice Pylons and Polygonal Monopods in the SNEL S...

Preparing for SpaceX Mission to Mars...

General Solutions for MHD Motions of Viscous Fluids with Viscosity Linearly...

Multicenter Molecular Integrals over Dirac Wave Functions for Several Funda...

Most Download

The Expressivity Dimension of Speech is the basis of the Expression Dimensi...

Diagnostic Challenges in Pancreatic Tumors...

The use of FIKR (Facet, Insight, Knowledge, and Resilience) Personality as ...

Peritoneal Carcinomatosis from Ovarian Cancer: A Case Report...

Into the Deep: Diving Record for the Dice Snake Natrix tessellata (Laurenti...

Unlawful Homicide of Two Ugly and Disabled Victims in a Japanese Tale Based...

The Examination of Game Skills of Children Aged 5-6 Years Participating in ...

The Relationship between Energy and Climate Warming...

EB Naevi-like Lesion in Infant Bullous Pemphigoid...

The Impact of Teledentistry on Modern Dental Practice...

Gaussian-Transform for the Dirac Wave Function and its Application to the M...

Current Oscillations and Resonances in Nanocrystals of Narrow-gap Semicondu...

On how Doping with Atoms of Gadolinium and Scandium affects the Surface Str...

Dimensioning of Splices Using the Magnetic System...

Enhancing Material Property Predictions through Optimized KNN Imputation an...

Page Navigation

研究を公開する

IgMin 科目を探索する

クイックリンク

IgMinリサーチを購読する

研究論文

Why Publish with IgMin Research?