Comparing machine learning algorithms with large language models in detecting fake news in social media | Статья в журнале «Молодой ученый»

Отправьте статью сегодня! Журнал выйдет 4 января, печатный экземпляр отправим 8 января.

Опубликовать статью в журнале

Авторы: ,

Рубрика: Информационные технологии

Опубликовано в Молодой учёный №44 (543) ноябрь 2024 г.

Дата публикации: 01.11.2024

Статья просмотрена: 2 раза

Библиографическое описание:

Кажымуханов, Алишер Тимурулы. Comparing machine learning algorithms with large language models in detecting fake news in social media / Алишер Тимурулы Кажымуханов, Ж. Ж. Кожамкулова. — Текст : непосредственный // Молодой ученый. — 2024. — № 44 (543). — С. 15-17. — URL: https://moluch.ru/archive/543/118863/ (дата обращения: 22.12.2024).



This study investigates the application of machine learning algorithms in detecting fake news on social media platforms and compares them with large language models (LLMs) such as GPT, BERT, and others. Fake news is a critical issue that impacts public opinion and decision-making, and identifying it effectively is essential in maintaining the integrity of information ecosystems. This research explores the effectiveness of machine learning models, including Support Vector Machines (SVM), Random Forests (RF), and Neural Networks (NN), and evaluate their performance in contrast to LLMs, which excel at understanding context through vast training datasets. The study employs publicly available datasets to test accuracy, precision, recall, and F1 scores. Results of research indicate that while LLMs show higher accuracy due to their advanced context comprehension, traditional machine learning models are faster and more resource-efficient in certain applications.

Keywords: fake news detection, machine learning, social networks, large language models, neural networks, text classification, support vector machines.

The proliferation of fake news on social media poses a significant challenge to the global information environment. The rapid spread of disinformation has the potential to manipulate public perception, influence elections, and create societal divisions. To counter this threat, advanced computational methods such as machine learning (ML) and large language models (LLMs) have been proposed for detecting and classifying misleading content.The relevance of this study stems from the need to compare the effectiveness of traditional ML models, which rely on structured input features, with modern LLMs, which utilize deep contextual understanding. While LLMs have shown superior performance in natural language processing (NLP) tasks, their computational costs remain high, making them less viable for real-time applications [1]. This paper examines whether machine learning algorithms can provide a viable, efficient alternative or complement to LLMs in the task of fake news identification. he primary objective of this study is to analyze and compare the performance of machine learning algorithms and LLMs in identifying fake news on social media platforms. The research seeks to answer the following questions:

Which machine learning algorithms are most effective for fake news detection?

How do these models compare with LLMs such as GPT and BERT in terms of accuracy, speed, and resource efficiency?

— What are the potential benefits of hybrid models that combine elements of both approaches?

Research Question.How do machine learning algorithms and large language models compare in identifying fake news on social networks in terms of accuracy, computational cost, and real-time application?

The hypothesis posits that large language models will outperform traditional machine learning algorithms in accuracy and contextual understanding but will require greater computational resources. Machine learning algorithms, while less accurate in some contexts, may offer faster processing speeds and lower resource demands.

This study follows a comparative experimental approach, evaluating both machine learning models and LLMs on the same dataset. The dataset includes verified fake news and authentic news articles sourced from publicly available repositories such as the FakeNewsNet dataset.

Machine Learning Models: Support Vector Machines (SVM): A classifier that maximizes the margin between classes; Random Forest (RF): An ensemble learning method that operates by constructing multiple decision trees; Neural Networks (NN): Basic feed-forward neural networks trained on structured input features [3].

Large Language Models (LLMs): GPT (Generative Pretrained Transformer): An autoregressive language model that uses deep learning to produce human-like text; BERT (Bidirectional Encoder Representations from Transformers): A model that excels at context-dependent word predictions [4].

Data Preprocessing text data was preprocessed through tokenization, stemming, and stop-word removal. For machine learning models, features were extracted using term frequency-inverse document frequency (TF-IDF). LLMs utilized word embeddings inherent to their pre-training process.

Study evaluated the performance of each model using the following metrics:

  1. Accuracy: The proportion of correct predictions.
  2. Precision: The ability of the model to correctly identify fake news.
  3. Recall: The proportion of actual fake news correctly identified.
  4. F1 Score: A weighted average of precision and recall.
  5. Computational Time: The time taken to process and classify the text.

Results . The comparison between machine learning algorithms and LLMs revealed significant differences in performance.

Table 1

Model

Accuracy

Precision

Recall

F1 Score

Computational Time (s)

SVM

83 %

81 %

78 %

80 %

1.2

Random Forest

85 %

84 %

83 %

83.5 %

1.4

Neural Networks (NN)

88 %

85 %

87 %

86 %

2.0

GPT (LLM)

94 %

92 %

93 %

92.5 %

6.5

BERT (LLM)

96 %

94 %

95 %

94.5 %

7.9

From the table 1, it is evident that LLMs, such as GPT and BERT, outperform traditional machine learning algorithms in all metrics except for computational time. The significant increase in accuracy and F1 score is due to LLMs’ ability to understand and process deeper linguistic nuances. However, they come at the cost of much longer processing times.

Discussion . The results suggest that LLMs are more effective for fake news detection due to their superior contextual understanding. However, they demand more computational resources, which limits their scalability in real-time applications. Traditional machine learning models like SVM and RF, while less accurate, offer quicker classification times, making them more practical for immediate fake news detection. Hybrid approaches, combining the efficiency of traditional algorithms with the context-aware abilities of LLMs, may present a viable solution. For instance, a system could use an SVM for initial filtering and pass more ambiguous cases to an LLM for deeper analysis.

Conclusions . This study concludes that while large language models demonstrate higher accuracy and better contextual understanding in detecting fake news on social networks, machine learning algorithms remain competitive in terms of speed and computational efficiency. The best solution for real-time fake news detection may lie in hybrid models that balance these trade-offs. Future research should focus on optimizing the integration of these models to enhance both accuracy and performance. Additionally, further studies on the ethical implications and societal impact of automated fake news detection systems are recommended.

References:

  1. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. IEEE Intelligent Systems , 32(5), 70–76.
  2. Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: A deep learning approach. IEEE International Conference on Intelligent Systems and Green Technology (ICISGT), Visakhapatnam, India, pp. 1–5.
  3. Zhou, X., & Zafarani, R. (2019). Network-based fake news detection: A pattern-driven approach. IEEE Transactions on Computational Social Systems, 6(3), 830–846.
  4. Shu, K., Wang, S., & Liu, H. (2019). Beyond news contents: The role of social context for fake news detection. IEEE Transactions on Knowledge and Data Engineering , 31(6), 987–1001.
  5. Khattar, D., Goud, J. S., Gupta, M., & Varma, V. (2019). MVAE: Multimodal variational autoencoder for fake news detection . IEEE International World Wide Web Conference (WWW), pp. 2915–2921.
  6. Kim, H., Oh, D., Choi, M., & Lee, H. (2018). Detecting fake news with machine le arning models: A study on the performance and comparison with manual detection. IEEE Access, 6, 13464–13475
Основные термины (генерируются автоматически): IEEE, BERT, GPT, SVM, LLM, ICISGT, MVAE, NLP, TF-IDF, WWW.


Ключевые слова

machine learning, social networks, neural networks, fake news detection, large language models, text classification, support vector machines

Похожие статьи

Enhancing collaboration and problem-solving skills in computer science education through IoT-based learning environments

In the rapidly evolving landscape of computer science education, the cultivation of collaboration and problem-solving skills is paramount for preparing students to thrive in the digital age. This paper explores how Internet of Things (IoT)-based lear...

Pros and cons of using artificial intelligence and chatbots in teaching foreign languages

The article examines the advantages and disadvantages of using artificial intelligence (AI) and chatbots in the process of teaching foreign languages. Key advantages include personalized learning, accessibility, flexibility, and the ability to practi...

The practical side of IoT implementation in smart cities

Smart cities are an innovative concept of sustainable urban design that gains popularity across the globe. Its core premise is the provision of smart educational, utility, traffic, waste and resource management, and health services with an advanced l...

Benefits of improving phonetic skills using technological resources

This article sheds light on the main problems English learners experience in pronunciation sounds that are unfamiliar for them or they do not have in their native language. Also it focuses on importance of applying computer technology in teaching pro...

Using tidal energy as a clean energy source to generate electricity

The potential of electric power generation from marine tidal currents is enormous. Tidal currents are being recognized as a resource to be exploited for the sustainable generation of electrical power. The high load factors resulting from the fluid pr...

Comparative analysis of the authenticity of the texts of the «Reading» section in the unified state exam in English in Russia and China

This article presents a comparative analysis of the authenticity of texts in the «Reading» section of the Unified State Exam (EGE) in English language proficiency in Russia and China. Analyzing 50 authentic texts covering diverse themes such as educa...

Information and communication technologies as a means of forming key competencies of students in English lessons

Teaching a foreign language is impossible without the use of modern teaching technologies, such as training in development, project methods, information technologies, including the Internet. This problem lies in the fact that information and communic...

Threat intelligence in cybersecurity

Cybersecurity threats continue to evolve, becoming more sophisticated and difficult to detect. As a result, organizations need to be proactive in identifying and mitigating these threats. Threat intelligence is a critical component of a comprehensive...

Synectics as a strategic asset in innovative teaching

In the article the authors investigate and make reasoned conclusions of the use of innovative teaching methods in education. The types and advantages of innovative methods are considered education in higher education, describes their features, which ...

Implementing of multimedia technologies in a foreign language lesson for university students

At the current stage of educational development, Kazakhstan's universities of higher education must work to improve the quality of lessons, which is connected to the possibility of implementing a technological approach with the development of new sta...

Похожие статьи

Enhancing collaboration and problem-solving skills in computer science education through IoT-based learning environments

In the rapidly evolving landscape of computer science education, the cultivation of collaboration and problem-solving skills is paramount for preparing students to thrive in the digital age. This paper explores how Internet of Things (IoT)-based lear...

Pros and cons of using artificial intelligence and chatbots in teaching foreign languages

The article examines the advantages and disadvantages of using artificial intelligence (AI) and chatbots in the process of teaching foreign languages. Key advantages include personalized learning, accessibility, flexibility, and the ability to practi...

The practical side of IoT implementation in smart cities

Smart cities are an innovative concept of sustainable urban design that gains popularity across the globe. Its core premise is the provision of smart educational, utility, traffic, waste and resource management, and health services with an advanced l...

Benefits of improving phonetic skills using technological resources

This article sheds light on the main problems English learners experience in pronunciation sounds that are unfamiliar for them or they do not have in their native language. Also it focuses on importance of applying computer technology in teaching pro...

Using tidal energy as a clean energy source to generate electricity

The potential of electric power generation from marine tidal currents is enormous. Tidal currents are being recognized as a resource to be exploited for the sustainable generation of electrical power. The high load factors resulting from the fluid pr...

Comparative analysis of the authenticity of the texts of the «Reading» section in the unified state exam in English in Russia and China

This article presents a comparative analysis of the authenticity of texts in the «Reading» section of the Unified State Exam (EGE) in English language proficiency in Russia and China. Analyzing 50 authentic texts covering diverse themes such as educa...

Information and communication technologies as a means of forming key competencies of students in English lessons

Teaching a foreign language is impossible without the use of modern teaching technologies, such as training in development, project methods, information technologies, including the Internet. This problem lies in the fact that information and communic...

Threat intelligence in cybersecurity

Cybersecurity threats continue to evolve, becoming more sophisticated and difficult to detect. As a result, organizations need to be proactive in identifying and mitigating these threats. Threat intelligence is a critical component of a comprehensive...

Synectics as a strategic asset in innovative teaching

In the article the authors investigate and make reasoned conclusions of the use of innovative teaching methods in education. The types and advantages of innovative methods are considered education in higher education, describes their features, which ...

Implementing of multimedia technologies in a foreign language lesson for university students

At the current stage of educational development, Kazakhstan's universities of higher education must work to improve the quality of lessons, which is connected to the possibility of implementing a technological approach with the development of new sta...

Задать вопрос