Comparing machine learning algorithms with large language models in detecting fake news in social media | Статья в журнале «Молодой ученый»

Отправьте статью сегодня! Журнал выйдет 16 ноября, печатный экземпляр отправим 20 ноября.

Опубликовать статью в журнале

Авторы: ,

Рубрика: Информационные технологии

Опубликовано в Молодой учёный №44 (543) ноябрь 2024 г.

Дата публикации: 01.11.2024

Статья просмотрена: 1 раз

Библиографическое описание:

Кажымуханов, Алишер Тимурулы. Comparing machine learning algorithms with large language models in detecting fake news in social media / Алишер Тимурулы Кажымуханов, Ж. Ж. Кожамкулова. — Текст : непосредственный // Молодой ученый. — 2024. — № 44 (543). — URL: https://moluch.ru/archive/543/118863/ (дата обращения: 07.11.2024).



This study investigates the application of machine learning algorithms in detecting fake news on social media platforms and compares them with large language models (LLMs) such as GPT, BERT, and others. Fake news is a critical issue that impacts public opinion and decision-making, and identifying it effectively is essential in maintaining the integrity of information ecosystems. This research explores the effectiveness of machine learning models, including Support Vector Machines (SVM), Random Forests (RF), and Neural Networks (NN), and evaluate their performance in contrast to LLMs, which excel at understanding context through vast training datasets. The study employs publicly available datasets to test accuracy, precision, recall, and F1 scores. Results of research indicate that while LLMs show higher accuracy due to their advanced context comprehension, traditional machine learning models are faster and more resource-efficient in certain applications.

Keywords: fake news detection, machine learning, social networks, large language models, neural networks, text classification, support vector machines.

The proliferation of fake news on social media poses a significant challenge to the global information environment. The rapid spread of disinformation has the potential to manipulate public perception, influence elections, and create societal divisions. To counter this threat, advanced computational methods such as machine learning (ML) and large language models (LLMs) have been proposed for detecting and classifying misleading content.The relevance of this study stems from the need to compare the effectiveness of traditional ML models, which rely on structured input features, with modern LLMs, which utilize deep contextual understanding. While LLMs have shown superior performance in natural language processing (NLP) tasks, their computational costs remain high, making them less viable for real-time applications [1]. This paper examines whether machine learning algorithms can provide a viable, efficient alternative or complement to LLMs in the task of fake news identification. he primary objective of this study is to analyze and compare the performance of machine learning algorithms and LLMs in identifying fake news on social media platforms. The research seeks to answer the following questions:

Which machine learning algorithms are most effective for fake news detection?

How do these models compare with LLMs such as GPT and BERT in terms of accuracy, speed, and resource efficiency?

— What are the potential benefits of hybrid models that combine elements of both approaches?

Research Question.How do machine learning algorithms and large language models compare in identifying fake news on social networks in terms of accuracy, computational cost, and real-time application?

The hypothesis posits that large language models will outperform traditional machine learning algorithms in accuracy and contextual understanding but will require greater computational resources. Machine learning algorithms, while less accurate in some contexts, may offer faster processing speeds and lower resource demands.

This study follows a comparative experimental approach, evaluating both machine learning models and LLMs on the same dataset. The dataset includes verified fake news and authentic news articles sourced from publicly available repositories such as the FakeNewsNet dataset.

Machine Learning Models: Support Vector Machines (SVM): A classifier that maximizes the margin between classes; Random Forest (RF): An ensemble learning method that operates by constructing multiple decision trees; Neural Networks (NN): Basic feed-forward neural networks trained on structured input features [3].

Large Language Models (LLMs): GPT (Generative Pretrained Transformer): An autoregressive language model that uses deep learning to produce human-like text; BERT (Bidirectional Encoder Representations from Transformers): A model that excels at context-dependent word predictions [4].

Data Preprocessing text data was preprocessed through tokenization, stemming, and stop-word removal. For machine learning models, features were extracted using term frequency-inverse document frequency (TF-IDF). LLMs utilized word embeddings inherent to their pre-training process.

Study evaluated the performance of each model using the following metrics:

  1. Accuracy: The proportion of correct predictions.
  2. Precision: The ability of the model to correctly identify fake news.
  3. Recall: The proportion of actual fake news correctly identified.
  4. F1 Score: A weighted average of precision and recall.
  5. Computational Time: The time taken to process and classify the text.

Results . The comparison between machine learning algorithms and LLMs revealed significant differences in performance.

Table 1

Model

Accuracy

Precision

Recall

F1 Score

Computational Time (s)

SVM

83 %

81 %

78 %

80 %

1.2

Random Forest

85 %

84 %

83 %

83.5 %

1.4

Neural Networks (NN)

88 %

85 %

87 %

86 %

2.0

GPT (LLM)

94 %

92 %

93 %

92.5 %

6.5

BERT (LLM)

96 %

94 %

95 %

94.5 %

7.9

From the table 1, it is evident that LLMs, such as GPT and BERT, outperform traditional machine learning algorithms in all metrics except for computational time. The significant increase in accuracy and F1 score is due to LLMs’ ability to understand and process deeper linguistic nuances. However, they come at the cost of much longer processing times.

Discussion . The results suggest that LLMs are more effective for fake news detection due to their superior contextual understanding. However, they demand more computational resources, which limits their scalability in real-time applications. Traditional machine learning models like SVM and RF, while less accurate, offer quicker classification times, making them more practical for immediate fake news detection. Hybrid approaches, combining the efficiency of traditional algorithms with the context-aware abilities of LLMs, may present a viable solution. For instance, a system could use an SVM for initial filtering and pass more ambiguous cases to an LLM for deeper analysis.

Conclusions . This study concludes that while large language models demonstrate higher accuracy and better contextual understanding in detecting fake news on social networks, machine learning algorithms remain competitive in terms of speed and computational efficiency. The best solution for real-time fake news detection may lie in hybrid models that balance these trade-offs. Future research should focus on optimizing the integration of these models to enhance both accuracy and performance. Additionally, further studies on the ethical implications and societal impact of automated fake news detection systems are recommended.

References:

  1. Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. IEEE Intelligent Systems , 32(5), 70–76.
  2. Thota, A., Tilak, P., Ahluwalia, S., & Lohia, N. (2018). Fake news detection: A deep learning approach. IEEE International Conference on Intelligent Systems and Green Technology (ICISGT), Visakhapatnam, India, pp. 1–5.
  3. Zhou, X., & Zafarani, R. (2019). Network-based fake news detection: A pattern-driven approach. IEEE Transactions on Computational Social Systems, 6(3), 830–846.
  4. Shu, K., Wang, S., & Liu, H. (2019). Beyond news contents: The role of social context for fake news detection. IEEE Transactions on Knowledge and Data Engineering , 31(6), 987–1001.
  5. Khattar, D., Goud, J. S., Gupta, M., & Varma, V. (2019). MVAE: Multimodal variational autoencoder for fake news detection . IEEE International World Wide Web Conference (WWW), pp. 2915–2921.
  6. Kim, H., Oh, D., Choi, M., & Lee, H. (2018). Detecting fake news with machine le arning models: A study on the performance and comparison with manual detection. IEEE Access, 6, 13464–13475
Основные термины (генерируются автоматически): IEEE, BERT, GPT, SVM, LLM, ICISGT, MVAE, NLP, TF-IDF, WWW.


Ключевые слова

machine learning, social networks, neural networks, fake news detection, large language models, text classification, support vector machines

Похожие статьи

Задать вопрос