A machine learning approach for detecting customs fraud through unstructured data analysis in social media
Bundidth Dangsawang, Siranee Nuchitprasitchai
Goods and services are sold through social media by individuals not authorized as legitimate dealers, resulting in lost taxes and customs duties to governments. This study proposes a model called SHIELD for detecting these violations through unstructured data in social media. The process involves collecting 2,373,570 records of commercial goods from social media platforms such as Twitter and Facebook in three phases. In Phase 1, keywords for labeling are collected for text classification. Three categories of results are defined: Red Line for smuggled goods, unpaid duty, prohibited goods, and restricted goods; Green Line for non-commercial goods; and Inspect for goods that cannot be identified from the text and require further investigation. Phase 2 and Phase 3 use keywords to detect smugglers from unstructured social media data for labeling grouped by three algorithms of Logistic Regression (LR), Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM), employed to classify imported illegal products. The results of all tests show that the LSTM technique had the best accuracy of 99.44% and the best average F1 score of 90.55%. Using algorithms and techniques such as LR, GRU, and LSTM demonstrates the potential of machine learning and natural language processing in detecting illegal activities and promoting economic security.
Year of publication: |
2024
|
---|---|
Authors: | Dangsawang, Bundidth ; Nuchitprasitchai, Siranee |
Subject: | Commercial goods | Customs duties | Gated Recurrent Unit | Logistic Regression | Long short-term memory | Unstructured data | Social Web | Social web | Künstliche Intelligenz | Artificial intelligence | Konsumentenverhalten | Consumer behaviour | Data Mining | Data mining | Regressionsanalyse | Regression analysis |
Saved in:
freely available
Saved in favorites
Similar items by subject
-
Machine learning approaches to sentiment analysis in online social networks
Mallick, Chandrakant, (2023)
-
Online newspaper subscriptions : using machine learning to reduce and understand customer churn
Belchior, Lúcia Madeira, (2024)
-
Application of the information Bottleneck method to discover user profiles in a web store
Iwański, Jacek, (2018)
- More ...