Paper
10 November 2022 Toxic detection based on RoBERTa and TF-IDF
Xinmin Liu, Feiyu Zhao
Author Affiliations +
Proceedings Volume 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022); 123481C (2022) https://doi.org/10.1117/12.2641437
Event: 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 2022, Zhuhai, China
Abstract
In an information age, the presence of toxic content has become a major problem for many online communities, and existing methods are not robust enough to detect it. Therefore, the demand for a more accurate and efficient system for toxic messages detection has reached its peak. In this paper, we introduce machine learning and deep learning models to this task. Following the intuition of acquiring the knowledge of both the word itself and its relationship with other words, a stacking model is constructed as the optimal strategy, combining both term frequency-inverse document frequency method (TF-IDF), a robustly optimized Bidirectional Encoder Representations from Transformers pretraining approach (RoBERTa) as the base-model, and neural network as the meta-model. The experiments show that stacking method and K-fold cross validation are advantageous, and our model achieves a detecting accuracy of 0.9023.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Xinmin Liu and Feiyu Zhao "Toxic detection based on RoBERTa and TF-IDF", Proc. SPIE 12348, 2nd International Conference on Artificial Intelligence, Automation, and High-Performance Computing (AIAHPC 2022), 123481C (10 November 2022); https://doi.org/10.1117/12.2641437
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Neural networks

Machine learning

Computer programming

Transformers

Data modeling

Performance modeling

Toxicity

RELATED CONTENT

Stock text topic recognition based on Stu-BERT
Proceedings of SPIE (June 15 2022)
Tax service volume forecasting based on informer
Proceedings of SPIE (December 02 2022)
Acronym identification based on SpanBERT-CRF
Proceedings of SPIE (September 07 2022)
A novel text generation algorithm based on few-shot dataset
Proceedings of SPIE (November 30 2022)
Use transformer encoder for KPI anomaly detection
Proceedings of SPIE (May 05 2022)

Back to Top