Paper
1 March 2023 Weakly supervised text classification method based on transformer
Ling Gan, Aijun Yi
Author Affiliations +
Proceedings Volume 12596, International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022); 125962F (2023) https://doi.org/10.1117/12.2672391
Event: International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022), 2022, Changsha, China
Abstract
The seed word-driven approach based on weakly supervised text classification (WTC) is the dominant approach. In existing seed word-driven methods,using metrics such as Term Frequency (TF), Inverse Document Frequency (IDF) and its combinations to update the seed words. the method assigns the same weight to all metrics, leading to the selection of common or poorly differentiated words as seed words; In addition most of the text classifiers used in the study have difficulty in capturing the correlation and global information between text information. In order to solve the above problems, Using Transformer as a text classifier first, The multi-headed self-attention mechanism allows capturing longrange dependencies while computing in parallel and fully learning the global semantic information of the input text. Then an improved TF-IDF method is proposed to increase the weight of IDF so that some common words that affect the classification can be filtered out. Its experimental results are improved on 20News and NYT datasets.
© (2023) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ling Gan and Aijun Yi "Weakly supervised text classification method based on transformer", Proc. SPIE 12596, International Conference on Mechatronics Engineering and Artificial Intelligence (MEAI 2022), 125962F (1 March 2023); https://doi.org/10.1117/12.2672391
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Classification systems

Transformers

Back to Top