|
1.INTRODUCTIONMachine reading comprehension1-3, intelligent question answering4, such as artificial customer service is inseparable from the knowledge base, and in the knowledge base to the common sense knowledge5, in 1956, Dartmouth conference put forward issues, let the machine to be able to use human language is put forward, and assuming that the machine has a large proportion of the human mind, on the basis of reasoning or speculate rules to use language. In 1958, McCarthy believed that programming a computer to solve a problem required a higher degree of intelligence in the program, and that a higher degree of intelligence required common sense in the machine. Common sense is the common, everyday consensus that society has about the same thing. For example: People are born and die when the sun rises in the east and sets in the west, lemons are sour and leaves fall off, etc. These are some common senses reasoning that people know. Common Sense Reasoning is related logical reasoning based on the acquired knowledge and has been applied in the field of question and answer. Up to now, many common sense reasoning6 benchmarks have appeared, but to make machines have human consciousness, still needs to be further refined and perfected, which is also a bottleneck area. At the moment, it’s mostly about generating common sense through text. Common sense reasoning is divided into holistic part reasoning, classification tree reasoning, time reasoning, story reasoning, action reasoning, spatial reasoning, physical reasoning and popular psychological reasoning, as shown in Figure 1. For the model of common sense reasoning, how to improve its accuracy and loss rate is still a difficulty in the application in the field of question and answer. It is necessary to use the given concept to generate daily sentences and let the machine identify whether it is common sense or not. Based on a brief introduction to the relevant concepts of text-based knowledge reasoning7, this paper summarizes the explanation of common sense knowledge base and the relevant models of common sense reasoning as well as its research progress, summarizes some existing problems, and puts forward the future development direction. 2.SOCIAL COMMON SENSE APPROACH BASED ON PRE-TRAINING MODE2.1Knowledge base of common senseNow computers have built various knowledge bases for understanding natural languages8, such as vocabulary knowledge base, syntactic annotation library, semantic relation knowledge base and annotation base, common sense knowledge base, a knowledge base that combines common sense with vocabulary, the combination of vocabulary, common sense and ontology knowledge base, etc., as shown in Table 1. Table 1.Knowledge base.
Common sense knowledge is implicit and diverse. The acquisition methods of common sense knowledge can be divided into two categories: one is manual compilation of common sense knowledge, and the other is automatic acquisition. The most reliable way to acquire common sense knowledge is manual acquisition, but manual acquisition is more complicated and troublesome, and you need to upgrade the corresponding technology to obtain the method. 2.2Common sense knowledge based acquisition methodThe common sense knowledge base can be obtained in various forms, such as word vectors, knowledge bases, text, pictures, voices, and knowledge graphs9. The following three methods will be explained. 2.2.1Word vector acquisition.Common sense knowledge can be obtained through word embedding, and keywords are extracted according to context10 irrelevant or context-related. Through keywords, the general idea of the article can be obtained as quickly as possible. Generally, we use word2vec for keyword extraction. TF-IDF, TextRank and other methods for keyword extraction. Through the keywords in the document to obtain the weight to choose, select the required knowledge content of the relevant common sense. 2.2.2Knowledge base acquisition.The knowledge base of common sense can be obtained through the knowledge base. Common sense databases include Cyc, ConceptNet, Atomic, Wikipedia databasing, COMET, GOGBASE, and Cosmos QA. If the required knowledge is not complete, it will be through dimensionality reduction, extractive completion and automatic Completion and so on are completed. The following mainly lists three commonly used knowledge repositories:
2.2.3Knowledge graph acquisition.The knowledge graph12 contains massive world knowledge and is stored in a structured form. Each node represents an entity in the real world, and their connecting edges mark the relationship between the entities13. The knowledge graph is generated through relationship prediction, entity linking, and knowledge-based, and the knowledge graph is named by triples. Create the corresponding information, and get the required knowledge obtained by labeling the sequence. 2.4Social knowledge based on pre-trainedBERT (Bidirectional Encoder Representations from Transformers) is a language representation model that uses the encoder part of the Transformer, with CLS as the beginning part and SEP as the end part. The BERT14 model uses two pre-training tasks15, one is a two-way language model, and the other is to judge the text of the next paragraph. And BERT is used for unsupervised learning in deep learning16. 2.4.1Bidirectional language model.A bidirectional language model uses a language model to obtain a context-dependent pre-trained representation, also known as ELMO. A bidirectional language model uses a two-layer bidirectional LSTM language model. Bi-LSTM-CRF (Bi-Direction Long Short-Term Memory Conditional Random Field, Bi-Direction Long Short-Term Memory Network - Conditional Random Field), after improving the RNN of the cyclic neural network17, The memory cell is used to capture the long-distance dependence of RNN, which effectively solves the problem of long-distance dependence of RNN. The activation function of each cell is replaced by an LSTM cell, and all of them are bidirectional RNN. The final layer of the BI-LSTM-CRF structure uses the CRF to learn an optimal path. Then, 15% of the words in a text were randomly selected and replaced by masking symbols through masking mechanism. However, there would be gaps in the target task if the 15% method was adopted. Therefore, Bert improved the bidirectional language model and changed the probability of masking words to 80% before. There is a 10% chance for random words and a 10% chance for words that remain the same. This can not only improve the accuracy of the results, but also reduce the gap between the target tasks. 2.4.2Judgment of the next paragraph.Judging the next paragraph of text is simply to judge whether the content of the next paragraph of text is the follow-up of the content of the previous text according to the text. The selected text is obtained by 50% positive and negative ratio, the forward column is related to two texts, and the negative column is unrelated. The pre-trained Bert model is used to link them together, and the CLS and ERP are used to train them to obtain the corresponding probability σWxCLS. 2.4.3Bert parameter adjustment method.For the pre-training model Bert, in order to improve the accuracy of data and reduce its loss rate, the Bert model will be adjusted accordingly to improve the accuracy of the model. The following methods are usually used to adjust the parameters.
3.EXPERIMENT AND RESULTS3.1Data set and experimental environmentThe data set used in the text is public data set, set some specific things, extrapolate the relevant questions to get the correct results, and aim to evaluate the ability of the model to extrapolate the common sense of life. Each question in the dataset has an associated description and a correct answer. The common knowledge data set constructed by common knowledge is divided into train and test training sets, of which 64,532 are training sets and 6276 are test sets. The experimental platform in this paper runs on a 1.50GHz, i5 processor, 16GB operating memory laptop, compiled under PyCharm Professional Edition 2020.2, and 64-bit Windows10 operating system. 3.2Experimental settingsIt is carried out under unsupervised conditions, using the framework structure of TensorFlow and bidirectional language model BI-LSTM-CRF. By reading training data and test data, the training set and test set contain about 20 million sets of dialogues. By parsing the dialogue, the dialogue, questions and answers of Tokenize are returned. The three word sequences are added together and the word table is constructed. The dataset is English. The word segmentation can be used directly with Spaces to parse all conversations, and ID mapping is performed on the training set and test set to build a network embedding and dropout model. After the dialogue set and the problem set model, LSTM + Dropout + Dense was carried out to complete the preliminary establishment of common sense Q&A. 3.3Experimental results and analysisThrough the dimension of the mapping vector, the number of initial iterations and the accuracy and loss rate of the algorithm in the training set and test set. The experimental structure is shown in Figure 2. With the improvement of iteration number and gradient descent value, the accuracy of training set and test set is improved, and the loss rate is decreased. The experimental results are shown in Figure 3. As can be seen from Figures 2 and 3, when different iterations and optimization algorithms are improved in different ways, the accuracy of the data set will be improved accordingly. When the epoch∈ [20,35], its accuracy rate is close to 80%, while the loss rate is close to 45%, both of which gradually approach to a flat state. Through the tests for the data set, and the parameter adjustment, among them for the loss of the training set value and the loss of the training set is related rules to follow, to the loss values of different training set is in decline, stable or rise, all need to be adjusted, the corresponding losses only in the training set value and the loss of the training set are trending down, It shows that our training network is normal and in the best situation, but the loss value of the training set tends to be stable, which indicates that the fitting and the addition of pooling are needed. 4.SUMMARY AND PROSPECTOverall, as a common sense reasoning knowledge reasoning in a small branch, also is an important part of, but for the common sense reasoning is also facing many difficulties, first of all, knowledge is difficult, for the common sense knowledge would have greater difficulty, similar to a sentence, how to distinguish whether there is a common sense in the sentence, especially for special event in the common sense, Sometimes it is impossible for humans to make accurate judgments, and now it is impossible for machines to meet such high standards; The second is the loss value calculated for the loss of the evaluation data set when selecting the model for common sense reasoning. Assessment results of accuracy, which requires common sense reasoning requires a high performance score to conduct a good assessment of common sense reasoning. Problems in common sense reasoning also vary in degree of difficulty and accuracy, so comparison cannot be made. Finally, it is also a difficulty to make the machine get the correct answer accurately when multiple common sense knowledge appears in one sentence. The above factors will certainly affect the research process of common sense reasoning, but the research on common sense is still optimistic, which has a great impact on common sense reasoning in question answering system, medical treatment and other aspects. Therefore, in the improper improvement of knowledge base, the research significance of common sense reasoning is still very significant. The development of Bert based on the pre-training model is of great significance, and its influence in the field of NLP has been for a long time. When the data set is large, the accuracy can be better improved by adjusting the parameters of Bert. In this project, the deep learning model was initially used for a single training, and then a test to further refine the model and data set was incorporated into the question-and-answer based on social knowledge reasoning. ACKNOWLEDGMENTThe work is supported by the Fundamental Research Funds for the Central University (NO.31920210017), Gansu Province Archives Science and Technology Project (GS-2020-X-07), Gansu Province Youth Science and Technology Fund Project(21JR1RA211), and Major National R&D Projects (NO.2017YFB1002103). REFERENCESZhu, C., Pre-trained Model, Machine Reading Comprehension Algorithm and Practice, China Machine Press, (2020). Google Scholar
Zhang, C., Qiu, H., Sun, Y., et al.,
“A review of machine reading comprehension based on pre-training model,”
Computer Engineering and Applications, 56
(11), 17
–25
(2020). Google Scholar
Zhang, Y., Jiang, Y., Mao, T., et al.,
“MCA-Reader: An attentional reading comprehension model based on multiple connections,”
Journal of Computational Linguistics, 33
(10), 73
–80
(2019). Google Scholar
Wang, H., Li, Z., Lin, X., et al., Basic Structure, Intelligent Question Answering and Deep Learning, Publishing House of Electronics Industry, (2019). Google Scholar
Liu, Y., Wan, Y., He, L., et al.,
“KG-BART: Knowledge graph-augmented BART for generative commonsense reasoning,”
arXiv:2009.12677v1,
(2020). Google Scholar
Lim, J., Oh, D., Jang, Y., et al.,
“I know what you asked: Graph path learning using AMR for commonsense reasoning,”
arXiv:2011.00766v2,
(2020). Google Scholar
Jiang, T., Qin, B. and Liu, T.,
“Open domain Chinese knowledge reasoning based on representation learning,”
Journal of Chinese Information Science, 32
(03), 34
–41
(2018). Google Scholar
Mihindukulasooriya, N., Rossiello, G., Kapanipathi, P., et al.,
“Leveraging semantic parsing for relation linking over knowledge bases,”
The Semantic Web-ISWC 2020, 402
–419
(2020). https://doi.org/10.1007/978-3-030-62419-4 Google Scholar
Fang, Y., Zhao, X., Tan, Z., Yang, S. and Xiao, W.,
“An improved translation based knowledge graph representation,”
Journal of Computer Research and Development, 55
(1), 139
–150
(2018). Google Scholar
Yan, J., Raman, M., Chan, A., et al.,
“Learning contextualized knowledge structures for commonsense reasoning,”
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021,
(2021). Google Scholar
Moghimifar, F., Qu, L., Zhuo, Y., et al.,
“COSMO: Conditional SEQ2SEQ-based mixture model for zero-shot commonsense question answering,”
in Proc. of the 28th Inter. Conf. on Computational Linguistics,
(2020). https://doi.org/10.18653/v1/2020.coling-main Google Scholar
Guan, S., Jin, X., Jia, Y., Wang, Y. and Cheng, X.,
“Research progress of knowledge reasoning for knowledge graph,”
Journal of Software, 29
(10), 2966
–2994
(2018). Google Scholar
Zhang, F. R. and Yang, Q.,
“Research on entity relation extraction method in knowledge-based question answering,”
Computer Engineering and Applications, 56
(11), 219
–224
(2020). Google Scholar
Devlin, J., Chang, M. W., Lee, K., et al.,
“BERT: Pre-training of deep bidirectional transformers for language understanding,”
in Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,
14171
–4186
(2019). Google Scholar
Cui, Y. M., Che, W. X., Liu, T., et al.,
“Revisiting pre-trained models for Chinese natural language processing,”
arXiv:2004.13922,
(2020). Google Scholar
Wang, N. Y., Ye, Y. X., Liu, L., Feng, L. Z., Bao, T. and Peng, T.,
“Language models based on deep learning: A review,”
Ruan Jian Xue Bao Journal of Software, 32
(4), 1082
–1115
(2021). Google Scholar
Zhang, Z., Cao, L. and Chen, X.,
“Neural network based on neural network,”
Journal of Computational and Applied Mathematics, 32
(1), 67
–72
(2013). Google Scholar
|