基于改进文本挖掘算法的情感倾向分析(任务书,开题报告,论文12000字)
摘要
情感倾向分析在文本挖掘范畴研究中占十分重要的地位,在大数据时代,文本数据的情感倾向分析研究吸引了很多的研究人员,如何深入学习文本所蕴含的信息,准确地表示出词语的特征,提高情感倾向分析的准确度是研究的主要目标。
在以往,研究人员大多采用人工抽样标注方法和机器学习算法相结合构造模型,往往需要大量的时间和人力。近年来研究人员开始使用深度学习的自动抽取特征。深度学习在文本挖掘范畴中最特出的一个研究成果是词向量。传统词向量是根据上下文信息学习获得,缺乏情感倾向信息,不是很好的解决情感倾向分析研究。卷积神经网络(CNN)和循环神经网络(RNN)是深度学习中比较有效的模型,CNN适合从数据中提出局部特征,而RNN可以有效分析时序数据。
为了将情感倾向信息融入词向量进行分析,本文提出了基于word2vec的Skip-gram模型的架构以及基于卷积循环神经网络改进模型的架构。本文针对由CNN-RNN模型中存在的不足做出了以下改良:利用改进word2vec技术优化输入词字典向量序列,将原有的文本信息转化为等长的输入词向量序列;优化了深度学习的激活函数,有效地避免了梯度消失的问题并提升了模型的运行能力;使用最大池化技术得出了局部特征的最大值。
最后,本文对COAE2014评测数据集进行了分析,并针对模型中 值和 值做了两组对比试验,研究了这些参数对改良模型的影响,并由实验的结果可得出本文的改进是有效的。
关键词:情感倾向性分析,卷积神经网络,循环神经网络,词向量
Abstract
Emotional tendency analysis plays an important role in the research of text mining. The Age of Big Data, the study of emotional tendencies of text data attracts many researchers. How to study the information contained in the text, and express the characteristics of words, to improve the emotional tendency of the analysis of the accuracy of the main objectives of the study.
In the past, most of the researchers used manual sampling method and machine learning algorithm to construct a model, often need a lot of time and energy. In recent years, researchers have begun to use the deep learning to extract features automatically. Deep learning in the text mining in the most special one of the research results is the word vectors. The traditional word vectorsis based on contextual information learning, lack of emotional tendencies information, and is not a good solution to emotional tendencies. Convolution neural network (CNN) and cyclic neural network (RNN) are more effective models in depth learning. CNN is suitable for proposing local features from data, and RNN can effectively analyze time series data.
In order to analyze the emotional orientation information into the word vectors, this paper proposes the architecture of the Skip-gram model based on word2vec and the architecture of the improved model based on convolutional neural network. In this paper, the following improvements are made in the CNN-RNN model: the improved word2vec technique is used to optimize the input word dictionary vectors sequence, and the original text information is transformed into an equal length input word vector sequence. The activation of the deep learning is optimized Function, which avoids the problem of gradient disappearance and improves the running ability of the model. The maximum pooling technique is used to obtain the maximum value of the local feature.
Finally, this paper analyzes the COAE2014 evaluation data set, and makes two sets of comparative tests on the and of the model. The influence of these parameters on the improved model is studied. It can be concluded that the improvement of this paper is effective.
Keyword: Sentiment Analysis, Convolutional Neural Network, Recurrent Neural Network,Word Vectors
目录
摘要 I
Abstract Ⅱ
第一章 绪论 - 1 -
1.1 研究背景及意义 - 1 -
1.2 国内外研究现状 - 2 -
第二章 基础理论 - 4 -
2.1 词向量学习 - 4 -
2.2人工神经网络算法 - 10 -
2.3 深度学习相关算法 - 12 -
2.3.1卷积神经网络 - 12 -
2.3.2 LSTM循环神经网络 - 12 -
第三章 基于改进文本挖掘算法的情感倾向分析 - 16 -
3.1改进卷积循环神经网络 - 16 -
3.1.1 网络框架图 - 16 -
3.1.2 文本嵌入表示 - 16 -
3.1.3 卷积层 - 17 -
3.1.4 层 - 18 -
3.1.5 池化层 - 20 -
3.1.6 GRU层 - 20 -
3.1.7后处理相关隐层 - 21 -
3.2 算法 - 21 -
第四章 实验结果与分析 - 23 -
4.1分类标准 - 23 -
4.2 实验数据集 - 23 -
4.2.1 数据预处理 - 24 -
4.3 实验结果 - 24 -
4.3.1 实验流程 - 24 -
4.3.2结果 - 25 -
4.4 对比实验 - 25 -
4.4.1 取值选择实验 - 25 -
4.4.2 选择实验 - 27 -
第五章 总结和展望 - 29 -
5.1总结 - 29 -
5.2 展望 - 29 -
参考文献 - 30 -
致谢 - 32 - |