Neural nets used for determining the text sentiment assessment T
Moscow State University, Philological faculty , Moscow Leninskie Gory 1 build. 51
When developing a program for analyzing the sentiment value, it is necessary to train the program on a sample of texts of a specific topic. The set of lexical units included in the review, which have an emotionally evaluative connotation, can be easily identified using the appropriate dictionaries, after preliminary lemmatization of the text words. This data, encoded in an obvious way, will be used as an input for the neural network, and the overall score contained in the feedback will be the output of the neural network (NN). Given the input data, which includes a set of estimated values of individual words / phrases, it is expected that the NN will determine the positive or negative characteristic of the entire review. The solution to the problem is achieved by entering a sequence of positive (+1), negative (-1) tokens in the review to the neural network. After the setting up the neural network, running the neural network on these data, obtaining the results one can interprete the results. Our focus is on analyzing the sentiment value of individual sentences. Analysis at the level of sentences implies dividing the source text into sentences and analyzing each of them separately. The level of knowledge required for the successful application of a neural network is much less than, for example, when using regression analysis methods, expert systems, an SVM machine, separating hyperplanes, etc. The overall estimation of the text is the output of the NN and is compared with the real ratings contained in the review. This comparison is the material for teaching neural networks. Data for classification tasks contains textual or other non-numeric information. In our case, evaluative vocabulary like good / bad with a value of +1 - good, nice, wonderful, and -1 bad, crappy, disgusting. A feedforward neural network is constructed by calling the Matlab® function. net = newff (P, T, N); where P is the data of evaluative vocabulary and T is the target values, data on the evaluation of the entire review, N is the number of neurons in the layer. The neural network will be trained on the training dataset. The network is now ready for training. The input data is automatically divided by the Matlab® object net into training, validation and test parts. Learning continues as long as the network continues to improve the correspondence between the set of evaluation tokens and the result, i.e. between the results obtained at the network output and the actual results of the review assessment.
Classification based on raw data was obtained for 6 epochs, validation and testing gave acceptable results: Percentage of correct classification: 72.6%, percentage of misclassification: 28.4%, The network contains 1 layer, 20 neurons, activation function logsig ( logistic function).
The result of the research was: The neural network can be used for practical problems of classifying positive and negative reviews, depending on the given evaluative vocabulary. On the test example, the neural network shows the correct results. The results do not depend on the type of activation function with saturation and are unacceptable for a linear activation function. Further research needs to be done with new datasets containing several hundred or even thousands of reviews. It is interesting to get the results for the graded values of the lexeme evaluation function. This is supposed to be done in the framework of further researches.