What is Sentiment Analysis?
Sentiment Analysis identifies customer opinions, feelings, and intent expressed through text using natural language processing and a sentiment dictionary or lexicon. A sentiment dictionary is a long list of words and phrases with assigned sentiment scores or polarities, along with a list of negation words. We compute the average sentiment based on the words and phrases within the unstructured textual data that you provide. You can use sentiment analysis to determine positive, negative, or neutral attitudes expressed by users.
Sentiment can be difficult to detect due to:
- idiom (an expression that has a figurative meaning totally different from the literal meaning of its components)
- vernaculars or jargon (e.g. "interest" has different meanings and different sentiment in each industry)
Stratifyd includes an extensive sentiment dictionary to account for these in all of the industries that we serve, but you can also customize sentiment dictionaries to account for terms that are specific to your organization.
The overall sentiment returned by the Unsupervised NLU model is generated by the same Sentiment Analysis, but visualization of the field in the Unsupervised NLU model looks different from raw Sentiment Analysis data because:
- that model applies a weighted average by default (a normalized superscore of summed positive and negative sentiment on a grouped basis)
- it uses a Gauge visualization rather than a list.
To visualize the overall.sentiment field from Sentiment Analysis as a Gauge, click the field in the Dimensions shelf and set Aggregation to Average. This returns a true average that is different from the weighted average returned by the Unsupervised NLU model, as it takes a normalized superscore of summed positive and negative sentiment on a grouped basis.
To visualize it as a Radar Chart, add the Number of Records calculated field to the widget.
Why use Sentiment Analysis?
While the Unsupervised NLU model provides an overall sentiment feature, if your volume of data is too low, the Unsupervised NLU model may fail. The machine learning (ML) portion of that model requires a certain amount of data in order to converge. The Sentiment Analysis allows you to find the overall sentiment even with limited data.
How much data is required for the Unsupervised NLU model varies depending on the data that you have. For example, if your documents are long, like news articles, then you would need fewer documents. However, if the documents are very short, it might still fail even with a lot of documents.
One Text field is required. Any field that you choose is mapped as a text field. If you choose multiple fields, it concatenates all of the text and treats it as a single mass of text for each record.
The analysis returns the following fields for use in widget visualizations.
- language.code: The two-letter language code of the detected language, e.g. en, fr, jp. See Language Detection for more information.
- language.name: The full name of the detected language, e.g. English, French, Japanese.
- sentiment.overall: The overall sentiment detected within the corpus, with values ranging from -5 to 5.
- tokenized: A list of every word detected in the corpus. This is trivial for languages that use spaces between words, but for languages in which there are no spaces between words and multi-character words are possible, each requires a custom tokenizer.
- translated.language.code*: The two-letter language code of the translated (Translate To) language, e.g. en, fr, jp.
- translated.language.name*: The full name of the translated (Translate To) language, e.g. English, French, Japanese.
- translated.ngrams*: Sets of translated words (two by default) that appear together frequently in the corpus. Note that this is untranslated if you use Translate Model Output.
- translated.text*: The full verbatim text as translated.
- translated.unigrams*: A textual array of single translated words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
- unigrams: A textual array of single words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
- Data: A table containing all of the original data from the data stream, plus all of the analyzed data from the analysis.
*In order to return the translated fields, you must subscribe to the Translate Text feature. Text translation involves an up-charge, as it uses a third-party translation service. If you want to use translation, please speak with your Stratifyd representative.
Once enabled, translate options appear in the Advanced section of the Deploy a New Model wizard. See Translate Text and Languages for more information.
To run a Sentiment Analysis
You can run a Sentiment Analysis from within a dashboard.
1. Open the dashboard to which you want to add the analysis.
2. Click the Data icon and in the Data Ingestion panel, and next to the data stream you want to analyze, click the vertical ellipsis button and select Edit.
3. In the Edit Data Streams dialog that appears, above the Deployed Models list, click Deploy Model.
4. In the Deploy Model dialog that appears, scroll down to Basic Analyses and click Sentiment Analysis.
5. In the Deploy a New Model wizard that appears, in the Unassigned Fields column, click the plus sign of the text field to use and click text to add it to the Assigned fields column, then click Next.
6. On the Complete and Submit page of the dialog, optionally change the Name, add Tags, and specify a Description for the analysis.
7. Optionally expand the Advanced section to set any of the options described in the table below.
You can set the following properties in the Advanced section of the Create a new model or Deploy a new model dialog.
- Default Language: Set the default language to assume if the language is not detectable when applying a language-specific stopword list to clean the text.
Default value: en (English); Valid values: the two-letter language code for any supported language
- Run Language Detection: Set to true to run language detection on the source text.
Default value: true
- Sentiment: Apply custom sentiment word lists based on your own domain knowledge or data properties. Stratifyd applies your sentiment to your analysis in addition to the built-in sentiment dictionary that contains around 80,000 words across all languages.
See Customize sentiment dictionaries for more information.
- Chinese Dictionary: Customize Chinese tokens that our engine uses when creating key n-grams in your analysis.
- Custom Filters: Define a custom data training filter to refine the data returned.