The Sentiment Predictive Model (Neural Network) is an AutoLearn model that was trained on a dataset of 44 million English customer product reviews using a Label-Embedding Attentive Model (LEAM) approach. It was trained to determine sentiment using star ratings from the reviews as its ground truth. The fields it returns are the same ones returned by the AutoLearn model, featuring the label field that reflects sentiment scores or polarities scored on a scale of 0 to 1. 

As a pre-trained model, sometimes the Sentiment Predictive Model (Neural Network) can get lost among your other pre-trained models. However, you can always find it by searching.

Why to use it

Because few clients have enough quality data to build out powerful AutoLearn models, Stratifyd provides one that is pre-trained. Since the model is already trained, you need only select a textual field similar to a customer review to use it. You might also use a textual field from survey responses or Glassdoor reviews, as the language used is similar to the language people use when leaving a product review.

If you apply the Sentiment Predictive Model to call center data, you can generally expect poor results. This is because call center data contains longer text than a product review, and it contains language (emotional and descriptive terms used to express their concerns) that is fundamentally different in comparison. A call center transcript is typically so long that the sentiment averages out to zero, unless people call in and say only "I hate you guys, you are terrible."

If you require more flexibility, and want to specify your own ground truth, you can opt to use the Auto-Topic Predictive Model (Unsupervised NLU) instead. With that lexicon-based model, in addition to getting sentiment scores, you can also specify a date field for timelines, and location fields for maps.

Input fields

The model requires one Unstructured Text field, a field that collects free-form user feedback that users type rather than select from a list.

Output fields 

The model returns the following fields for use in widget visualizations.

  • keywords: A textual array of the most important words within the data stream. In supervised models, keywords are the terms that the model finds and uses to predict each label.
  • label: The predicted sentiment value between 0 and 1 based on your input data, also known as the Neural Sentiment Score.
  • language.code: The two-letter language code of the detected language, e.g. en, fr, jp. See Language detection for more information.
  • language.name: The full name of the detected language, e.g. English, French, Japanese.
  • tokenized: A list of every word detected in the corpus. This is trivial for languages that use spaces between words, but for languages in which there are no spaces between words and multi-character words are possible, each requires a custom tokenizer.
  • unigrams: A textual array of single words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
  • Data: A table containing all of the original data from the data stream, plus all of the analyzed data from the model.

To use the Sentiment Predictive Model

You can create a model from within a dashboard, or you can add one to the Models page. Here we create it from within a dashboard.

To access the Data Settings Panel menu, click the Data settings panel button in the top right corner of your dashboard.


In the data settings panel that appears, make sure you have connected the data stream you want to work with in the Connect column.

In the middle column, Analyze, click to deploy new under Stratifyd Models.

In the next box that appears select the data stream you want to run the model on, next select Sentiment Predictive Model, then map the appropriate fields (such as text) then click start Analysis.

You return to the Data Settings panel and can see this analysis under deployed models in the analyze column.

Advanced settings

To access advanced settings, click into the deployed model in the Analyze column, then click switch to advanced settings. You can ignore any settings having to do with training, as this model is pre-trained. 

  • Run Language Detection: Run language detection on the source text.
    Default value: true
  • Default Language: Assume this language if language detection fails. This is used to select a language-specific stopword list to apply to clean text.
    Default value: English
  • Stopwords: Apply custom stopword lists to remove non-informative words from categorization.
    See Customize stopwords dictionaries for more information.
  • Chinese Dictionary: Customize Chinese tokens that our engine uses when creating key n-grams in your analysis.
    See Customize Chinese token dictionaries for more information.

Did this answer your question?