The Theme Summarization model is an unsupervised natural language processing model and is used for analyzing unstructured textual data. This model will produce similar field or widget outcomes as Theme Detection, but Theme Summarization has a different methodology that can produce different results.
See below for a summary of when to use this model, advantages of this model, and best practices.
When should I use Theme Summarization?
Theme Summarization is similar to Theme Detection in that both aim to uncover unknown topics in unstructured data. However, where Theme Detection aims to present the most common topics in the data, Theme Detection aims to provide the most unique topics. In this way, Theme Summarization provides a more well-rounded view of the dataset, presenting topics that may be missed by Theme Detection because they are "muted out" by high-volume topics.
This model should be run on data sets with a limited scope where fresh topics can emerge. Hourly, Daily, Weekly time frames will work well for this model. The Theme Summarization results will produce similar model outputs to Theme Detection and will provide the opportunity for topic/sentiment widget creation. The following is available for the Theme Summarization data outputs:
Sentiment Scores per verbatim
Sentiment aggregations (sentiment average gauge)
The additional outputs will eventually be available as well:
Summary & Sentence Highlight = “what sentence(s) highlights/summarizes the verbatim?”
Expanded topic chunks = “what phrases (2-5 words) highlight the data set?”
Summary of important docs = “what are the verbatims that provide the most impactful feedback?”
Advantages of Theme Summarization
The Theme Summarization method leverages upgraded data science neural network structures that provide the following improvements:
Higher quality topic discovery
Rich topic outputs
Semi-supervised customization available
Slow processing than Theme Detection.
Only available in English
To use the Theme Summarization Model
1. You can apply a model from within a workspace, or you can add one to the Models tab. Here, we apply it from within a workspace.
2. To access the Data Settings Panel menu, click the Data settings button, accompanied by the gear icon.
In the data settings panel that appears, make sure you have connected the data stream you want to work with in the Connected column.
3. In the Analyze tab, expand the section labelled What are our customers saying? to see available models. Choose Theme Summarization by clicking the + icon.
4. For the model to run successfully, you'll need to select a date dimension and a text dimension. Make your selections and click Start Analysis.
5. Depending on the size and complexity of your data, it may take some time for the analysis to finish running. When you return to the Data Settings, you'll see this analysis within the Deployed section at the top of the Analyze tab.
Choose the right minimum frequency can greatly impact your topic discovery results. The Min_count setting in the advance setup is the minimum frequency requirement for a word to appear in the vocabulary for topics.
Trade off of diversity vs quality:
Default min_count=5 is suitable for most datasets.
Recent Updates to Theme Summarization
We have added parameters to adjust “summary sentences“ amount. You can now return up to 10 summary sentences.
We have added parameters for longer summaries so sentences are no longer cut in half. You can use commas or sentence splits.
To apply either of these changes, choose Advanced Setup and select the Parameters tab.
We're here to help! Don't hesitate to contact us for further assistance via chat or submit a ticket!