Stratifyd uses labels and logic blocks (controlled vocabularies) to create faceted hierarchical taxonomies. Labels create the branches of the hierarchical categorization tree. Each label uses a logic block, that is, a list of restricted words and Boolean logic, to index documents. The Taxonomy Analysis applies your taxonomy to a data stream.
Alternatively, you can use the Semi-Automatic Taxonomy model to apply machine learning trained on your taxonomy. However, if you are accustomed to working with taxonomies, you may find that the Taxonomy Analysis is more straightforward.
In either case, you must first get your taxonomy label structure into Stratifyd. You can do this in two ways, by Uploading a taxonomy, or by Creating a taxonomy.
For more information, see Working with taxonomies.
What is a Taxonomy Analysis?
The Taxonomy Analysis is a straightforward mechanism that allows you to analyze your data stream against your taxonomy, filtering data into each label in the hierarchy based on the logic in your taxonomy. It relies entirely on the logic provided for each of your taxonomy labels. It makes no predictions or associations, and if you want to re-categorize items, you must change the logic within the related label.
Why use the Taxonomy Analysis?
While the Semi-Automatic Taxonomy model adds machine learning and features a feedback loop for further refinement, the Taxonomy Analysis may prove more advantageous in the long run. This analysis depends on your complex logic blocks in the taxonomy for improvement. While it takes more time and effort to create a robust taxonomy for use with the Taxonomy Analysis, you may find that it is more reliable and accurate because the logic is either matched or not matched.
The Semi-Automatic Taxonomy is a good choice for short-term solutions when time if of the essence, but for long-term reliability and accuracy, we recommend investing the effort to create a complex taxonomy for use with the Taxonomy Analysis.
One Text field is required. Any field that you choose is mapped as a text field. If you choose multiple fields, it concatenates all of the text and treats it as a single mass of text for each record.
The model returns the following fields for use in widget visualizations.
- language.code*: The two-letter language code of the detected language, e.g. en, fr, jp. See Language Detection for more information.
- language.name*: The full name of the detected language, e.g. English, French, Japanese.
- taxonomy.labels: A textual array of the labels found in the taxonomy label structure.
- tokenized*: A list of every word detected in the corpus. This is trivial for languages that use spaces between words, but for languages in which there are no spaces between words and multi-character words are possible, each requires a custom tokenizer.
- translated.language.code*: The two-letter language code of the translated (Translate To) language, e.g. en, fr, jp.
- translated.language.name*: The full name of the translated (Translate To) language, e.g. English, French, Japanese.
- translated.ngrams*: Sets of translated words (two by default) that appear together frequently in the corpus. Note that this is untranslated if you use Translate Model Output.
- translated.text*: The full verbatim text as translated.
- translated.unigrams*: A textual array of single translated words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
- unigrams: A textual array of single words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
- Data: A table containing all of the original data from the data stream, plus all of the analyzed data from the model.
*In order to return the translated fields, you must subscribe to the Translate Text feature. Text translation involves an up-charge, as it uses a third-party translation service. If you want to use translation, please speak with your Stratifyd representative.
To run a Taxonomy Analysis
You can run a Taxonomy Analysis from within a dashboard.
1. Open the dashboard to which you want to add the analysis.
2. Click the Data Settings button to open the Data Settings panel.
3. In the Data Settings Panel the pop-up, click the plus button next to Taxonomy to deploy a new taxonomy model.
4. In the dialog that appears, select the data stream you want to run the taxonomy on.
5. Select the taxonomy model you want to deploy.
6. Configure the model by naming the analysis and selecting the text field to use in the analysis. On this page you can also enable Taxonomy AI Enhancement to automatically improve your taxonomy label coverage.
You can also switch to advanced settings to apply translation or tuning (see below for advanced settings).
7. Click Start Analysis.
You can set the following properties in the Advanced section of the Create a new model or Deploy a new model dialog.
- Run Language Detection: Set to true to run language detection on the source text.
- Default Language: Set the default language to assume if the language is not detectable when applying a language-specific stopword list to clean the text.
Default value:en(English); Valid values: the two-letter language code for any supported language
- Chinese Dictionary: Customize Chinese tokens that our engine uses when creating key n-grams in your analysis.
See Customize Chinese token dictionaries for more information.
- Custom Filters: Define a custom data training filter to refine the data returned.