How does Translate Text work?
This analysis uses a third-party translation service, Google Cloud Translation or Microsoft Translator, to translate text into the language of your choice during data ingestion. Although Translate Text appears in the list of Basic Analyses in the Deploy Model dialog, it is not supported as a stand-alone analysis. You can use it by setting advanced features in the following models and basic analyses:
- Auto-Topic Predictive Model (Unsupervised NLU)
- AI Enhanced Taxonomy (Semi-automatic Taxonomy)
- N-Gram Generator
- Taxonomy Analysis
- Sentiment Analysis
Text translation involves an up-charge, as it uses a third-party translation service. If you want to use translation, please speak with your Stratifyd representative.
Note that translation is charged per million characters. With very large datasets, this can add up very quickly, so it is prudent to limit the number of users who have translation enabled.
Translate Text uses your settings to determine the Translate From language. You can optionally specify a single language from which to translate, or if you leave that setting blank, we use Language Detection to find the Translate From language. Here are some things to note about the way Language Detection works:
- Language Detection only works on records with more than eight words.
- If Language Detection cannot detect the language for a record, it marks the record as the language specified in the Default Language setting. If that matches your Translate To setting, it does not translate that record.
- If there are mixed languages within a single record, Language Detection marks the record as the predominant language in that record (i.e. if there are eight English words, three Chinese words, and two Spanish words, it marks the record English).
- If Language Detection finds that the language for a record matches the Translate To language setting, it does not translate that record.
This means that if you have mixed languages in a single record, and the predominant language matches your Translate To setting, that record is not translated.
Stratifyd includes an extensive sentiment dictionary for each of the 27 supported languages, but you can also customize sentiment dictionaries to account for terms that are specific to your organization.
Enable translation by changing the Translate Options setting. Doing so enables other translation settings.
- None (default): No translation is performed.
- Translate Text Input: Translates the input text before running analysis.
- Translate Model Output: Translates the model output text after running analysis.
The following third-party translation services are supported.
- Microsoft SMT: Uses Microsoft's statistical machine translation (deprecated on April 30, 2018).
- Microsoft NMT: Uses Microsoft's neural machine translation for higher quality translations (introduced in 2016). Click here to expand and view languages supported by the Microsoft NMT engine.
- Google Translate: Uses the Google Cloud Translation API Click here to expand and view languages supported by the Google Translate engine.
One text field is required. Any field that you choose is mapped as a text field. If you choose multiple fields, it concatenates all of the text and treats it as a single mass of text for each record.
The analysis adds the following fields to those returned by the Auto-Topic Predictive Model (Unsupervised NLU) or AI Enhanced Taxonomy (Semi-automatic Taxonomy) model (or N-Gram Generator, Taxonomy Analysis, or Sentiment Analysis). Also included are related fields.
- language.code: The two-letter language code of the detected language, e.g. en, fr, jp. See Language Detection for more information.
- language.name: The full name of the detected language, e.g. English, French, Japanese.
- translated.language.code: The two-letter language code of the translated (Translate To) language, e.g. en, fr, jp.
- translated.language.name: The full name of the translated (Translate To) language, e.g. English, French, Japanese.
- translated.ngrams: Sets of translated words (two by default) that appear together frequently in the corpus. Note that this is untranslated if you use Translate Model Output.
- translated.text: The full verbatim text as translated.
- translated.unigrams: A textual array of single translated words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
To translate text
You can translate text when creating an Auto-Topic Predictive Model (Unsupervised NLU) model or a AI Enhanced Taxonomy (Semi-automatic Taxonomy), or when running an N-Gram Generator, Taxonomy Analysis, or Sentiment Analysis. Here are the steps with the Auto-Topic Predictive Model (Unsupervised NLU).
1. Open the dashboard to which you want to add the analysis and open the data settings panel.
2. Select the deployed model you want to apply translation to or deploy a new model.
3. In the model dialog box that pops up, select to switch to advanced settings, and click to the Translation tab.
4. Set the other Translate options:
- Translate Options: set Translate Options to Translate Text Input (or Translate Model Output)
- Translate Engine: Select Google Translate or Microsoft NMT
- Translate To: Select the language into which you want to translate text
- Translate From: Optionally select a single language from which to translate, or leave blank to use Language Detection to translate from multiple languages
Note that some of the languages in the drop-down list may not be supported for every Translate Engine. Please refer to the list of languages supported for each Translate Engine.