How does Translate Text work?
This analysis uses a third-party translation service, Google Cloud Translation or Microsoft Translator, to translate text into the language of your choice during data ingestion. Although Translate Text appears in the list of Basic Analyses in the Deploy Model dialog, it is not supported as a stand-alone analysis. You can use it by setting advanced features in the following models and basic analyses:
Topic Models (Theme Detection, Theme Summarization, Emerging Theme Analysis)
- Emotion Recognition
Text translation involves an up-charge, as it uses a third-party translation service. If you want to use translation, please speak with your Stratifyd representative.
Note that translation is charged per million characters. With very large datasets, this can add up very quickly, so it is prudent to limit the number of users who have translation enabled.
Translate Text uses your settings to determine the Translate From language. You can optionally specify a single language from which to translate, or if you leave that setting blank, we use Language Detection to find the Translate From language. Here are some things to note about the way Language Detection works:
Language Detection only works on records with more than eight words.
If Language Detection cannot detect the language for a record, it marks the record as the language specified in the Default Language setting. If that matches your Translate To setting, it does not translate that record.
If there are mixed languages within a single record, Language Detection marks the record as the predominant language in that record (i.e. if there are eight English words, three Chinese words, and two Spanish words, it marks the record English).
If Language Detection finds that the language for a record matches the Translate To language setting, it does not translate that record.
This means that if you have mixed languages in a single record, and the predominant language matches your Translate To setting, that record is not translated.
Stratifyd includes an extensive sentiment dictionary for each of the 27 supported languages, but you can also customize sentiment dictionaries to account for terms that are specific to your organization.
Enable translation by changing the Translate Options setting. Doing so enables other translation settings.
None (default): No translation is performed.
Translate Text Input: Translates the input text before running analysis.
Translate Model Output: Translates the model output text after running analysis.
One text field is required. Any field that you choose is mapped as a text field. If you choose multiple fields, it concatenates all of the text and treats it as a single mass of text for each record.
The analysis adds the following fields to those returned by the models mention above. Also included are related fields.
language.code: The two-letter language code of the detected language, e.g. en, fr, jp. See Language Detection for more information.
language.name: The full name of the detected language, e.g. English, French, Japanese.
translated.language.code: The two-letter language code of the translated (Translate To) language, e.g. en, fr, jp.
translated.language.name: The full name of the translated (Translate To) language, e.g. English, French, Japanese.
translated.ngrams: Sets of translated words (two by default) that appear together frequently in the corpus. Note that this is untranslated if you use Translate Model Output.
translated.text: The full verbatim text as translated.
translated.unigrams: A textual array of single translated words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment.
To translate text
You can translate text when deploying one of the models mentioned above. Here are the steps with Emotion Recognition.
1. Open the workspace to which you want to add the analysis and open the data settings panel.
2. Select the deployed model you want to apply translation to, or deploy a new model.
3. In the model dialog box that pops up, select Switch to advanced setup, and click to the Translation tab.
4. Set the other Translate options:
Translate Options: set Translate Options to Translate Text Input (or Translate Model Output)
Translate Engine: Select Google Translate or Microsoft NMT
Translate To: Select the language into which you want to translate text
Translate From: Optionally select a single language from which to translate, or leave blank to use Language Detection to translate from multiple languages
Note that some of the languages in the drop-down list may not be supported for every Translate Engine. Please refer to the list of languages supported for each Translate Engine.
Additional Settings: Detect Threshold
Topic Models have an additional setting option, Detect Threshold, that provides additional flexibility. When a record passes through Language Detection as part of the translation process, a confidence value between 0 and 1 is assigned to the record. This indicates how confident the system is in its choice for language detected.
Sometimes, verbatims with mixed languages or short lengths won't be detected with the correct language or are assigned too high of a confidence value, and may not be translated. Raising the Detect Threshold will force such records to be translated regardless.
Records with confidence values above the Detect Threshold will not be translated. For instance, if a record has a 0.95 confidence value, but the Detect Threshold is 0.9, the record will not be translated. However, if the Detect Threshold is raised to 0.96, the record will be translated.
The default value for Detect Threshold is 0.9. Raising the Threshold is a good troubleshooting method to provide more coverage in cases where not enough records are being translated, perhaps because they have short verbatims.
We're here to help! Don't hesitate to contact us for further assistance via chat or submit a ticket!