AutoLearn builds custom machine learning models based on your data and chooses the most suitable ensemble of models based on the F1 score. After training the model with your data, you can apply the model to a new data stream that has the same input dimensions and similar data.
Why to use it
If you are unsure of which supervised model's algorithm is best suited to your data, choose AutoLearn, as it compares results from each algorithm and determines which gets the best results. This does take more time, so if you are certain about which algorithm you need, use the appropriate supervised model for faster results.
How it works
We train a model to predict the categories or Key Performance Indicators (KPIs) of unlabeled or unscored records. For training, we use a sample from a large dataset. Once the model is trained, you can apply its findings either to the large dataset itself, or to other datasets with text similar to the training dataset.
After pre-processing to acquire up to three types of feature vectors, AutoLearn can train a classification model. It tests 20% of the training data against the other 80%, running all of the following machine learning (ML) models.
-
Random Forest
-
Feedforward Neural Network
-
Logistic Regression
-
Semi-Automatic Taxonomy
-
Embedding attentive model (LEAM)
-
Support Vector Machine
-
ZSL model
AutoLearn evaluates the performance of each model, as well as an ensemble of the models, using a weighted average of F1 scores for each, and then selects the model or ensemble of models that has the highest F1 score in predicting the KPI or Category field.
An F1 score is an Inference Score, a proxy for accuracy. In other words, the F1 score does not guarantee that this model is 75% accurate at predicting labels or KPIs on every data stream, only that it was 75% accurate with the training data. (To find the actual accuracy of an AutoLearn model, you would need to hand score every field, so we implicitly trust the F1 score.)
Accuracy is dependent on the size of the training dataset provided to the AutoLearn model. If you only use 300 records for training and then apply it to a data stream with 300,000 records, the results may be disappointing, even if the F1 score is high for the 300 records. It is best not to apply an AutoLearn model trained on a small number records to a huge amount of data unless you are quite confident that the textual data in the large dataset is very similar in nature to the training dataset.
Stratifyd supplies an out-of-the-box Neural Sentiment model, an AutoLearn model that is pre-trained on 40 million customer reviews, that you can use for data that is similar to a customer review.
When you deploy the AutoLearn model to a data stream that is similar to the training data stream, it generates results that you can then reincorporate into your AutoLearn model, adding more records to the training dataset and making it smarter and bigger and better over time.
Model Input
The model requires you to choose the Ground Truth and then choose at least one of either an Unstructured Text field or a Training Feature field.
Field |
Description |
Ground Truth |
A field that objectively measures how the customer feels (ie: Star Rating, NPS, CSAT score, etc.) |
Unstructured Text |
A field that collects free-form customer feedback (ie: they type it rather than select from a list) |
Training Feature |
A field that collects structured customer data (ie: data that is numerical or that is selected from options) |
Model Output
The AutoLearn Model returns the following fields for you to use in widget visualizations in your dashboard analysis.
Field |
Description |
Confidence |
The confidence level about the accuracy of the prediction on a scale of 0 to 100. |
Keywords |
A textual array of the most important words within the data stream. In supervised models, keywords are the terms that the model finds and uses to predict each label. Keywords are only generated if you select an Unstructured text field. If you select only a Training Feature, that is, selectable values, dates, or numbers, no keywords are generated. |
Label |
The predicted output based on what the model learned from your input data. |
Language Code |
The two-letter language code of the detected language (ie: en, fr, jp.) See Language detection for more information. |
Language Name |
The full name of the detected language, e.g. English, French, Japanese. |
Tokenized |
A list of every word detected in the corpus. This is trivial for languages that use spaces between words, but for languages in which there are no spaces between words and multi-character words are possible, each requires a custom tokenizer. |
Unigrams |
A textual array of single words within the data stream. Stratifyd calculates the total number of words and the number of unique values. Useful in a word cloud viewed with filters on average sentiment. |
Data |
A table containing all of the original data from the data stream, plus all of the analyzed data from the model. |
How to create and deploy an AutoLearn model
You can create a model from within a workspace, or you can add one to the Models page.
Creating an AutoLearn Model from the Models Page
1. On the Models page, ensure you are on the Predictive Models tab. Then, click the blue plus sign in the bottom right.
2. In the dialogue that opens, select AutoLearn.
3. In the next window, select the data stream you wish to use for training your model. In this example, we'll be using Chat Data that includes a CSAT score, in order to predict a CSAT score for chats from the same source that lack a score.
4. In the next window, select your Training Feature(s), Unstructured Text field, and Ground Truth. Then, click Next.
5. Lastly, enter a name for your model and click Submit. Your model will begin to train.
Creating an AutoLearn Model from a Workspace
1. In your workspace, navigate to the Settings tab.
2. In the center of the page, click the the dropdown reading "I'm a data expert, show my what else you have." Select "Train a New Model" to open the dialogue mentioned in Step 2 of the process for creating a model from the Models page, and follow the same process.
Deploying an AutoLearn Model
1. In your workspace, navigate to the Settings tab. Select the data stream you on which you wish to deploy your AutoLearn model.
2. In the center of the screen, under "I'm a data expert, show me what else you have," select "Use Custom Model."
3. In the dialogue that opens, select the AutoLearn model you wish to use.
4. On the next page, select the fields you wish to use that correspond the fields used in training your model. In this case, we are using a similar dataset with empty values for CSAT, so the platform will automatically fill in corresponding fields. When finished, click "Start Analysis."
5. To start processing, click Save.
Advanced Settings
You can set the following properties in the Advanced section during the process of creating a new model or deploy a new model dialog.
Field |
Description |
Ratio of training to validation set |
The ratio that determines how much of the dataset is used as a training set and how much is used to validate the results.
|
Minimum number of records for a field |
Set the minimum number of records for a field to be considered for analysis. This prunes any classes with fewer records than the value you set here.
|
Resampling |
If your data is skewed towards one class or another, you can resample your data and adjust the class distribution for better results.
|
Apply training filter to analysis |
Apply your custom data training filter to your analysis results. (See Add Filter below.)
|
Run Language Detection |
Run language detection on the source text.
|
Default Language |
Assume this language if language detection fails. This is used to select a language-specific stopword list to apply to clean text.
|
Stopwords |
Apply custom stopword lists to remove non-informative words from categorization.
|
Chinese Dictionary |
Customize Chinese tokens that our engine uses when creating key n-grams in your analysis.
|
Schedule Model Retrain |
Specify the number of days, weeks, months, or years after which to retrain your model. |
Add Filter |
Select a field on which to filter training data. If Apply training filter to analysis is selected above, the filter also applies to your analysis results. |
Deploying an AutoLearn model - Best Practice
Deploying an existing model can be accessed from the ALL tab in the AI model panel. This is different from deploying a new model. By clicking the Predictive Model + button on the AI panel, the user will be prompted to create an entirely new model.
NOTE: if the user intends to use an AI model they should deploy it using the ALL tab vs creating a new one.

Further questions?
We're here to help! Don't hesitate to contact us for further assistance via chat or submit a ticket!
Comments
0 comments
Please sign in to leave a comment.