How accurately a model makes predictions depends on the distribution of the data on which it trains. Since distribution of data deviates over time, it is a good practice to monitor incoming data and after it has deviated substantially, to retrain your model using newer data. You also retrain your model after manually setting sentiment values and negation words, and adding labels to semi-automatic taxonomies. Once you have a feel for how often this is appropriate, you can schedule time intervals at which to retrain your models automatically and filter the data on which to retrain.
Set sentiment and negation words
To submit feedback
You can customize Unsupervised NLU models by setting the sentiment value of specific words and setting negation words that may be specific to your industry.
1. Open a dashboard with a model that you want to train and click the Data tab.
2. Click the tab for the model that uses text to calculate sentiment, e.g. Unsupervised NLU Model, and click a tile that contains text for which you want to provide feedback.

3. In the Document Detail dialog that appears, you can see all of the information for the selected record. In the Sentiment Explanation box, select a word or phrase on which to provide feedback, then right-click and select Set sentiment value or (when selecting a single word) Set as negation word.
4. If you select Set as negation word, the word is set immediately with no further interaction. If you select Set sentiment value, the Set Sentiment dialog appears where you can set a sentiment value between -5 and 5, where values from 0 to 1 are neutral, and then click Add.
5. When you have finished providing feedback on one tile, click the arrow to the right to proceed to the next.
Schedule retraining
Why retrain?
When new data is added to a stream, the new data is automatically analyzed using all of the models that are attached to the stream.
For example, if you apply an Unsupervised NLU model to a stream of 1,000 documents, the model first trains using the 1,000 documents, generates a topic model, and then applies the topic model to the 1,000 documents, assigning topics to each document based on the text. If you add another 1,000 documents to this stream, it re-uses the existing topic model, putting the 1,000 new documents into the existing topics, preserving consistent topic numbers.
As you add more and more data to the stream, the original topics may no longer be characteristic of the data. If you want to take all of the existing data into consideration, you can do so by retraining the model. When you retrain a model, it happens in place so that the visualizations remain populated while the data updates.
Note how this differs from scheduling data crawls. When a data connector is set up to automatically refresh data periodically, all data models using the refreshed data stream make predictions on the new data, but they do so using the training they already underwent unless you schedule periodic retraining.
How it works
To facilitate this feature, we store the following information on the model object.
-
schedule: training schedule interval as int {years, months, weeks}
-
schedule_last: timestamp of last scheduled training run in milliseconds
-
schedule_time: timestamp of next scheduled training run in milliseconds
-
schedule_hash: streamed version of model data from the last scheduled training run
When you schedule retraining and the scheduled time arrives, Stratifyd retrains the model and updates the schedule_last
, schedule_time
, and schedule_hash
values. If there are no changes in the model data, the model version remains unchanged. If there are changes in the hash value, Stratifyd updates the model version.
To schedule retraining for a new model
In the advanced section of the model, there is an option to set up a retraining period. If you set up a training period, the model automatically retrains when new data is inserted into the stream after that period expires. The period starts when you apply the model to the stream.
For example, if you apply a Semi-Automatic Taxonomy to a stream at 9 p.m. with a retraining window of 1 day, any data inserted into the stream after 9 p.m. the following day automatically triggers a retrain of the model.
1. Begin to create a new model of any type.
2. On the Submit page of the Deploy a New Model wizard, click Advanced.
3. Near the bottom of the Advanced section, under Schedule Model Retrain, select the units of measure for the retraining period (Days by default).
4. In the Schedule Model Retrain box, enter the interval (number of days, weeks, months, or years) at which to retrain the model and click Submit.
To schedule retraining from the Models page
1. From the Home page, select the Models page and then click the tile for the model for which you want to schedule retraining.
2. In the Model Info dialog that appears, near the top, click Schedule Retrain.
3. In the Model Retrain Schedule dialog that appears, select the units of measure for the retraining period (Days by default).
4. Enter the interval (number of days, weeks, months, or years) at which to retrain the model and click Submit.
At the top right corner of the page, a message notifies you that the model retraining is scheduled.
To schedule retraining from a dashboard
If you have an existing model for which you want to schedule retraining, you can set it up from within your dashboard.
1. In your dashboard, click the Data icon to open the Data Ingestion panel, then click the vertical ellipsis button and select Edit.
2. In the Edit Data Streams dialog that appears, click the tile for the model for which you want to schedule retraining.
3. In the Edit model name wizard that appears, click Next, then on the Submit page of the model, click Advanced.
4. Near the bottom of the Advanced section, under Schedule Model Retrain, select the units of measure for the retraining period (Days by default).
5. In the Schedule Model Retrain box, enter the interval (number of days, weeks, months, or years) at which to retrain the model.
6. You can also specify a filter for the data on which to retrain your model. For example, you might want to retrain based only on data coming from a specific country or a specific range of dates. To do so, click Add Filter.
7. If you want to apply the filter to the entire data set instead of just the retraining data, scroll up and clear the checkbox next to Apply training filter to analysis.
Model Feedback Loop
Why create feedback?
Finding enough training data to make a model accurate can be difficult, so the feedback loop helps you to improve the quality of the model incrementally. You can use the feedback loop with any Supervised models, and multiple users can create feedback for a manager to review and submit. You can schedule retraining of the model based on feedback at a convenient time.
You can incorporate feedback from multiple team members into the following types of models, and then schedule the model to retrain based on all of the feedback at a convenient time.
-
AutoLearn
-
Random Forest
-
Feedforward Neural Network
-
Logistic Regression Model
-
Semi-Automatic Taxonomy
-
Embedding attentive model
-
Support Vector Machine
-
ZSL model
Feedback takes the form of adding or removing labels, setting the sentiment value of specific words, and setting negation words that may be specific to your industry.
To add or remove labels
1. Open a dashboard with a supervised model that you want to train and click the Data tab.
2. Click the tab for the model that you want to train, and in the *.labels column (where * is the name and type of the model), you can see the Show Model button, indicating that there is no unsubmitted feedback.

3. Mouse over any record for which you want to add a label, and click the Add Label button that appears.

For other types of data, you may encounter a pencil icon instead of the Add Label button.
This opens a dialog where you can modify the label.
4. In the Pick a path dialog that appears, click to select the label that you want to add, or if you have many labels, use the Search box to find it.
5. Your new label is marked with an orange asterisk, and a Submit Feedback button appears in the column header.

6. To remove a label, click the minus sign to its right.
7. A removed label is marked with strike-through text, and the minus sign becomes an undo icon.
Unsaved feedback
If you are called away in the middle of providing feedback before you submit it, the feedback remains in the state in which you left it, even if you close your browser.
One exception: If you clear your browser cache, unsubmitted feedback is lost.
To submit feedback
1. In the *.labels column header, click the Submit Feedback button.
2. In the Add new feedback dialog, any existing feedback lists appear, and you can select one to which to add your feedback or create a new one. Click Add to a new list.
3. In the dialog that appears, provide a name for the new feedback list and click OK.
A message box informs you that it was added.
How are conflicts handled?
If different people add multiple labels to a single document, all of the labels apply to the document.
Example:
-
Document has label A.
-
User 1 adds labels B and C.
-
User 2 adds labels D and E.
-
Document has labels A, B, C, D and E.
If a person adds a label to a document and another person removes a different label from the document, both the addition and the removal apply.
Example:
-
Document has labels A and B.
-
User 1 adds label C.
-
User 2 removes label A.
-
Document has labels B and C.
If a person removes a label from a comment, the removal feedback overrides any future addition unless the removal action is removed from the feedback list.
Example:
-
Document has labels A and B.
-
User 1 removes label A.
-
User 2 adds label A.
-
Document has label B.
-
Model owner removes User 1's feedback from the feedback list and retrains the model.
-
Document has labels A and B.
To review feedback
Since a number of people may be providing feedback on a single model, Stratifyd uses queuing to process all of the retraining feedback at once when it is convenient. All team members can see all of the feedback lists when submitting feedback, but only the administrator or model owner can review the list and accept or reject individual feedback items.
1. Log in as the administrator or model owner.
2. On the Home page, click the Advanced tab and select Feedback Lists.

3. Here you can see any feedback lists for all models that you own. Click the feedback list tile that you want to review.

4. In the Edit Custom Feedback List dialog that appears, in the Remove from list column to the far right, click Remove for any feedback that you want to remove from the list.
5. When you have finished reviewing the feedback, click Save. Repeat for any additional feedback lists.
To retrain the model
The administrator or model owner can retrain the model manually, or set up a regular schedule for retraining the model.
Because the model is locked during retraining, it is best to do this at night or on a weekend so that employees can work without disruption. During retraining, anyone trying to access the model is shown a message stating that the model is locked.
1. From the Home page, select the Models tab, and click the tile of the model to retrain.
The Model Info dialog appears.
2. Scroll down to the Utilized Resources section to see all feedback lists, stopwords lists, and other resources associated with the model. Any resources marked "Newly Added" have not yet been incorporated into the model training.
3. To immediately retrain the model, click Retrain.
To set a schedule for retraining, at the top of the dialog click Schedule Retrain. See Schedule retraining for details.
4. In the top right corner of the page, message boxes show you the progress.
5. Once complete, the dialog replaces the "Newly Added" message with the model version number in which the resource was incorporated.
Comments
0 comments
Please sign in to leave a comment.