Summary
The power of the platform comes from pre-trained data science models to augment your data. With these models you can discover stories in your data, make insight-driven business decisions, and do it all faster than ever with the use of the Stratifyd platform.
Model selection is very important and equally important is how we set up these models. A very general way to think is the larger the size of data the longer the processing.
The factors of the raw data are important because they directly impact the time it takes our back-end to completely run:
Models - algorithms that learn details about the data.
-
Data dimensions impact the time for model processing because dependencies of models may require referencing an entire scope of the data. The more data the longer the dependencies take.
Analyses - application of the model to a stream.
-
Data dimensions impact the time for analyses to complete because of the number of results large data can produce.
-
Example: A large data set is going to get more taxonomy matches than a smaller data set
Knowing the key factors above here are some actionable takeaways that can help improve the model processing time:
-
Select the right model
-
Limit the scope - if you have data results from a model look to leverage what is available before running more data through the system.
Ballpark processing for data size (assume average verbatim size of ~2,000 characters + 30 fields + taxonomy 3 levels deep):
100’s of records:
-
Ingestion - 10 min <
-
Auto-Topic Predictive Model - 10 min <
-
Taxonomy - 10 min <
-
Sentiment - 10 min <
1,000’s of records:
-
Ingestion - 10 min <
-
Auto-Topic Predictive Model - 20 min <
-
Taxonomy - 20 min <
-
Sentiment - 15 min <
10,000’s of records:
-
Ingestion - 20 min <
-
Auto-Topic Predictive Model - 30 min <
-
Taxonomy - 20 min <
-
Sentiment - 20 min <
100,000’s of records:
-
Ingestion - 3-6 hr
-
Auto-Topic Predictive Model- 2-3 hr < (should stick to less than 200K)
-
Taxonomy - 6 hr <
-
Sentiment - 6 hr <
1,000,000’s of records:
-
Ingestion - 12-18 hr
-
Auto-Topic Predictive Model - N/A (should not run this model)
-
Taxonomy - 12-24 hr
-
Sentiment - 12-18 hr
Comments
0 comments
Please sign in to leave a comment.