Summary
The value of the Stratifyd platform comes from pre-trained data science models to augment your data. With these models you can discover stories in your data, make insight-driven business decisions, and do it all faster than ever.
While model selection is important, decisions around model setup are equally important. In particular, we must consider the size of our dataset: the larger the dataset, the longer our processing times. The characteristics of the raw data are important because they directly impact the time it takes our back-end to completely run.
There are two definitions (and processing tasks) that are relevant in considering model selection:
Models - algorithms that learn details about the data.
-
Data dimensions impact the time for model processing because dependencies of models may require referencing an entire scope of the data. The more data, the longer the dependencies take.
Analyses - application of the model to a stream.
-
Data dimensions impact the time for analyses to complete because of the number of results large data can produce.
-
Example: A large data set is going to get more taxonomy matches than a smaller data set
Knowing the key factors above, here are some actionable takeaways that can help improve the model processing time:
-
Select the right model. For more information, see How do I know which model to use?
-
Limit the scope - if you have data results from a model look to leverage what is available before running more data through the system.
Estimating Processing Times
See the table below for estimated processing times for different tasks based on the length of your dataset. Note that the times listed below assume that the following characteristics and circumstances:
- The average verbatim size across the dataset is ~2000 characters.
- The dataset contains 30 fields (or less).
- For taxonomy analyses, the taxonomy logic is no more than 3 layers deep.
100's of Records | 1000's of Records | 10,000's of Records | 100,000's of Records |
1,000,000's of Records |
|
Ingestion | Under 10 min | Under 10 min |
Under 20 min |
3 – 6 hrs | 12 – 18 hrs |
Topic Model |
Under 10 min |
Under 20 min |
Under 30 min | 2 – 3 hrs (Keep total volume under 200k) | Should not attempt this analysis. |
Taxonomy | Under 10 min | Under 20 min | Under 20 min | Under 6 hrs. | 12 – 24 hrs |
Neural Sentiment | Under 10 min | Under 15 min | Under 20 min | Under 6 hrs. | 12 – 18 hrs |
Comments
0 comments
Please sign in to leave a comment.