While we use an intuitive visualization to represent the Buzzword results, the scientific meaning behind it is beyond just a simple co-occurrence that many other solutions offer. Our Signals' engine reads and processes every bi-gram (e.g. two meaningful words together) of every document identifying statistically relevant topics of conversation. In addition, Signals extracts Named Entities from all of the content. The word view (i.e. Word Cloud, List View or Graph View) is generated from these fine elements. The size of the bi-gram is based on the significance score (a combination of the word frequency and our Stratifyd importance score); larger font indicates more relevant bi grams. Specifically, in the following two steps:
Step 1, Signals engine will start by performing NLP (for over 24 languages). In this step, your input documents will be tokenized into corresponding N-Grams ( N>=2), Lemmatized (sort words by grouping inflected or variant forms of the same word), Stemmed, Removed Junk and stop words, extract Part-of-Speech information, parse our Entity Information, and other our internal process. This is to create a large N-Gram-based content network based on your input data files.
Step 2, Signals will run a Multi-Model approach on top of the N-Gram-based content network. This including using our proprietary text analytics algorithms extended from Bayesian Neural Network and Generative Model, LSTM (Long Short Term Memory), Seq2Seq NLU, and etc. The goal is to representing and clustering your data inputs into semantically meaningful groups. And these groups are guaranteed by the statistical significance (e.g., the number attached to each topic group in the Semantic Category Visualization). The top representative terms and topics are always selected and visualization (i.e. Buzzwords) for you to follow the hints to ask "what-if" questions.