Text analytics is the latest tool that can be mined for business intelligence. Sources such as blogs, websites, twitter, word documents, PDF documents, and comment fields in customer service databases can be leveraged for value. Output of text can be categorized, interpreted with sentiment analysis and used as a concept web (image below). There is an incredible amount of value in text mining, and with that value comes the cost of doing the work. Below are items you must maintain to get the full value out of text analytics.
Dictionaries
Text analytics tools often come with already built in dictionaries. These allow you to plug-n-play with your data sources. The challenge is that not all dictionaries will fit your specific needs. Basic English and other dictionaries will not sort your text corpa into the specific positive, negative sentiment categories you need. With this knowledge, you will need to modify your dictionaries and continuously update them with emerging terms.
Categories
Systems can build categories automatically and categorize terms based on the dictionary. This doesn’t always build the categories to the exact specification you need. Your specific business needs will need to be modified so that the right sub-categories are located within the main categories. Synonymous terms need to be linked so that the system knows how to build the categories. This will take a consistent effort to make sure terms are in the appropriate categories.
Text Linking
Linking terms together allows you to gain insights into the frequency of terms that are linked to each other. Word linking content web, such as the one above, allow you to see how concepts are linked together. Instantaneous insights can be made by understanding the linked concepts. With the linking, added data will create new emerging terms. Your team will need to make sure these terms are linked correctly to the right concepts so that you maintain the maximum value out of your analysis.
Rules
Creating rules helps you route which bucket of sentiment specific text belongs to. Rules can be assembled to send terms to positive sentiment, negative sentiment and emerging terms buckets. Depending on how complicated your text analytics application is will determine how many rules will need to be managed. Rules will need to be updated to maintain data is going into the appropriate buckets when new text emerges.
Conclusion
The three V’s of Big Data are present in text analytics. The sheer volume of information is available from many sources. To obtain value out of text analytics you need to maintain the dictionaries, categories, text linking, and rules to update your text mining algorithms. If done correctly, you will gain significant insight about what is said about your product or service. Text analytics can also be used to create predictions and alerts about your products or services. The value is tremendous, the investment is time in maintaining and structuring the text analytics system.
Related Blogs
Big vs. Small data
Forecast your website traffic
Decision trees versus Neural Networks