Machine learning startup BigML now supports text data in its cloud-based prediction service. It has always analyzed numerical fields in complex datasets to determine the relationship between them and any given outcome, and how it will consider the importance of words, too.
BigML, a machine-learning-based cloud service that lets users generate statistical predictions from their complex data, has revamped the service to include textual analysis. No, it won’t analyze the sentiment of your tweets or translate your documents into Spanish, but it will use words as variables when getting to the bottom of how your data is connected.
There are a handful of options for customizing the text field, too, such as the ability to pare words down to their stems (e.g., “greatness” becomes “great). If you’re into accuracy, BigML also now lets users run ensemble models (or forests) and test the accuracy of their models. Users building models across very large datasets or who have built BigML predictions into their applications via API can use a new feature called PredictServer that runs predictions tens of times faster on a dedicated server.
As BigML keeps maturing and adding new features, its toughest task might be figuring out its target users and tailoring the experience around them. I like the service, but the more features it adds, the more I can see how a formal grounding in statistics and data analysis would help me make better use of it. Then again, if I had those skills, I might prefer any number of advanced software packages that let me do a whole lot more.