If you listen to analysts talk about complex data, they all agree, it’s growing, and faster than anything else before. Complex data can mean a lot of things, but to our research group, ever increasing volumes of naturally occurring human text and speech—from blogs to YouTube videos—enable new and novel questions for Natural Language Processing (NLP). The dominating characteristic of these new questions involves making sense of lots of data in different forms, and extracting useful insights.
NLP is hot and getting hotter
NLP is a highly interdisciplinary field of study comprising of concepts and ideas from Mathematics, Computer Science and Linguistics. Naturally occurring instances of human language, be it text or speech, are growing at an exponential rate given the popularity of the Web and social media. In addition, people are increasingly becoming more and more reliant on internet services to search, filter, process and, in some cases, even understand the subset of such instances they encounter in their daily lives. Whether you think about it or not, those services allowing you to do so much with language everyday are generally trying to solve well-understood NLP problems under active research. To put it into context, let us show you some examples. Let’s say that a blogger is trying to gather the latest information on the earthquake in Chile. Her workflow might consist of the following sequence of web-based tasks. With each task, we include the name of the specific NLP problem being solved by the service performing the task:
• “Show me the 10 most relevant documents on the web about the earthquake in Chile” (Information Retrieval)
• “Show me a useful summary of these 200 news articles about the earthquake in Chile” (Automatic Document Summarization)
• “Translate this Spanish blog into English so I can get the latest information about the earthquake in Chile” (Machine Translation)