N-grams are combinations of 1 or more words. 1: monogram, 2: bi-gram, 3: tri-gram, etc. Rarely is it more than 3, unless looking for a specific slogan or turn of phrase. Words are not taken from any part of speech class, so you’re going to get any and all strings. This is important because often times you can simply filter out all words of a particular part of speech class (nouns, verbs, adjectives, adverbs, etc) to improve your signal-to-noise ratio.
Via Ιoannis Saridakis