There’s a lot of junk on the web. There is also a lot of good stuff on the web. And then there is the stuff that’s been lifted from the good and dropped amid the dross—the aggregation, the block-quotes, the straight-off copy-paste jobs.
The extent of that duplication now has a number: according to Matt Cutts, a long time Google search engineer who developed Google’s family-friendly “SafeSearch” filter and who now leads Google’s web spam team, “something like 25% or 30% of the web’s content is duplicate content.”
That’s not necessarily a bad thing. Not all of the duplication is plagiarized or hastily created traffic-seeking junk. Examples of inoffensive duplication include quotes from blogs that link back to the original blog, or the thousands of pages of technical manuals scattered across the web that are updated with small changes but remain largely the same..
Via Jeff Domansky
Fascinating research and interesting reading for all content producers.
25%-30% sometimes seems low; but then again, I do hate to find some splogger with my stuff so my ire may seem to weight those numbers.