All Blog Posts

How to Look Like a Scraper Site

Ken Gaebler

New Google Algorithm for Scraper Sites
This kid may look innocent enough, but he's still a @#%$! scraper (at least according to my new statistical algorithm) ;-)

Google's newest algorithm cracks down on scraper sites and content sites with low quality. Across the web, site owners are reeling if they have lost traffic because of the Google algorithm change.

If you have lost organic SEO traffic from Google, you look more like a scraper site than you realize.

Scraper sites, for the uninitiated, are sites that grab content from other sites and put that content on their sites. In some cases, it's illegal – a violation of copyright laws -- and in other cases it's legal, but borderline unethical/slimy.

As an example of the latter, the most prominent example that made the news lately juxtaposed and (You can read more about this particular situation at Hacker News). StackOverflow is a collaboratively edited question and answer site for professional and enthusiast programmers. It's a great site. I use it all the time.

Long story short, StackOverFlow was pissed because its original content was being copied by eFreedom (something StackOverFlow allows), but then, to the surprise of the folks at StackOverFlow, eFreedom was often ranking higher than StackOverFlow for the content.

Universally, most would agree that's a bad situation – when the original creator of content ranks below a copy cat. Google spends a ton of time and money trying to make sure it doesn't happen.

In this instance, Google engineers heard this complaint and decided to do something about it. It's not as simple as that, because they are ALWAYS working to kill bad results, but they acknowledge that the StackOverflow situation was a motivator for the recent upheaval in the Google search results.

This game of cat and mouse has been going on for years. What's new is Google's new algorithm, rolled out last week, which got rid of the scraper sites and low-quality sites (or at least tried to).

Our position, shared by many, is that if you consider your site to be a good site and you lost traffic, you resemble a low-quality site more than you think.

It's similar to email spam filtering algorithms. You can write a perfectly legitimate email but a spam filter may think it is spam. The reasons might surprise you. For example, starting an email with "Hi Jim," instead of "Jim" gives your email a higher probability of being spam. In a world where statistical algorithms rule, stuff you do innocently can cast you into a cluster of people that you wouldn't really think you belong clustered with.

Put yourself in Google's shoes for a moment. Here's a thought exercise for you. Let's say you are a college professor with a class of 100 students. Students turn in their term paper and you see 13 papers that are identical and 3 that are very similar to the plagiarized paper.

Who's the original author? Without talking to every student, and just by using data, how would you determine who the original author is? Assume you do have access to other data – for example, you can talk to another professor and find out that they received 10 plagiarized papers and that there was some overlap between your class and the plagiarizers. You also know lots of other things, like what grades the students have gotten, whether they carry a backpack to class or not, etc.

Believe it or not, this odd little thought puzzle is exactly the thing that keeps Google engineers up at night. Who should get credit for content and who should rank highest for it?

It's also a problem that can be solved, at least approximately, with mathematical concepts and computer code. Google tracks many different pieces of data, and -- for a given website page, or a given section of a website, or an entire website – Google can boil that data down to a single number. Your number then tells Google who you are.

It's not quite this direct but if eFreedom's number (or the number for an average scraper or low quality site) is a 97 and yours is a 96, well those two are close enough for Google to say "There are some aspects of this site with a 96 that suggest to me that they might not be a site that I want to show at the very top of the results." Boom! You just dropped in the rankings because you are similar to a bad site.

In reality, you may only have done a number of things that are as innocent as saying "Hi Jim," in an email instead of just going with "Jim," for your intro.

Yes, it 's a crazy world where statistical models are judging our worth, but it happens all the time. Think credit score. Think GPA. Think about how the TSA decides to frisk you rather than the gal who's next in line.

What's my point?

If the new Google algorithm has ruined your life by killing your organic website traffic, then you need to think about "how to look like a scaper site" -- what are the attributes of a bad site or a scraper site that Google might not like. Do whatever you can to NOT have those attributes.

If you have thoughts on how to look like a scraper site -- what some of the "tells" of a bad site might be -- I'd love to hear them.