Tuesday, January 7, 2020

17 Things Bloggers Can Do to Combat Content Scraping

Last week, I received an email from a fellow blogger. She found some of her posts reproduced without attribution on a sketchy blog site. She wanted to let me know that on that site, she also found some of my blog posts. While trying to figure out what to do, I discovered a similar website that had also posted some of my blog posts. The text of the posts was altered slightly, albeit awkwardly, but the content and photos remained all mine. Even the title of the posts listed my blog name.

It was jarring, to say the least.

Imagine seeing your content, including your photos, on another website. I was in good company along with Another Mother Runner, Marathon Investigation, and Women's Running. It made me wonder what these bigger blogs were doing, if anything, to stop their content from being stolen. I also wondered why anyone would want to steal content from my little blog!

Talk about feeling helpless. It's the freaking internet! As much as I wanted to reach through the screen and strangle the thief, it was just a fantasy. I needed to figure out what to do to protect my content and myself.



Stealing content from another person's blog is called scraping. Usually done through an automated process (a bot), the scrapers steal content for a variety of reasons, but mostly to promote their own spam blogs (splogs). While doing research for this post, I was disheartened to find posts promoting the practice of scraping. One post, Web Scraping for Fun and Profit, shares all kinds of tips for those looking to steal another person's content. I didn't hyperlink to his post for obvious reasons but it illustrates a point that content scraping is a big deal.

With the growing realization that there was not much I could do to prevent scrapers from stealing my content, I instead decided to take the offensive and figure out how to protect myself. Here's what I learned and some of the steps every blogger should take to protect their blog in the unfortunate event that their content is scraped.



Find out if your content is being scraped. 
This is easy.

  • Copyscape is a search tool where you enter the web address of your content to find out if it is anywhere else besides your blog. This is how I found the second site that was scraping my content. 
  • If you're using a WordPress blog, you can also use trackbacks or 'pings' by including links to other posts in your content. 
  • Google Webmaster Tools will show you who is linking to your site. If you see any unfamiliar domain addresses, you can click on on it to see who/what is happening.
  • Google Alerts is another tool you can use to see if your content is being reproduced. By entering the exact name of your blog post, you'll receive an email of any mentions or uses of the post.
  • Google Analytics is a powerful tool that all bloggers should be using. There is so much data generated from GA but for the purposes of this post, know that you should always look at your referring traffic to see who is referring readers to your blog. Interestingly, while preparing this post, I noted a spike in traffic on a particular blog post in December. That post was hit with a bunch of spam during that time frame, which means most likely a bot hit it. This is also the time frame during which my posts were stolen.
  • Do you use Feedburner? If so, you can check your statistics, including uncommon uses, which identifies areas where your content is being used. If you see a domain that is using your content inappropriately, you can contact the host to ask that your content be removed. 


Stopping content scrapers
Not so easy and maybe even futile.

  • Contact the owner of the offending website. My fellow victimized blogger, who initially contacted me when she found my content on the site that scraped hers, tried to get the scraper to take down her content. In her words, he was 'unrepentant'. When I contacted him, he didn't respond.
  • Whois is a public database that provides information related to ownership of a blog site. By entering the URLs of both blogs that scraped my content, I learned that one owner was based in Arizona and the other, Pakistan. I was happy to see that as a Google domain owner, my personal information is protected. If your information isn't private, you may want to consider paying to do just that.
  • Contact the site owner (if their information is available) or the host and tell them that they are in violation of DMCA. Ask that the offending material be taken down. In my case, I contacted both hosts and received similar responses, basically, too bad, so sad, that the host is just the host and has no responsibility for what is being posted. 
  • If contacting the site owner is futile, you can send a cease and desist letter, which outlines the content that has been stolen and gives a 72 hour notice to respond. If this doesn't work, then you can issue a DMCA Takedown Notice. There are sample letters that you can use. There are no guarantees, but it's what we have.
  • Google has a form you can fill out to ask that the offending sites be taken down and not indexed. This is what I did. Since I had 23 posts scraped, I spent the good part of an afternoon filling these out. 
  • Contact other blog owners whose content has been scraped. I sent an email to the owner of Marathon Investigation. I also saw posts from one of my blogging friends on both sites and contacted her as well. How else can we stop these thieves if we don't support each other?



Protecting Your Content
These steps may or may not be effective. But it's all we've got. 
  • Add a copyright license to your feed. There are several ways you can do this. The easiest is through Feedburner, where there is a link to Creative Commons. Creative Commons is a non-for-profit that allows you to copyright your work. You can find this option under the "Publicize" tab. 
  • When you write your blog, your content is automatically copyrighted under the law. You don't have to post a copyright notice on your blog but it does let readers know that the content is yours. A copyright notice may deter some people from stealing your content. There are free sample copies of copyright notices that you can customize and add to your website. I've placed one on mine.
  • You don't have to register your blog posts with the U.S. Copyright Office, but it will give you legal recourse if someone steals your content. You do have to register each blog post individually and you can only register 10 per application. You cannot register your blog as 'a collection'.
  • Block the offenders' IP addresses. You can also redirect them to a dummy feed of undesirable content. I kind of like this idea, but if you're a hobby blogger like me, you might need some technical help to do this. 
  • Beat the scrapers at their own game. Use their scraping to your advantage. Insert backlinks to your own content into your posts. This will send readers from the scraped content back to your blog. You can also use the RSS footer to insert a banner which directs readers back to your site. This will appear on the scraped content as well. 
Should I Worry About Content Scraping?
After reading this post, you might be asking yourself if it's worth your while to fight the scrapers. I say yes.

Just keep in mind that we're all vulnerable. I don't have a huge following and I was surprised that anyone would steal my content. I did learn that smaller blogs have more to lose from scrapers than big established blogs, especially if the scrapers get more hits on your scraped content than you do on the original. Google might penalize your blog, thinking you're the scraper!

At the very least, you should protect your blog by following the steps I've listed above. If you're a WordPress blogger, you have access to tools and plug-ins you can use to keep your posts safe. At the very least, if your content is stolen, take the steps to ask that your content be taken down and/or file a DMCA complaint. Google makes the process pretty painless. 

Have you ever been the victim of content scraping? If so, can you share your experience? Do you have a copyright on your blog? What steps will you take to try to prevent this from happening to you?
I'm linking up with Kim and Zenaida for Tuesday Topics.





31 comments :

  1. So, how do they make money? Ads? It’s so frustrating. Thanks for pulling together all these steps.

    ReplyDelete
    Replies
    1. Some make money through ads. It's not completely clear to me what some of the other scrapers reap from this process. It's awful.

      Delete
  2. Thanks so much for this post Wendy! As you know, I'm still dealing with this. I kind of keep getting the run around but I'm still going to keep trying to protect my blog.

    ReplyDelete
    Replies
    1. Part of me wants to shut the whole thing down. I hate seeing my content being bastardized like this. I'm waiting to see what Google does, if anything.

      Delete
  3. Yikes - As Usually, I Am Totally Unaware - Great Advice For Sure - Rather Unsettling When I Think About It

    Big Hugs Little W
    Cheers

    ReplyDelete
  4. I despise thieves of any kind! Sorry this happened to you. Thanks for the info, it will come in handy if my blog ever becomes worthy of scrapers.

    ReplyDelete
  5. A very useful post, Wendy, thank you so much! Copyscape is really helpful and easy to use. I'm on Wordpress and should start to familiarize myself with the tools available to protect my content. Thanks again!

    ReplyDelete
    Replies
    1. I think it's important to be proactive. I would never have thought anyone would steal my content. It was very upsetting initially but once I learned what it all meant, I felt better.

      Delete
  6. I didn't know it was called scraping but my blog posts - each and every one - have been reposted on a site. I reported it to Google but they said I'd have to use the form you wrote about to have each post removed. They WILL remove the post once you complete the form and send it in, but I have over eleven year's worth of posts and honestly, I just don't have the time or patience to do that for each post, especially when the site just keeps stealing and posting my new posts. It's very frustrating. I'm interested to hear if you get any better results; I don't think Google cares, sadly.

    ReplyDelete
    Replies
    1. I'm sure I'll have to keep reporting...so I'm waiting for another week until the posts accumulate. I can't quite believe I have to do this. What concerns me the most are my photos.

      Delete
  7. I am so sorry that happened to you. I don't know why people would do this kind of thing :( Thanks for all the information. While copyscape didn't find anything regarding my URL (phew!), I will be much more vigilant in the future!

    ReplyDelete
  8. I'm so sorry that this happened to you, but thank you for putting this together! I really need to make more use of Google Analytics--I forget that it can do so many things!

    ReplyDelete
    Replies
    1. GA is amazing! I knew that before I was scraped but even more so now!

      Delete
  9. What a mess! I have had content (via eBibs) stolen, and relabeled with a different person's name. Needless to say, I learned quickly to create my own memes (with my own pics) after that, since the site was totally useless in any recourse. It's a shame we can't just do the blogging thing for fun without these kinds of worries. Not all of us are in this to make a profit; it's not right that others try to capitalize over someone else's work.

    ReplyDelete
    Replies
    1. It's not as much of a mess as it is upsetting to see your content posted on an icky, sketchy website. If only it were as easy as creating a meme...

      Delete
  10. This is really informative! I have had pins stolen and used before but was not sure what if anything I could do about it. It is really upsetting to learn this is going on and I am not sure what their game is and why they would go to the trouble of doing it. Hope you get some useful feedback.

    ReplyDelete
    Replies
    1. I don't even want to think about another platform at this point. I'm pretty much ready to give up the game but for my blogging friends.

      Delete
  11. As far as I know no one has bothered to scrape my running posts, but I get a lot of copying on my professional blog. I usually use a comment on the offending blog to post a cease and desist notice. And I do indeed have backlinks in all my posts which means I get an alert every time a new post is published containing those backlinks - useful stuff. Thank you for sharing this info to help people!

    ReplyDelete
    Replies
    1. You're lucky that WP has that ability to notify you about backlinks. Blogger is not so generous. It's been a frustrating week for me. Does the cease and desist notice actually work?

      Delete
    2. I post it in a comment on the post in question. Either they don't have moderation, it goes up in public and the post disappears, or they do have moderation, read it and again, it goes. However it's easier in a way that it's on my professional blog as I can talk about damage to reputation etc. etc.

      Delete
  12. I had never heard of this term till I Kim mentioned it in her Sunday post. Wow! I had NO idea that was happening. So much useful information that I will have to go back and ready it again.

    ReplyDelete
  13. Obviously scrapers are well aware of what they're doing, but so many people have absolutely no clue about copyright. It's sad & frustrating.

    Thanks for putting this post together, Wendy!

    ReplyDelete
  14. Ugh, that's so frustrating. Thanks for sharing the tips and what to watch out for! I hadn't really heard of this before recently.

    ReplyDelete
  15. This makes me so frustrated! Years ago I did find my blog content on other sites, however when I used the Copyscape app just now, it did not find anything! I know you say you are most worried about your photos. Did you know there are two different copyrights you can have, a content copyright and a visual copyright.
    Also, spammers do the same thing with Instagram photos. Mine have showed up on many other sites. You should check yours out too.

    ReplyDelete
  16. This is unbelievable and so frustrating!Thank you Wendy for pulling all of this information together - so helpful!!

    ReplyDelete
  17. Thanks for putting all this information together! I have had my blog scraped before (I actually got ping backs because they linked back to my original post like that makes it okay!), but I haven't really done anything before. It is really time consuming but I see that it is important. So next time I'll use your tips. Plus I'll look into what Wordpress has to offer to avoid this.

    Interestingly, the same day this happened to you, I had a photo of mine turn up on Instagram. I reported it as stolen intellectual property and they had taken it down within a half hour. I don't think the poster had any consequences though.

    ReplyDelete
  18. I'm so sorry this happened to you, Wendy. It's got to be so frustrating. Thanks for putting together such an informative post for the rest of us.

    ReplyDelete