RSS Feed Scraper Websites and How They Affect Blog Authors

Teri Radichel
Cloud Security
Published in
4 min readJan 30, 2023

--

Free Content on Jobs in Cybersecurity | Sign up for the Email List

I sent a note to my customers yesterday saying that I’m going to try to temporarily put my blog behind a paywall to fend off RSS scrapers. These are sites that blatantly copy all your content instead of displaying a portion of the article and then redirecting to the blog to read the full article.

This behavior hurts blog authors, and especially on a site like Medium where you may be getting paid for web traffic. Instead of the traffic coming to Medium your blogs are read elsewhere and your Medium stats are low.

The problem is that then I noticed in my Medium stats that RSS scrapers are still getting the full blog post, even with the paywall. Perhaps they are waiting for me to remove the paywall and then scraping the blog.

Note: If you are reading this content anywhere other than my Medium Cloud Security blog, please contact me on LinkedIn or Twitter and let me know.

https://medium.com/cloud-security

My name is unique and it should be easy to find on either platform:

Teri Radichel

https://linkedin.com/in/teriradichel

https://twitter.com/teriradichel

You can also search for my company, 2nd Sight Lab, LLC to which I assign the copyright at the bottom of each post.

Then authors have to send their time searching around for duplicated content and reporting it to Google:

The other thing is, I think some of these scrapers are simply rearranging the conent and they are definitely removing links. It could be in some cases that the scrapers are trying to prevent your blogs to get traction in search engine rankings as I wrote about in the above post.

Here’s an article I just saw yesterday on the topic that provides a list of RSS Scrapers:

https://www.techbusinessnews.com.au/rss-feed-scraper-websites-and-how-to-stop-them/

They also offer some solutions to help you fend off RSS scrapers. However, if you host your content on Medium, you can’t do this, since Medium controls the web hosting for your content.

How could Medium fix this problem?

First of all, Medium stats need to be more granular — like Google Analytics — showing you more details about the IP addresses that visited your site and whether they as a result RSS or web.

Medium could provide a lot more information to make the site more valuable to authors like showing which countries frequent your blog and even what corporate IP ranges, which you can identify using something like MaxMind or possibly CloudFlare as the article above mentions, or maybe even source straight from the IP registries: ARIN, RIPE, APNIC, LACNIC, AFRINIC. I’ve written about those before.

Allow authors to block IP ranges they don’t want frequenting their blogs.

Now, the IP addresses that visit your site alone might not help you, because you have to link that IP to the site that’s hosting your content. The RSS feed could be pulled by one IP and published to a site with a different IP (most likely). But if certain IPs are known for performing these actions then you could identify them at least.

Then, Medium needs to all you to block certain IP addresses from visiting your blog.

Next, show user agents. Same thing. Some user agents are malicious or at least annoying scrapers. Allow authors to block specific user-agents.

If that’s too complicated, for a short term fix, Medium could allow authors to block RSS altogether. How many people still use RSS for legitimate reasons? I don’t really know the answer to that question. But for my purposes, I would like to simply block RSS on my blog. I don’t see any option for doing that.

The other thing is, instead of sending the entire blog in RSS, which one person who was blatantly copying my blog said was the problem because other sources don’t do that, Medium could deliver a portion or the blog and a link. That would drive people who read the posts via RSS to visit the blog.

The other thing Medium should do is provide referrers — in more detail. A complete list. That would also help authors see when other types of advertising and marketing campaigns are successful via parameters in the URL. But at a minimum, authors could see who is visiting your site from a referrer vs. someone who just comes straight to the site once per day with no referrer to scrape the content — so there needs to be a no referrer category and show you which IPs those are.

Medium is a great, simple blogging platform, but it is almost too simple. Time will tell if I keep my content here. For the moment, I’m hoping for a simple toggle to turn off RSS. Pretty please.

Follow for updates.

Teri Radichel | © 2nd Sight Lab 2023

About Teri Radichel:
~~~~~~~~~~~~~~~~~~~~
⭐️ Author
: Cybersecurity Books
⭐️ Presentations
: Presentations by Teri Radichel
⭐️ Recognition: SANS Award, AWS Security Hero, IANS Faculty
⭐️ Certifications: SANS ~ GSE 240
⭐️ Education: BA Business, Master of Software Engineering, Master of Infosec
⭐️ Company: Penetration Tests, Assessments, Phone Consulting ~ 2nd Sight Lab
Need Help With Cybersecurity, Cloud, or Application Security?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔒 Request a
penetration test or security assessment
🔒 Schedule a
consulting call
🔒
Cybersecurity Speaker for Presentation
Follow for more stories like this:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

❤️ Sign Up my Medium Email List
❤️ Twitter:
@teriradichel
❤️ LinkedIn:
https://www.linkedin.com/in/teriradichel
❤️ Mastodon:
@[email protected]
❤️ Facebook:
2nd Sight Lab
❤️ YouTube:
@2ndsightlab

--

--

Teri Radichel
Teri Radichel

Written by Teri Radichel

CEO 2nd Sight Lab | Penetration Testing & Assessments | AWS Hero | Masters of Infosec & Software Engineering | GSE 240 etc | IANS | SANS Difference Makers Award