Featured image of post Big Tech Is Killing Their Customers

Big Tech Is Killing Their Customers

Big tech companies are destroying smaller online platforms through endless data scraping, causing a crisis in revenue and value for content creators.

 

Recently, Dennis Schubert posted a fiery update about a crisis in the “diaspora*” project. The platform’s network infrastructure was collapsing due to heavy traffic. But what caused this overload? Shockingly, 70% of the requests came from LLM (Large Language Model) bots operated by major tech companies. These bots ignored robots.txt directives, relentlessly scraping every data they could access.

Dennis discovered that ChatGPT and Amazon bots even went as far as to scrape the entire edit history of Wiki pages—every single revision. He couldn’t help but ask:

“What are they trying to achieve? Are they analyzing how text evolves?”

This data hoarding significantly strained diaspora*’s servers, slowing the platform for legitimate users. Dennis tried several countermeasures:

  1. Updating robots.txt: Useless, as the bots ignored it.
  2. Rate-limiting: Failed because the bots rotated their IP addresses.
  3. Blocking User Agents: Ineffective, as the bots disguised themselves as regular browsers.

Frustrated, Dennis likened the situation to a DDoS attack on the entire internet.

Why Does Big Tech Need Our Data?

The answer lies in AI’s insatiable hunger for training data.
High-quality datasets are the backbone of AI models, and the industry is running out of fresh material to train on. As OpenAI engineer James Betker once wrote:

As I’ve spent these hours observing the effects of tweaking various model configurations and hyperparameters, one thing that has struck me is the similarities between all the training runs.

It’s becoming clear to me that these models are truly approximating their datasets to an incredible degree

To stay ahead in the AI arms race, tech giants are aggressively scraping data from every corner of the web—personal blogs, independent wikis, and small projects. They don’t just scrape; they strip the internet bare.

Can We Fight Back?

Big Tech has teams of experts balancing web scraping and user experience, but small websites and independent projects lack these resources. For individuals, it’s an uphill battle.

Dennis suggested two unconventional methods to fend off bots:

  1. Tarpit Strategy: Generate meaningless random text to trick bots into wasting resources on irrelevant data.
  2. JavaScript Traps: Serve bot-detected requests with JavaScript-heavy content, embedding scripts that only bots would execute, such as crypto mining code.

While these approaches might work, they’re expensive and technically demanding.


The Zero-Click Internet

What’s Big Tech’s ultimate goal?
To trap users within their ecosystems. By leveraging AI to generate “the best content,” they eliminate the need for users to visit other websites. No more outbound links, no more exploring—their AI serves everything directly, with ads seamlessly integrated.

For individual creators, this means:

  • SEO doesn’t matter anymore.
  • High-quality content won’t reach users.
  • Revenue dries up.

Your work is more than fuel for Big Tech’s data engines in this new reality.

The Inevitable Decline of the Open Web

Big Tech is reshaping the internet, exploiting data while squeezing value out of independent creators. Fighting back is nearly impossible for small websites. The shift is already happening, and it’s irreversible. Ironically, the open web is dying, and the companies that built it are killing it.

References


This version maintains the original tone while aligning with Medium’s concise, conversational style. It’s structured for readability and flows logically to keep readers engaged.


  • Long Time Link
  • If you find my blog helpful, please subscribe to me via RSS
  • Or follow me on X
  • If you have a Medium account, follow me there. My articles will be published there as soon as possible.
Licensed under CC BY-NC-SA 4.0
Last updated on Jan 09, 2025 16:02 CST
Built with Hugo
Theme Stack designed by Jimmy