this post was submitted on 27 Apr 2024
112 points (95.9% liked)

Linux

45234 readers
1189 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

I was reading the reddit thread on Claude AI crawlers effectively DDOSing Linux Mint forums https://libreddit.lunar.icu/r/linux/comments/1ceco4f/claude_ai_name_and_shame/

and I wanted to block all ai crawlers from my selfhosted stuff.

I don't trust crawlers to respect the Robots.txt but you can get one here: https://darkvisitors.com/

Since I use Caddy as a Server, I generated a directive that blocks them based on their useragent. The content of the regex basically comes from darkvisitors.

Sidenote - there is a module for blocking crawlers as well, but it seemed overkill for me https://github.com/Xumeiquer/nobots

For anybody who is interested, here is the block_ai_crawlers.conf I wrote.

(blockAiCrawlers) {
  @blockAiCrawlers {
    header_regexp User-Agent "(?i)(Bytespider|CCBot|Diffbot|FacebookBot|Google-Extended|GPTBot|omgili|anthropic-ai|Claude-Web|ClaudeBot|cohere-ai)"
  }
  handle @blockAiCrawlers {
    abort
  }
}

# Usage:
# 1. Place this file next to your Caddyfile
# 2. Edit your Caddyfile as in the example below
#
# ```
# import block_ai_crawlers.conf
#
# www.mywebsite.com {
#   import blockAiCrawlers
#   reverse_proxy * localhost:3000
# }
# ```
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 64 points 1 month ago (10 children)
[–] [email protected] 8 points 1 month ago* (last edited 1 month ago) (1 children)

In dark mode, the anchor tags are difficult to read. They're dark blue on a dark background. Perhaps consider something with a much higher contrast?

A picture of a website with a dark purple background and dark blue links.

Apart from that, nice idea - I'm going to deploy the zipbomb today!

[–] [email protected] 5 points 1 month ago

nice catch, thanks (i use light mode most of the time)

load more comments (8 replies)