digdilem

joined 1 year ago

How to save data for archive purposes? in c/[email protected]

[–] [email protected] 3 points 1 day ago

I used to write to DVD's, but the failure rate was astronomical - like 50% after 5 years, some with physical separation of the silvering. Plus today they're so relatively small they're not worth using.

I've gone through many iterations and currently my home setup is this:

I have several systems that make daily backups from various computers and save them onto a hard drive inside one of my servers.
That server has an external hard drive attached to it controlled by a wifi plug controlled by home assistant.
Once a month, a scheduled task wakes up that external hdd and copies the contents of the online backup directory onto it. It then turns it off again and emails me "Oi, minion. Backups complete, swap them out". That takes five minutes.
Then I take the usb disk and put it in my safe, removing the oldest of 3 (the classic, grandfather, father, son rotation) from there and putting that back on the server for next time.
Once a year, I turn the oldest HDD into an "Annual backup", replacing it with a new one. That stops the disks expiring from old age at the same time, and annual backups aren't usually that valuable.

Having the hdd's in the safe means that total failure/ransomware takes, at most, a month's worth. I can survive that. The safe is also fireproof and in another building to the server.

This sort of thing doesn't need to be high capacity HDDs either - USB drives and micro-SD cards are very capable now. If you're limited on physical space and don't mind slower write times (which when automating is generally ok), the microSd's and clear labelling is just as good. You're not going to kill them through excessive writes for decades.

I also have a bunch of other stuff that is not critical - media files, music. None of that is unique and can be replaced. All of that is backed to a secondary "live" directory on the same pc - mostly in case of my incompetence in deleting something I actually wanted. But none of that is essential - I think it's important to be clear about what you "must save" and what is "nice to save"

The clear thing is to sit back and work out a system that is right for you. And it always, ALWAYS should be as automated as you can make it - humans are lazy sods and easily justify not doing stuff. Computers are great and remembering to do repetitive tasks, so use that.

Include checks to ensure the backed up data is both what you expected it to be, and recoverable - so include a calendar reminder to actually /read/ from a backup drive once or twice a year.

What are we doing for TVs these days software wise? in c/[email protected]

[–] [email protected] 6 points 2 days ago

It even has the approval of my wife.

He is the chosen one! Hail him!

Uttlesford District Council approves sound-recording CCTV cameras in c/[email protected]

[–] [email protected] 11 points 3 days ago (2 children)

Strange. That seems to go against the Home Office's own Code of Practice

"13 3.2.2 Any proposed deployment that includes audio recording in a public place is likely to require a strong justification of necessity to establish its proportionality. There is a strong presumption that a surveillance camera system must not be used to record conversations as this is highly intrusive and unlikely to be justified"

I hope they've taken good legal advice.

Neither Elon Musk Nor Anybody Else Will Ever Colonize Mars | Defector in c/[email protected]

[–] [email protected] 16 points 3 days ago

Never say never - unless you're writing clickbait.

In an effort to "not show bias" Trump's sentencing is delayed till after the election. in c/[email protected]

[–] [email protected] 42 points 1 week ago

"Justice delayed is justice denied"

Lula says Elon Musk’s wealth does not mean world must accept his ‘far-right free-for-all’ in c/[email protected]

[–] [email protected] 1 points 1 week ago

Anarchism is all about working together to build a better world where everyone has all of their needs met,

Hmm, I was working with the classic disctionary definition which is "a state of disorder due to absence or non-recognition of authority or other controlling systems."

But you're right, anarchism does have that other meaning, so perhaps a better word would be "chaos".

His actions in supporting Trump in the US, promoting hate and extermist views on X globally, and encouraging civil war in the UK do all fit a chaos agenda. That's not about money - at least, not that I can see.

He is one of the world's most dangerous people, however, and I don't say that lightly. Not least because of his history of being unpredictable.

More governments should follow Brazil's example and push back.

Lula says Elon Musk’s wealth does not mean world must accept his ‘far-right free-for-all’ in c/[email protected]

[–] [email protected] 1 points 1 week ago (1 children)

He has more than anyone else in the history of the world. By any scale, he's won at money.

Lula says Elon Musk’s wealth does not mean world must accept his ‘far-right free-for-all’ in c/[email protected]

[–] [email protected] 1 points 1 week ago

That's been said, and perhaps it has been true. But when you're the richest man in the world by a significant margin, you have literally won at money and to remain competitive you need to move onto other things. Like the power of politics, and working to destabilise multiple countries at once.

Lula says Elon Musk’s wealth does not mean world must accept his ‘far-right free-for-all’ in c/[email protected]

[–] [email protected] 6 points 1 week ago (7 children)

He's already got all the money and most of the power. Now his hobby is far right extremism and anarchy.

Lula says Elon Musk’s wealth does not mean world must accept his ‘far-right free-for-all’ in c/[email protected]

[–] [email protected] 109 points 1 week ago (15 children)

Over the past year Musk has removed all masks and clearly believes he can operate beyond the law. His motives are clearly to watch the world burn. He is an extremely dangerous, unpredictable and powerful man, threatening democracy across the globe.

Our governments need to protect us from him. Brazil's being brave here, I hope they're just the first.

Is Linux As Good As We Think It Is? in c/[email protected]

[–] [email protected] 7 points 1 week ago

I'm inclined to give Linux more benefit of the doubt than, say, Windows. That's because of the motives behind it.

Microsoft have a very long history of making design choices in their software that users don't like, and quite often that's because it suits their interests more than their customers. They are a commercial business that exists to benefit itself, after all. Same with Apple. Money spoils everything pure, after all. You mention privacy, but that's just one more example of someone wanting to benefit financially from you - it's just in a less transparent and more open-ended way than paying them some cash.

Linux, because that monetary incentive is far less, is usually designed simply "to be better". The developers are often primary users of the software. Sure - sometimes developers make choices that confuses users, but that over-arching driving business interest just isn't there.

X unveils video-centric TV app to rival YouTube - Dexerto in c/[email protected]

[–] [email protected] 63 points 1 week ago* (last edited 1 week ago) (4 children)

Feels like another hate-pushing cesspit to avoid.

Skara Brae Buddo - 5,000 year old figurine. Buddo means "Friend" (lemmy.ml)

submitted 1 month ago by [email protected] to c/[email protected]

6 comments fedilink

On display at the Stromness museum. Carved from whalebone and believed to be a child's doll.

Was discovered at the famous Skara Brae site, and then spent years forgotten in a box at the museum before being rediscovered.

https://www.bbc.co.uk/news/uk-scotland-north-east-orkney-shetland-36526874

195

Stopping a badly behaved bot the wrong way. (lemmy.ml)

submitted 5 months ago* (last edited 5 months ago) by [email protected] to c/[email protected]

31 comments fedilink

I host a few small low-traffic websites for local interests. I do this for free - and some of them are for a friend who died last year but didn't want all his work to vanish. They don't get so many views, so I was surprised when I happened to glance at munin and saw my bandwidth usage had gone up a lot.

I spent a couple of hours working to solve this and did everything wrong. But it was a useful learning experience and I thought it might be worth sharing in case anyone else encounters similar.

My setup is:

Cloudflare DNS -> Cloudflare Tunnel (Because my residential isp uses CGNAT) -> Haproxy (I like Haproxy and amongst other things, alerts me when a site is down) -> Separate Docker containers for each website. On a Debian server living in my garage.

From Haproxy's stats page, I was able to see which website was gathering attention. It's one running PhpBB for a little forum. Tailing apache's logs in that container quickly identified the pattern and made it easy to see what was happening.

It was seeing a lot of 404 errors for URLs all coming from the same user-agent "claudebot". I know what you're thinking - it's an exploit scanning bot, but a closer look showed it was trying to fetch normal forum posts, some which had been deleted months previously, and also robots.txt. That site doesn't have a robots.txt so that was failing. What was weird is that the it was requesting at a rate of up to 20 urls a second, from multiple AWS IPs - and every other request was for robots.txt. You'd think it would take the hint after a million times of asking.

Googling that UA turns up that other PhpBB users have encountered this quite recently - it seems to be fascinated by web forums and absolutely hammers them with the same behaviour I found.

So - clearly a broken and stupid bot, right? Rather than being specifically malicious. I think so, but I host these sites on a rural consumer line and it was affecting both system load and bandwidth.

What I did wrong:

In docker, I tried quite a few things to block the user agent, the country (US based AWS, and this is a UK regional site), various IPs. It took me far too long to realise why my changes to .htaccess were failing - the phpbb docker image I use mounts the root directory to the website internally, ignoring my mounted vol. (My own fault, it was too long since I set it up to remember only certain sub-dirs were mounted in)
Figuring that out, I shelled into the container and edited that .htaccess, but wouldn't have survived restarting/rebuilding the container so wasn't a real solution.

Whilst I was in there, I created a robots.txt file. Not surprisingly, claudebot doesn't actually honour whats in there, and still continues to request it ten times a second.

Thinking there must be another way, I switched to Haproxy. This was much easier - the documentation is very good. And it actually worked - blocking by Useragent (and yep, I'm lucky this wasn't changing) worked perfectly.

I then had to leave for a while and the graphs show it's working. (Yellow above the line is requests coming into haproxy, below the line are responses).

Great - except I'm still seeing half of the traffic, and that's affecting my latency. (Some of you might doubt this, and I can tell you that you're spoiled by an excess of bandwidth...)

That's when the penny dropped and the obvious occured. I use cloudflare, so use their firewall, right? No excuses - I should have gone there first. In fact, I did, but I got distracted by the many options and focused on their bot fighting tools, which didn't work for me. (This bot is somehow getting through the captcha challenge even when bot fight mode is enabled)

But, their firewall has an option for user agent. The actual fix was simply to add this in WAF for that domain.

And voila - no more traffic through the tunnel for this very rude and stupid bot.

After 24 hours, Cloudflare has blocked almost a quarter of a million requests by claudebot to my little phpbb forum which barely gets a single post every three months.

Moral for myself: Stand back and think for a minute before rushing in and trying to fix something in the wrong way. I've also taken this as an opportunity to improve haproxy's rate limiting internally. Like most website hosts, most of my traffic is outbound, and slowing things down when it gets busy really does help.

This obviously isn't a perfect solution - all claudebot has to do is change its UA, and by coming from AWS it's pretty hard to block otherwise. One hopes it isn't truly malicious. It would be quite a lot more work to integrate Fail2ban for more bots, but it might yet come to that.

Also, if you write any kind of web bot, please consider that not everyone who hosts a website has a lot of bandwidth, and at least have enough pride to write software good enough to not keep doing the same thing every second. And, y'know, keep an eye on what your stuff is doing out on the internet - not least for your own benefit. Hopefully AWS really shaft claudebot's owners with some big bandwidth charges...

EDIT: It came back the next day with a new UA, and an email address linking it to anthropic.com - the Claude3 AI bot, so it looks like a particularly badly written scraper for AI learning.