Research shows 25% of web pages posted between 2013 and 2023 have vanished. A few organisations are racing to save the echoes of the web, but new risks threaten their very existence.

It’s possible, thanks to surviving fragments of papyrus, mosaics and wax tablets, to learn what Pompeiians ate for breakfast 2,000 years ago. Understand enough Medieval Latin, and you can learn how many livestock were reared at farms in Northumberland in 11th Century England – thanks to the Domesday Book, the oldest document held in the UK National Archives. Through letters and novels, the social lives of the Victorian era – and who they loved and hated – come into view.

But historians of the future may struggle to understand fully how we lived our lives in the early 21st Century. That’s because of a potentially history-deleting combination of how we live our lives digitally – and a paucity of official efforts to archive the world’s information as it’s produced these days.

However, an informal group of organisations are pushing back against the forces of digital entropy – many of them operated by volunteers with little institutional support. None is more synonymous with the fight to save the web than the Internet Archive, an American non-profit based in San Francisco, started in 1996 as a passion project by internet pioneer Brewster Kahl. The organisation has embarked what may be the most ambitious digital archiving project of all time, gathering 866 billion web pages, 44 million books, 10.6 million videos of films and television programmes and more. Housed in a handful of data centres scattered across the world, the collections of the Internet Archive and a few similar groups are the only things standing in the way of digital oblivion.

You are viewing a single thread.
View all comments
11 points

I’m part of the problem, a tiny bit. For altruistic reasons - ok more like “I’m kinda weird, maybe this will make people on IRC like me more” reasons - I ran mspencer.net and hosted web pages for people for free. Ended up with web content for around 100 people, and they weren’t all just using it as a drop box. (Older than wikipedia.org by 199 days, woo!)

Hosted on ancient hardware, nothing even remotely approaching a modern security architecture, I eventually left it to run un-maintained until the IDE HDD died. More recently I got the data off of it. (Heads unstuck themselves while in a cardboard box for a decade? Dunno.) But I don’t know how to get everything back online in a safe way.

I’m a proper software engineer now, I can kinda see how work handles securely hosting web services. Now just throwing everything together on one box feels too lazy and insecure. But I can’t figure out a reasonable security architecture to use. I thought I had one, but I failed to account for VM jackpotting attacks. And it feels like it takes me a month to do what a competent ops person can do in a day.

But that’s a discussion for a different comment section.

permalink
report
reply
4 points

Try to ask in self hosting community here in lemmy

permalink
report
parent
reply
7 points

But I can’t figure out a reasonable security architecture to use. I thought I had one, but I failed to account for VM jackpotting attacks. And it feels like it takes me a month to do what a competent ops person can do in a day.

You’re overthinking it, just secure things enough that you’re ahead of the script kiddies automated scan tools (which isn’t a lot tbh)

The people with actual real skill don’t care about you, they’d rather go after juicy targets, like companies or politicians or rich people

permalink
report
parent
reply
5 points

If it’s static content, nothing beats an AWS S3 bucket.

permalink
report
parent
reply
4 points
*

Last time I went snooping:

15 installs of phpbb, which would require work to put back online as their communities are of course gone. Remove spam, undo defacement, etc.

7 installs of Dormando’s Oekaki BBS Clone

5 installs of WonderCatStudio BBS

4 installs of OekakiPotato / RanmaGuy etc.

and several users who just used php to ‘include’ headers and table of contents page parts.

(Yes I was quite the weeb. Still am, but I was one too. :-) )

permalink
report
parent
reply
5 points

If this was my problem to solve, I would host it internally, as-is, on a virtual machine of your choice, then create a a static html mirror version from the public information and put that up on AWS S3 as a static website.

permalink
report
parent
reply

News

!news@lemmy.world

Create post

Welcome to the News community!

Rules:

1. Be civil

Attack the argument, not the person. No racism/sexism/bigotry. Good faith argumentation only. This includes accusing another user of being a bot or paid actor. Trolling is uncivil and is grounds for removal and/or a community ban. Do not respond to rule-breaking content; report it and move on.


2. All posts should contain a source (url) that is as reliable and unbiased as possible and must only contain one link.

Obvious right or left wing sources will be removed at the mods discretion. We have an actively updated blocklist, which you can see here: https://lemmy.world/post/2246130 if you feel like any website is missing, contact the mods. Supporting links can be added in comments or posted seperately but not to the post body.


3. No bots, spam or self-promotion.

Only approved bots, which follow the guidelines for bots set by the instance, are allowed.


4. Post titles should be the same as the article used as source.

Posts which titles don’t match the source won’t be removed, but the autoMod will notify you, and if your title misrepresents the original article, the post will be deleted. If the site changed their headline, the bot might still contact you, just ignore it, we won’t delete your post.


5. Only recent news is allowed.

Posts must be news from the most recent 30 days.


6. All posts must be news articles.

No opinion pieces, Listicles, editorials or celebrity gossip is allowed. All posts will be judged on a case-by-case basis.


7. No duplicate posts.

If a source you used was already posted by someone else, the autoMod will leave a message. Please remove your post if the autoMod is correct. If the post that matches your post is very old, we refer you to rule 5.


8. Misinformation is prohibited.

Misinformation / propaganda is strictly prohibited. Any comment or post containing or linking to misinformation will be removed. If you feel that your post has been removed in error, credible sources must be provided.


9. No link shorteners.

The auto mod will contact you if a link shortener is detected, please delete your post if they are right.


10. Don't copy entire article in your post body

For copyright reasons, you are not allowed to copy an entire article into your post body. This is an instance wide rule, that is strictly enforced in this community.

Community stats

  • 14K

    Monthly active users

  • 8.9K

    Posts

  • 162K

    Comments