ERROR Insert Coin
When you make a potentially system breaking change and forgot to make a snapshot of the VM beforehand…
Someone set up a script to automatically create daily backups to tape. Unfortunately, it’s still the first tape that was put in there 3.5 years ago, every backup since that one filled up failed. It might as well have failed silently because everyone who received the email with the error message filtered them to a folder they generally ignored.
And no one ever tried to restore it.
Happened to me as well, after a year I learned incremental DB backups were wrongly offset by GMT diff, so we were losing hours every time. Fun.
Luckily we never needed them.
And now we have Postgres with WAL archiving and I sleep so much better.
Tbh there is nothing more taxing on my mental health than doing maintenance on our production servers.
Never update, never reboot. Clearly the safest method. Tried and true.
this week i sudo shutdown now our main service right at the end of the workday because i tought it was a local terminal.
not a bright move.
I was making after hours config changes on a pair of mostly-but-not-entirely redundant Cisco L3 switches which basically controlled the entire network at that location. While updating the running configs I mixed up which ssh session was which switch and accidentally gave both switches the same IP address, and before I noticed the error I copied the running config to the startup config.
Due to other limitations and the fact that these changes were to fix DNS issues (and therefore I couldn’t rely on DNS to save me) I ended up keeping sshing in by IP until I got the right switch and trying to make the change before my session died due to dropped packets from the mucked up network situation I had created. That easily added a couple of hours of cleanup to the maintainence I was doing
There’s a package called molly-guard
which will check to see if you are connected via ssh when you try to shut it down. If you are, it will ask you for the hostname of the system to make sure you’re shutting down the right one.
Very usefull program to just throw onto servers.
Best thing I did was change my shell prompt so I can easily tell when it isn’t my machine
Happens to everyone
Just having a multitude of terminals open with a mix of test environment and (just for comparison) an open connection to the production servers…
We were at a fair/exhibition once and on the first day people working on an actual customer project asked us, if they could compare with our code.
Obviously they flashed the wrong PLC and we were stuck dead at the first hours of the exhibition.
I still think that this place was cursed, as we also had to do multiple re-soldering of some connections of our robot and the sherry on top was the system flash dying - where I had fucked up, because I just finished everything late at night and didn’t made a complete backup of everything.
But it seems, if luck runs out, you lose on all fronts.
At least I was able to restore everything in 20mins. Which must be some kind of record.
But I was shaking so much from the stress, that I couldn’t efficiently type anymore and was lucky to have a colleague to just calmly enter what I told him to and with that we’re able to get the show case up and running again.
Well, at least the beer afterwards tasted like the liquid of the gods
Oops.
Since you’re using sudo, I suggest setting different passwords on production, remote, and personal systems. That way, you’ll get a password error before a tired/distracted command executes in the wrong terminal.