Home > Computers, This Site > Back Online After Server Failure

Back Online After Server Failure

June 1st, 2009

This is exactly why I backup as frequently as I do. During this entire process no data was lost.Yesterday my webserver starting acting very strangely. I attempted to upload some photos to my photo gallery. It would not let me create any new content, but I could still view all of my content that was already posted. I SSH’d into the server and found that my system load was at 280 (!!!) and growing by about 2 points per minute. I checked some logs and couldn’t really figure out what was going on. No process was showing a major increase in resource consumption and there were no signs of extra traffic. I tried to close out all of the terminal windows I had open, but two of them would not lose, no matter what I tried (killing processes, etc.). After 225 days of uptime, I figured I should just reboot the system.

I waited a few minutes but the system never came back online (not even enough for me to ping it). I waited untill about 10 minutes after I gave the shutdown command and then forcefully turned it off. I brought it into my room to work on it and plugged it into a monitor to see where the problem was (I usually run my servers headless). I turned on the power and… nothing. The fans spun and it started drawing power from the wall, but nothing was displayed on the monitor. The system was completely dead.

“Fine. I’ll grab one of my spare computers, install Ubuntu Server, restore the backups, and that will be that” I thought to myself. The first computer I grabbed did not turn on at all. The green light on the motherboard would glow, but it would not power on. I even tried manually jumpping the power connection on the motherboard just in case the power switch was bad. After that attempt failed I put it back on the shelf and moved on to the next spare computer.

“Irregular fan speed. Press F2 to continue.” That was not going to cut it to run as my webserver. At this point the only spare computer I have left is an older laptop with a broken screen. I used to use this laptop as a live spare webserver, so it already had everything installed. I stopped using it simply because it was so old and underpowered. It got to the GRUB boot loader and froze. Again, that’s not going to cut it for use as the primary webserver. <sigh>

With four computers that would not work the way they were suposed to I decided to just use my personal fileserver “Ducktape” to act as the webserver as well. I forwarded port 80 to it’s IP address instead of the old webserver and threw up a quick error message and then went out for dinner in order to relax. After I got home it was a simple matter to restore all of my backed up mysql databases, web directories, and apache2 config files. It took about 45 minutes to get everything working correctly, and now I’m back online. I also learned that I didn’t backup the actual web folders correctly. The stored perrmisions for the files were lost, so I had to fix that (although that’s probably because it was on a new linux box, which had different user IDs).

Aside from the multiple simultanious physical failures, this is how system failures and recoveries are supposed to go. No data was lost.

Computers, This Site , , ,

Comments are closed.