aMule Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

We're back! (IN POG FORM)

Pages: [1] 2

Author Topic: Site downtime explanation, and lot of extra text.  (Read 29841 times)

Kry

  • Ex-developer
  • Retired admin
  • Hero Member
  • *****
  • Karma: -665
  • Offline Offline
  • Posts: 5795
Site downtime explanation, and lot of extra text.
« on: February 25, 2009, 10:19:16 AM »

First part: Informational part of the post:

The server hosting www.amule.org and most of our services had to go into maintenance several hours ago, and now it's operational again. Sorry for the downtime.

Second part: Extras, totally unnecessary information, personal opinion, shameless rant.

I have used multiple filesystems since I started using linux. And I started with Debian Bo, so some googling should give you how long I'm talking about. Granted, I'm definitely not one of the Ancient Ones, but I like to believe I've been using linux for a good chunk of time.
Actually, I might as well mention I have used computers since my first Amstrad CPC 464 and the Intel 8088 based PC I inherited from my cousin, which didn't even have a hard drive and ran MS-DOS 3.3 from 5¼-inch floppies.

That said, I have used the (scary) AMSDOS filesystem, whatever they called the CP/M file system, FAT12/16/32/X, NTFS in pretty much all its incarnations, ext2, ext3, HPFS, HFS Plus, UFS2 (I miss OS/2 Warp), JFS, ZFS, ReiserFS, XFS and probably others I can't remember, for I was young and liked to try shiny new stuff, even if only for a few hours.
Out of those, Ext2 (seemed to be the only reasonable linux option at the time), FAT/NTFS (for obvious reasons) and XFS (my current filesystem of choice) are the ones that I have used regularly over the years.

Except for some time, while still in college, that I decided ReiserFS seemed to be cool.

Back then, a 80GB hard drive was serious business. It felt like you could have your entire life in it. And heck, I was busy with some open source projects already, I used debian almost exclusively and was, as a matter of fact, enjoying life a lot. And at this wonderful peak point of my life, ReiserFS happened.

My ENORMOUS 80GB hard drive was divided in two partitions: Linux (70GB) and Windows (10GB). Yes, Windows - I also liked gaming between coding sessions (and, to be honest, instead of them a lot of the time) and I dare anyone to check the state of the Wine project back in 2001.

Of course, as the space in the Windows partition was fairly limited (and as much as Windows has become worse with time, 98 already required 500MB just for the basic installation), I had all my data in the linux partition. Heck, I was at linux most of the time, and didn't trust the FAT support back then. So I lived happily in my 70GB linux partition, with all my data: IRC/IM logs and email backups (I'm a fairly OCD person when it comes to historical data), pictures, music, videos, college data including the required  programming exercises for my classes, papers for a couple classes, electronic documents with information needed to study some classes, and other data, all infinitely valuable for me because, well, it was my data. It was impressive how so much information could fit in that (my first hard drive was a wonderful 320MB chunk of heaven, and I was one of the lucky people - others started with hard drives that had barely more capacity than floppies themselves).

I was, in short, a very happy geek. A very happy, naive geek.

WARNING: Explicit content. If you're offended by swearing, do not continue reading, and just imagine the rest of the story. I don't think its hard at this point. If you continue reading, keep in mind this is my personal opinion and has nothing to do with the aMule project team as a whole. And it gets kinda technical (read: boring) at some point.

Out of all the filesystems mentioned above, only one of them seems to be consistently out to get me, shamelessly jumping when I least expect it for a full session of anal rape. I give you three chances at guessing it and the first two don't count.

Fucking ReiserFS. If I had to decide between going back in time to kill Hitler or prevent ReiserFS from existing, I wouldn't even blink before packing my time-traveler-compatible suitcase with clothes suitable for the California weather, and wouldn't even bother learning German.

One day in 2001, I was happily using my linux when I suddenly couldn't save the file geoscape.c that I was editing in nano. It just gave an error. A couple terminal windows and some research later, I noticed something very interesting: the dreaded "read-only filesystem" message.

"Well, ", I thought, "shit happens. I will reboot and let it replay the journal, maybe run fsck."

A very, very naive geek.

LILO appeared, I happily chose "1". Linux started booting. Some messages related to ReiserFS later, I had a kernel panic. For someone that had been a linux user for a while, and not the kind of user Ubuntu trains nowadays, a Kernel Panic meant "something is very wrong, but 99% of the times everything will be OK after turning the computer off and on". I even cracked a joke about restarting linux more than windows to one of my flatmates who happened to be around.

I pressed the power button. I waited a few seconds. I pressed it again.

LILO appeared, I chose "1", but more in an annoyed way than happily. The kernel started booting. The kernel failed to mount hda1.

What?.

Or, as this was 2001 and my life was still contained in my birth country

Que?

At this point I had gone from "gee-isn't-today-weird" mood to a very sombre one, spiked with nervous laughs. Ideas, ideas, ideas. Think think think. I know: I'll use the debian installation disk to boot and then mount the partition. God, I'm a fucking genius, and a very handsome one at that!

There we go! Installation starts, I switch to the other console, do my magic, attempt to mount the partition.

mount: you must specify the filesystem type

I had just lost everything. Everything. It took a while to sink in, then there was no option but to get drunk.

Believe me when I say I tried every single ReiserFS file recovery system available at the time. Nothing. No way to recover even a lousy .bash_profile. ReiserFS had managed to break the filesystem metadata, and it seemed that a lot of the data as well in the process.

How this happened I don't know. Maybe I could have used some manual recovery to get some of the stuff back, but I was young and impatient, and the option of getting drunk, followed by some reformatting of the partition in good old ext2 and a big "FUCK YOU REISERFS" yell about 5 times a day for a couple of weeks was all I could manage.

For those interested, the hardware didn't fail, and that hard drive is still operational, tho it was donated to a friend long time ago (he reports that it still works ok).

I lived the next 6 months thinking that I had been the unluckiest guy in the world that day. Yeah, I know, tons of african children die every day, but heck, I was in my early twenties, I was the center of the world and any drama happening to me was THE drama, for the same reason the data I lost was THE data, and possibly the most important loss humanity ever faced. To be honest, time hasn't done much about fixing that part of my personality, so we'll see when I hit my thirties.

Turns out, 6 months after I had my seas turn red and fire rain from the sky, a company I was working for had two servers' filesystems die on them in 48 hours. They first server had ReiserFS as the filesystem of choice, and showed the same behaviour as my own filesystem. The second one too. As a matter of fact, the second one was the backup server.

In that 48h time frame, the data disappeared from Server A, was restored from Server B to A, then disappeared from Server B. Had Server B's filesystem choked a day earlier, they could have faced a seriously shitty situation. They stopped using ReiserFS. Yeah, they should have had RAID, and actually they do nowadays, but we live and learn.

I had two sysadmins as my drinking partners for a few days.

Fast-forward years of joking about ReiserFS, telling this story to new linux users, arguing with ReiserFS zealots, and regrettably more drinking. Let's say for the sake of the story that now we're at 23rd of February, 2009.

I wake up to find the aMule webserver down. After briefly considering ignoring this event and just playing some games, I decide the users deserve better (and the developers too, as we host several other services there) and ssh to the box. Or to be exact, try to ssh to the box. I get some ssh key exchange error, and there's no way to access the server. Well, not the first time this server goes down, I guess that's why the web interface for the company that owns it has a soft reset/hard reset option.

I do a soft reset and the server happily ignores me like I'm a leper or something. Time for a hard reset, never a good thing. After waiting for ping to respond, I ssh to the server again. This time I get in, but everything is AWFULLY slow. First thing to check is top, and mysql is taking 100% of the CPU for no apparent reason. I leave it do its thing but it won't come back. I check the logs, but mysql has created no logs. My eyes narrow. I do 'touch ~/hello'. The response, of course, is "read only file system". I kill mysql (not like it can do anything to the filesystem at this point) and check dmesg: tons of errors when accessing the filesystem.

And then I remember and shiver.

The preinstalled linux image in this box from the company that hosts it uses ReiserFS. When deltaHF set up this box, that was the only option, ReiserFS. Had I been involved in setting up the server there would have been hell, and I would have either vetoed the company or tried to convince them to change the filesystem or at least give other options. But for 'real life'-related reasons I wasn't involved in it. And ReiserFS ended up being our filesystem of choice, because we had no choice.

And I, shame on me, did nothing about it after taking over the server administration. Call it a late-twenties crisis that made me naive and innocent again, call it faith in the blooming Open Source community, call it stupidity, call it laziness. The fact is we stayed with ReiserFS.

And when I restarted the box again, it didn't come back.

And when I booted the rescue system and tried to mout the partition, it didn't recognize the filesystem.

At this point I guess the two or three readers that are still loyal to this bahamut of a story will expect me to be either suicidal or ready for a mental institution. Hell, at least be in the mood to go and buy some bottles of vodka.

But I wasn't.

I was grinning.

Because this time, I had ReiserFS by the balls.

See, when I took over the administration of this server I found out we had not one, but two hard drives, and the second one was completely unused. So it was really a no-brainer for me to set up an automated backup from the main hard drive, and as a matter of fact, pipe the output of 'dd if=/dev/sda bs=1MB' to gzip and save it once a week, from the rescue system. And of course, format the backup drive's partition in XFS.

So, fuck you ReiserFS. Fuck you hard. It took me more time to write this post than to restore the data lost, with only one hour of it now living in Data Heaven. As a matter of fact, I think I will keep ReiserFS as the main filesystem, if only to see how he tries to have intercourse with me just to find out that I'm not that guy barely out of his teens and with a sharp, scientific mind that assumed no malice in simple things like a filesystem. I'm a happily married guy that has been in the computer scene long enough to know that perfectly innocuous-looking pieces of software and hardware are, in fact, out to get you. And when the day comes that they do get you (and they will!), they won't even have the courtesy to wear a condom.

And of course, its good to know that I have become mature enough to avoid taking cheap, insensitive shots at ReiserFS, even if I love black humour, but I couldn't avoid one of the oldest laws of the internet.

Fuck, it's 4am already. Do forgive my misspellings, awful grammar and anything I've done wrong in this text, including but not limited to writing it.

And if you want some friendly advice, never, ever use ReiserFS.

tl;dr: A filesystem killed my brother
« Last Edit: February 25, 2009, 11:10:42 AM by Kry »
Logged

Bagusajalf

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 40
Re: Site downtime explanation, and lot of extra text.
« Reply #1 on: February 25, 2009, 01:45:18 PM »

Hans Reiser deserves a loooong time in jail. Indoubtedly.
Logged

fabtar

  • Approved Newbie
  • *
  • Karma: 1
  • Offline Offline
  • Posts: 24
Re: Site downtime explanation, and lot of extra text.
« Reply #2 on: February 25, 2009, 02:18:44 PM »

Nice tragic&funny  story!
I like a lot your earlier computer experience, you have reminded me similar expiriences:
C64 (Simon basic rules!)
286-16MHz 1M Hard disk 40MBytes (upgrade with 287 coprocessor and 2MBytes of ram)
486-dx4-100Mhz Hard disk 210MBytes
.... and so on

and you have also reminded me lots of data losses.
Luckily at about 25 years (after a miraculous disaster recovery) I have realized and self imposed a simple rule:
"You shouldn't  blindly trust  server/PC/disk''s reliability"

Since then I have started periodic data backup and actually I have a backup's  script which weekly backups my useful data to an external drive.
IMHO I think that having photos, mail, documents and creative works on PC without an automatic backup is insane.

P.s: Thanks for pointing RaiserFS weakness.

Logged
Mldonker looking around

Morse

  • Full Member
  • ***
  • Karma: 6
  • Offline Offline
  • Posts: 105
Re: Site downtime explanation, and lot of extra text.
« Reply #3 on: February 25, 2009, 06:02:49 PM »

you should learn russian! that way you'll be able to yell much more than simple "fuck you reiserfs": you'll be able to discribe your feelings and probable relationships with reiserfs for hours without a single repeat!
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: Site downtime explanation, and lot of extra text.
« Reply #4 on: February 25, 2009, 11:33:14 PM »

That's a hell of a good piece of writing. Really. Great suspense and great fun to read. And a happy end.  :D
I actually got frightened that a certain piece of 99% complete work would get involved at some point (shiver).  ;)
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

wires

  • Jr. Member
  • **
  • Karma: 6
  • Offline Offline
  • Posts: 83
Re: Site downtime explanation, and lot of extra text.
« Reply #5 on: February 26, 2009, 01:32:22 AM »

Go on working on this history and Hollywood will buy it! "Hell-ReiserFS, Kry's curse"  ;D
Logged

skolnick

  • Global Moderator
  • Hero Member
  • *****
  • Karma: 24
  • Offline Offline
  • Posts: 1188
  • CentOS 6 User
Re: Site downtime explanation, and lot of extra text.
« Reply #6 on: February 26, 2009, 01:55:51 AM »

Great story Kry. Drama (losing your data), suspense (the part with the two servers), horror (no other word to describe the loss of 70GB of data)...it has everything. That's one for telling to your grandchildren (in some years, and only "if any"). I also learned the hard way to avoid ReiserFS, however my data loss was smaller (coupole gigs...no big deal). Now I wouldn't even try to format a HDD with that piece of crap. I trust ext3. Not the fastest, but so far, reliable.

Regards.
Logged

vitorgatti

  • Approved Newbie
  • *
  • Karma: -1
  • Offline Offline
  • Posts: 5
    • http://vitorgatti.50webs.com
Re: Site downtime explanation, and lot of extra text.
« Reply #7 on: February 26, 2009, 12:18:23 PM »

Whoa, great history... ReiserFS was totally destroyed.
I think I used it once, but in a time that I was changing to Linux (testing a lot of distros), so when I really migrated on 2006, installing Ubuntu 5.10, it said to format using EXT3, and I'm glad Ubuntu chose that for me!

EXT4 FTW :D
Logged

RazZziel

  • Developer
  • Newbie
  • *****
  • Karma: 0
  • Offline Offline
  • Posts: 3
  • Cheerleader Amoeba
Re: Site downtime explanation, and lot of extra text.
« Reply #8 on: February 26, 2009, 03:22:14 PM »

ReiserFS™ is a fine piece of engineering that has served me well for 8 glorious years.

Don't blame it about being part of  your personal collection of curses, and same goes for those by your side that weren't strong enough to avoid getting infected. Maybe you stop that habit of yours of inducing Matrix failures wherever you go, if you want to get better luck with filesystems next time.

Godspeed Hans Reiser-kun, I love you even if your artwork is fucking creepy.

<3 <3


...brb, backing the fuck up of my hard drives.
Logged

Arichy

  • Full Member
  • ***
  • Karma: 0
  • Offline Offline
  • Posts: 224
Re: Site downtime explanation, and lot of extra text.
« Reply #9 on: February 28, 2009, 12:09:06 PM »

I started with a 8088, too.
Logged
Gentoo i686

lfroen

  • Guest
Re: Site downtime explanation, and lot of extra text.
« Reply #10 on: February 28, 2009, 02:17:19 PM »

Nice told story. Really.

And regarding ReiserFS ... you should be expecting this kind of failure when using not-exactly-mainstream (which means not-really-tested)  filesystem. I loosed my data on such scale only once - when HD physically failed. Since then I do backups and don't get impressed with disk or filesystem fuckups.
Logged

fabtar

  • Approved Newbie
  • *
  • Karma: 1
  • Offline Offline
  • Posts: 24
Re: Site downtime explanation, and lot of extra text.
« Reply #11 on: February 28, 2009, 07:47:17 PM »

And regarding ReiserFS ... you should be expecting this kind of failure when using not-exactly-mainstream (which means not-really-tested)  filesystem.

Suse has surpresided me choosing raiserfs as default filesystem  (see http://www.ewdisonthen.com/opensuse-drops-reiserfs-sign-of-things-to-come-1212.php  ) and after this I thought all the wonderful tales about  Raiser were right.
Then Opensuse has dropped RFS  quickly.. perhaps have they caught the infiltrated RaiserFS's zealot?
:-D
Logged
Mldonker looking around

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: Site downtime explanation, and lot of extra text.
« Reply #12 on: March 01, 2009, 10:46:33 PM »

My personal little story of data loss:

My first computer was a C64, which I got pimped up well at some point with a 256k memory extension (which cost about as much as a 4G set today - just the rams, not counting the CPU extension board required) and a floppy speeder (C64 floppy would perform far below its theoretical speed usually). To use it for copying discs at full speed (10s read / 20s write) I had to develop my own software of course.

After a lot of developing an experimenting I had it almost done. Then one day after a the clack-clack-clack of yet another test run had finished I suddenly realized I finally had forgotten to swap the discs. Instead of the test disc I had just overwritten the data disc with my source (the one and only of course - I was young and innocent).

I just stood up, went to my bedroom and lay on my bed, swearing and wondering what to do now. I would never get all the pieces together again. Then I suddenly remembered I had just for the first time tried the "integrated development environment" where you could run the assembler and the program directly from the editor instead of the usual save/exit/assemble-from-disc/run.

I raced back up the stairs, checked the computer, and really - I had left it on, and my source was still in RAM. With shaking hands I inserted a floppy and saved it. Then I turned off the C64 and left it for the day. I was truely done with my nerves...

Morale: the greatest source of computer trouble is usually sitting at the keyboard.  :D
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

btkaos

  • Global Moderator
  • Sr. Member
  • *****
  • Karma: 110
  • Offline Offline
  • Posts: 486
  • Kaos is infinite!
Re: Site downtime explanation, and lot of extra text.
« Reply #13 on: March 02, 2009, 08:16:39 PM »

Reiserfs is really bad, I'm sorry for you Kry.

I highly recommend a video by Theodore Ts'o, main designer of ext2/3/4 filesystems:

http://www.linux-magazine.com/online/news/video_ted_ts_o_on_ext4_btrfs_and_first_steps_with_linux
Logged

gav616

  • Guest
Re: Site downtime explanation, and lot of extra text.
« Reply #14 on: March 02, 2009, 08:30:26 PM »

lol, i read the part about Reiserfs3, why??!

for best data integrity i'd always use ext3 with full journaling mode (not 4, its immature) or at a push jfs if you like placebo effects.

the only way I would use reiserfs3 is on my /var mount point.
Logged
Pages: [1] 2