aMule Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

We're back! (IN POG FORM)

Pages: 1 2 [3] 4

Author Topic: known_files a bit fragile....  (Read 24015 times)

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: known_files a bit fragile....
« Reply #30 on: September 18, 2011, 01:05:02 PM »

I know, but the fact remains that there is a race condition.
Where should this race condition be?
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: known_files a bit fragile....
« Reply #31 on: September 18, 2011, 02:26:32 PM »

Try 10613, it improves amulegui's performance when too many files are shared.
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: known_files a bit fragile....
« Reply #32 on: September 18, 2011, 09:32:23 PM »

Or rather 10616, which doesn't perpetually rehash files duplicate more than twice, in case this is what you are doing...
(It still will rehash them once, but then add them to duplicates correctly at last and be done.)
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #33 on: September 19, 2011, 01:50:46 AM »

I know, but the fact remains that there is a race condition.
Where should this race condition be?

The race condition exists while known or known2 are being written out - on my system known is taking about 15 seconds to be fully populated.

By writing the files out and then rotating them into place afterwards, the window for system events or sigkills leaving an incomplete known file is _much_ shorter.

For whatever reason known2 seems to be much more robust.

I know you say that known shouldn't be being truncated, but something is causing the entire file to be rewritten in place. As long as this occurs there's a possibility that it can be corrupted with no valid copy on the system.

Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #34 on: September 19, 2011, 01:53:28 AM »

Or rather 10616, which doesn't perpetually rehash files duplicate more than twice, in case this is what you are doing...
(It still will rehash them once, but then add them to duplicates correctly at last and be done.)

There's only one copy of the files onboard.

Amuled certainly seems to be hashing every file twice for some reason - but on the second pass lsof shows none of the shared files are being touched - just known.met being continually being rewritten.
Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #35 on: September 19, 2011, 02:27:18 PM »

Or rather 10616[/quote/

Starts quickly, then starts logging the following error over and over again. Amulegui and amulecmd can't connect.

total jobs = 100, too many jobs (about 10 at a time, every 10 seconds - thousands so far)

Google searches suggest it's coming out of  libupnp.

Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #36 on: September 19, 2011, 02:35:20 PM »

Google searches suggest it's coming out of  libupnp.

See http://comments.gmane.org/gmane.linux.upnp-sdk.general/271 - there was no followup on this though.

I've been seeing the "too many jobs" error message a bit recently, but usually oinly after about 12-24 hours.

The files I'm currently sharing are 150-300Mb apiece. One of the end targets is to use this method for scientific data distribution - long-term that'd be a few million files at around 200Mb each (planetary albedo data for climate modelling)...




Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #37 on: September 19, 2011, 08:46:07 PM »

ok, 10616:

2011-09-19 19:35:17: amule.cpp(1281): KnownFileList: Failsafe for crash on file hashing creation
 2011-09-19 19:35:17: KnownFileList.cpp(131): KnownFileList: start saving known.met
 2011-09-19 19:36:45: KnownFileList.cpp(165): KnownFileList: finished saving known.met
 2011-09-19 19:36:45: amule.cpp(1275): KnownFileList: Safe adding file to sharedlist: X
 2011-09-19 19:36:45: amule.cpp(1281): KnownFileList: Failsafe for crash on file hashing creation
 2011-09-19 19:36:45: KnownFileList.cpp(131): KnownFileList: start saving known.met
 2011-09-19 19:38:15: KnownFileList.cpp(165): KnownFileList: finished saving known.met
 2011-09-19 19:38:15: KnownFileList.cpp(131): KnownFileList: start saving known.met
 2011-09-19 19:39:41: KnownFileList.cpp(165): KnownFileList: finished saving known.met
 2011-09-19 19:39:42: amule.cpp(1275): KnownFileList: Safe adding file to sharedlist: Y
 2011-09-19 19:39:42: amule.cpp(1281): KnownFileList: Failsafe for crash on file hashing creation
 2011-09-19 19:39:42: KnownFileList.cpp(131): KnownFileList: start saving known.met
 2011-09-19 19:41:09: KnownFileList.cpp(165): KnownFileList: finished saving known.met

Note the amount of time spent saving each known.met - I've benchmarked the disk system at something in excess of 75Mb/s (11 drive ZFS RAIDZ2 with SSD read and write cache), so taking this long to write out ~9Mb is a bit odd.

This long write period is why I suggest writing out, then renaming into place.
Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #38 on: September 19, 2011, 09:36:13 PM »

One thing which has shown up with the extra logging is that what's being written out to known.met is a _long_ way behind what's being opened by amuled (shown with lsof)

So it's not that things are being rehashed twice, but that rehashing to known.met  is going on long after the files have been closed. Why is it lagging so far?

 
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: known_files a bit fragile....
« Reply #39 on: September 19, 2011, 11:00:40 PM »

taking this long to write out ~9Mb is a bit odd
QUITE a bit, yes. My known.met is ~5MB and took 4s to save, with optimize off.

Are you building with optimize on btw? If not you should try that. You can build with debug AND optimize at the same time.

What's probably happening:
- background hashing finishes, posts event
- foreground event queue processes event and starts writing known.met
- background hashing finishes, posts event
- background hashing finishes, posts event
- background hashing finishes, posts event
- foreground event queue finishes writing known.met
- foreground event queue does other important things
- background hashing finishes, posts event
- foreground event queue processes event and starts writing known.met
...

So events stack up and up, because background hashes faster than foreground can process.
Oh, and each foreground event will block (because knownFileListMutex is locked), and so spawn a new event handler instance.  :o
The core of the problem is - why does writing take so long? Is your system thrashing (continuously swapping aMule's memory in and out) ?
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #40 on: September 19, 2011, 11:12:04 PM »


The core of the problem is - why does writing take so long? Is your system thrashing (continuously swapping aMule's memory in and out) ?

Not at all, no swapping happening at all and right now there's 1500Mb free (8Gb system). Other processes are writing very quickly.

On the CPU front, one thread is chewing 99.9% (of one cpu), whilst 8 other threads are using 0.0

CPU is Intel core2 duo @ 2.6GHz

Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #41 on: September 19, 2011, 11:21:24 PM »

Current build parameters:

./configure --enable-optimize --enable-amule-daemon --enable-amulecmd --enable-webserver --enable-amule-gui --enable-cas --enable-wxcas --enable-alc --enable-alcc --enable-xas --enable-geoip --enable-mmap --enable-fileview --with-zlib --enable-ccache
Logged

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #42 on: September 20, 2011, 11:56:13 PM »

I've moved the amule directory to a dedicated SSD.

Writes are a lot faster, but it's still taking 4 seconds to write  known.met, while bonnie can easily hit 100Mb/s (multithreaded) on this drive.

Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: known_files a bit fragile....
« Reply #43 on: September 21, 2011, 11:00:17 PM »

The data has to be prepared in a complicated way which takes quite some CPU. You can't compare that against the raw transfer rate.
So how did that affect the overall performance? Did the problem go away?
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

stoatwblr

  • Sr. Member
  • ****
  • Karma: 12
  • Offline Offline
  • Posts: 318
Re: known_files a bit fragile....
« Reply #44 on: September 23, 2011, 09:26:24 PM »


It's gone away for the moment, but returns if a normal HDD is used.

I'd imagine that as more shares are added the problem will return even with SSD.
Logged
Pages: 1 2 [3] 4