aMule Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

We're back! (IN POG FORM)

Pages: 1 2 [3] 4 5

Author Topic: Download part _X_ is Corrupt -- too often  (Read 29282 times)

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
RE: no gdb - no change
« Reply #30 on: January 23, 2004, 05:17:19 PM »

Quote
Now trying with a more recent glibc. (2.3.2-4.80.8 )

Of course, upon realizing that I wasn't running gdb, and wasn't
watching the machine either, aMule crashed almost instantly :-(

With this glibc, the gdb setup procedure changes as follows:

Program received signal SIG32, Real-time event 32.
0x406dea35 in pthread_getconcurrency () from /lib/i686/libpthread.so.0
(gdb) ha SIG32 nostop noprint pass

- Werner
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
mixed result after libc upgrade
« Reply #31 on: January 23, 2004, 09:48:43 PM »

After the libc upgrade, 4.5h (16-20:30 UTC), 5.5-19 MB/h:

bad 1 good 0 ich 0 other 40
bad 1 good 1 ich 0 other 73
bad 1 good 2 ich 0 other 95
bad 1 good 3 ich 0 other 140
bad 1 good 4 ich 0 other 153
bad 1 good 5 ich 0 other 209
bad 1 good 6 ich 0 other 233
bad 1 good 7 ich 0 other 288
bad 2 good 7 ich 0 other 332

... and then aMule segfaulted. (I've changed aMule to also print
the counters when incrementing the "good" ones.)

This differs from the roughly 1:2:0:200 ratios I obtained with an older
libc. However, there's not enough data in this for me to claim that
the result is significant.

Note that, if indeed a libc bug caused data corruption, some of the
corrupt data may be on disk (aMule does commit unverified data
to disk, right ?), so even if the problem is fixed now, it may take a
while until the corrupt chunks drop to near-zero.

- Werner
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
apparently no change
« Reply #32 on: January 24, 2004, 06:33:23 AM »

A longer run (8.5h, 21-5:30 UTC, 9-35 MB/h) with the new libc
didn't show an improvement:

bad 1 good 0 ich 0 other 67
bad 2 good 0 ich 0 other 115
bad 3 good 0 ich 0 other 130
bad 4 good 0 ich 3 other 172
bad 4 good 1 ich 3 other 198
bad 4 good 2 ich 3 other 218
bad 4 good 3 ich 4 other 291
bad 4 good 4 ich 4 other 380
bad 4 good 5 ich 4 other 431
bad 5 good 5 ich 4 other 472
bad 5 good 6 ich 4 other 566
bad 5 good 7 ich 5 other 645
bad 5 good 8 ich 6 other 701
bad 5 good 9 ich 6 other 728
bad 5 good 10 ich 6 other 794
bad 5 good 11 ich 6 other 837
bad 5 good 12 ich 6 other 922

After shutdown:

bad 5 good 13 ich 7 other 983
bad 5 good 14 ich 7 other 985
bad 5 good 15 ich 7 other 992
bad 5 good 16 ich 7 other 996
bad 6 good 16 ich 7 other 1010
bad 7 good 16 ich 7 other 1011

Next is the upgrade from wxBase/wxGTK 2.4.1 to 2.4.2.

- Werner
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
more data points
« Reply #33 on: January 24, 2004, 10:15:41 PM »

First run, ca. 8h, ended by X session failing (weird. not sure what's
going on):

bad 16 good 41 ich 1 other 3148

Second run, ca. 4.5h, UTC 16:30-21, ca. 20 MB/h, ended by aMule
segfaulting (see posting in "Backtraces"):

bad 3 good 5 ich 0 other 397

We seem to be back to the 1:2:0:200 ratio. Of course, if my theory
that most of the corruption is still on disk is true, I'll need a few
hundred more good or bad hashes before there should be any
significant change in the numebers. We'll see.

Both runs with the latest glibc (without fopen data corruption bug),
wx 2.4.2. Note: aMule feels a lot less responsive since upgrading
to wx 2.4.2.

- Werner
Logged

whomnet

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 17
Re: Download part _X_ is Corrupt -- too often
« Reply #34 on: January 25, 2004, 10:53:03 AM »

werner, do you have any conclussions? I have the same problem, today I will move the "Temp" and "Incoming" directories to a FAT32 partition to se if it mades the change. I think it could be too some syncronization problem, but I don't know how to demonstrate my hipothesis (I only know that, although aMule says the parts are corrupt, the aren't when cheching with DonkeyDoctor).

As you can see (Kry, deltaHF) it is not an isolated problem.

"Seguiremos informando" 8)
Logged

knecht666

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 24
    • http://knecht.homelinux.net
corruptet parts
« Reply #35 on: January 25, 2004, 03:39:19 PM »

I also have about 20-30 corrupted parts in 24h, having compiled wxGTK, xwBase and aMule by hand and running a AMD 770MHZ, SuSE 9 on 2.4.23 Kernel. Fiesystem is ReiserFS.
It it shurly not a single problem . . .

Thanks for making aMule, its greatful

Sebastian :]
Logged
---------------------------------------
fiat veritas pereat mundus

deltaHF

  • Evil Admin
  • Former Developer
  • Hero Member
  • *****
  • Karma: 6
  • Offline Offline
  • Posts: 3920
  • .. Legends may sleep, but they never die ..
    • http://www.amule.org
Re: Download part _X_ is Corrupt -- too often
« Reply #36 on: January 25, 2004, 04:04:53 PM »

well, i finished 7 dl's in last 24h ..

corrupted files i'm getting only by 1 file that isn't finished yet

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
Re: Download part _X_ is Corrupt -- too often
« Reply #37 on: January 25, 2004, 05:38:19 PM »

Quote
Originally posted by whomnet
werner, do you have any conclussions?

Unfortunately not. I now have a fairly long run (19+ h so far,
~45 MB/h, started at UTC 21). The patten has changed, but
I'm not sure how to interpret it. The last entry:

bad 13 good 94 ich 61 other 5383

good/bad is considerably better than it used to be, but also
a lot of "ich" is happening, and I don't know how to factor this
in. Also, weekend traffic may have different properties than
workday traffic.

The good thing is that, if the corruption is also on disk, and one
of the changes I've made (libc upgrade, wx upgrade) has stopped
it, enough parts should have been hashed by tonight that future
test should yield a different good/bad ratio (well, as long as ICH
doesn't keep on adding noise).

Regarding FAT32: this seems very odd to me. About the only thing
that is different is that FAT32 internally doesn't have holes, so a
file always occupies the space indicated by its length. Now, while
it's true that aMule is a heavy user of holes, so are a lot of other
programs, and I'd be rather surprised if there was any kernel bug in
the handling of holes. For user space (libc, application), it's almost
transparent what the kernel does with holes.

- Werner
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
24h trace
« Reply #38 on: January 25, 2004, 10:31:19 PM »

Saturday, 21 UTC to Sunday, ~21 UTC, ca. 46 MB/h, ended with a
segfault (see "Backtraces"), same system as before. Last entry:

bad 16 good 115 ich 61 other 6408

Full trace attached. What's special: lots of ICHs (a lot more than I used
to get), and a significantly better good/bad ratio (which may or may
not be significant).

The general trend seems to be for things to start poorly, and then to
improve. This could be just noise, it could be a normal property of
long-running sessions, it could be a weekend effect, or it could
mean that my files are getting more healthy.

BTW, I'd recommend in any case to upgrade libc if the current
version has the fopen bug. It's very subtle, and can cause weird
problems.

- Werner
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
tiny trace, nevertheless interesting
« Reply #39 on: January 26, 2004, 12:27:52 AM »

From the run that ended in a very early (1-2h) crash:

bad 0 good 1 ich 0 other 55
bad 0 good 2 ich 0 other 91
bad 0 good 3 ich 0 other 118

What's interesting here is that this is the first time since taking these
traces that I had "good" count up without any "bad" hashes. This
would support the "libc" theory.

Things that don't support it:
 - reports that independent (non-Unix :-( ) tools have verified the
   correctness of on-disk files for other people
 - reports that moving to FAT32 made the problems disappear
   instantly

Of course, there's always the possibility that we're seeing the results
of more than one problem.

- Werner
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
one more data point
« Reply #40 on: January 26, 2004, 06:34:54 PM »

9.25h, UTC 8-17:15, 65 MB/h, run ended with segfault:

bad 11 good 64 ich 31 other 3965

Less good than yesterday's 24h run, but apparently better than
the early ones. Lots of ICH, that may or may not skew the numbers.

A common pattern I notice is that, on each run, initially relatively
many corruptions are detected, and that things improve over time.
This is probably because of how aMule selects sources.

- Werner

Code: [Select]
bad 1 good 0 ich 0 other 107
bad 1 good 1 ich 0 other 156
bad 1 good 2 ich 0 other 203
bad 1 good 3 ich 0 other 235
bad 2 good 3 ich 0 other 335
bad 2 good 4 ich 0 other 378
bad 3 good 4 ich 0 other 387
bad 4 good 4 ich 0 other 405
bad 4 good 5 ich 0 other 466
bad 4 good 6 ich 0 other 471
bad 4 good 7 ich 0 other 472
bad 5 good 7 ich 0 other 491
bad 5 good 8 ich 0 other 569
bad 5 good 9 ich 0 other 710
bad 5 good 10 ich 4 other 935
bad 5 good 11 ich 4 other 1059
bad 5 good 12 ich 4 other 1108
bad 5 good 13 ich 4 other 1235
bad 5 good 14 ich 4 other 1301
bad 5 good 15 ich 4 other 1369
bad 5 good 16 ich 4 other 1524
bad 5 good 17 ich 4 other 1530
bad 5 good 18 ich 4 other 1554
bad 5 good 19 ich 4 other 1660
bad 5 good 20 ich 4 other 1699
bad 5 good 21 ich 4 other 1709
bad 5 good 22 ich 4 other 1775
bad 5 good 23 ich 4 other 1814
bad 5 good 24 ich 4 other 1815
bad 5 good 25 ich 4 other 1941
bad 6 good 25 ich 4 other 1960
bad 6 good 26 ich 4 other 2057
bad 6 good 27 ich 4 other 2059
bad 6 good 28 ich 4 other 2076
bad 6 good 29 ich 4 other 2084
bad 6 good 30 ich 5 other 2143
bad 6 good 31 ich 6 other 2151
bad 6 good 32 ich 6 other 2152
bad 6 good 33 ich 10 other 2183
bad 6 good 34 ich 15 other 2453
bad 6 good 35 ich 16 other 2481
bad 6 good 36 ich 17 other 2527
bad 6 good 37 ich 17 other 2548
bad 6 good 38 ich 17 other 2568
bad 6 good 39 ich 19 other 2637
bad 6 good 40 ich 20 other 2674
bad 6 good 41 ich 24 other 2720
bad 7 good 41 ich 25 other 2743
bad 7 good 42 ich 25 other 2789
bad 7 good 43 ich 25 other 2797
bad 7 good 44 ich 25 other 2812
bad 7 good 45 ich 25 other 2852
bad 7 good 46 ich 25 other 3032
bad 7 good 47 ich 25 other 3034
bad 7 good 48 ich 26 other 3180
bad 7 good 49 ich 26 other 3204
bad 8 good 49 ich 26 other 3329
bad 8 good 50 ich 26 other 3376
bad 8 good 51 ich 26 other 3395
bad 9 good 51 ich 26 other 3422
bad 9 good 52 ich 28 other 3517
bad 9 good 53 ich 31 other 3538
bad 9 good 54 ich 31 other 3551
bad 9 good 55 ich 31 other 3563
bad 9 good 56 ich 31 other 3577
bad 9 good 57 ich 31 other 3600
bad 10 good 57 ich 31 other 3615
bad 10 good 58 ich 31 other 3719
bad 10 good 59 ich 31 other 3827
bad 10 good 60 ich 31 other 3852
bad 11 good 60 ich 31 other 3856
bad 11 good 61 ich 31 other 3863
bad 11 good 62 ich 31 other 3923
bad 11 good 63 ich 31 other 3940
bad 11 good 64 ich 31 other 3965

Edited by BigBob to see inline without the mess to download the attachment ...
Logged

whomnet

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 17
Re: Download part _X_ is Corrupt -- too often
« Reply #41 on: January 27, 2004, 02:41:40 PM »

Using werner's data I have made plots of the evolution of the chunks, and get some (important?) notes:

1.- The corruted chunks have a somehow constant rate, the other types don't modify this rate.
2.- If considered "total good" chunks both the good and the ich recovered chunks, it can be seen that the ratio between total goods and bads fluctuates around 4:6, wich is pretty unacceptable.

I am going to patch my amule with the werner's patch and recompile it to check if my system has equal proportions, and also to try finding some (easy) solution...

[PD: edited to attach the resulting figures in zipped pdf format]
« Last Edit: January 27, 2004, 02:43:32 PM by whomnet »
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
more data, again
« Reply #42 on: January 27, 2004, 04:20:47 PM »

Quote
Originally posted by whomnet
2.- If considered "total good" chunks both the good and the ich recovered chunks, it can be seen that the ratio between total goods and bads fluctuates around 4:6, wich is pretty unacceptable.

Seems that you have good and bad reversed. Fortunately, I normally
don't get more bad than good chunks :-)

BTW, Monday night was a good night, UTC 22-5 or 6, ~65 MB/h:

bad 0 good 1 ich 0 other 16
bad 0 good 2 ich 0 other 277
bad 0 good 3 ich 0 other 555
bad 0 good 4 ich 0 other 680
bad 0 good 5 ich 0 other 740
bad 0 good 6 ich 0 other 789
bad 1 good 6 ich 0 other 826
bad 1 good 7 ich 0 other 908
bad 2 good 7 ich 0 other 930
bad 2 good 8 ich 0 other 937
bad 2 good 9 ich 0 other 1039
bad 2 good 10 ich 0 other 1129
bad 2 good 11 ich 0 other 1158
bad 2 good 12 ich 0 other 1187
bad 2 good 13 ich 0 other 1279
bad 2 good 14 ich 0 other 1296
bad 3 good 14 ich 0 other 1392
bad 3 good 15 ich 0 other 1447
bad 3 good 16 ich 0 other 1453
bad 4 good 16 ich 0 other 1458
bad 4 good 17 ich 0 other 1495
bad 4 good 18 ich 0 other 1567
bad 4 good 19 ich 0 other 1580
bad 4 good 20 ich 0 other 1653
bad 5 good 20 ich 3 other 1827
bad 6 good 20 ich 3 other 1837
bad 6 good 21 ich 3 other 1852
bad 6 good 22 ich 3 other 1854
bad 6 good 23 ich 3 other 1960
bad 6 good 24 ich 3 other 2057
bad 6 good 25 ich 3 other 2077
bad 6 good 26 ich 3 other 2136
bad 7 good 26 ich 3 other 2193
bad 7 good 27 ich 3 other 2393
bad 7 good 28 ich 3 other 2496
bad 7 good 29 ich 3 other 2628
bad 7 good 30 ich 3 other 2811
bad 7 good 31 ich 3 other 2853
bad 8 good 31 ich 3 other 2869
bad 8 good 32 ich 3 other 2894
bad 8 good 33 ich 3 other 2923
bad 8 good 34 ich 3 other 2983
bad 8 good 35 ich 3 other 2994
bad 9 good 35 ich 3 other 3017
bad 9 good 36 ich 3 other 3018
bad 9 good 37 ich 3 other 3360
bad 9 good 38 ich 3 other 3433
bad 9 good 39 ich 3 other 3455
bad 9 good 40 ich 3 other 3514
bad 9 good 41 ich 3 other 3554
bad 9 good 42 ich 3 other 3670
bad 9 good 43 ich 3 other 3799
bad 9 good 44 ich 3 other 3850
bad 9 good 45 ich 3 other 3856
bad 9 good 46 ich 3 other 3889
bad 10 good 46 ich 3 other 3933
bad 10 good 47 ich 3 other 3977
bad 10 good 48 ich 3 other 3984
bad 10 good 49 ich 3 other 4061
bad 10 good 50 ich 3 other 4174
bad 10 good 51 ich 3 other 4318
bad 10 good 52 ich 3 other 4353
bad 10 good 53 ich 3 other 4354
bad 10 good 54 ich 3 other 4457
bad 10 good 55 ich 3 other 4463
bad 10 good 56 ich 3 other 4518
bad 10 good 57 ich 3 other 4564
bad 11 good 57 ich 3 other 4617
bad 11 good 58 ich 3 other 4650
bad 12 good 58 ich 3 other 4653
bad 12 good 59 ich 3 other 4656
bad 12 good 60 ich 3 other 4718
bad 12 good 61 ich 3 other 4721
bad 12 good 62 ich 3 other 4740
bad 13 good 62 ich 3 other 4758

And during shutdown:

bad 13 good 63 ich 3 other 4814
bad 13 good 64 ich 3 other 4820
bad 13 good 65 ich 3 other 4826
bad 13 good 66 ich 3 other 4836
bad 14 good 66 ich 3 other 4846
bad 14 good 67 ich 3 other 4849
bad 15 good 67 ich 3 other 4850
bad 15 good 68 ich 3 other 4855
bad 15 good 69 ich 3 other 4869

- Werner
Logged

whomnet

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 17
Re: Download part _X_ is Corrupt -- too often
« Reply #43 on: January 27, 2004, 08:38:11 PM »

:] Oh, yes, I changed the good and bad numbers :S

hehehe, after noticing that, the correct graphics are the ones attached, and the conclussions are more logical  :))

1- As before, the bad chunks follow a linear progression, as the good chunks. However the good blocks follow a steeper progression than the bad ones.

2- When ICH is taken into account, it seems that after some blocks the ICH is able to recover better the blocks (this is logical: I think the blocks were already well downloaded and so they are easy to be recovered). This makes the erroneus blocks to become less important since that time, but they keep at a rate of about  10:1 (10 goods, 1 bad).

I am trying a solution, and will use werner's patch to see what happens to my blocks. Let's see what numbers do I get tomorrow...
Logged

werner

  • Approved Newbie
  • *
  • Karma: 0
  • Offline Offline
  • Posts: 47
Re: Download part _X_ is Corrupt -- too often
« Reply #44 on: January 27, 2004, 09:57:43 PM »

Quote
Originally posted by whomnet
2- When ICH is taken into account,

What confuses me with ICH is that there doesn't seem to be a direct
quantitative correlation between the message that ICH has done
something, and the code executing the ICH path. E.g. I once had
a few dozen (> 30, I think) ICHs counted in rapid succession, but
got only a single message.

So, quite clearly, there's something about ICH I don't understand :-(

- Werner
Logged
Pages: 1 2 [3] 4 5