aMule Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

We're back! (IN POG FORM)

Pages: 1 [2] 3 4 5

Author Topic: aMule SVN 9385 crash on 64bit Debian  (Read 35979 times)

GonoszTopi

  • The current man in charge of most things.
  • Administrator
  • Hero Member
  • *****
  • Karma: 169
  • Offline Offline
  • Posts: 2685
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #15 on: February 14, 2009, 11:29:49 AM »

Nice work. It's good to know that after all it wasn't aMule nor wxWidgets, but a library even deeper...
Logged
concordia cum veritate

btkaos

  • Global Moderator
  • Sr. Member
  • *****
  • Karma: 110
  • Offline Offline
  • Posts: 486
  • Kaos is infinite!
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #16 on: February 14, 2009, 04:55:22 PM »

Indeed it seems like a serious bug in libxcb (mangling a 64bits quantity). Never forget int is 32bit even in amd64.

In 8 hours uptime my amule just went beyond dpy->request number 500.000.000 so I assume I should it the bug in 56 hours. Anyways I have amule running under gdb so I can check teh value of dpy->request at any time.

Unfortunately, for Ubuntu/Debian users, upgrading to libxcb 1.1.93 is not trivial due to packaging changes, either you use experimental/jaunty packages (basically libx11, libxcb, x11proto and xcb-proto) or try to apply the patch by hand.
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #17 on: February 14, 2009, 10:13:17 PM »

That only works when you are in the < 4Gb of RAM range of memory. I have machines with 8Gb and 16Gb of memory, so running 32bits is pointless.
Wrong. Running a 32 bit OS may be pointless with that much memory, but running a 32 bit app isn't. As long as an app doesn't need that much memory (like aMule), it's pointless running it at 64 bit. It only uses more memory that way, and therefore it also runs slower.

Anyways, this is not a 32 vs 64 bits bug, but a threading one.
Just take a look at the crash reports here, and count how many are about 64 bit versions. There's something seriously wrong with the 64 bit builds, causing strange pointer corruptions.

Well, you can do the same and install 32 bit libraries. ... It is, however, not necessary with open source applications and therefore not done.
I'd start suggesting to users with these strange crashes to start using 32 bit builds, as long as we're unable to fix the problems with the 64 bit version.
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

btkaos

  • Global Moderator
  • Sr. Member
  • *****
  • Karma: 110
  • Offline Offline
  • Posts: 486
  • Kaos is infinite!
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #18 on: February 15, 2009, 12:17:40 PM »

That only works when you are in the < 4Gb of RAM range of memory. I have machines with 8Gb and 16Gb of memory, so running 32bits is pointless.
Wrong. Running a 32 bit OS may be pointless with that much memory, but running a 32 bit app isn't. As long as an app doesn't need that much memory (like aMule), it's pointless running it at 64 bit. It only uses more memory that way, and therefore it also runs slower.

This is a hot debated topic, given that in AMD64 your application has a lot of extra registers to use. long int math is cheaper as well.  I'd say the consensus is that the performance is even and sometimes better in 64bit. Regarding aMule it seems to use about 10% more virtual memory for me, almost the same RSS.

Anyways, this is not a 32 vs 64 bits bug, but a threading one.
Just take a look at the crash reports here, and count how many are about 64 bit versions. There's something seriously wrong with the 64 bit builds, causing strange pointer corruptions.
I saw that bugs and I couldn't reproduce them. Right now aMule is working fantastically for me in my 64bit setup (I've to wait for request number 2^32 to really assure that, right now it's going by 1931804051) Keep in mind that if the xcb patch solves this issue a lot of crashes seen here are thankfully solved.

[Let me digress here a bit, the fact that aMule is doing such a number of request is highly suspicious, for instance XDefineCursor is being called 10 times a second from toolbar code. WTF wx!. I'll have a look at some profiling information, but I cannot promise anything]

I'm sorry I was mistaken at first with this one. Mismatch of request number in async X connection should logically be caused by a threading bug (the application starting a request before the previous one ended)

The probability of the bug being in such a vital library was IMHO really small, given the time such a library has been in production. In this case the bug was in libxcb, I didn't expect that, and fortunately it's fixed in new versions.
I'd start suggesting to users with these strange crashes to start using 32 bit builds, as long as we're unable to fix the problems with the 64 bit version.

Of course a 64bits environment is a less mature one than a 64bits one (and 64bit users should be aware of the problems they will have). But what you suggest is a bad idea for two reasons:
  • Debian based distributions have bad support for multi-arch. Anyways, you end having a lot of duplicated libraries. As wuischke said, there's no reason to duplicate libraries that are open source, if they are well written they should run in 32, 64 and 128 bits. Just think of all the 32bits libraries needed to run aMule.
  • Bugs happening in 64 bits are bugs nevertheless. I don't know why are you unable to fix the problems. Maybe they are not bitting you in 32bits now. The fact you ignore it, the bug won't go away. Maybe in two years time when 16Gb setups are common 32bits will be the less supported arch.

I'd say quite the oppossite, users who are bothered by 64bits should use a 32bits distribution, not run 32bits binaries in a native 64 bits enviroments. If they have >4Gb of RAM they may use a 64bits kernel. That was my previous setup, but then I ran into the case some of my applications wanted to profit from 16Gb of RAM, so the upgrade was neeeded.

Other option is just use a chroot.

Of course, if your distribution has multi-arch setup you have none of these problems.
Logged

wuischke

  • Developer
  • Hero Member
  • *****
  • Karma: 183
  • Offline Offline
  • Posts: 4292
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #19 on: February 15, 2009, 12:35:10 PM »

Quote
[Let me digress here a bit, the fact that aMule is doing such a number of request is highly suspicious, for instance XDefineCursor is being called 10 times a second from toolbar code. WTF wx!. I'll have a look at some profiling information, but I cannot promise anything]
It would be very kind of you to have a further look at that. We could really use more users like you, thanks a lot for your work!

P.S. You got yourself another honorable mentioning in the Changelog. Your third now. We really owe you a lot.
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #20 on: February 15, 2009, 01:02:12 PM »

Bugs happening in 64 bits are bugs nevertheless. I don't know why are you unable to fix the problems.
Because we're no gods (except for Kry of course). We know our code well and can fix anything that's directly wrong with it. However, if a certain pointer variable gets accessed in 20 places, 3 of them are write accesses (all clean), and backtrace shows some bits of it suddenly toggle, then we're stumped. If I then replace some of the read accesses, and suddenly the problem goes away, what should be the conclusion? That on 64 bit pointers can be destroyed by read access ?!? See here.

I'm suspecting bugs in the compiler or in the kernel at an issue like that, maybe registers not being restored correctly on task change or something. Also, pure statistics tells me that not everybody is involved in the problem. Iirc we had two true crash bugs caused by ourselves in our code in 2008. In both cases we got at least 10 independent crash reports in a few days (people never read before posting  ::) ) .
If you start distrusting your platform you can stop working with it right away. You never know if a problem is caused by yourself or by something you have no influence on. And with 64 bit Linux we are at this point now. The X problem is not the only 64 bit issue you know.
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

wuischke

  • Developer
  • Hero Member
  • *****
  • Karma: 183
  • Offline Offline
  • Posts: 4292
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #21 on: February 15, 2009, 01:39:26 PM »

Stu: I've been using 64 bit Linux over the course of 2 years and I can't confirm any kernel or compiler bugs. My user experience has literally been the same.

I'm not sure, but I think 40bit (1TB) is the processor limit for memory addresses. I can't tell if there's such a limitation in the linux kernel, too.

Anyway: Is this easily reproducible without requiring a lot of traffic? Do we have native a win64 version? (There's mingw64, but I've never used it.) I could test it on 64 bit Linux and BSD and on 64 bit Windows (Server 2003/Vista/Server 2008), if you want me to.
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #22 on: February 15, 2009, 04:12:17 PM »

Well, for some people it appears to be reproducible easily enough (see the permanent crash reports), but not for all as I tried to point out argueing statistically. I don't know if there's  a common pattern here (distro, compiler). Ubuntu sticks out, but that may simply be because many people are using it. (I tried to reproduce the "Ubuntu broken cryptopp" issue one day btw and couldn't, so I don't know what that's about either.)

I have never tried to compile aMule for win64 since I have no win64 at hand to run it on (or compile it on). And I don't see much of a point in it. It's not a 64 bit issue really, it's just an issue appearing only on 64bit Linux.

What about your distro agnostic builds? Could we tell people with problems to run them instead and see if the problem appears there still?
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

wuischke

  • Developer
  • Hero Member
  • *****
  • Karma: 183
  • Offline Offline
  • Posts: 4292
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #23 on: February 15, 2009, 04:49:29 PM »

Crypto++ uses assembler and seems to be sensitive to the cpu used, although I haven't checked the assembler source code to verify that. (I know x86 assembler only partly.) Exaggerated compiler optimization seems to break some things, too.

Well, we know that it only happens on 64 bit. The symptom is the 40th bit of the tray icon pointer set to one. It seems to be related to receiving a message. The number of cores doesn't seem to be important.

I'll try the message sending the day after tomorrow (If I don't forget to do so.) and see if I can reproduce it. If yes, the fun part involving various compilers, standard libraries and kernels can begin...

Do you have other suggestions for testing?

Edit: Re: distro-agnostic: They are only 32 bit currently and I seem to have misplaced my compiling environment. But I'll start the download again.
« Last Edit: February 15, 2009, 04:58:38 PM by wuischke »
Logged

wires

  • Jr. Member
  • **
  • Karma: 6
  • Offline Offline
  • Posts: 83
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #24 on: February 16, 2009, 08:50:38 AM »

I offer myself for testing patches. I use a Fedora 9 64 bits single core AMD64 CPU. Since Stu put that boolean state variable the original tray window crash disappeared but I've seen some other fishy pointers on the forum so It seems to me that the problem was "relocated".
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #25 on: February 17, 2009, 12:05:02 AM »

That is the other foul apple that keeps turning up. It's a corrupted list, and nobody knows how it gets corrupted.
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

btkaos

  • Global Moderator
  • Sr. Member
  • *****
  • Karma: 110
  • Offline Offline
  • Posts: 486
  • Kaos is infinite!
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #26 on: February 17, 2009, 02:33:50 AM »

That is the other foul apple that keeps turning up. It's a corrupted list, and nobody knows how it gets corrupted.
Sorry STU, this bug is in libxcb, already fixed in its latest release.
Logged

wuischke

  • Developer
  • Hero Member
  • *****
  • Karma: 183
  • Offline Offline
  • Posts: 4292
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #27 on: February 18, 2009, 03:21:48 PM »

I could not reproduce any crashes with aMule 2.2.3 on a 64 bit C2D running Arch64. (libxcb 1.1.93, crypto++ 5.5.2, wxGtk 2.8.9 on a XFCE to have a panel.)

(Edit: I tried sending messages, changing upload/download speed from tray icon while up-and downloading over the course of about 4 hours)
« Last Edit: February 18, 2009, 03:23:26 PM by wuischke »
Logged

Stu Redman

  • Administrator
  • Hero Member
  • *****
  • Karma: 214
  • Offline Offline
  • Posts: 3739
  • Engines screaming
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #28 on: February 18, 2009, 09:32:33 PM »

Sorry STU, this bug is in libxcb, already fixed in its latest release.
I linked the wrong thread - I meant that one  :-[.
Logged
The image of mother goddess, lying dormant in the eyes of the dead, the sheaf of the corn is broken, end the harvest, throw the dead on the pyre -- Iron Maiden, Isle of Avalon

wires

  • Jr. Member
  • **
  • Karma: 6
  • Offline Offline
  • Posts: 83
Re: aMule SVN 9385 crash on 64bit Debian
« Reply #29 on: February 18, 2009, 11:05:17 PM »

I could not reproduce any crashes with aMule 2.2.3 on a 64 bit C2D running Arch64. (libxcb 1.1.93, crypto++ 5.5.2, wxGtk 2.8.9 on a XFCE to have a panel.)

(Edit: I tried sending messages, changing upload/download speed from tray icon while up-and downloading over the course of about 4 hours)

You should use an older version like 2.2.2 (any prior to Stu patch). The moment it receives a message it crashes. Now I don't have a debug install for this version but I've got this backtrace generated just as I'm writing sending a chat message from amule SVN 9437 to 2.2.2:
Code: [Select]
--------------------------------------------------------------------------------
A fatal error has occurred and aMule has crashed.
Please assist us in fixing this problem by posting the backtrace below in our
'aMule Crashes' forum and include as much information as possible regarding the
circumstances of this crash. The forum is located here:
    http://forum.amule.org/index.php?board=67.0
If possible, please try to generate a real backtrace of this crash:
    http://www.amule.org/wiki/index.php/Backtraces

----------------------------=| BACKTRACE FOLLOWS: |=----------------------------
Current version is: aMule 2.2.2 using wxGTK2 v2.8.9
Running on: Linux 2.6.27.12-78.2.8.fc9.x86_64 x86_64

[2] wxString::~wxString() in amule [0x44658f]
[3] wxFatalSignalHandler in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceafa01c]
[4] ?? in /lib64/libpthread.so.0 [0x35fa40ed30]
[5] wxColour::wxColour(unsigned char, unsigned char, unsigned char, unsigned char) in amule [0x56d110]
[6] wxDataObjectSimple::~wxDataObjectSimple() in amule [0x518428]
[7] wxDataObjectSimple::~wxDataObjectSimple() in amule [0x518afe]
[8] wxEvtHandler::ProcessEventIfMatches(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceaf5989]
[9] wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceaf6b64]
[10] wxEvtHandler::ProcessEvent(wxEvent&) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceaf6c57]
[11] wxTimerBase::Notify() in /usr/lib64/libwx_gtk2u_core-2.8.so.0[0x3fe82e8da6]
[12] ?? in /usr/lib64/libwx_gtk2u_core-2.8.so.0 [0x3fe81eedcb]
[13] ?? in /lib64/libglib-2.0.so.0 [0x375f437beb]
[14] g_main_context_dispatch in /lib64/libglib-2.0.so.0[0x375f43742b]
[15] ?? in /lib64/libglib-2.0.so.0 [0x375f43ac0d]
[16] g_main_loop_run in /lib64/libglib-2.0.so.0[0x375f43b13d]
[17] gtk_main in /usr/lib64/libgtk-x11-2.0.so.0[0x3fe7983db0]
[18] wxEventLoop::Run() in /usr/lib64/libwx_gtk2u_core-2.8.so.0[0x3fe81e6718]
[19] wxAppBase::MainLoop() in /usr/lib64/libwx_gtk2u_core-2.8.so.0[0x3fe826fa6b]
[20] wxEntry(int&, wchar_t**) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dcea99b9d]
[21] std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) in amule [0x5136f2]
[22] __libc_start_main in /lib64/libc.so.6[0x35f981e32a]
[23] ?? in amule [0x445599]

Let me know if I can help ok?
Logged
Pages: 1 [2] 3 4 5