aMule Forum

English => Backtraces => Topic started by: Corwin on January 21, 2009, 06:22:35 PM

Title: aMule SVN 9385 crash on 64bit Debian
Post by: Corwin on January 21, 2009, 06:22:35 PM
amule: ../../src/xcb_lock.c:33: _XCBUnlockDisplay: Assertion `xcb_get_request_sent(dpy->xcb->connection) == dpy->request' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7f6ec73e86f0 (LWP 17870)]
0x00007f6ec4572ed5 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x00007f6ec4572ed5 in raise () from /lib/libc.so.6
#1  0x00007f6ec45743f3 in abort () from /lib/libc.so.6
#2  0x00007f6ec456bdc9 in __assert_fail () from /lib/libc.so.6
#3  0x00007f6ec19c13c7 in ?? () from /usr/lib/libX11.so.6
#4  0x00007f6ec19971de in XDefineCursor () from /usr/lib/libX11.so.6
#5  0x00007f6ec3b2ead0 in gdk_window_set_cursor () from /usr/lib/libgdk-x11-2.0.so.0
#6  0x00007f6ec57dc119 in wxWindow::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#7  0x00007f6ec5852c2d in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#8  0x00007f6ec5852c64 in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#9  0x00007f6ec5852ec4 in wxAppBase::ProcessIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#10 0x00007f6ec57b33b6 in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#11 0x00007f6ec297078b in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#12 0x00007f6ec2973f5d in ?? () from /usr/lib/libglib-2.0.so.0
#13 0x00007f6ec297448d in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#14 0x00007f6ec3eb9737 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#15 0x00007f6ec57ca798 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#16 0x00007f6ec5852cfb in wxAppBase::MainLoop () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#17 0x00007f6ec50d6cbd in wxEntry () from /usr/lib/libwx_baseu-2.8.so.0
#18 0x000000000068a5f3 in main (argc=1, argv=0x7fffcf52f9d8) at amule-gui.cpp:94
(gdb) bt full
#0  0x00007f6ec4572ed5 in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x00007f6ec45743f3 in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x00007f6ec456bdc9 in __assert_fail () from /lib/libc.so.6
No symbol table info available.
#3  0x00007f6ec19c13c7 in ?? () from /usr/lib/libX11.so.6
No symbol table info available.
#4  0x00007f6ec19971de in XDefineCursor () from /usr/lib/libX11.so.6
No symbol table info available.
#5  0x00007f6ec3b2ead0 in gdk_window_set_cursor () from /usr/lib/libgdk-x11-2.0.so.0
No symbol table info available.
#6  0x00007f6ec57dc119 in wxWindow::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#7  0x00007f6ec5852c2d in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#8  0x00007f6ec5852c64 in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#9  0x00007f6ec5852ec4 in wxAppBase::ProcessIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#10 0x00007f6ec57b33b6 in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#11 0x00007f6ec297078b in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
No symbol table info available.
#12 0x00007f6ec2973f5d in ?? () from /usr/lib/libglib-2.0.so.0
No symbol table info available.
#13 0x00007f6ec297448d in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
No symbol table info available.
#14 0x00007f6ec3eb9737 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
No symbol table info available.
#15 0x00007f6ec57ca798 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#16 0x00007f6ec5852cfb in wxAppBase::MainLoop () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#17 0x00007f6ec50d6cbd in wxEntry () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#18 0x000000000068a5f3 in main (argc=1, argv=0x7fffcf52f9d8) at amule-gui.cpp:94
No locals.
(gdb) thread apply all bt

Thread 4 (Thread 0x4323d950 (LWP 17885)):
#0  0x00007f6ec7102fad in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00007f6ec512dde9 in wxConditionInternal::WaitTimeout () from /usr/lib/libwx_baseu-2.8.so.0
#2  0x00007f6ec512f172 in wxSemaphoreInternal::WaitTimeout () from /usr/lib/libwx_baseu-2.8.so.0
#3  0x00000000007d8ed1 in CTimerThread::Entry (this=0x205d5f0) at Timer.cpp:64
#4  0x00007f6ec512f35a in wxThreadInternal::PthreadStart () from /usr/lib/libwx_baseu-2.8.so.0
#5  0x00007f6ec70fefc7 in start_thread () from /lib/libpthread.so.0
#6  0x00007f6ec46105ad in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x4223b950 (LWP 17878)):
#0  0x00007f6ec71060e1 in nanosleep () from /lib/libpthread.so.0
#1  0x00007f6ec5134e6c in wxMicroSleep () from /usr/lib/libwx_baseu-2.8.so.0
#2  0x00000000005d70de in UploadBandwidthThrottler::Entry (this=0x2950400) at UploadBandwidthThrottler.cpp:320
#3  0x00007f6ec512f35a in wxThreadInternal::PthreadStart () from /usr/lib/libwx_baseu-2.8.so.0
#4  0x00007f6ec70fefc7 in start_thread () from /lib/libpthread.so.0
#5  0x00007f6ec46105ad in clone () from /lib/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f6ec73e86f0 (LWP 17870)):
#0  0x00007f6ec4572ed5 in raise () from /lib/libc.so.6
#1  0x00007f6ec45743f3 in abort () from /lib/libc.so.6
#2  0x00007f6ec456bdc9 in __assert_fail () from /lib/libc.so.6
#3  0x00007f6ec19c13c7 in ?? () from /usr/lib/libX11.so.6
#4  0x00007f6ec19971de in XDefineCursor () from /usr/lib/libX11.so.6
#5  0x00007f6ec3b2ead0 in gdk_window_set_cursor () from /usr/lib/libgdk-x11-2.0.so.0
#6  0x00007f6ec57dc119 in wxWindow::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#7  0x00007f6ec5852c2d in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#8  0x00007f6ec5852c64 in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#9  0x00007f6ec5852ec4 in wxAppBase::ProcessIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#10 0x00007f6ec57b33b6 in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#11 0x00007f6ec297078b in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#12 0x00007f6ec2973f5d in ?? () from /usr/lib/libglib-2.0.so.0
#13 0x00007f6ec297448d in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#14 0x00007f6ec3eb9737 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#15 0x00007f6ec57ca798 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#16 0x00007f6ec5852cfb in wxAppBase::MainLoop () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#17 0x00007f6ec50d6cbd in wxEntry () from /usr/lib/libwx_baseu-2.8.so.0
#18 0x000000000068a5f3 in main (argc=1, argv=0x7fffcf52f9d8) at amule-gui.cpp:94
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on January 21, 2009, 07:26:48 PM
Again the xcb_lock.c:33 and the cursor...

Please do me a favour and report this bug also the debian developers. We've got the same crash for Ubuntu (ref (http://www.amule.org/amule/index.php?topic=16458.0)) and I begin to suspect this might somehow be related to the distribution gtk2 or xcb packages. (When you google this error you'll find a couple of them in the Ubuntu bug tracker).

Would you be so kind to include the link to the bug tracker entry?
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Corwin on January 24, 2009, 12:30:41 PM
Probably has more to do with user base, Ubuntu has more users than any other Linux distro.

The first Google result I pulled up was Red Hat:
https://bugzilla.redhat.com/show_bug.cgi?id=478689

Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 11, 2009, 05:03:23 PM
I recently  moved to 64bits and I hit the bug.

Ummm, it seems WXGTK may not be taking some necessary precautions.

Could you try the attached patch? I'm testing it, but as the crash happens about 3 days uptime I will take a while to see how it works.


Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 11, 2009, 08:16:28 PM
I don't think we call GUI functions from threads. At least I hope so...
There's something fishy like that in the part file importer iirc, but ordinary usage should be fine. So I don't expect it helps.

Quote from: http://tronche.com/gui/x/xlib/display/XInitThreads.html
It is only necessary to call this function if multiple threads might use Xlib concurrently.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 12, 2009, 12:55:50 AM
Stu, note the crash happens in kinda "automatic" code (like changing the cursor). What is more, it crashes after several hours without user input. This smells like  threading bug to me.

By the way, similar crashes have been fixed that way. My backtrace gives 99% possibility for this to be an application (or WX) bug.

Of course the patch is not ready to go as is into aMule, but if it works it should give us a good idea on what is going on.

As a completely newbie to aMule code, I'm sorry I didn't have time to study the threading setting and supply a more in-depth analysis.

As the patch is completely harmless, I'd suggest everyone with the crash to test it.

I've been unable to get more than 3 days uptime since my move to 64bits, so in 4 days we'll have a good idea on how effective is the patch for me.

[I don't want to troll, but I find the gtk+wx combination to be a quite buggy setting (two bugs for me in a month), as least as compared with something rock-solid like QT Code quality of gtk+wx is not optimal as well]
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 12, 2009, 09:03:12 PM
My backtrace gives 99% possibility for this to be an application (or WX) bug.
Which backtrace ? And how did you calculate that probability?


[I don't want to troll, but I find the gtk+wx combination to be a quite buggy setting (two bugs for me in a month), as least as compared with something rock-solid like QT Code quality of gtk+wx is not optimal as well]
That's not trolling.
I'd guess once you ported the whole thing to QT you'd find it has bugs/problems/limitations too.
Anyway, we're stuck with wx. a Mule without wx wouldn't be aMule, but an entirely different project.

BtW: I don't know much about 64 bit Linux, except that it sucks and always makes strange pointer corruptions and crashes.  >:(
(That's a better attempt at trolling.  ;))
With 64 bit Windows you still can run 32 bit apps. What about Linux ? Does 32 bit aMule run on 64 bit Linux ? It doesn't take 4Gb of Memory you know. If it runs I'd just drop the 64 bit builds.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on February 12, 2009, 09:50:17 PM
Microsoft (or Apple) use duplicate libraries (both in 32 bit/64 bit or intel/ppc) for backwards compatilibity with older software.

Well, you can do the same and install 32 bit libraries. If you install all necessary libraries in 32 bit, you can use a 32 bit aMule on a 64 bit Linux. Some distributions do this to get skype, zattoo or another closed source 32 bit-only software to work. It is, however, not necessary with open source applications and therefore not done.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 12, 2009, 10:13:42 PM
Oh, I'm sorry STU, I was quite careless in my debugging session and lost the backtrace. I'm really sorry about my newbie error of not saving the backtrace before proceed to actually debugging the application. So this is why I didn't post my backtrace.

However, the backtrace itself showed that Xlib was being called from two different threads. The codepath both in aMule, GTK+ and WX were perfect. Every variable was right, the bug lied in the use of Xlib.

I must admit I said 99% of probability meaning "it is really likely the bug is a multithreading bug", look, no user action, just some race condition and bang!

aMule is not the only app affected, For instance see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513829 (In particular, note how the bug is not distro-specific, as Fedora is also affected, and how the XInitThreads() call must be placed *before* gtk_init() )

I said 99% percent because with the superficial debugging I did there's no way to prove it. However I'm 28 hours uptime and aMule is now rock-solid with patch.

Regards, Billkaos
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 12, 2009, 10:16:11 PM
By the way, use of gdk_threads_init() is not needed as aMule indeed is using GTK+ right, only from one thread. If two or more threads were to interact with gtk+, this call must be added.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 13, 2009, 01:08:22 AM
BtW: I don't know much about 64 bit Linux, except that it sucks and always makes strange pointer corruptions and crashes.  >:(
(That's a better attempt at trolling.  ;))
With 64 bit Windows you still can run 32 bit apps. What about Linux ? Does 32 bit aMule run on 64 bit Linux ? It doesn't take 4Gb of Memory you know. If it runs I'd just drop the 64 bit builds.
That only works when you are in the < 4Gb of RAM range of memory. I have machines with 8Gb and 16Gb of memory, so running 32bits is pointless.

Anyways, this is not a 32 vs 64 bits bug, but a threading one.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on February 13, 2009, 12:51:47 PM
However, the backtrace itself showed that Xlib was being called from two different threads. The codepath both in aMule, GTK+ and WX were perfect. Every variable was right, the bug lied in the use of Xlib.

If you ever can reproduce the bug, please think of us and post a backtrace. I'm really interested in it (maybe we can do something against it, or it might reveal other bugs).

Ummm, it seems WXGTK may not be taking some necessary precautions.

I'd say the patch should better be sent to the wxGTK developers (http://trac.wxwidgets.org/wiki). I mean it'd be better to patch wxGTK.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 14, 2009, 03:42:48 AM
If you ever can reproduce the bug, please think of us and post a backtrace. I'm really interested in it (maybe we can do something against it, or it might reveal other bugs).
It seems I miscalculated the probability and I'm still clueless about what is going on.

I got:
Code: [Select]
Thread 1 (Thread 0x7f00414a7780 (LWP 2455)):
#0  0x00007f003e92d015 in raise () from /lib/libc.so.6
#1  0x00007f003e92eb83 in abort () from /lib/libc.so.6
#2  0x00007f003e925d89 in __assert_fail () from /lib/libc.so.6
#3  0x00007f003e63b867 in _XCBUnlockDisplay (dpy=0x2c66a00) at ../../src/xcb_lock.c:33
#4  0x00007f003e61124e in XDefineCursor (dpy=0x2c66a00, w=65012340, cursor=65011719)
    at ../../src/DefCursor.c:47
#5  0x00007f003dbcc178 in gdk_window_x11_set_cursor (window=0x5753dc0, cursor=0x4ff5640)
    at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkwindow-x11.c:2912
#6  0x00007f003fe2acf5 in wxToolBar::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#7  0x00007f003fe09413 in wxFrame::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#8  0x00007f003fe35e7d in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#9  0x00007f003fe36114 in wxAppBase::ProcessIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#10 0x00007f003fd956b4 in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#11 0x00007f003c0a2d3b in IA__g_main_context_dispatch (context=0x2c43950)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2144
#12 0x00007f003c0a650d in g_main_context_iterate (context=0x2c43950, block=1, dispatch=1,
    self=<value optimized out>) at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2778
#13 0x00007f003c0a6a3d in IA__g_main_loop_run (loop=0x2cd3020)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2986
#14 0x00007f003df33727 in IA__gtk_main () at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1200
#15 0x00007f003fdacd18 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#16 0x00007f003fe35f4b in wxAppBase::MainLoop () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#17 0x00007f003f6d073d in wxEntry () from /usr/lib/libwx_baseu-2.8.so.0
#18 0x0000000000524a52 in main ()

We can inspect frame #3, and print the values involved in the assertion:
Code: [Select]
(gdb) print dpy->xcb->connection
$24 = (xcb_connection_t *) 0x2c67440
(gdb) print dpy->request
$25 = 4294967297
$25 looks suspicious, as it is 2^32. It seems an off-by-one bug. However I'm not expert in 64-bit debugging, so I may be missing something.

What is really curious is frame #6, which calls gdk_windows_set_cursor() from an idle handler:
Code: [Select]
void wxToolBar::OnInternalIdle()
{
    // Check if we have to show window now
    if (GtkShowFromOnIdle()) return;
   
    wxCursor cursor = m_cursor;
    if (g_globalCursor.Ok()) cursor = g_globalCursor;

    if (cursor.Ok())
    {
        /* I now set the cursor the anew in every OnInternalIdle call
           as setting the cursor in a parent window also effects the
           windows above so that checking for the current cursor is
           not possible. */

        if (HasFlag(wxTB_DOCKABLE) && (m_widget->window))
        {
            /* if the toolbar is dockable, then m_widget stands for the
               GtkHandleBox widget, which uses its own window so that we
               can set the cursor for it. if the toolbar is not dockable,
               m_widget comes from m_toolbar which uses its parent's
               window ("windowless windows") and thus we cannot set the
               cursor. */
            gdk_window_set_cursor( m_widget->window, cursor.GetCursor() );
        }
    // BOOM!!

Quoting http://library.gnome.org/devel/gdk/stable/gdk-Threads.html

Quote
Idles, timeouts, and input functions from GLib, such as g_idle_add(), are executed outside of the main GTK+ lock. So, if you need to call GTK+ inside of such a callback, you must surround the callback with a gdk_threads_enter()/gdk_threads_leave() pair or use gdk_threads_add_idle_full() which does this for you. However, event dispatching from the mainloop is still executed within the main GTK+ lock, so callback functions connected to event signals like GtkWidget::button-press-event, do not need thread protection.

I wonder if it is related
Title: Full backtrace, not really useful
Post by: btkaos on February 14, 2009, 03:46:32 AM
Code: [Select]
(gdb) thread apply all bt full

Thread 4 (Thread 0x40945950 (LWP 2467)):
#0  0x00007f00410b055d in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
No symbol table info available.
#1  0x00007f003f728699 in wxConditionInternal::WaitTimeout () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#2  0x00007f003f729a22 in wxSemaphoreInternal::WaitTimeout () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#3  0x0000000000604ed8 in CTimerThread::Entry ()
No locals.
#4  0x00007f003f729c0a in wxThreadInternal::PthreadStart () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#5  0x00007f00410ac3ea in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#6  0x00007f003e9e0cbd in clone () from /lib/libc.so.6
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 0x41df6950 (LWP 2465)):
#0  0x00007f00410b3851 in nanosleep () from /lib/libpthread.so.0
No symbol table info available.
#1  0x00007f003f72f8bc in wxMicroSleep () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#2  0x00000000004f5032 in UploadBandwidthThrottler::Entry ()
No locals.
#3  0x00007f003f729c0a in wxThreadInternal::PthreadStart () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#4  0x00007f00410ac3ea in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#5  0x00007f003e9e0cbd in clone () from /lib/libc.so.6
No symbol table info available.
#6  0x0000000000000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 0x7f00414a7780 (LWP 2455)):
#0  0x00007f003e92d015 in raise () from /lib/libc.so.6
No symbol table info available.
#1  0x00007f003e92eb83 in abort () from /lib/libc.so.6
No symbol table info available.
#2  0x00007f003e925d89 in __assert_fail () from /lib/libc.so.6
No symbol table info available.
#3  0x00007f003e63b867 in _XCBUnlockDisplay (dpy=0x2c66a00) at ../../src/xcb_lock.c:33
__PRETTY_FUNCTION__ = "_XCBUnlockDisplay"
#4  0x00007f003e61124e in XDefineCursor (dpy=0x2c66a00, w=65012340, cursor=65011719)
    at ../../src/DefCursor.c:47
No locals.
#5  0x00007f003dbcc178 in gdk_window_x11_set_cursor (window=0x5753dc0, cursor=0x4ff5640)
    at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkwindow-x11.c:2912
impl = (GdkWindowImplX11 *) 0x5753e60
xcursor = 6
#6  0x00007f003fe2acf5 in wxToolBar::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#7  0x00007f003fe09413 in wxFrame::OnInternalIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#8  0x00007f003fe35e7d in wxAppBase::SendIdleEvents () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#9  0x00007f003fe36114 in wxAppBase::ProcessIdle () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#10 0x00007f003fd956b4 in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#11 0x00007f003c0a2d3b in IA__g_main_context_dispatch (context=0x2c43950)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2144
No locals.
#12 0x00007f003c0a650d in g_main_context_iterate (context=0x2c43950, block=1, dispatch=1,
    self=<value optimized out>) at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2778
max_priority = 300
timeout = 0
some_ready = 1
nfds = 200
allocated_nfds = <value optimized out>
fds = (GPollFD *) 0xbe8a5e0
__PRETTY_FUNCTION__ = "g_main_context_iterate"
#13 0x00007f003c0a6a3d in IA__g_main_loop_run (loop=0x2cd3020)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2986
self = (GThread *) 0x2c44cc0
__PRETTY_FUNCTION__ = "IA__g_main_loop_run"
#14 0x00007f003df33727 in IA__gtk_main () at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1200
tmp_list = (GList *) 0x0
functions = (GList *) 0x0
init = (GtkInitFunction *) 0x5a28600
loop = <value optimized out>
#15 0x00007f003fdacd18 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#16 0x00007f003fe35f4b in wxAppBase::MainLoop () from /usr/lib/libwx_gtk2u_core-2.8.so.0
No symbol table info available.
#17 0x00007f003f6d073d in wxEntry () from /usr/lib/libwx_baseu-2.8.so.0
No symbol table info available.
#18 0x0000000000524a52 in main ()
No locals.
(gdb)
[I need to fix wx debugging information]
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 14, 2009, 04:14:34 AM
I made a mistake.

dpy->request is the sequence number, and I didn't print it right in gdb:

Quote
(gdb) print xcb_get_request_sent(dpy->xcb->connection)
$32 = 1
(gdb) print dpy->request
$33 = 4294967297

See?, the display sequence number is 2^32+1 (4294967297), but xcb mangled it to 32 bits, so an overflow occurred, which in fact triggers the bug.

So the bug seems in xcb and this commit may fix it:

http://cgit.freedesktop.org/xcb/libxcb/commit/?id=baff35a04b0e8d21821850a405a550d86a8aeb6f

I'm testing the fixed libxcb right now. As it is an overflow it explains the long time to reach it, just when amule reaches sequence number 4294967296 (wow!) It also explains why other applications where hitting the bug earlier: They used sequence numbers more quickly, i.e. are more X11 intensive.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on February 14, 2009, 11:29:49 AM
Nice work. It's good to know that after all it wasn't aMule nor wxWidgets, but a library even deeper...
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 14, 2009, 04:55:22 PM
Indeed it seems like a serious bug in libxcb (mangling a 64bits quantity). Never forget int is 32bit even in amd64.

In 8 hours uptime my amule just went beyond dpy->request number 500.000.000 so I assume I should it the bug in 56 hours. Anyways I have amule running under gdb so I can check teh value of dpy->request at any time.

Unfortunately, for Ubuntu/Debian users, upgrading to libxcb 1.1.93 is not trivial due to packaging changes, either you use experimental/jaunty packages (basically libx11, libxcb, x11proto and xcb-proto) or try to apply the patch by hand.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 14, 2009, 10:13:17 PM
That only works when you are in the < 4Gb of RAM range of memory. I have machines with 8Gb and 16Gb of memory, so running 32bits is pointless.
Wrong. Running a 32 bit OS may be pointless with that much memory, but running a 32 bit app isn't. As long as an app doesn't need that much memory (like aMule), it's pointless running it at 64 bit. It only uses more memory that way, and therefore it also runs slower.

Anyways, this is not a 32 vs 64 bits bug, but a threading one.
Just take a look at the crash reports here, and count how many are about 64 bit versions. There's something seriously wrong with the 64 bit builds, causing strange pointer corruptions.

Well, you can do the same and install 32 bit libraries. ... It is, however, not necessary with open source applications and therefore not done.
I'd start suggesting to users with these strange crashes to start using 32 bit builds, as long as we're unable to fix the problems with the 64 bit version.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 15, 2009, 12:17:40 PM
That only works when you are in the < 4Gb of RAM range of memory. I have machines with 8Gb and 16Gb of memory, so running 32bits is pointless.
Wrong. Running a 32 bit OS may be pointless with that much memory, but running a 32 bit app isn't. As long as an app doesn't need that much memory (like aMule), it's pointless running it at 64 bit. It only uses more memory that way, and therefore it also runs slower.

This is a hot debated topic, given that in AMD64 your application has a lot of extra registers to use. long int math is cheaper as well.  I'd say the consensus is that the performance is even and sometimes better in 64bit. Regarding aMule it seems to use about 10% more virtual memory for me, almost the same RSS.

Anyways, this is not a 32 vs 64 bits bug, but a threading one.
Just take a look at the crash reports here, and count how many are about 64 bit versions. There's something seriously wrong with the 64 bit builds, causing strange pointer corruptions.
I saw that bugs and I couldn't reproduce them. Right now aMule is working fantastically for me in my 64bit setup (I've to wait for request number 2^32 to really assure that, right now it's going by 1931804051) Keep in mind that if the xcb patch solves this issue a lot of crashes seen here are thankfully solved.

[Let me digress here a bit, the fact that aMule is doing such a number of request is highly suspicious, for instance XDefineCursor is being called 10 times a second from toolbar code. WTF wx!. I'll have a look at some profiling information, but I cannot promise anything]

I'm sorry I was mistaken at first with this one. Mismatch of request number in async X connection should logically be caused by a threading bug (the application starting a request before the previous one ended)

The probability of the bug being in such a vital library was IMHO really small, given the time such a library has been in production. In this case the bug was in libxcb, I didn't expect that, and fortunately it's fixed in new versions.
I'd start suggesting to users with these strange crashes to start using 32 bit builds, as long as we're unable to fix the problems with the 64 bit version.

Of course a 64bits environment is a less mature one than a 64bits one (and 64bit users should be aware of the problems they will have). But what you suggest is a bad idea for two reasons:

I'd say quite the oppossite, users who are bothered by 64bits should use a 32bits distribution, not run 32bits binaries in a native 64 bits enviroments. If they have >4Gb of RAM they may use a 64bits kernel. That was my previous setup, but then I ran into the case some of my applications wanted to profit from 16Gb of RAM, so the upgrade was neeeded.

Other option is just use a chroot.

Of course, if your distribution has multi-arch setup you have none of these problems.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on February 15, 2009, 12:35:10 PM
Quote
[Let me digress here a bit, the fact that aMule is doing such a number of request is highly suspicious, for instance XDefineCursor is being called 10 times a second from toolbar code. WTF wx!. I'll have a look at some profiling information, but I cannot promise anything]
It would be very kind of you to have a further look at that. We could really use more users like you, thanks a lot for your work!

P.S. You got yourself another honorable mentioning in the Changelog. Your third now. We really owe you a lot.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 15, 2009, 01:02:12 PM
Bugs happening in 64 bits are bugs nevertheless. I don't know why are you unable to fix the problems.
Because we're no gods (except for Kry of course). We know our code well and can fix anything that's directly wrong with it. However, if a certain pointer variable gets accessed in 20 places, 3 of them are write accesses (all clean), and backtrace shows some bits of it suddenly toggle, then we're stumped. If I then replace some of the read accesses, and suddenly the problem goes away, what should be the conclusion? That on 64 bit pointers can be destroyed by read access ?!? See here (http://www.amule.org/amule/index.php?topic=16214.0).

I'm suspecting bugs in the compiler or in the kernel at an issue like that, maybe registers not being restored correctly on task change or something. Also, pure statistics tells me that not everybody is involved in the problem. Iirc we had two true crash bugs caused by ourselves in our code in 2008. In both cases we got at least 10 independent crash reports in a few days (people never read before posting  ::) ) .
If you start distrusting your platform you can stop working with it right away. You never know if a problem is caused by yourself or by something you have no influence on. And with 64 bit Linux we are at this point now. The X problem is not the only 64 bit issue you know.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on February 15, 2009, 01:39:26 PM
Stu: I've been using 64 bit Linux over the course of 2 years and I can't confirm any kernel or compiler bugs. My user experience has literally been the same.

I'm not sure, but I think 40bit (1TB) is the processor limit for memory addresses. I can't tell if there's such a limitation in the linux kernel, too.

Anyway: Is this easily reproducible without requiring a lot of traffic? Do we have native a win64 version? (There's mingw64, but I've never used it.) I could test it on 64 bit Linux and BSD and on 64 bit Windows (Server 2003/Vista/Server 2008), if you want me to.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 15, 2009, 04:12:17 PM
Well, for some people it appears to be reproducible easily enough (see the permanent crash reports), but not for all as I tried to point out argueing statistically. I don't know if there's  a common pattern here (distro, compiler). Ubuntu sticks out, but that may simply be because many people are using it. (I tried to reproduce the "Ubuntu broken cryptopp" issue one day btw and couldn't, so I don't know what that's about either.)

I have never tried to compile aMule for win64 since I have no win64 at hand to run it on (or compile it on). And I don't see much of a point in it. It's not a 64 bit issue really, it's just an issue appearing only on 64bit Linux.

What about your distro agnostic builds? Could we tell people with problems to run them instead and see if the problem appears there still?
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on February 15, 2009, 04:49:29 PM
Crypto++ uses assembler and seems to be sensitive to the cpu used, although I haven't checked the assembler source code to verify that. (I know x86 assembler only partly.) Exaggerated compiler optimization seems to break some things, too.

Well, we know that it only happens on 64 bit. The symptom is the 40th bit of the tray icon pointer set to one. It seems to be related to receiving a message. The number of cores doesn't seem to be important.

I'll try the message sending the day after tomorrow (If I don't forget to do so.) and see if I can reproduce it. If yes, the fun part involving various compilers, standard libraries and kernels can begin...

Do you have other suggestions for testing?

Edit: Re: distro-agnostic: They are only 32 bit currently and I seem to have misplaced my compiling environment. But I'll start the download again.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 16, 2009, 08:50:38 AM
I offer myself for testing patches. I use a Fedora 9 64 bits single core AMD64 CPU. Since Stu put that boolean state variable the original tray window crash disappeared but I've seen some other fishy pointers on the forum so It seems to me that the problem was "relocated".
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 17, 2009, 12:05:02 AM
That (http://www.amule.org/amule/index.php?topic=16497.0) is the other foul apple that keeps turning up. It's a corrupted list, and nobody knows how it gets corrupted.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 17, 2009, 02:33:50 AM
That (http://www.amule.org/amule/index.php?topic=16497.0) is the other foul apple that keeps turning up. It's a corrupted list, and nobody knows how it gets corrupted.
Sorry STU, this bug is in libxcb, already fixed in its latest release.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on February 18, 2009, 03:21:48 PM
I could not reproduce any crashes with aMule 2.2.3 on a 64 bit C2D running Arch64. (libxcb 1.1.93, crypto++ 5.5.2, wxGtk 2.8.9 on a XFCE to have a panel.)

(Edit: I tried sending messages, changing upload/download speed from tray icon while up-and downloading over the course of about 4 hours)
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 18, 2009, 09:32:33 PM
Sorry STU, this bug is in libxcb, already fixed in its latest release.
I linked the wrong thread - I meant that one (http://www.amule.org/amule/index.php?topic=16596.0)  :-[.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 18, 2009, 11:05:17 PM
I could not reproduce any crashes with aMule 2.2.3 on a 64 bit C2D running Arch64. (libxcb 1.1.93, crypto++ 5.5.2, wxGtk 2.8.9 on a XFCE to have a panel.)

(Edit: I tried sending messages, changing upload/download speed from tray icon while up-and downloading over the course of about 4 hours)

You should use an older version like 2.2.2 (any prior to Stu patch). The moment it receives a message it crashes. Now I don't have a debug install for this version but I've got this backtrace generated just as I'm writing sending a chat message from amule SVN 9437 to 2.2.2:
Code: [Select]
--------------------------------------------------------------------------------
A fatal error has occurred and aMule has crashed.
Please assist us in fixing this problem by posting the backtrace below in our
'aMule Crashes' forum and include as much information as possible regarding the
circumstances of this crash. The forum is located here:
    http://forum.amule.org/index.php?board=67.0
If possible, please try to generate a real backtrace of this crash:
    http://www.amule.org/wiki/index.php/Backtraces

----------------------------=| BACKTRACE FOLLOWS: |=----------------------------
Current version is: aMule 2.2.2 using wxGTK2 v2.8.9
Running on: Linux 2.6.27.12-78.2.8.fc9.x86_64 x86_64

[2] wxString::~wxString() in amule [0x44658f]
[3] wxFatalSignalHandler in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceafa01c]
[4] ?? in /lib64/libpthread.so.0 [0x35fa40ed30]
[5] wxColour::wxColour(unsigned char, unsigned char, unsigned char, unsigned char) in amule [0x56d110]
[6] wxDataObjectSimple::~wxDataObjectSimple() in amule [0x518428]
[7] wxDataObjectSimple::~wxDataObjectSimple() in amule [0x518afe]
[8] wxEvtHandler::ProcessEventIfMatches(wxEventTableEntryBase const&, wxEvtHandler*, wxEvent&) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceaf5989]
[9] wxEventHashTable::HandleEvent(wxEvent&, wxEvtHandler*) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceaf6b64]
[10] wxEvtHandler::ProcessEvent(wxEvent&) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dceaf6c57]
[11] wxTimerBase::Notify() in /usr/lib64/libwx_gtk2u_core-2.8.so.0[0x3fe82e8da6]
[12] ?? in /usr/lib64/libwx_gtk2u_core-2.8.so.0 [0x3fe81eedcb]
[13] ?? in /lib64/libglib-2.0.so.0 [0x375f437beb]
[14] g_main_context_dispatch in /lib64/libglib-2.0.so.0[0x375f43742b]
[15] ?? in /lib64/libglib-2.0.so.0 [0x375f43ac0d]
[16] g_main_loop_run in /lib64/libglib-2.0.so.0[0x375f43b13d]
[17] gtk_main in /usr/lib64/libgtk-x11-2.0.so.0[0x3fe7983db0]
[18] wxEventLoop::Run() in /usr/lib64/libwx_gtk2u_core-2.8.so.0[0x3fe81e6718]
[19] wxAppBase::MainLoop() in /usr/lib64/libwx_gtk2u_core-2.8.so.0[0x3fe826fa6b]
[20] wxEntry(int&, wchar_t**) in /usr/lib64/libwx_baseu-2.8.so.0[0x3dcea99b9d]
[21] std::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) in amule [0x5136f2]
[22] __libc_start_main in /lib64/libc.so.6[0x35f981e32a]
[23] ?? in amule [0x445599]

Let me know if I can help ok?
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on February 19, 2009, 09:33:19 AM
Ah, thank you. It should be easily reproducible then, I hope.

You might have a look at http://www.amule.org/wiki/index.php/Backtraces to create us more extensive backtraces. This will make it easier to understand what's happening.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 20, 2009, 12:29:42 AM
There are some backtraces here http://www.amule.org/amule/index.php?topic=16214.0 (http://www.amule.org/amule/index.php?topic=16214.0) and you can find a new one attached

Taken from aMule 2.2.2 compiled with --with-wxdebug

Regards
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 20, 2009, 01:34:54 AM
Ok I've got new data. I've included 2 uint64 fences (same size as void*) around m_wndTaskbarNotifier in amuleDlg.h (aMule 2.2.2)

Code: [Select]
m_fence1( 0L ),
m_wndTaskbarNotifier(NULL),
m_fence2 ( 0L ),

and then added a check on them in ShowTransferRate
Code: [Select]
        if (m_fence1 || m_fence2) {
                AddLogLine(true , CFormat( _("BROKEN FENCE: %x,%x" )) % m_fence1 % m_fence2);
        }
        wxASSERT((m_wndTaskbarNotifier != NULL) == thePrefs::UseTrayIcon());
        if (m_wndTaskbarNotifier) {
                // set trayicon-icon
                int percentDown = (int)ceil((kBpsDown*100) / thePrefs::GetMaxGraphDownloadRate());
                UpdateTrayIcon( ( percentDown > 100 ) ? 100 : percentDown);

                wxString buffer2;
                if ( theApp->IsConnected() ) {
                        buffer2 = CFormat(_("aMule (%s | Connected)")) % buffer;
                } else {
                        buffer2 = CFormat(_("aMule (%s | Disconnected)")) % buffer;
                }
                m_wndTaskbarNotifier->SetTrayToolTip(buffer2);
        }

I was really surprised as I was unable to repeat the crash for a while, finally I selected the transfers window and send another message, and this is the result:
Code: [Select]
2009-02-20 01:12:07: BaseClient.cpp(2140): ED2k Client: 'http://emule-project.net' has passed the secure identification, V2 State: 0
2009-02-20 01:12:08: ClientTCPSocket.cpp(803): New message from 'wires' (IP:192.168.1.33)
2009-02-20 01:12:08: BROKEN FENCE: 0,10000000000

m_fence2 has that nasty 0x10000000000, so is it possible that CamuleDlg::SetActiveDialog is causing the crash?
grep SetActiveDialog *cpp
amuleDlg.cpp:   SetActiveDialog(DT_TRANSFER_WND, m_transferwnd);
amuleDlg.cpp:void CamuleDlg::SetActiveDialog(DialogType type, wxWindow* dlg)
amuleDlg.cpp:               SetActiveDialog(DT_NETWORKS_WND, m_serverwnd);
amuleDlg.cpp:               SetActiveDialog(DT_SEARCH_WND, m_searchwnd);
amuleDlg.cpp:               SetActiveDialog(DT_TRANSFER_WND, m_transferwnd);
amuleDlg.cpp:               SetActiveDialog(DT_SHARED_WND, m_sharedfileswnd);
amuleDlg.cpp:               SetActiveDialog(DT_CHAT_WND, m_chatwnd);
amuleDlg.cpp:               SetActiveDialog(DT_STATS_WND, m_statisticswnd);
ChatWnd.cpp:         theApp->amuledlg->SetActiveDialog(CamuleDlg::DT_CHAT_WND, this);

ChatWnd has the only invocation of this function outside amuleDlg.

Regards
Title: Re: aMule SVN 9385 crash on 64bit Debian A
Post by: btkaos on February 20, 2009, 03:04:44 AM
Be careful,  it seems there are more 64 bit related bugs in libxcb/libx11. See the following gem in _XSend (xcb_io.c):

Code: [Select]
        if(dpy->xcb->event_owner != XlibOwnsEventQueue || dpy->async_handlers)
        {
                unsigned int sequence;
                for(sequence = dpy->xcb->last_flushed; sequence < dpy->request; ++sequence)
                {

Here we go again. dpy->request is declared as long (8 bytes in amd64) but int is 32. Oh, so sequence will never stop as dpy->request is always bigger.

This one causes aMule to eat all the available memory! See the funny backtrace (aMule interrupted while it's busy)
Code: [Select]
#0  0x00007fcd3d650f22 in _int_malloc () from /lib/libc.so.6
#1  0x00007fcd3d652658 in malloc () from /lib/libc.so.6
#2  0x00007fcd3d316d3a in _XSend (dpy=0x1f67c00, data=0x0, size=0) at ../../src/xcb_io.c:306
#3  0x00007fcd3d316f81 in _XReply (dpy=0x0, rep=0x7fff481b4990, extra=0, discard=0)
    at ../../src/xcb_io.c:450
#4  0x00007fcd3d2f4246 in XGetWindowProperty (dpy=0x1f67c00, w=80782988, property=254, offset=0,
    length=9223372036854775807, delete=0, req_type=4, actual_type=0x7fff481b4a98,
    actual_format=0x7fff481b4aa0, nitems=0x7fff481b4a90, bytesafter=0x7fff481b4a88, prop=0x7fff481b4a80)
    at ../../src/GetProp.c:64
#5  0x00007fcd3c89088d in gdk_event_translate (display=0x1f77060, event=0x5565550, xevent=0x7fff481b4d30,
    return_exposes=0) at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkevents-x11.c:533
#6  0x00007fcd3c890c47 in _gdk_events_queue (display=0x1f77060)
    at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkevents-x11.c:2299
#7  0x00007fcd3c89106e in gdk_event_dispatch (source=<value optimized out>,
    callback=0x7fcd3d941a80 <main_arena+128>, user_data=0x0)
    at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkevents-x11.c:2359
#8  0x00007fcd3ad7dd3b in IA__g_main_context_dispatch (context=0x1f44950)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2144
#9  0x00007fcd3ad8150d in g_main_context_iterate (context=0x1f44950, block=1, dispatch=1,
    self=<value optimized out>) at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2778
#10 0x00007fcd3ad81a3d in IA__g_main_loop_run (loop=0x82a5040)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2986
#11 0x00007fcd3cc0e727 in IA__gtk_main () at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1200
#12 0x00007fcd3ea86d18 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#13 0x00007fcd3eadbe15 in wxDialog::ShowModal () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#14 0x00000000005627eb in CDownloadListCtrl::OnViewFileInfo ()
#15 0x00007fcd3e404ca9 in wxEvtHandler::ProcessEventIfMatches () from /usr/lib/libwx_baseu-2.8.so.0
#16 0x00007fcd3e405e84 in wxEventHashTable::HandleEvent () from /usr/lib/libwx_baseu-2.8.so.0
#17 0x00007fcd3e405f77 in wxEvtHandler::ProcessEvent () from /usr/lib/libwx_baseu-2.8.so.0
#18 0x00007fcd3eb8e639 in wxWindowBase::TryParent () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#19 0x00007fcd3e405f00 in wxEvtHandler::ProcessEvent () from /usr/lib/libwx_baseu-2.8.so.0
#20 0x00007fcd3ebbc2e5 in wxScrollHelperEvtHandler::ProcessEvent () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#21 0x00007fcd3eb68640 in wxMenuBase::SendEvent () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#22 0x00007fcd3eaf6623 in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#23 0x00007fcd3b62825d in IA__g_closure_invoke (closure=0x7b11e70, return_value=0x0, n_param_values=1,
    param_values=0xb14d0a0, invocation_hint=0x7fff481b57d0)
    at /build/buildd/glib2.0-2.18.2/gobject/gclosure.c:767
#24 0x00007fcd3b63df5d in signal_emit_unlocked_R (node=0x444d710, detail=0, instance=0xa9b4970,
    emission_return=0x0, instance_and_params=0xb14d0a0)
    at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:3244
#25 0x00007fcd3b63f608 in IA__g_signal_emit_valist (instance=0xa9b4970, signal_id=<value optimized out>,
    detail=0, var_args=0x7fff481b59b0) at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:2977
#26 0x00007fcd3b63fb33 in IA__g_signal_emit (instance=0x0, signal_id=1033116288, detail=0)
    at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:3034
#27 0x00007fcd3cd1cfab in IA__gtk_widget_activate (widget=0xa9b4970)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkwidget.c:4776
#28 0x00007fcd3cc2196d in IA__gtk_menu_shell_activate_item (menu_shell=0x7c5fa80, menu_item=0xa9b4970,
    force_deactivate=<value optimized out>) at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmenushell.c:1139
#29 0x00007fcd3cc233b5 in gtk_menu_shell_button_release (widget=0x7c5fa80, event=0x5565400)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmenushell.c:678
#30 0x00007fcd3cc14888 in _gtk_marshal_BOOLEAN__BOXED (closure=0x43f52a0, return_value=0x7fff481b5ce0,
    n_param_values=<value optimized out>, param_values=0x74810a0, invocation_hint=<value optimized out>,
    marshal_data=0x7fcd3cc1a590) at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmarshalers.c:84
#31 0x00007fcd3b62825d in IA__g_closure_invoke (closure=0x43f52a0, return_value=0x7fff481b5ce0,
    n_param_values=2, param_values=0x74810a0, invocation_hint=0x7fff481b5ca0)
    at /build/buildd/glib2.0-2.18.2/gobject/gclosure.c:767
#32 0x00007fcd3b63dc3b in signal_emit_unlocked_R (node=0x43f5310, detail=0, instance=0x7c5fa80,
    emission_return=0x7fff481b5e20, instance_and_params=0x74810a0)
    at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:3282
#33 0x00007fcd3b63f48a in IA__g_signal_emit_valist (instance=0x7c5fa80, signal_id=<value optimized out>,
    detail=0, var_args=0x7fff481b5e80) at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:2987
---Type <return> to continue, or q <return> to quit---
#34 0x00007fcd3b63fb33 in IA__g_signal_emit (instance=0x0, signal_id=1033116288, detail=0)
    at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:3034
#35 0x00007fcd3cd176be in gtk_widget_event_internal (widget=0x7c5fa80, event=0x5565400)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkwidget.c:4745
#36 0x00007fcd3cc0d1f3 in IA__gtk_propagate_event (widget=0x7c5fa80, event=0x5565400)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:2391
#37 0x00007fcd3cc0e313 in IA__gtk_main_do_event (event=0x5565400)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1596
#38 0x00007fcd3c89109c in gdk_event_dispatch (source=<value optimized out>, callback=<value optimized out>,
    user_data=<value optimized out>) at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkevents-x11.c:2365
#39 0x00007fcd3ad7dd3b in IA__g_main_context_dispatch (context=0x1f44950)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2144
#40 0x00007fcd3ad8150d in g_main_context_iterate (context=0x1f44950, block=1, dispatch=1,
    self=<value optimized out>) at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2778
#41 0x00007fcd3ad816cb in IA__g_main_context_iteration (context=0x1f44950, may_block=1)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2841
#42 0x00007fcd3cc0e5d1 in IA__gtk_main_iteration () at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1288
#43 0x00007fcd3eaf52fd in wxWindow::DoPopupMenu () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#44 0x000000000056d023 in CDownloadListCtrl::OnMouseRightClick ()
#45 0x00007fcd3e404ca9 in wxEvtHandler::ProcessEventIfMatches () from /usr/lib/libwx_baseu-2.8.so.0
#46 0x00007fcd3e405e84 in wxEventHashTable::HandleEvent () from /usr/lib/libwx_baseu-2.8.so.0
#47 0x00007fcd3e405f77 in wxEvtHandler::ProcessEvent () from /usr/lib/libwx_baseu-2.8.so.0
#48 0x0000000000637192 in MuleExtern::wxListMainWindow::SendNotify ()
#49 0x000000000063a883 in MuleExtern::wxListMainWindow::OnMouse ()
#50 0x00007fcd3e404ca9 in wxEvtHandler::ProcessEventIfMatches () from /usr/lib/libwx_baseu-2.8.so.0
#51 0x00007fcd3e405e84 in wxEventHashTable::HandleEvent () from /usr/lib/libwx_baseu-2.8.so.0
#52 0x00007fcd3e405f77 in wxEvtHandler::ProcessEvent () from /usr/lib/libwx_baseu-2.8.so.0
#53 0x00007fcd3e405f00 in wxEvtHandler::ProcessEvent () from /usr/lib/libwx_baseu-2.8.so.0
#54 0x00007fcd3ebbc2e5 in wxScrollHelperEvtHandler::ProcessEvent () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#55 0x00007fcd3ea99d1f in ?? () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#56 0x00007fcd3cc14888 in _gtk_marshal_BOOLEAN__BOXED (closure=0x4a62190, return_value=0x7fff481b6fe0,
    n_param_values=<value optimized out>, param_values=0x4931c90, invocation_hint=<value optimized out>,
    marshal_data=0x7fcd3ea99ba0) at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmarshalers.c:84
#57 0x00007fcd3b62825d in IA__g_closure_invoke (closure=0x4a62190, return_value=0x7fff481b6fe0,
    n_param_values=2, param_values=0x4931c90, invocation_hint=0x7fff481b6fa0)
    at /build/buildd/glib2.0-2.18.2/gobject/gclosure.c:767
#58 0x00007fcd3b63df5d in signal_emit_unlocked_R (node=0x43f4e80, detail=0, instance=0x49b64c0,
    emission_return=0x7fff481b7120, instance_and_params=0x4931c90)
    at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:3244
#59 0x00007fcd3b63f48a in IA__g_signal_emit_valist (instance=0x49b64c0, signal_id=<value optimized out>,
    detail=0, var_args=0x7fff481b7180) at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:2987
#60 0x00007fcd3b63fb33 in IA__g_signal_emit (instance=0x0, signal_id=1033116288, detail=0)
    at /build/buildd/glib2.0-2.18.2/gobject/gsignal.c:3034
#61 0x00007fcd3cd176be in gtk_widget_event_internal (widget=0x49b64c0, event=0x554c410)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkwidget.c:4745
#62 0x00007fcd3cc0d1f3 in IA__gtk_propagate_event (widget=0x49b64c0, event=0x554c410)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:2391
#63 0x00007fcd3cc0e313 in IA__gtk_main_do_event (event=0x554c410)
    at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1596
#64 0x00007fcd3c89109c in gdk_event_dispatch (source=<value optimized out>, callback=<value optimized out>,
    user_data=<value optimized out>) at /build/buildd/gtk+2.0-2.14.4/gdk/x11/gdkevents-x11.c:2365
#65 0x00007fcd3ad7dd3b in IA__g_main_context_dispatch (context=0x1f44950)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2144
#66 0x00007fcd3ad8150d in g_main_context_iterate (context=0x1f44950, block=1, dispatch=1,
    self=<value optimized out>) at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2778
#67 0x00007fcd3ad81a3d in IA__g_main_loop_run (loop=0x73b53a0)
    at /build/buildd/glib2.0-2.18.2/glib/gmain.c:2986
#68 0x00007fcd3cc0e727 in IA__gtk_main () at /build/buildd/gtk+2.0-2.14.4/gtk/gtkmain.c:1200
#69 0x00007fcd3ea86d18 in wxEventLoop::Run () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#70 0x00007fcd3eb0ff4b in wxAppBase::MainLoop () from /usr/lib/libwx_gtk2u_core-2.8.so.0
#71 0x00007fcd3e3aa73d in wxEntry () from /usr/lib/libwx_baseu-2.8.so.0
---Type <return> to continue, or q <return> to quit---
#72 0x0000000000524a52 in main ()

aMule is using some memory:
Code: [Select]
USER       PID     %CPU %MEM    VSZ            RSS            TTY      STAT START   TIME COMMAND
btkaos    21059 23.7      44.3    7847760 6739124 pts/1 Tl   Feb18 699:06 /usr/local/bin/amule

[This is 7Gb]

As far as I know it is not fixed yet, see http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/xcb_io.c#n276

It seems getting > 4 days uptime in AMD64 is challenging at the moment. I've patched libx11, let's see what pops up next.

I couldn't imagine X developers where so careless about this kind of issues. More when GCC warns about this concrete error. It seems aMule being so X intensive has a good effect :)
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 21, 2009, 05:05:30 PM
Ok I've got new data.
This is very interesting.
Could you repeat the experiment and this time initialize the fences with 0xaaaaaaaaaaaaaaaa ? I'd like to see which part exactly gets overwritten.
I'm thinking about different alignment used in different modules, but SetActiveDialog is also defined in amuledlg.cpp so that should not be a possible explanation.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 23, 2009, 12:17:33 AM
Hi Stu, here you are

Code: [Select]
2009-02-22 23:58:04: BROKEN FENCE: aaaaaaaaaaaaaaaa,aaaaaaaaaaaaaaaa
2009-02-22 23:58:09: ClientTCPSocket.cpp(179): ED2k Client: Accepted connection from 192.168.1.33
2009-02-22 23:58:09: BaseClient.cpp(985): Local Client Protocol: Local Client: OP_HELLOANSWER to 192.168.1.33
2009-02-22 23:58:09: BaseClient.cpp(799): Local Client Protocol: Local Client: OP_EMULEINFO/OS_INFO to 192.168.1.33
2009-02-22 23:58:09: BaseClient.cpp(2173): Local Client Protocol: Local Client: OP_SECIDENTSTATE to 192.168.1.33
2009-02-22 23:58:09: BaseClient.cpp(2005): Local Client Protocol: Local Client: OP_PUBLICKEY to 192.168.1.33
2009-02-22 23:58:09: ClientTCPSocket.cpp(803): New message from 'wires' (IP:192.168.1.33)
2009-02-22 23:58:09: BaseClient.cpp(2067): Local Client Protocol: Local Client: OP_SIGNATURE to 192.168.1.33
2009-02-22 23:58:09: BaseClient.cpp(2140): ED2k Client: 'wires' has passed the secure identification, V2 State: 0
2009-02-22 23:58:09: BROKEN FENCE: aaaaaaaaaaaaaaaa,aaaa01aaaaaaaaaa
2009-02-22 23:58:15: BROKEN FENCE: aaaaaaaaaaaaaaaa,aaaa01aaaaaaaaaa
2009-02-22 23:58:20: BROKEN FENCE: aaaaaaaaaaaaaaaa,aaaa01aaaaaaaaaa
2009-02-22 23:58:25: BROKEN FENCE: aaaaaaaaaaaaaaaa,aaaa01aaaaaaaaaa

Just 1 byte gets overwritten... It makes no sense to me  :(
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 23, 2009, 11:32:12 PM
Well, you're the first who brought a little light to the issue. Can't tell if it's enough, but that's certainly more information than we got from the umpty backtraces before. Thank you!
To get a little more light: change your fence check to
Code: [Select]
        if (m_fence1 != m_fence2) {
                AddLogLine(true , CFormat( _("BROKEN FENCE xxx: %x,%x" )) % m_fence1 % m_fence2);
        }
and sprinkle it over the code, changing the xxx to something different each time so you can identify when it occurs. Esspecially before and after the
Code: [Select]
m_nActiveDialog = type;
in SetActiveDialog.
I'd also like to see the output of
Code: [Select]
CFormat(wxT("sb %d sD %d %x %x %x %x %x"))  %  sizeof(bool) % sizeof(DialogType)
% (uint64) & m_wndTaskbarNotifier
% (uint64) & m_fence2
% (uint64) & m_nActiveDialog
% (uint64) & m_is_safe_state
% (uint64) & m_BlinkMessages
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 26, 2009, 01:23:49 AM
The addresses and sizes you requested:
Code: [Select]
2009-02-24 21:47:37: sb 1 sD 4 &taskBar 7f91049df5e8 &fence2 7f91049df5f0 &activeDiag 7f91049df5f8 &is_safe_state 7f91049df5fc
&blink 7f91049df5fd

This will darken the issue... Once I put the fences check outside ShowTransferRate the behaviour was like this:

Code: [Select]
2009-02-24 21:47:38: KnownFileList.cpp(94): KnownFileList: Reading 0 known files from file format 0x0e.
2009-02-24 21:47:38: KnownFileList.cpp(106): KnownFileList: Finished reading known files
2009-02-24 21:47:38: ClientCreditsList.cpp(168): Creditfile loaded, 0 clients are known
2009-02-24 21:47:38: IPFilter.cpp(109): Loading IP-filters 'ipfilter.dat' and 'ipfilter_static.dat'.
2009-02-24 21:47:38: IPFilter.cpp(333): Loaded 0 IP-ranges from '/home/xxxxxx/.aMule/ipfilter.dat'. 0 malformed lines were discarded.
2009-02-24 21:47:38: IPFilter.cpp(333): Loaded 0 IP-ranges from '/home/xxxxxx/.aMule/ipfilter_static.dat'. 0 malformed lines were discarded.
2009-02-24 21:47:38: ExternalConn.cpp(169): External connections disabled in config file
2009-02-24 21:47:38: MuleUDPSocket.cpp(81): Created Server UDP-Socket at port 4665
2009-02-24 21:47:38: MuleUDPSocket.cpp(81): Created Client UDP-Socket at port 4672
2009-02-24 21:47:38: amuleDlg.cpp(213):
2009-02-24 21:47:38: amuleDlg.cpp(215):  - This is aMule 2.2.2 using wxGTK2 v2.8.9 (Debugging) based on eMule.
2009-02-24 21:47:38: amuleDlg.cpp(217):    Running on Linux 2.6.27.15-78.2.23.fc9.x86_64 x86_64
2009-02-24 21:47:38: amuleDlg.cpp(219):  - Visit http://www.amule.org to check if a new version is available.
2009-02-24 21:47:38: amuleDlg.cpp(220):
2009-02-24 21:47:38: IP2Country.cpp(104): Loaded 248 flag bitmaps.
2009-02-24 21:47:38: ServerList.cpp(83): Loading server.met file: /home/invitado/.aMule/server.met
2009-02-24 21:47:38: ServerList.cpp(168): 7 servers in server.met found
2009-02-24 21:47:38: DownloadQueue.cpp(169): No part files found
2009-02-24 21:47:38: SharedFileList.cpp(352): Found 0 known shared files
2009-02-24 21:47:38: ThreadScheduler.cpp(116): ThreadScheduler: Scheduler created.
2009-02-24 21:47:38: ThreadScheduler.cpp(229): ThreadScheduler: Task scheduled: AICH Syncronizing -
2009-02-24 21:47:38: ThreadScheduler.cpp(79): ThreadScheduler: Starting scheduler
2009-02-24 21:47:38: ThreadScheduler.cpp(161): ThreadScheduler: Scheduler thread started
2009-02-24 21:47:38: ThreadScheduler.cpp(264): ThreadScheduler: Entering scheduling loop
2009-02-24 21:47:38: ThreadScheduler.cpp(274): ThreadScheduler: Resorting tasks
2009-02-24 21:47:38: ThreadScheduler.cpp(288): ThreadScheduler: Current task: AICH Syncronizing -
2009-02-24 21:47:38: ThreadTasks.cpp(265): AICH-Hasher: Syncronization thread started.
2009-02-24 21:47:38: ThreadTasks.cpp(309): AICH-Hasher: Masterhashes of known files have been loaded.
2009-02-24 21:47:38: ThreadScheduler.cpp(308): ThreadScheduler: Completed task 'AICH Syncronizing', 0 tasks remaining.
2009-02-24 21:47:38: ThreadScheduler.cpp(324): ThreadScheduler: Last task, calling OnLastTask
2009-02-24 21:47:38: ThreadScheduler.cpp(278): ThreadScheduler: No more tasks, stopping
2009-02-24 21:47:38: ThreadScheduler.cpp(329): ThreadScheduler: Leaving scheduling loop
2009-02-24 21:47:38: amule.cpp(1884): General: Running: 2.2.2, Version check: 2.2.3
2009-02-24 21:47:38: amule.cpp(1905): You are using an outdated version of aMule!
2009-02-24 21:47:38: amule.cpp(1906): Your aMule version is 2.2.2 and the latest version is 2.2.3
2009-02-24 21:47:38: amule.cpp(1907): The latest version can always be found at http://www.amule.org
2009-02-24 21:47:38: ClientList.cpp(1060): ED2k Client: Cleaned ClientList, removed 0 not used known clients
2009-02-24 21:47:42: BROKEN FENCE ShowTransferRate: 0,aaaaaaaaaaaaaaaa
2009-02-24 21:47:47: BROKEN FENCE ShowTransferRate: 0,aaaaaaaaaaaaaaaa
I've run the test with the check commented out in many places and always got like that so It has to be scaring the bug  :). Activating the check just inside ShowTransferRate, fence2 becomes aaaa01aaaaaaaaaa and fence1 stays aaaa....

My last test was patching r9450 to revert the m_TrayIcon and...... It worked! :o If you can publish a revert patch I would like to test it to confirm that it actually works without the boolean state member. I think that working on a 2.2.2 source isn't so useful.

Regards

Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 26, 2009, 09:44:32 PM
So an enum is 32 bit on a 64 bit compiler. Interesting.

So now you got fence1 set to all zero ?  ??? This is making no sense at all. You sure you initialized fence1 to aaaaaaaa ?

Here's the patch you requested.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 26, 2009, 09:49:22 PM
You sure you initialized fence1 to aaaaaaaa ?

 ;D that's exactly the first thing I checked!

I'll try the patch and post the results.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 27, 2009, 01:46:42 AM
I didn't like what I got so I applied the revert patch to a clean directory to start again (2.2.2).
This may look better:

Code: [Select]
2009-02-27 01:21:38: ClientTCPSocket.cpp(795): Remote Client Protocol: Remote Client: OP_MESSAGE from 192.168.1.33
2009-02-27 01:21:38: ClientTCPSocket.cpp(805): New message from 'wires' (IP:192.168.1.33)
2009-02-27 01:21:38: DEBUG FENCE CChatWnd.ProcessMessage 2: aaaaaaaaaaaaaaaa,aaaa01aaaaaaaaaa
2009-02-27 01:21:38: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 85, size 77 received from 192.168.1.33

The modified version of ChatWnd.cpp
Code: [Select]
void CChatWnd::ProcessMessage(uint64 sender, const wxString& message)
{
        theApp->amuledlg->DebugFences(wxT("CChatWnd.ProcessMessage 1"));
        if ( !theApp->amuledlg->IsDialogVisible(CamuleDlg::DT_CHAT_WND) ) {
                theApp->amuledlg->SetMessageBlink(true);
        }
        theApp->amuledlg->DebugFences(wxT("CChatWnd.ProcessMessage 2"));
        if (chatselector->ProcessMessage(sender, message)) {
                // Check to enable the window controls if needed
                CheckNewButtonsState();
        }
        theApp->amuledlg->DebugFences(wxT("CChatWnd.ProcessMessage 3"));
}

Seeing this, I've realized that the chat icon does not blink after receiving a message. Both IsDialogVisible and SetMessageBlink are quite simple so I'm lost again.

Also, the failure happens only when the active window is not the chat window
Code: [Select]
2009-02-27 01:29:49: ClientTCPSocket.cpp(795): Remote Client Protocol: Remote Client: OP_MESSAGE from 192.168.1.33
2009-02-27 01:29:49: ClientTCPSocket.cpp(805): New message from 'wires' (IP:192.168.1.33)
2009-02-27 01:29:50: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 85, size 77 received from 192.168.1.33
2009-02-27 01:29:50: ClientTCPSocket.cpp(1371): Remote Client Protocol: Remote Client: OP_PUBLICKEY from 192.168.1.33
2009-02-27 01:29:50: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 86, size 49 received from 192.168.1.33
2009-02-27 01:29:50: ClientTCPSocket.cpp(1387): Remote Client Protocol: Remote Client: OP_SIGNATURE from 192.168.1.33
2009-02-27 01:29:50: BaseClient.cpp(2140): ED2k Client: 'wires' has passed the secure identification, V2 State: 0
2009-02-27 01:30:07: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol e3, opcode 4e, size 4 received from 192.168.1.33
2009-02-27 01:30:07: ClientTCPSocket.cpp(795): Remote Client Protocol: Remote Client: OP_MESSAGE from 192.168.1.33
2009-02-27 01:30:07: ClientTCPSocket.cpp(805): New message from 'wires' (IP:192.168.1.33)
2009-02-27 01:30:07: DEBUG FENCE CChatWnd.ProcessMessage 2: aaaaaaaaaaaaaaaa,aaaa01aaaaaaaaaa


Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 27, 2009, 01:14:18 PM
So an enum is 32 bit on a 64 bit compiler. Interesting.
STU, I'm not sure what you mean. In ANSI C an enum is just an integer (32 bits in both i386 and AMD64)

However, In C++ behavior is sightly different, in the sense that an enum is "promoted" to an integer, but going from an integer to an enum needs an explicit cast.

MSVC allows to declare the type of enums, but AFAICT this is not portable.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on February 27, 2009, 11:24:56 PM
I didn't like what I got so I applied the revert patch to a clean directory to start again (2.2.2).
Which version did you use? You can't apply the revert-patch to 2.2.2 since the change is only in 2.2.3 and SVN.

So an enum is 32 bit on a 64 bit compiler. Interesting.
STU, I'm not sure what you mean. In ANSI C an enum is just an integer (32 bits in both i386 and AMD64)
Well, it's muddy enough. I remember "int = native machine word size" and "short <= int <= long" (very helpful, thanks). Guess that was true for 16-32 but not for 32-64.

So what do we have?
Code: [Select]
variable                                   offset (dez)
CMuleTrayIcon *m_wndTaskbarNotifier;        0
uint64 fence2;                              8
DialogType m_nActiveDialog;                16
bool m_is_safe_state;                      20
bool m_BlinkMessages;                      21

strange 0x01 (== true) turns up at         13

Could it be that the m_BlinkMessages is written to the wrong adress ?

Please try the following:
- turn on verbose mode so you see the full compiler invocation
- compile ChatWnd.cpp and amuleDlg.cpp by hand, replacing -c with -S so it generates assembler source
- post the assembler files here (along with the full calls to the compiler used)
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on February 28, 2009, 12:17:45 AM
I remember "int = native machine word size" and "short <= int <= long" (very helpful, thanks). Guess that was true for 16-32 but not for 32-64.

If `int' was 64 bits wide on a 64b arch, there were no type for either 16 or 32 bits.

See http://gcc.gnu.org/onlinedocs/gccint/Type-Layout.html for more information on type sizes.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on February 28, 2009, 03:27:19 AM
So an enum is 32 bit on a 64 bit compiler. Interesting.
STU, I'm not sure what you mean. In ANSI C an enum is just an integer (32 bits in both i386 and AMD64)
Well, it's muddy enough. I remember "int = native machine word size" and "short <= int <= long" (very helpful, thanks). Guess that was true for 16-32 but not for 32-64.
STU, I'm not yet familiar with this particular bug, but the fact is the size of C types in 64bit land was surprising for me as well. I assumed int would be 64 bits whereas is 32bit in i386, but they have the same size as in 64 bits sizeof(int) is 4. However the problem is sizeof(unsigned long) != sizeof64(unsigned long), and here is where most bugs happen. (One is 8, the other is 4)

Such is life, I guess they have good reasons for this choice.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on February 28, 2009, 08:30:15 PM
I've found this links that may be useful
64-BIT PROGRAMMING MODELS (http://www.unix.org/version2/whatsnew/lp64_wp.html) about Unix standards
AMD64 and LP64 (http://hbfs.wordpress.com/2008/10/28/the-lp64-model-and-the-amd64-instruction-set/) for me the important note here is that Linux is LP64 (instead of ILP64) on AMD64.

Which version did you use? You can't apply the revert-patch to 2.2.2 since the change is only in 2.2.3 and SVN.
You're right I've applied the patch to nothing, original idea was to apply it to latest SVN version just to see its behaviour, but finally I decided to start from zero with 2.2.2

The assemblers and command lines are attached. Please carefully review the compilation commands for errors, I've been failing with builds and other things and I don't want to be the one that made you read assembler for pleasure  ::)



The only language where you can do high quality development work without understanding the intricate details of the compiler is "english", as used by a manager when telling the coders what to do. (Well, perhaps...)
ftp://
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on March 01, 2009, 12:21:05 AM
Thank you very much for your effort! But the assembler is hard to chew. And I didn't even find that SetMessageBlink function.  ???

But I noticed something interesting:
amuleDlg is compiled with the app and with -pthread
ChatWnd is compiled with libmuleappgui and without -pthread
Doku (http://gcc.gnu.org/onlinedocs/gcc/IA_002d64-Options.html#IA_002d64-Options) says about -pthread:
Quote
Add support for multithreading using the POSIX threads library. This option sets flags for both the preprocessor and linker. It does not affect the thread safety of object code produced by the compiler or that of libraries supplied with it. These are HP-UX specific flags.

Now, where does this flag come from, and might it influence something?
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on March 01, 2009, 01:13:43 AM
Thank you very much for your effort! But the assembler is hard to chew. And I didn't even find that SetMessageBlink function.  ???

Because it's an inline function.

Now, where does this flag come from, and might it influence something?
Code: [Select]
$ grep -- -pthread Compilation.flags
LIBUPNP_CFLAGS = -pthread 
WXBASE_LIBS = -L/usr/local/lib -pthread   -lwx_baseud_2.8.9_net-2.8 -lwx_baseud_2.8.9-2.8
WX_CFLAGS = -I/usr/local/lib/wx/include/gtk2-unicode-debug-2.8-2.8.9 -I/usr/local/include/wx-2.8-2.8.9 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -D__WXDEBUG__ -D__WXGTK__ -pthread
WX_CFLAGS_ONLY = -pthread
WX_CXXFLAGS = -I/usr/local/lib/wx/include/gtk2-unicode-debug-2.8-2.8.9 -I/usr/local/include/wx-2.8-2.8.9 -D_FILE_OFFSET_BITS=64 -D_LARGE_FILES -D__WXDEBUG__ -D__WXGTK__ -pthread
WX_LIBS = -L/usr/local/lib -pthread   -lwx_gtk2ud_2.8.9_adv-2.8 -lwx_gtk2ud_2.8.9_core-2.8 -lwx_baseud_2.8.9_net-2.8 -lwx_baseud_2.8.9-2.8
$
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on March 01, 2009, 10:10:19 PM
Wires, please try the following:

- in src/Makefile find the line WX_CPPFLAGS =
- add the option -pthread
- rebuild (make clean; make), check if the fence still triggers

Strange option that. GCC doku lists it under "IA64 architecture specific" (also PPC and Sparc). So why is it defined and accepted in my Ubuntu 32 also ?!?

SetMessageBlink is defined as callable function in ChatWnd.s btw. That's not what I would expect of  "inline".

Am I understanding correctly that Ubuntu 64 is for AMD only? What about 64 bit Intel CPUs (like mine)?
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on March 01, 2009, 10:34:23 PM
The architecture is called amd64, because AMD introduced this extension to the x86 instruction set. The extended instruction set has been licensed by Intel and is used in its x86 64 bit processors. Sometimes you'll find the name x86_64, too, that's the same thing.

IA32 is more or less synonymous to x86; IA64 on the other hand is something different, that's Intel's Itanium. (Mentioning Itanium: They killed Alpha (http://en.wikipedia.org/wiki/DEC_Alpha)! :'()

In short: amd64 works fine on 64 bit capable Intel (x86) processors. In my case, that's a mobile C2D.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on March 01, 2009, 11:03:43 PM
Ok, maybe I'll try to get it running in my VM (no idea if that's possible with a 32 bit host).
They have a dirty kind of humor btw - or is it just me ?  ;D
(http://www.ubuntu.com/files/buttons/buttonlarge1.png)
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on March 02, 2009, 01:04:17 AM
Strange option that. GCC doku lists it under "IA64 architecture specific" (also PPC and Sparc). So why is it defined and accepted in my Ubuntu 32 also ?!?

From man page (man 7 pthreads)
Quote
Compiling on Linux
       On Linux, programs that use the Pthreads API should be compiled using cc -pthread

SetMessageBlink is defined as callable function in ChatWnd.s btw. That's not what I would expect of  "inline".

One of the compilation flags is -fno-inline so `inline' keyword is ignored.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on March 02, 2009, 07:06:19 PM
I think I found the source of the bug. Or at least I found a bug. And it has nothing to do with -pthreads.

wires, could you please try the following patch?
Code: [Select]
diff --git a/src/Makefile.am b/src/Makefile.am
index be7eda7..3499267 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -110,7 +110,7 @@ libmuleappgui_a_SOURCES = \
        MuleCollection.cpp \
        muuli_wdr.cpp
 
-libmuleappgui_a_CPPFLAGS = $(AM_CPPFLAGS) $(WX_CPPFLAGS) -I$(srcdir)/libs -I$(srcdir)/include $(LIBUPNP_CPPFLAGS)
+libmuleappgui_a_CPPFLAGS = $(AM_CPPFLAGS) $(WX_CPPFLAGS) -I$(srcdir)/libs -I$(srcdir)/include $(LIBUPNP_CPPFLAGS) $(GEOIP_CPPFLAGS)
 
 core_sources = \
        RC4Encrypt.cpp \

Please re-run autogen.sh and configure, delete at least libmuleappgui.a (or 'make clean'), and rebuild amule.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on March 04, 2009, 10:02:51 AM
Ok! it's done! And the winner is GonoszTopi  ;D!!! I've tried the FLAGS patch from Stu and it didn't work, but the last patch from GonoszTopi actually works!! The message is received, the icon blinks and fences stay clear. I would like to know what's the cause, how can these compilation flags beak the runtime in such a way  ???. Here you are the log for a successful chat sesssion:
Code: [Select]
2009-03-04 09:32:20: ClientTCPSocket.cpp(336): Remote Client Protocol: Remote Client: OP_HELLO from 192.168.1.33
2009-03-04 09:32:20: BaseClient.cpp(985): Local Client Protocol: Local Client: OP_HELLOANSWER to 192.168.1.33
2009-03-04 09:32:20: BaseClient.cpp(799): Local Client Protocol: Local Client: OP_EMULEINFO/OS_INFO to 192.168.1.33
2009-03-04 09:32:20: BaseClient.cpp(2173): Local Client Protocol: Local Client: OP_SECIDENTSTATE to 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 87, size 5 received from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(1346): Remote Client Protocol: Remote Client: OP_SECIDENTSTATE from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol e3, opcode 4e, size 8 received from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(795): Remote Client Protocol: Remote Client: OP_MESSAGE from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(805): New message from 'wires' (IP:192.168.1.33)
2009-03-04 09:32:20: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 1, size 17 received from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(1326): Remote Client Protocol: Remote Client: OP_EMULEINFO is an OS_INFO
2009-03-04 09:32:20: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 85, size 77 received from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(1371): Remote Client Protocol: Remote Client: OP_PUBLICKEY from 192.168.1.33
2009-03-04 09:32:20: BaseClient.cpp(2067): Local Client Protocol: Local Client: OP_SIGNATURE to 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(2005): Remote Client Protocol: Packet with protocol c5, opcode 86, size 49 received from 192.168.1.33
2009-03-04 09:32:20: ClientTCPSocket.cpp(1387): Remote Client Protocol: Remote Client: OP_SIGNATURE from 192.168.1.33
2009-03-04 09:32:20: BaseClient.cpp(2140): ED2k Client: 'wires' has passed the secure identification, V2 State: 0
2009-03-04 09:33:00: BaseClient.cpp(1321): ED2k Client: --- Deleted client "Client wires on IP:Port 192.168.1.33:14662 using aMule SVN v2.3.0 aMule SVN"; Reason was Timeout

These are the steps I've followed to rebuild amule:
1. rm -rf build
2. cd ../src
3. apply GonoszTopi patch to Makefile.am (manually add GEOIP_CPPFLAGS to libmuleappgui_a_CPPFLAGS)
4. cd ..
6. ./autogen.sh
5. mkdir build
6. cd build
7. $ ../configure --prefix=/home/xxxxx/aMule-SVN --enable-dependency-tracking --disable-upnp --enable-geoip --with-wxdebug --disable-optimize --enable-debug --disable-wxcas --disable-alc --disable-alcc --with-wx-debug --with-wx-config=/home/xxxx/wxGTK/bin/wx-config --with-wx=/home/xxxxx/wxGTK
8. make
9. make install

Thank you all for your time. Let me know if you need me to run more tests ok?

This is a bit off topic but during make I've found this:
Code: [Select]
$ make
Parsing 2 files
FileName: ECTagTypes
FileContent: EC tag types for use on the ec library.
Reading content section...
Datatype: Enum
Dataname: ECTagTypes
DataType: uint8
No more content sections
All info parsed
FileName: ECCodes
FileContent: EC codes and type definition.
Reading content section...
Datatype: TypeDef
Reading content section...
Datatype: Enum
Dataname: ProtocolVersion
DataType: uint16
Reading content section...
Datatype: Enum
Dataname: ECFlags
DataType: uint32
Reading content section...
Datatype: Enum
Dataname: ECOpCodes
DataType: uint8
Reading content section...
Datatype: Enum
Dataname: ECTagNames
DataType: uint16
Reading content section...
Datatype: Enum
Dataname: EC_DETAIL_LEVEL
DataType: uint8
Reading content section...
Datatype: Enum
Dataname: EC_SEARCH_TYPE
DataType: uint8
Reading content section...
Datatype: Enum
Dataname: EC_STATTREE_NODE_VALUE_TYPE
DataType: uint8
Reading content section...
Datatype: Enum
Dataname: EcPrefs
DataType: uint32
May be it is not important but enums are 32 bit in x86_64 so why it detects uint8/16/32 values? Feel free to quote this on a new post in the devel section ok? Not sure if that's the right way.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wuischke on March 04, 2009, 10:37:57 AM
Hi wires,

you are correct, enums are 32 bit. The output you see is related to the EC protocol. Basically a certain action is encoded by a number. When I send for instance 0x09, aMule knows I want to add a link. In the code we don't write 0x09 every time, but instead put all so called op codes in an enum. In ECCodes.h (src/libs/ec/cpp/) you'll find something like the following:
Code: [Select]
enum ECOpCodes {
...
EC_OP_ADD_LINK                      = 0x09,
...
Everytime we use the enum value EC_OP_ADD_LINK in our code, it will be replaced by 0x09. But 0x09 is only 8 bit - it would waste a lot of bandwith if everytime we wanted to send a '9' we would use a 32 bit value. Therefore we convert the 32 bit value to a 8 bit value when sending it.

The output you see is used for the generation of the C++ code files from an abstract file in a perl script, which makes changing data types (maybe we'll have more than 255 op codes one day and need a uint16) and values easy.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on March 04, 2009, 12:04:43 PM
I would like to know what's the cause, how can these compilation flags beak the runtime in such a way  ???.

First of all, Stu pointed out that somehow the bool value of m_BlinkMessages is written with a -8 bytes offset. Later in your compilation logs I discovered that amuleDlg.cpp was compiled with -DENABLE_IP2COUNTRY=1, but not ChatWnd.cpp. Then I was almost certain what the bug was, and soon found the following in amuleDlg.h:
Code: (amuleDlg.h) [Select]
#ifdef ENABLE_IP2COUNTRY
        CIP2Country*            m_IP2Country;
#endif

I was wondering why it affected only 64-bit, but it can either be related to data alignment done by the compiler, or something black magic.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on March 04, 2009, 08:01:49 PM
Ahhh - that's the bad thing about magic tricks. You're always so disappointed when you find out how it is done...  :D
Good to have that one nailed finally.

I was wondering why it affected only 64-bit, but it can either be related to data alignment done by the compiler, or something black magic.
Code: [Select]
CMuleTrayIcon *m_wndTaskbarNotifier;
DialogType m_nActiveDialog;
bool m_is_safe_state;
bool m_BlinkMessages;
With a displacement of 4 instead of 8 it only smashes the m_nActiveDialog and not the  m_wndTaskbarNotifier. I don't know the exact consequences of that - maybe only an extra dialog redraw or something, but no crash.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: wires on March 05, 2009, 12:11:41 AM
Well, thank you for your explanations. I've enjoyed this chase!  ;)
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: btkaos on March 05, 2009, 02:35:29 AM
First of all, Stu pointed out that somehow the bool value of m_BlinkMessages is written with a -8 bytes offset. Later in your compilation logs I discovered that amuleDlg.cpp was compiled with -DENABLE_IP2COUNTRY=1, but not ChatWnd.cpp. Then I was almost certain what the bug was, and soon found the following in amuleDlg.h:
Code: (amuleDlg.h) [Select]
#ifdef ENABLE_IP2COUNTRY
        CIP2Country*            m_IP2Country;
#endif
Impressive catch GonoszTopi!!
Quote
I was wondering why it affected only 64-bit, but it can either be related to data alignment done by the compiler, or something black magic.
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: GonoszTopi on March 05, 2009, 09:27:00 PM
Impressive catch GonoszTopi!!
You know, it isn't hard at all when others do the slave work ;)
Title: Re: aMule SVN 9385 crash on 64bit Debian
Post by: Stu Redman on March 05, 2009, 09:48:33 PM
Well, thank you for your explanations. I've enjoyed this chase!  ;)
Thank you for helping, wires - without your fence idea and your lots of testing it would not have been possible.