aMule Forum
English => aMule crashes => Topic started by: GuybrushThreepwood on October 13, 2009, 03:07:14 PM
-
Hello to everyone! :) I don't know if it's really a bug or if it's something related to my particular situation but I'm experiencing a very annoying problem with the latest aMule SVN versions.
Before describing the issue, I'd like to describe my situation:
1) I run the aMule daemon on a mips router and remotely control it by aMuleGUI
2) I've built many aMule versions (not only) with no issues at all and I've been using the same toolchain (let me know if you need to know the dependencies installed)
3) The last aMule version that I've built before this one is the SVN 9548 that's rock solid on my router
4) Today I've built the latest SVN 9838 just to see if it was something wrong with the SVN 9834 but it behaves exactly the same
Some days ago, I've decided to build a newer version and I've downloaded the SVN 9834 sourcecode. The compilation stage went fine and everything seems to work fine, sometimes for an hour, sometimes for some minutes only, sometimes for several hours but before or later I get many consecutive:
WARNING! Client UDP-Socket ... discarded packet due to errors (2) while sending.
errors (we're talking of dozens of them), aMule doesn't succeed to upload or download anything anymore and, after a while, the connection drops down and I need to reboot the router.
I repeat: I don't know if it's really a bug but I'm rather confident that it is because the older SVN 9548 works fine. The only other explanation could be that in the meantime something has been added that's too hard to handle on a router according to its limited resources.
I hope to receive some advices soon and I thank you all in advance.
-
Hi and thanks for your reply. As said, I've taken into account the possibility of a resources shortcoming but, unless the later SVNs have introduced some very hard to handle features, how could you explain that the SVN 9548 works PERFECTLY under the SAME conditions? That's strange, isn't it? Apart from this, I've made a very fast and superficial research (I would deepen it as soon as possible) and this error seems quite widespread (though, as said, my research was very superficial and I don't know the different scenarios). When you can, please tell me what do you think of this...
-
SVN versions 9353 - 9647 are totally fucked (upload is broken) and should not be used :-[
How is your upload speed with the two versions in comparison?
Same configure options, especially the mmap option ? Same wxWidgets version?
-
@Stu Redman
Hi and thanks for your reply. I've downloaded the SVN 9548 a long time ago when it was the current one and it works very well for me: I manage to get good download speeds (probably a little better than the ones achieved with the SVN 9834/38, nothing major though) and, what's the most important thing, it's very very stable (even after prolonged use) unlike the newer SVN 9834/38 that aren't usable because of the described issue.
I've used EXACTLY the same toolchain to build all the versions and it includes wxWidgets 2.8.10 that's the latest stable release. If you need to know the other dependencies that I've included in my toolchain, you could look here:
http://forum.amule.org/index.php?PHPSESSID=7263b2379436f7b195c7f3f79e4a3161&topic=16721.msg88778#msg88778
Tell me if you need to know something more. Have a good day! :)
-
Best would be if you could bisect when from it is unusable for you, using the public git repository (http://repo.or.cz/w/amule.git). ~300 commits is way too much to check, looking for a bug we cannot reliably (or at all) reproduce.
"I would suspect the one experiencing the bug probably is more motivated to fix it than people who don't, so I suspect you're in trouble there. ;-)"
-
Guybrush, did you pass the --enable-mmap option to configure? IIRC that was introduced after 9548 (it was probably always-on then). That's essential for your environment.
Great if 9548 is stable for you, but it makes you a leech. 8)
-
@Stu Redman
Excuse me... Why should I be a leech? Is it because I've compiled an old development version when it was the current one like the SVN 9838 is now??? Honestly I don't understand...
By the way, I've not used the configure --enable-mmap switch: I didn't know anything about the need of using it. Thanks.
-
Because you don't upload, or upload bad stuff. Not intentionally of course. Until now when I told you that this old version is broken.
But with the --enable-mmap current SVN should work just as stable. :)
-
But with the --enable-mmap current SVN should work just as stable. :)
Also without. Should.
-
Not on a mips router running out of memory.
-
If 9548 ran, 9838 should also.
-
Because you don't upload, or upload bad stuff. Not intentionally of course. Until now when I told you that this old version is broken.
But with the --enable-mmap current SVN should work just as stable. :)
I didn't know that. The upload speeds are very good and I've uploaded a lot of stuff... So was all that I've uploaded corrupted? It's a pity...
I would rebuild the SVN 9838 using the --enable-mmap switch and keep you updated about the results. By now, thank you! :)
If 9548 ran, 9838 should also.
According to what Stu Redman has said in one of the previous posts, the SVN 9548 could have that option enabled by default and this explains everything... I must remember to use that switch from now on.
-
True. Having access to the source again I could check. Memory mapping code was introduced in r9377, configure option in r9563. In between it was used unconditionally.
-
So was all that I've uploaded corrupted?
Only uploads from finished files. :-[
-
Bad news... :( I've rebuilt the aMule daemon using the --enable-mmap configure switch but I'm still getting the same error and I've almost completely understood what exactly happens.
After some hours of stable operation, I start to see hundreds (I'm not exagerating) of consecutive " WARNING! Client UDP-Socket discarded..." warnings in the logfile; after this, the swap partition that I use along with aMule (even with the older versions and not only with them with no issues at all so I'm certain that it's ok) crashes and I can't turn the swap off (when I use the swapoff command I always get an error and I can't succeed in turning it off) unless rebooting the router. If I leave things unchanged, after a while the router crashes completely and I've to reboot it. There is definitively something too hard too handle for a simple router that might have been added in the newer aMule versions...
-
Are you using Kad or ED2K ?
-
Kad only, exactly as I did with the older versions. I've seen that the old "established" servers don't exist anymore and they've been replaced by some new unfamiliar servers (only six according to the peerates server list) with a very low number of users and shared files so I've decided to disable the ed2k connection and leave the kad connection alone. As said, this is exactly what I did with the older versions... Just to be clear: I've rebuilt the SVN 9838 (my previous post refers to this version) instead of the SVN 9834 but I don't think that it makes a great difference.
-
Error 2 under Linux is
#define ENOENT 2 /* No such file or directory */
error is raised in src/MuleUDPSocket.cpp:
bool CMuleUDPSocket::SendTo(uint8_t *buffer, uint32_t length, uint32_t ip, uint16_t port)
{
// Just pretend that we sent the packet in order to avoid infinite loops.
if (!(m_socket && m_socket->Ok())) {
return true;
}
amuleIPV4Address addr;
addr.Hostname(ip);
addr.Service(port);
// We better clear this flag here, status might have been changed
// between the U.B.T. addition and the real sending happening later
m_busy = false;
bool sent = false;
m_socket->SendTo(addr, buffer, length);
if (m_socket->Error()) {
wxSocketError error = m_socket->LastError();
if (error == wxSOCKET_WOULDBLOCK) {
// Socket is busy and can't send this data right now,
// so we just return not sent and set the wouldblock
// flag so it gets resent when socket is ready.
m_busy = true;
} else {
// An error which we can't handle happended, so we drop
// the packet rather than risk entering an infinite loop.
AddLogLineNS((wxT("WARNING! ") + m_name + wxT(": Packet to "))
<< Uint32_16toStringIP_Port(ip, port)
<< wxT(" discarded due to error (") << error << wxT(") while sending."));
sent = true;
}
} else {
AddDebugLogLineM(false, logMuleUDP, (m_name + wxT(": Packet sent ("))
<< Uint32_16toStringIP_Port(ip, port) << wxT("): ")
<< length << wxT("b"));
sent = true;
}
return sent;
}
so socket should be ok :( but ENOENT means probably an invalid handle due to failed socket allocation or some memory corruption that cause socket number to have been changed...
-
@freddy77
So you're confirming that this issue could be related to a bug, aren't you? By the way, if I remember well, you were using aMule on a mips router like me, right? Haven't you experienced this issue? Thanks for your reply! :)
-
Error 2 under Linux is
#define ENOENT 2 /* No such file or directory */
[...]
so socket should be ok :( but ENOENT means probably an invalid handle due to failed socket allocation or some memory corruption that cause socket number to have been changed...
Error 2 actually refers to wxSOCKET_IOERR.
-
Error 2 actually refers to wxSOCKET_IOERR.
which could be anything :(
-
Kad only, exactly as I did with the older versions.
Kad is kind of a memory hog. How much memory does your router have? Did you just change the executable, or the config dir too? If you have a new id you might have hit a hot spot in the network and run into memory problems.
-
Kad only, exactly as I did with the older versions.
Kad is kind of a memory hog. How much memory does your router have? Did you just change the executable, or the config dir too? If you have a new id you might have hit a hot spot in the network and run into memory problems.
Yes, I know that the KAD connection is really "heavy" on resources but how do you explain the fact that the older (and bugged, now I know it and I have to thank you for that! ;) ) SVN 9548 is stable after days of continuative operation under EXACTLY the same conditions (KAD only)? It seems quite strange to me, isn't it? My router has 32Mb of memory and I use a 128Mb swap partition along with it. I've changed only the executable and the id is the same. Any advice is appreciated. Thank you! :)
@freddy77
Haven't you experienced this issue on your router? Thanks for your reply! :)
-
In my tests I didn't use Kad...
-
Hi all,
last night my wireless link went down and the log window got full of messages like this
2009-10-21 00:50:26: MuleUDPSocket.cpp(333): WARNING! Client UDP-Socket: Packet to xxx.xxx.xxx.xxx:xxxx discarded due to error (2) while sending.
In another thread someone said something about the log window consuming a lot of memory because it is/was not limited. May be this is the problem with an embedded system ?
For me it's been enough to shut down Kad and then bootstrap from known clients.
I'm not sure but I think this also happens when my xDSL link hangs and the wireless keeps working. I'm able to ping the router but I can't go outside my local network.
I'm using SVN 9833 on x86_64, Kad only.
May be a connection check can be implemented over TCP. Something like opening a TCP connection to www.amule.org:80 when this error is detected, and set the link as unavailable on others than "connection refused" or "ok". The problem is that if not properly handled this check can take up to 4 minutes to time out...
-
In my tests I didn't use Kad...
I understand but, if the situation is the one depicted by the peerates server list, the only way to download something is using Kad... Apart from this, even with a daily updated server list and ipfilter.dat (I do it by the same script that launches the aMule daemon), I don't feel very safe as new fake or worse spy servers pop out everyday. Even servers that previously were trustworthy sometimes become spy servers and this isn't a good situation. I feel much safer with the ed2k connection disabled but this is an idea (maybe wrong).
That said, I still don't understand how couldn't the older versions suffer of this problem. I've always used KAD (along with ed2k before, alone now) and everything was always fine. For me it's quite obvious that some modifications (and I perfectly understand that it's VERY difficult to track down which of them) made in the meantime have introduced this problem.
I understand also that this problem is marginal because I think that the most out of the aMule users don't use it on an embedded system but I think that a memory leak could be significant even on a system with 4Gb of ram, simply because it's wasted memory. Again, this is only an idea (probably wrong). Have a good day! :)
-
Wires: the logwindow issue is related only to the gui apps (amule and amulegui). amuled wastes no memory on logging. It just writes the logfile, and reads it back when a gui connects.
And yes, aMule behaves poorly when internet connection breaks and later re-establishes. I've often found it sitting dumb and disconnected even though connection was fine again. :(
Guybrush: nobody debates the usefulness of Kad. The question is simply if it is able to run at all on a low-memory router. If it worked before it may have been a very close thing, and some little change or new feature taking memory (like the corruption blackbox) might have pushed it over the edge now. But "memory problem" is a pure speculation, it's just the most common problem on embedded.
What is your router btw?
-
@Stu Redman
Thank you very much for your reply. :) So, according to you, might I abandon the idea of using the newer aMule versions? It's a pity but I could completely understand because the aMule development couldn't be limited by the attempt to make it work in a satisfactory way everywhere, even on systems with very low resources available like a router.
You speak about a "corruption blackbox". Is a version not bugged like the SVN 9548 but without this blackbox available? Maybe I could try it or maybe I could patch the sourcecode to disable it (if possible, just a guess)... My router is based on the BCM6348 and, as said, has 32Mb of RAM. Have a good day! :)
-
CBB was introduced before the critical bugfix.
If you want to disable it, just delete all lines with m_CorruptionBlackBox from PartFile.cpp .
I could also provide a fix for the critical bugs for your working SVN version.
-
CBB was introduced before the critical bugfix.
If you want to disable it, just delete all lines with m_CorruptionBlackBox from PartFile.cpp .
I could also provide a fix for the critical bugs for your working SVN version.
Thanks for telling me. I would try it ASAP and post here the results. Have a good day! :)
-
@Stu redman
I've just built the lastest SVN 9847 following your guidelines (using the --enable-mmap switch and after patching PartFile.cpp to disable the Corruption Blackbox) and I've started its testing. By now, it has been up for five hours and everything is still fine.
-
Nothing. :( I still get a lot of warnings:
WARNING! Client UDP-Socket: Packet to ... discarded due to error (2) while sending.
After a while it stops downloading and uploading and it freezes the router. The only change is that now I'm able to succesfully execute the swapoff command, nothing relevant though.
-
This happen after disconnection of router or ip change ?? Perhaps router loose default router and many socket functions (like bind or send) fails...
An strace would be helpful but is hard to launch it when you have such problems :(
I would emulate this behavior using a normal pc and removing default router manually.
-
No, the router remains connected and it loses the connection only when it definitively crashes and the IP is static. I don't understand what could be the cause of this behaviour.
There are two amule.conf entries that I've successfully used for a long time with no issues at all but I'm reviewing everything so let's talk about them too:
QueueSizePref=5 (I think that the default value 50 is too high for a router with limited resorces)
FileBufferSizePref=70 (that's almost 1Mb, I've used this value instead of the too low according to me 16 to reduce the accesses to the disk)
For what concerns the rest, everything else is almost at default values. I don't think that those values could cause my problems (maybe the FileBufferSizePref only as it gets more memory but I don't think that less than a Megabyte more than the default value could be so relevant), especially considering that everything was fine with those settings and the older versions.
Just a question: what's the part of the code that triggers those warnings? I was just wondering: is it possible that the older SVNs (like the 9548) didn't include this check and this is the only reason why the warning doesn't appear in the log file?
-
Just a question: what's the part of the code that triggers those warnings?
See here (http://forum.amule.org/index.php?topic=17343.msg93729#msg93729).
I was just wondering: is it possible that the older SVNs (like the 9548) didn't include this check and this is the only reason why the warning doesn't appear in the log file?
Now that you mention it - I replaced the former printf with a AddLogLineNS in 9573. So yeah, if you redirect the output of your working version (or just copy over the current MuleUDPSocket.cpp - there are no important changes) you will probably find the same messages. So they could be unrelated to your problem.
-
I can confirm that the SVN 9548 didn't display those warnings only because it uses a printf instead of appending them to the logfile. Everything is EXACTLY as Stu Redman as said. I've replaced the printf in MuleUDPSocket.cpp and rebuilt the SVN 9548 and I've got the same behaviour with the only exception (quite a big one though) that the router and the swapfile remain stable. I've also understood what causes those warnings: as already guessed by freddy77, they arise when the connection drops.
After making these verifications, I've built the latest SVN 9852 as I've done before with the SVN 9847 (I've patched PartFile.cpp to disable the Corruption BlackBox) but this time I've also patched MuleUDPSocket.cpp to disable the visualization (not the handling) of those warnings.
The warnings aren't there anymore (obviously! ;) ) but, even with the Corruption Blackbox disabled, the router isn't stable and the swapfile crashes (unlike what happened with the previously built SVN 9847 that wasn't much stable too but at least didn't make the swap crash).
I've to add a slightly off topic issue: aMuleGUI isn't stable at all and has often crashed while all the previous versions that I've tested (even the SVN 9847) were very stable. Has something related to aMuleGUI changed in the meantime? If yes, maybe that checking those modifications could be necessary.
I would surely have to use a version much older than the latest ones: they definitively have something too hard to handle for a router with limited resources (unless you could give me some other advices like the one on the Corruption BlackBox). It's a pity cause the newer versions are much better than the SVN 9548...
I hope that my testing has been useful and thank you all (especially Stu Redman whose advices have been very useful) for your help. :)
-
Just what I expected. So we still have no clue what is causing your problem. :(
Only way I can think of to narrow it down is if we try more versions. One in between 9548 and now, if it fails one in the middle before that and so on, until we can locate the break. Sounds like a lot but really isn't, because many versions have no changes that could be related to this.
If you're willing to test more versions I'll provide them for you. Of course, when testing you have two keep in mind that two sessions are never the same. Your downloads change, the other clients change and so on. So "works" often is "not crashed yet". :-\
A word about remote gui: you should always use the one that belongs to the same SVN version as the daemon, at least if there were EC changes in between. And there were a lot of them this year. :)
If there are still crashes please post a backtrace (if it's from a fairly current version at least), but make a new thread for that.
-
Only way I can think of to narrow it down is if we try more versions. One in between 9548 and now, if it fails one in the middle before that and so on, until we can locate the break. Sounds like a lot but really isn't, because many versions have no changes that could be related to this.
If you're willing to test more versions I'll provide them for you.
Hi and thanks for your reply. Yes, I'm ready to test more versions if you'd like to tell me where to find them. To begin, I could try to rebuild the SVN 9834 disabling the Corruption BlackBox and that warning notification but I don't know if it would be enough. Tell me what do you think. I guess that, apart from the Corruption BlackBox, there isn't anything else to disable to free some memory...
Of course, when testing you have to keep in mind that two sessions are never the same. Your downloads change, the other clients change and so on. So "works" often is "not crashed yet". :-\
I completely understand this but the SVN 9548 was very stable and I've used it for a while though on the other hand I know that it's outdated now and it suffers of a severe bug.
A word about remote gui: you should always use the one that belongs to the same SVN version as the daemon, at least if there were EC changes in between. And there were a lot of them this year. :)
I know that and that's why every time that I build a new daemon version I rebuild a new aMuleGUI version too. Both amuled and aMuleGUI are the same version (SVN 9852). The problem isn't surely that.
-
Let's start right in the middle with revision 9700: http://www.megaupload.com/?d=WBQG5E7J
(you can also pm me your mail address if it can handle 4.5 MB files).
Unpack with tar xjf 9700.tbz
-
Let's start right in the middle with revision 9700: http://www.megaupload.com/?d=WBQG5E7J
Or you can get it also as a tar.gz (http://repo.or.cz/w/amule.git/snapshot/49f94817335034dbc7bfa458943b343902a60f9b.tar.gz) or a zip (http://repo.or.cz/w/amule.git/snapshot/49f94817335034dbc7bfa458943b343902a60f9b.zip) archive ;)
-
Forgot all about that one... :-[
You can get most of the tarballs from there, Guybrush.
-
I believe you have better things to do than generating tarballs when they're readily available 8)
-
Thank you both! :) I would be away in the next days but I've already downloaded the archive and I would keep you updated.
-
I've built the SVN 9700 (both the daemon and aMuleGUI) and it's less stable than the SVN 9852: it crashes even sooner. :( Do you have an explanation for this? I would test it more thoroughly but I'm not very confident that it would solve my problems... Maybe it misses some kind of optimization that's crucial, I don't know.
-
If I had an explanation I would fix that bug and move on. ;)
Some wild guesses: improvements since 9700 dampen the bad effect, general "weather" (you know, two sessions are never the same).
Now try (9700 + 9548)/2 . (And always build with mmap.)
-
Now try (9700 + 9548)/2 . (And always build with mmap.)
That's rev. 9624: tar.gz (http://repo.or.cz/w/amule.git/snapshot/393a11359200ab0b09b9c2ab1ad0a7e6cf49aab8.tar.gz) or zip (http://repo.or.cz/w/amule.git/snapshot/393a11359200ab0b09b9c2ab1ad0a7e6cf49aab8.zip)
-
Again, thanks to the both of you! :) I would try the SVN 9624. What I was guessing was that, considering that the 9700 works even worse than the 9852 (with the same downloads so under similar if not the same conditions), maybe that in the meantime you've applied some optimizations and this mean that I could try a version newer than the 9700 instead of an older one like the 9624... As said, it's just a guess and that's why I was asking...
-
Hello and excuse me for my long absence. I've finally managed to find some time to build the 9624 as you've suggested. Nothing has changed or has improved: almost the same behaviour than the other versions tested. The only stable version remains the 9548.
I ask you for an opinion on this again:
...considering that the 9700 works even worse than the 9852 (with the same downloads so under similar if not the same conditions), maybe that in the meantime you've applied some optimizations and this mean that I could try a version newer than the 9700 instead of an older one like the 9624....
Apart from this, I'm starting to think that the problem isn't related to the CorruptionBlackBox (though it is certainly a major improvement over the 9548) or commenting some lines isn't enough to disable it. If it could help, I could send to anyone who asks for it, the files modified by me (apart from modifying PartFile.cpp to disable the BlackBox and MuleUDPSocket.cpp to disable the logging of discarded packets when the connection drops, please consider that I had to make a little modification to amule.cpp too to get the downloads over 1Gb working as said here:
http://forum.amule.org/index.php?topic=16721.msg88939#msg88939
but this modification has proved to be ok with all the versions up to 9548 that I've built). Tell me if you happen to have some ideas: in the meantime I would try to build the latest SVN and see how it behaves. Have a good day! :)
-
If there was a regression between 9548 and 9624 you should try a version in between to narrow down what change is causing the problem.
What you should do first however is build 9548 in your current build environment and see if it still performs as well as before. Because it could also be a problem of your build environment.
-
What you should do first however is build 9548 in your current build environment and see if it still performs as well as before. Because it could also be a problem of your build environment.
Already done! ;) I've modified MuleUDPSocket.cpp to log the "discarded packet" warnings too (the 9548 originally uses a printf as you've said to me) and the 9548 is rock solid as always. The build environment is certainly fine as I've succesfully used it to build many aMule versions and many other things. I've built the latest 9960 (without commenting the calls to the BlackBox, I've only modified amule.cpp for the 1Gb issue and MuleUDPSocket.cpp to disable the logging of the "discarded packet" warnings): let's see how it behaves. I've tweaked the options a little but nothing major (I've disabled the GeoIP support). I'm not sure that using an older version is the key... I was wondering: you've said to me that the 9548 has a bug that makes sent packets corrupted: is it possible that the "cure" for this issue is the problem? I don't think...
-
As long as your upload reaches your set upload rate it has no effect.
Oh, and you are using the same configure options for all your builds I hope?
GeoIP has no influence on the daemon.
-
As long as your upload reaches your set upload rate it has no effect.
Yes, it does.
Oh, and you are using the same configure options for all your builds I hope?
Yes, obviously.
GeoIP has no influence on the daemon.
Thanks for telling me! ;)
By the way, the newly built 9960 with the CorruptionBlackBox enabled isn't stable (on my router: it's clear that mine is a peculiar situation, first of all because of the very limited resources).
I've tried it three times and after more or less one hour and an half it always stops uploading and downloading, the connection doesn't drop (I've checked and the router reports the ADSL carrier being up) but I can't reach any WAN address (I can instead reach the web interface of the router and even telnet to it), the swap doesn't crash completely as with other versions tested (like the 9852 if I remember well) but it takes more than usual to do the swapoff command and (this is strange) the swap usage remains EXACTLY the same all the time after aMule starts playing up.
I've rebuilt the 9960 with the CorruptionBlackBox disabled and it has been uploading and downloading fine for two hours and an half now: let's see if it's more stable; by now I can say that it's certainly more stable than the same version, built by the same build environment and with the same options, under the same conditions (same number and type of downloads etc.). This is quite obvious as the CorruptionBlackBox is certainly "heavy" on resources but I've reported the testing results anyway for the sake of completeness.
I will keep you updated.
-
Are you watching all the time with the remotegui? I just fixed an evil core memleak when EC was active in 9963. It was introduced in 9704. (By me, yeah, but the actual bug was already lurking in the code and waiting for someone making use of incremental tags...)
-
Bad news. I've left aMule working overnight (when I've left it last night everything was still working as expected) and this morning I've found it struggling: the hard drive didn't stop working a second, upload and download speeds were awful and erratic; I've checked and as expected the swap had given up, I needed to reboot the router because the swapoff command didn't help. EXACTLY as with the other versions from 9838 and so on (some of the older versions tested like the 9624 had a somewhat different behaviour as described in the previous posts but weren't stable anyway).
I can't even imagine what could make the swap crash this way (even considering that the router remains stable and the ADSL carrier up and I can reach any WAN address with no issues). :(
In response to your question: no, I use the remote GUI only for a couple of minutes to check how things are going on but keep it closed most of the time as I think that establishing and keeping the connection opened is another load for the poor router. I'm starting to think about giving up! :(
By the way, if you think that it could help, I would build the newest 9963: the bug has been introduced with 9704 and this makes sense but, again, I only use the remote GUI for a couple of minutes, sometimes over the day.
Just a last thing: the italian translation is almost fine but with 9960 (I could be wrong but I don't remember this "issue" with other versions) when I exit aMuleGUI it asks for confirmation in english. It's just a very marginal issue.
-EDIT-
Confirmed: aMuleGUI 9852 (the last before 9960 that I've built, it works so it uses the same protocol version) asks for confirmation in italian, not in english like the 9960. By the way I don't think there is the need to waste time on this.
-
when I exit aMuleGUI it asks for confirmation in english
Feel free to help update the Italian translation. :)
Instead of trying latest versions just return to the original plan to find out when the regression was introduced.
-
Feel free to help update the Italian translation. :)
I would be very happy to do it if you could tell me how.
Instead of trying latest versions just return to the original plan to find out when the regression was introduced.
I fear that I would end up with building the 9549! ;) Just kidding! The next candidate is (9548+9624)/2=9586: excuse me for the dumb question but how could I use the aMule Git (http://repo.or.cz/w/amule.git/) to download it? Thanks! :)
-
I reply to myself the 9586 should be this:
http://repo.or.cz/w/amule.git/snapshot/ab595d4c648cf91dd3ea55314ac26cb3ca3b3368.tar.gz
I would build it ASAP.
-
I would be very happy to do it if you could tell me how.
http://wiki.amule.org/index.php/Translations
-
http://wiki.amule.org/index.php/Translations
Excuse me for making another dumb question! ;) I already know that page and I've used it to localize my remote GUI but I've completely forgotten it! Thanks!
-
The next candidate is (9548+9624)/2=9586: excuse me for the dumb question but how could I use the aMule Git (http://repo.or.cz/w/amule.git/) to download it? Thanks! :)
The easiest way is using git (http://git-scm.com/). First you have to clone the repository with
$ git clone git://repo.or.cz/amule.git
(this only has to be done once). This will put the clone into ./amule. You can name another directory to hold the mirror at the end of the command line.
Now enter the directory where your clone resides. Second, you select which revision you want to test (rev.9586 in your case) and check out that revision with
$ git checkout amule-svn-r9586
Please note that not every single revision is mirrored to git (most likely because those commits affected only other branches or svn properties not meaningful to git). In case you get an error like below, just try decreasing the version number until it succeeds.
$ git checkout amule-svn-r9953
error: pathspec 'amule-svn-r9953' did not match any file(s) known to git.
-
@GonoszTopi
Thanks for your advices: I would follow them.
Bad news, again! :( Even 9586 makes the swap crash: how is this possible? ??? I've built several aMule versions (stable ones too but mostly SVN ones) before 9548 and everything was always fine (with 9548 included as already said; I've recently rebuilt 9548 adding the logging feature for discarded packets and it's still fine, this confirming that my build environment is ok). Why every newer version has some kind of issue? :(
By the way, 9586 behaves exactly as 9960: poor, very poor download speeds, erratic upload speeds and the general sensation that aMule is struggling up to the swap crash (aMule, the ADSL carrier and the rest are still "fine" but the swap crashes; when I try to do the swapoff it says that it cannot allocate memory and I need to reboot the router as always).
9586 isn't much newer than 9548: I'm really starting to fear that I would end up with building 9549... Next candidate is 9567. Just a doubt, are my installed dependencies:
http://forum.amule.org/index.php?PHPSESSID=7925dbe2a8ad143cd2420881c9b8f855&topic=16721.msg88778#msg88778
still fine (I guess so otherwise I might receive errors when configuring) or do I need to update anything? I've already pointed out this in the previous posts but I ask this again, just to be sure. Is this modification:
http://forum.amule.org/index.php?topic=16721.msg88939#msg88939
(I've succesfully applied it up to 9548, without it I can't handle downloads which size is over 1Gb though someone has said that the newer versions, 9548 included and I can exclude this because I've tried, don't suffer of this issue related to an incompatibility with older versions of uclibc like the one inlcuded in my toolchain) legitimate or could it be the cause of my problems?
Just a last wild guess: could this issue be related to a faulty swap partition? Maybe the newer versions require more swap space and they reach a damaged swap area that wasn't reached before... Apart from the fact that I've checked the swap and everything is reported to be fine, I find this explanation very arguable. Thanks for your help: I REALLY appreciate your effort! :)
-
Now what do we have in this range ?
- signal handler for mmap (introduced in 9549, fixed and attached to configure option in 9563)
- autoclose of partfiles to reduce file handles (9585)
Try 9584 (or 9563 or something in between, doesn't matter which).
Or try current SVN and patch FileArea.cpp:
//#if !defined(HAVE_SIGACTION) || !defined(SA_SIGINFO) || !defined(HAVE_MMAP)
#if 1
class CFileAreaSigHandler
{
-
Thanks for your reply. I would try the latest SVN patching it as you've suggested. In the meantime, I've rebuilt 9548 and I'm testing it: by now it's rock solid as always. It isn't certainly a build environment issue...
Could I continue patching amule.cpp as described? Do you think that disabling the CorruptionBlackBox could help? I've tried 9960 with and without it with no evident changes... My idea is to build the latest SVN
1) patching amule.cpp as always
2) patching MuleUDPSocket.cpp to disable the logging of discarded packets (just to keep the log file clean, this is certainly an irrelevant modification)
3) patching FileArea.cpp as you've suggested
What do you think?
-
Leave CorruptionBlackBox alone. 9586 failed too, and that was before CBB was introduced.
-
As said, I was already planning to leave PartFile.cpp untouched but thanks for your confirmation. I would keep you updated...
-
I've just built 9970 patching FileArea.cpp as you've suggested (I've patched amule.cpp and MuleUDPSocket.cpp as always too) and things are even worse now: the swap crashes after only one hour (I don't know if this is just a coincidence).
-
Then try 9584 (or less) without patching filearea.cpp. If this works it's the file autoclose feature that's causing the trouble (though I have no idea why yet).
-
Then try 9584 (or less) without patching filearea.cpp. If this works it's the file autoclose feature that's causing the trouble (though I have no idea why yet).
Hello everyone! :) First of all, excuse me for my long absence but I had some problems and hadn't found the time to build 9584 before... Finally, I've managed to build 9584 and everything seems (I'm still testing it) to be fine like with 9548. What does this mean? Can I forget to use the newer versions or is making some modifications to the source code possible? I remember that Stu Redman was talking about the file autoclose feature... Thanks for your reply and have a good day!
-
So it's the file autoclose. Hm.
One thing: iirc there is a global limit for file handles and network connections in amuled. autoclose greatly reduces used file handles and thus allows for more network connections. Which could be what is breaking your router...
What is your setting for maximum number of network connections? Can you check how many are actually open both with working and not working versions?
-
As always, thanks for your great effort: I REALLY REALLY appreciate your help! Sad to say but 9584 doesn't work neither. I've tested it for fourteen hours of continuous usage: it has worked well (good general stability and download speeds) for at least ten hours (this alone was certainly better than what achieved by the other versions apart from the 9548) but, after fourteen hours and after completing a download (that's always a very heavy task so it could be the reason though I don't believe this because it doesn't happen with 9548), the router has stated to struggle (the ADSL connection was still up and working but the hard drive connected to the router didn't stop reading/writing for a single second) in fact, as I expected, the swap had crashed: as always, when I've tried to call the swapoff command, I've received the 'Couldn't allocate memory' error message in response and I could only reboot the router.
Now I ask to you: is there any RELEVANT modification between 9548 and 9584??? Next candidate is 9566: as I've feared in one of my previous posts, I would end up with building 9549... :( Is it possible that the latest working version on my router is 9548??? :( I remember that 9548 has a bug that causes corrupted packets to be sent: could be the solution to that bug the cause? If I remember well, I've patched 9548 for that bug and it still was ok. By the way, tomorrow I would test 9584 again to see if it would make the swap crash again.
What is your setting for maximum number of network connections? Can you check how many are actually open both with working and not working versions?
I would post the informations that you're asking for ASAP.
-
I'm asking again: did you enable memory mapping ?
Also, please answer my questions.
-
I'm asking again: did you enable memory mapping ?
Also, please answer my questions.
Excuse me: I didn't mean to neglect anything but you didn't speak about memory mapping in your last post. At least this point is already more than clear: I ALWAYS enable memory mapping by the --enable-mmap configure switch.
Today, I'm testing the older 9548 and it's proving to be very stable as always: the swap is still fine and here are the details that you've asked for (in your last post you did ask for these informations but not anything about memory mapping otherwise I had replied promptly and with the best detail possible as I've always tried to do):
I use these settings:
MaxConnections=500
MaxConnectionsPerFiveSeconds=10
and this is what I get:
Active connections: 68
Average connections: 58.57
Peak connections: 108
Limit reached: never
Everything seems fine (at least with 9548). I would continue to follow 9548 behaviour during this day and tomorrow I would do the same with 9584. Please advice if you need me to make some modifications or monitor some other aspects. Have a good day! :)
-
Just wanted to make sure about the mmap. It has been a while... :D
When you run in problems with 9584, lower you MaxConnections to 100 or even 70 and retry.
-
When you run in problems with 9584, lower you MaxConnections to 100 or even 70 and retry.
I had already tried to lower the MaxConnections value to 100 with other versions and it didn't help but I would try, maybe lowering it to 70 as you suggest though I fear that it could be a too low value. By the way, lowering the Max Connections value could help only if I would see that the connections number goes often near or over that limit while using 9584 otherwise this tweak wouldn't affect anything.
9548 is still working fine (as expected! ;) This was the version that I've used more than any other so far and it always proved to be very stable: it's only that I'd like to take advantage of the improvements included in the newest versions though I'm starting to fear that I wouldn't... :( )
-
There are some news... I've stopped testing 9548 (it was only a waste of time as it was still working perfectly as always) and anticipated the planned testing on 9584. I've also reduced the MaxConnections value to 100 but, after only a couple of hours, the hard disk connected to the router has started to struggle again: sure to see that something was wrong, I've opened aMuleGUI and seen that everything was messed up (erratic upload speed, zero download speed and all the dowloads had lost their respective queues). This time, I've noticed something interesting in the logfile that I don't remember to have seen before:
2010-06-01 21:45:59: Invalid Kad tag; type=0x09 name=
Two messages exactly like this were reported in twenty minutes. As always, I couldn't successfully complete the swapoff command: "cannot allocate memory" as expected. I hope that this is something new that could shed some light on this issue...
-
I've made a quick search and I've found this topic:
http://www.amule.org/amule/index.php?topic=16193.0
on the aMule forum. It seems that the error that I get isn't something to be alarmed by (Kry speaks of 'checking') so how could this be related to the swap crash? Maybe it's only a coincidence... When you can, please tell me if I need to build another version between 9548 and 9584 (9566 is the next candidate) or try anything else... Thanks for your advices! :)
-
No idea. There isn't much left.
Try 9566 next.
-
No idea. There isn't much left.
Try 9566 next.
Have you lost any hope like me? I was making some more research and I've found this topic:
http://forums.gentoo.org/viewtopic-t-492270-start-0.html
Nothing special or new but it made me think... It probably isn't a bug but only an high request for memory (due to some misterious reasons) that my poor router couldn't satisfy but I ask to myself: "Why a so high swap usage while it never happened with 9548? What modification caused this?".
By the way, I've built 9566 and I would try it tomorrow. I hope, if not to succeed, at least that you wouldn't start to hate me! ;)
-
Hey, this thread hasn't half the posts RRM's had, and we found the problem in the end. So don't lose hope. :)
-
Even 9566 crashes as always so, before building the next candidate (9556 ), I've reinstalled the good old 9548 to test it under EXACTLY the same conditions that 9566 had just failed with. As expected, EVERYTHING fine as always. Then, I've built 9556 and I'm testing it right now: by now it's working absolutely fine.
I'm not planning to give up but this is starting to become a nightmare! :( How many SVNs have I built, configured and tested so far? I've lost the count! ;)
-
Did you reinstall or rebuild 9548?
Please rebuild it. Maybe it's something with your environment.
-
I'm not planning to give up but this is starting to become a nightmare! :( How many SVNs have I built, configured and tested so far? I've lost the count! ;)
Not much left:
[9549, 9551, 9553] <(9556)> [9561, 9563]
The revisions missing from this list are either
- commits on another branch (9550, 9554, 9555, 9557, 9559), or
- MacOSX-specific (9552, 9558, 9560), or
- shell script fix (9562), or
- translation update (9564, 9565).
-
Did you reinstall or rebuild 9548?
Please rebuild it. Maybe it's something with your environment.
You had already suggested this (maybe you've forgotten it but it's normal because you aren't currently following this only but many other and more complex issues! ;) ) and I had already rebuilt it with my current build environment replacing the printf in MuleUDPsocket.cpp to reproduce the discarded packets (due to disconnections as already understood in this thread) logging. Yesterday I've only reinstalled it but after rebuilding it from scratch only a couple of months ago and with my current build environment: it isn't the 'original' 9548 that I've built as soon as it was released.
By the way, my build environment hasn't never changed since when I've set it up and I successfully used it to build other things too and I continue to do it.
That said, I've used 9556 for eighteen hours (a good test I think) and it has worked very well (like 9548 I could say). The only strange thing happened when I've decided to exit from aMule... I use a little script that kills all the processes related to aMule and, if activated by the startup script, turns the swap off: when called, the 'killall amuled' command failed (all the amuled -f instances could still be seen doing a 'ps') and the successive 'swapoff /dev/sda6' being sda6 my swap partition failed with the recurrent 'Cannot allocate memory' message. This time it was something different though because, as said, aMule was stable: after almost a minute all the amuled instances were terminated then I could call the swapoff command while I think that the other newer versions tested actually crashed and the swap issue was only a side-effect.
I'm starting to think about retrying 9566 making a couple of checks that I've in mind (more or less the ones made with 9556) and see what happens. With the right results, it could be that the 'culprit' is the file autoclose feature as already guessed by StuRedman and I could try to build one of the lastest SVNs disabling the CBB and the file autoclose feature though this could be too reckless... ;)
Thanks to GonoszTopi too for his summarizing, I find it very helpful.
-
There are some news... The 9566 remained more or less stable but proved to be heavier than 9556 on resources: the router seemed to swap a lot as the hard drive was almost constantly working while 9556 was much more 'quite' (with the same downloads so almost under the same conditions).
I've definitively understood the 'Cannot allocate memory' issue: while with 9548 all the amuled -f instances are IMMEDIATELY terminated then the following swapoff command succeeds, with 9566 and 9556 (very likely to be the same even with the other versions tested) it takes much more time and this leads to the 'Cannot allocate memory' error message due to the fact that the swap is still being used. What was strange is that while 9556 took almost a minute to terminate all the amuled -f instances, 9566 took at least five minutes: I don't know if this could be a relevant information or not... Is there any change from 9549 on that could justify the increase in the time needed for the amuled instances to be terminated? By the way, 9548 remains the most stable version that I've tested so far.
Are there any relevant changes between 9556 and 9566? What could be the next step? Could the file autoclose feature really be a culprit? Is there a simple way to disable it (along with the CBB) to try, let's say, one of the newer versions 'patching' it a little or is it out of question? Thanks for your replies.
-
About the termination issue I suspect the signal handler introduced in 9549 might be related. Maybe that's the issue. Can you try 9549 next?
So what do we have:
9548 works
9549 mmap file area signal handler introduced
9561 fix for signal handler, mmap disabled completely
9563 configure option for mmap
9572 New functions for converting Kad IPs to string.
9585 file autoclose
9590 CBB
To disable the signal handler just replace the
#if !defined(HAVE_SIGACTION) || !defined(SA_SIGINFO)...
near the beginning of FileArea.cpp with #if 1
-
Thanks for your reply: great as always! I would try 9549 ASAP.
-
I've 'patched' 9549 and built it. It has been working for six hours now and it's going very well. I've already tried a killall on the amuled processes and it's as fast and reliable as with 9548 so no issues with the swapoff command or else. The signal handler is very likely to be the cause as expected by Stu but I would keep you updated.
-
If this keeps working walk your way up to newer versions, always keeping the signal handler disabled.
-
9549 with the signal handler disabled has proven to be stable so I've built 10210 patching FileArea.cpp (to disable the signal handler), MuleUDPSocket.cpp (to disable the discarded packets logging due to disconnections) and amule.cpp (to avoid the uclibc 1Gb issue), being the last two modifications exactly the same that I've been successfully applying for ages now.
It seemed to work well but after almost four hours I've tried to connect to it and aMuleGUI reported no connection (both Kad and ed2K were down) and no downloads in the downloads list (though there were; I'm speaking of downloads, not active ones) nor uploads. The aMuleGUI window was completely blank apart from the interface. Though the ADSL carrier was up, I couldn't navigate at all. As soon as I've stopped aMule, everyhing went back to normal: I could do the swapoff command and navigate with no issues at all without the need for rebooting the router.
Now I've started aMule again: let's see what happens... What I've noticed is that the connection by aMuleGUI (the refresh rates too) is VERY VERY slow.
-
Try 10140 (old GUI). There are issues with amulegui and the new gui at the moment.
-
First of all, excuse me for my long long absence. I haven't absolutely lost my interest in this subject but I've had some severe family problems and I couldn't make more testing anymore.
By the way, some days ago I've managed to download the latest 10306 (at that time, I hadn't still read the last post from Stu) and I've built it: though, according to the changelog, some fixes to the GUI have been applied in 10305, I've still found the same issues found with 10210. I can't see the statistics anymore under the GUI and, what's worse, aMule is clearly struggling though the swap issue is finally over thanks to the signal handler disabling suggested by Stu.
Now I'm going to download and build the 10140 like suggested by Stu (thank you very much as always! ;) ) but before I'd like to know if in 10306 the GUI issues are still present or they're solved as it seems according to the changelog. Thanks for everything! :)
-
Welcome back! Sorry to hear about your problems. Real life should always have priority over projects like this.
You must excuse however that my thinking process about your problem is more or less reset to zero meanwhile. ;)
There's nothing improving your situation since 10302.
I don't know when exactly the stat tree was broken but I'm planning to fix it before 2.3. There are more pressing issues atm though.
10140 will be worth a try, but I doubt it's better performance-wise than 10302+ . Versions between are a bit CPU load heavy on the core when the GUI is connected.
What exactly do you mean with "struggling" ? Can you compare performance with connected gui and without?
-
Hi and thanks for your kind reply! :) As already told in one of my previous messages (that I perfectly know you couldn't remember considering all the issues that you have to deal with), with 'struggling' I mean that aMule shows erratic download and upload speeds going up and down and the hard drive connected to the router doesn't stop for a second and this usually means that it's swapping a lot. I hope that things are clearer now. By the way, in the meantime I've built 10140 as you've suggested and it's proving to be quite stable though I plan to test it more thorougly before saying a final word. I will keep you updated. Thanks for everything! :)
-
Hello again! I'm continuing to test 10140 and it's still going well. I have two doubts though: you've said to me that 10140 includes the old GUI but I've tried aMuleGUI 10210 with amuled 10140 and it works, how is this possible if 10210 uses a new GUI? Something might have changed for sure because by now 10140 is much better than 10210 but I don't understand how could 10210 GUI work with 10140 daemon if these two versions use different GUIs...
Apart from this, even with 10140 I can't see the statistics: what's changed and what's preventing me from doing it? I'm not talking about the graphs (they never worked for me, probably because of some missing dependencies that I never cared about because I don't need the graphs and I fear that they're quite heavy to handle) but about the simple text statistics (average upload and download speeds, time connected, number of reconnections etc.) that have always worked like a charm for me with the 9xxx SVNs... Thanks! :)
-
Well, it could work together somehow, but you're on your own with any problems. :)
As I already said - stats have been broken for some time, and will stay that until I fix them. Graphs have never worked and won't any time soon.
-
Well, it could work together somehow, but you're on your own with any problems. :)
Clear! ;) I always use the GUI with the same daemon version anyway as I'm sure that's the best practice. That was only a fast check that I've done. By the way I can confirm that by now it seems that with 10140 the old good days of 9548 have come back (with a lot of new improvements too)!
As I already said - stats have been broken for some time, and will stay that until I fix them. Graphs have never worked and won't any time soon.
Excuse me, I've missed that. For what concerns graphs I don't care about them at all. I will continue to test 10140 and when I would be sure that everything is fine I would write a post here summarizing the various modifications needed to make it work on my router (thank you for almost all of them! :) )
Just a last question: what are the changes made to the GUI afterwards that could cause my problems? Wouldn't I never be able to use any version newer than 10140? Thanks for everything! :)
-
For the new gui the clients have to be transported through EC. Functions like showing uploads can't be implemented reasonably otherwise anymore. There is a benefit - double-clicking a download in amulegui didn't show the sources before. Now it does. (Single click to be precise.)
Collecting this info takes a bit time however. If your router is already close to the edge this might tip it over.
You can try if the clients are the problem by patching ExternalConn.cpp lines 663 +:
}
//// Add clients
//CECEmptyTag clients(EC_TAG_CLIENT);
//const CClientList::IDMap& clientList = theApp->clientlist->GetClientList();
//for (CClientList::IDMap::const_iterator it = clientList.begin(); it != clientList.end(); it++) {
// const CUpDownClient* cur_client = it->second;
// CValueMap &valuemap = tagmap.GetValueMap(cur_client->ECID());
// clients.AddTag(CEC_UpDownClient_Tag(cur_client, EC_DETAIL_INC_UPDATE, &valuemap));
//}
//response->AddTag(clients);
return response;
}
Please tell me if that makes current SVN usable for you.
-
I will within a day or two and I will post here the results...
-
Please tell me if that makes current SVN usable for you.
Hi! Unfortunately, I've built 10312 commenting those lines but nothing: it isn't stable like 10140 and the older 9xxx SVNs are, it behaves instead as 10306 with ExternalConn.cpp untouched... Are there any other relevant changes between let's say 10140 and 10210 that may cause my problems?
Just to keep things clear, 10210 was the first one of the '10xxx family' that I've built and it has issues too though somewhat different from the ones shown by 10306, please read these posts:
http://forum.amule.org/index.php?topic=17343.msg98190#msg98190
http://forum.amule.org/index.php?topic=17343.msg99495#msg99495
for reference. Thanks again!
-
Hmm, please try 10191 (or little earlier) - before EC implementation of new gui, and if it fails too, 10145 (or little later) - right after intro of new gui.
-
Interesting Guybrush. You could use "git bisect" in order to find the exact revision causing problems.
-
Nah, do what I said. There are some versions in that range that you better not try. :-\
-
I agree with you STU, but keep in mind that git bisect has log(n) complexity on the number of svn revisions.
Regards,
BTK
-
In order to save you some math, the log(n) things means that if you need to try more than 5 revisions, you should use git bisect.
Regards,
BTK
-
First of all thanks to you both for your replies. I'd like to use the git bisect function if necessary but I haven't never used so I would have to find out how.
By the way, I've chosen 10190 and 10148 following Stu's advices. I've left ExternalConn.cpp untouched on the both of them.
I haven't still tested 10148 but I've been using 10190 for some days now (I did't come back here before for that reason: I wanted to test it more thoroughly before saying a final word) and it proved to be quite stable. I have the impression that it's quite heavier on resorces than 10140 but it seems to work well. Now we have to understand what are the changes made between 10190 and 10210 that could cause my problems... I'm free to help should anyone of you need some more detailed informations.
-
Now we have to understand what are the changes made between 10190 and 10210 that could cause my problems...
See here (http://forum.amule.org/index.php?topic=17343.msg99516#msg99516).
-
Stat tree discussion (http://forum.amule.org/index.php?topic=18320.msg99801#msg99801)
-
Hi and thanks for your reply. I knew that but 'patching' ExternalConn.cpp didn't help with 10312 as already said so the problem might be elsewhere... If you think that it could be useful, I can rebuild 10210 modifying ExternalConn.cpp but I'm not sure that this would help. I've read that you've fixed the GUI stats in 10354, thank you! It's a pity but I fear that by now I can't use it (though I'm really tempted to build and test it ;) )... By the way, thank you very much for your great effort to make aMule better and better and for your help to me too. :)
-
The only performance impact in 10190 - 10210 is the transmission of clients, which is negated by the ExternalConn patch.
Then again, this is only of relevance when amulegui is connected.
What did you say happens if you use amulecmd to control your demon?
-
I've always used aMuleGUI to control the daemon running on my router, amulecmd only to automatically handle ed2k links. I will try to build the latest SVN patching ExternalConn.cpp and I will post here the results. Thank you! :)
-
Hello again! I've been using 10360 (built with the usual modifications plus the one to ExternalConn.cpp) for two days now and it seems to work very well (with the GUI stats now working again too thanks to Stu ;D ).
I can't imagine what could have changed from 10312 (maybe some optimizations, I don't know) that has made 10360 stable on my router but I will continue to test it and I will keep you updated. If everything would be fine as I expect, I would summarize here all the modifications needed to get the latest SVN work on my router. Thanks for everything! :)
-
Hello to everyone! :) First of all excuse me for my long long absence but I couldn't test the 'newly' built SVN 10360 like I wanted to because of some issues that hadn't anything to do with aMule or computing in general.
By the way, though late for sure, I've finally managed to test the 'patched' 10360 as thoroughly as I wanted to. I can confirm that it's as stable as the much older 9548 and this is a GREAT improvement considering all the improvements included in the time between 9548 and 10360! ;D
As promised, I'm going to summarize all the changes that I've made to aMule sources to make newer versions work in a satisfying way on my mips router so on a low resources system. Before starting, I'd like to point out some general but quite imporant advices:
- The "WARNING! Client UDP-Socket discarded packet..." consecutive warnings being the subject of this thread are due to a drop with the ADSL connection
- It's necessary to add the --enable-mmap configure switch to deal with low memory availability
That said, I can start summarizing all the modifications done on the aMule SVN 10360 sources:
- As already discussed here:
http://forum.amule.org/index.php?PHPSESSID=7263b2379436f7b195c7f3f79e4a3161&topic=16721.msg88778#msg88778
I've patched amule.cpp commenting these two lines
// rl.rlim_cur = rl.rlim_max;
// setrlimit(resType, &rl);
Without this modification I couldn't handle downloads which size was over 1Gb but please bear in mind that this could be necessary only for my particular situation because of the uclibc version built into my toolchain.
- In MuleUDPSocket.cpp I've commented this portion of code:
/* AddLogLineNS((wxT("WARNING! ") + m_name + wxT(": Packet to "))
<< Uint32_16toStringIP_Port(ip, port)
<< wxT(" discarded due to error (") << error << wxT(") while sending."));
*/ sent = true;
ONLY to disable the logging of the discarded packets because of dropped connection.
- In FileArea.cpp I've done this modification to disable the signal handler:
//#if !defined(HAVE_SIGACTION) || !defined(SA_SIGINFO) || !defined(HAVE_MMAP)
#if 1
class CFileAreaSigHandler
{
- In ExternalConn.cpp I've commented this portion of code to disable the clients transportation through EC:
// Add clients
/* CECEmptyTag clients(EC_TAG_CLIENT);
const CClientList::IDMap& clientList = theApp->clientlist->GetClientList();
for (CClientList::IDMap::const_iterator it = clientList.begin(); it != clientList.end(); it++) {
const CUpDownClient* cur_client = it->second;
CValueMap &valuemap = tagmap.GetValueMap(cur_client->ECID());
clients.AddTag(CEC_UpDownClient_Tag(cur_client, EC_DETAIL_INC_UPDATE, &valuemap));
}
response->AddTag(clients);
*/ return response;
}
That's all! It's more or less the summary of the thread along with some other issues discussed on another. First of all I'd like to say a GREAT "Thank you!" to StuRedman and all the others that have tried (and ofter succeeded) to help me. I've seen that in the meantime SVN 10360 has become quite outdated but I don't plan to build newer versions soon (I'd like to but I haven't the time). I ask you this anyway: do you believe that newer versions are still usable for me? I'm just curious because SVN 10360 completely satisfies me. Thanks again, really! :)
-
Hello Guybrush,
thank you for summing this all up! Let's take a look at the issues:
Without this modification I couldn't handle downloads which size was over 1Gb but please bear in mind that this could be necessary only for my particular situation because of the uclibc version built into my toolchain.
Hm, since 9539 UnlimitResource(RLIMIT_FSIZE) is not called anymore if __UCLIBC__ is defined. Please try if it works without this too. (Mind - your patch cuts ALL Unlimits, not just the broken one with the file size.
/* AddLogLineNS((wxT("WARNING! ") + m_name + wxT(": Packet to "))
<< Uint32_16toStringIP_Port(ip, port)
<< wxT(" discarded due to error (") << error << wxT(") while sending."));
I'd just change this to an AddLogLineN and so log it only to the log file and not to the console.
In FileArea.cpp I've done this modification to disable the signal handler:
I think I will just disable the signal handler for __UCLIBC__ .
In ExternalConn.cpp I've commented this portion of code to disable the clients transportation through EC:
This severely cuts into the client visualization of course. Please try:
// Add clients
CECEmptyTag clients(EC_TAG_CLIENT);
const CClientList::IDMap& clientList = theApp->clientlist->GetClientList();
for (CClientList::IDMap::const_iterator it = clientList.begin(); it != clientList.end(); it++) {
const CUpDownClient* cur_client = it->second.GetClient();
if (!cur_client->IsDownloading()) {
continue;
}
CValueMap &valuemap = tagmap.GetValueMap(cur_client->ECID());
clients.AddTag(CEC_UpDownClient_Tag(cur_client, EC_DETAIL_INC_UPDATE, &valuemap));
}
response->AddTag(clients);
This will show the active uploads again at minimum extra CPU load. I could make a preference setting for this.
The idea is that in the future an unpatched amuled should work on your machine.
I don't expect any new problems with current SVN instead of 10360.
-
Thanks for your advices as always. I will try to build a newer SVN following your advices ASAP. Have a good day! :)
-
SVN 10479 should now work out-of-the-box for you, Guybrush. Please verify.
To disable clients set in amule.conf
[ExternalConnect]
TransmitOnlyUploadingClients=1
This will still show the active uploads.
-
Should be fixed by rev. 10489.
-
Hello and excuse me for my long absence. I have built SVN 10494 (forcing mmap to 1 as always, just to reconnect to what's debated of in the previous posts) and everything works very well out of the box as suggested by Stu.
I've thoroughly tested it over the last months (though not continuatively) and it has proven to be very stable. I'm now back to the good old days of the 'ancient' ;) SVN 9548 but with a lot of new improvements. I'd like to thank you all (especially Stu) for your help. :) Don't hesitate to tell me if you want me to test newer versions on my router should something that could cause troubles on low resources systems have been changed in the meantime. Thanks again!
-
Hi Guybrush,
good to hear it works at last for you!
We have a release candidate for 2.3.1 out now (based on SVN 10600). If you want you can check it out, though I don't expect any problems compared to what you have now. But you never know.
-
Hello again to everyone! :) After a long absence, I've decided to build a newer SVN (I've used the aforementioned SVN 10494 in the last months and it has always worked like a charm) and I've downloaded 10786. I was forced to refresh my toolchain because the required version of wxWidgets now is 2.8.12 instead of the older 2.8.10 but everything has gone smoothly on this side. The issues have begun a little later though...
The new SVN isn't reliable on my router and after a while it starts to misbehave and the swap crashes (exactly like all the other tested versions before the 10494 and after the 9548, cannot allocate memory when trying to call the swapoff command etc.).
What's even worse (but here we're going slightly off topic) is that, in spite of the fact that I've built the same version aMuleGUI, I can't NEVER (neither at the beginning) connect with the daemon running on the router: it always complains about a wrong password while the password is completely fine, still the same I always use and simply generated with the md5sum, cut etc. 'trick'.
You all surely have much more urgent and important issues to attend to so I'm not starting another struggle like the one that has seen its end a little less than a year ago. I'd only like to see if I can find a solution without too much harm! ;)
-
Hi Guybrush, long time no see!
You caught a broken version. Try 10788 (or maybe 10781, because there were some changes in between I'm unsure of).
Are you sure you have built and run a matching version of amulegui? Any amulegui version since 2.3.1 should work with any amuled version since 2.3.1 by the way, there have been no breaking changes since (yet).
-
Hello Stu, I'm very glad being here again taking with you after such a long time! What to say if not 'Wonderful!!!' when thinking about the fact that among all the versions I could download and build, I've caught a broken one... I'm a lucky man! ;)
That said, I confirm to you that I've built aMuleGUI 10786. Apart from the fact that I always rebuild aMuleGUI to match the version of the daemon, this time I couldn't do anything different even if I wanted to because aMuleGUI 10494 don't work with amuled 10786 because of a protocol change.
The password is right but it rejects any connection attempt complaining about it being wrong, I don't know why. I'll build another SVN soon (I guess that everything including and past 10788 is fine, isn't it?) and post here the results ASAP. Have a good day! :)
-
Hi! Just to be on the safe side, I've built 10781 like suggested by you, Stu. It seems fine but I haven't thoroughly tested it because of time and because aMuleGUI doesn't still accept any password: I've tried changing it with another one freshly generated both by command line and by amuleweb (I know that it doesn't change anything as it is the same thing made in two different ways but I've tried that way too to exclude that in the meantime something has changed with password generation) to no avail.
I don't know what to think and the problem is that the subject of this topic is something completely different so I'm thinking about opening a new thread to avoid mixing things up but before doing it I'd like to go deeper into things and understand if this issue is real or something related to me only. I will keep you updated.