aMule Forum
English => aMule Help => Topic started by: p.r. on January 25, 2009, 12:01:43 AM
-
I run both rtorrent and amuled at my home NAS (ppc603e 200Mhz/32MB/uClibc Linux), and I've noticed a strange behavior of Amule.
When rtorrent performs procedure of final file hashing it does NOT slow down itself (speed of downloading/uploading) or other applications, such as amule.
When amule performs procedure of final file hashing it DOES very slow down itself and other applications, such as rtorrent.
I think that this difference may be due to different libraries: openssl 0.9.7m (libcrypto.so.0.9.7) and crypto++ 5.5.2 (built into Amule 2.2.3).
Therefore, I want to know:
1. is it possible to compile amule with openssl?
2. is it possible to make amule (by changing the settings, or source code) NOT to use more than 50% of cpu time?
-
1. is it possible to compile amule with openssl?
No.
2. is it possible to make amule (by changing the settings, or source code) NOT to use more than 50% of cpu time?
Most likely, yes.
-
2. is it possible to make amule (by changing the settings, or source code) NOT to use more than 50% of cpu time?
Most likely, yes.
But how ?
Could some kind of "hash strategies switcher" be implemented, so
user could choose, for example, between different options:
1. as fast as possible
2. don't exceed certain data rate (xx MB/s)
3. don't use more than xx% of cpu time ?
Sorry if I've written something stupid, but it looks like amule by default try to hash files as fast as possible.
-
Sorry if I've written something stupid, but it looks like amule by default try to hash files as fast as possible.
When the final rehash happens the file is (yet) a partfile, so the hasher thread gets high priority. That's what causing the general slow-down.
-
Sorry if I've written something stupid, but it looks like amule by default try to hash files as fast as possible.
When the final rehash happens the file is (yet) a partfile, so the hasher thread gets high priority. That's what causing the general slow-down.
And how to reduce priority of hasher thread ?
So I found a solution.
Check ThreadTasks.cpp:50
CHashingTask::CHashingTask(const CPath& path, const CPath& filename, const CPartFile* part)
// GetPrintable is used to improve the readability of the log.
: CThreadTask(wxT("Hashing"), path.JoinPaths(filename).GetPrintable(), (part ? ETP_High : ETP_Normal)),
You can change ETP_High to ETP_Low if you want, even the ETP_Normal, but it will make hashing way slower in your system. OTOH it should prevent CPU starvation.
Let's see how it will work.
-
Will amuled also slow down?
-
No luck with reducing priority of hasher thread:
" : CThreadTask(wxT("Hashing"), path.JoinPaths(filename).GetPrintable(), (part ? ETP_Low : ETP_Low))"
No changes in amuled behaviour, got the same slow down.
-
This priority affect only the order in which tasks are executed in aMule. Probably hashing take too much time.
-
// Try to avoid reducing the latency of the main thread
m_thread->SetPriority(WXTHREAD_MIN_PRIORITY);
There seems to be no easy way to reduce CPU load.
-
Final file hashing speed:
rtorrent ~150 MB/minute and NO system slow down
amuled ~50 MB/minute and FULL system slow down
What could be a cause of such awful inefficiency?
-
Are you sure it is the hashing? Are your incoming and temp in the same partition/disk?
-
Well, torrent uses sha1 on small subsets (64kB to 4MB according to a large online encyclopaedia). ed2k (i.e. aMule) uses bigger chunks of 9500 kB which are hashed using MD4. aMule uses additionally AICH (http://www.amule.org/wiki/index.php/AICH) which uses sha1 to hash small parts of 180 kB, i.e. it creates 52+1 hashes per chunk. (Furthermore aMule copies the whole file after verification whereas torrent usually does not. That's what Kry's question is targetting.)
Therefore it's safe to assume that aMule is doing more work when hashing. This does, however not explain the slowdown. If, however, your computer is low on RAM this might be caused by the higher RAM usage of aMule when hashing. (It does currently load the whole 9.28 MB chunk into RAM, iirc.) This might cause swapping which is always a lot slower. freddy77 is working on patches to lower the RAM usage, so this might be improved in the future.
-
Are you sure it is the hashing? Are your incoming and temp in the same partition/disk?
Yes, these directories are in same disk /partition, and the disk is internal ata133, not external usb.
All directories locations are default: # ls -la1 /mnt/HD_a2/home/p2p/.aMule
.
..
ED2KLinks_lock
Incoming
Queue
Temp
amule.conf
clients.met
clients.met.BAK
cryptkey.dat
emfriends.met
ipfilter.dat
ipfilter_static.dat
key_index.dat
known.met
known2_64.met
last_version_check
lastversion
load_index.dat
logfile
logfile.bak
muleLock
nodes.dat
preferences.dat
preferencesKad.dat
server.met
server_met.old
shareddir.dat
src_index.dat
wuischke
Thank you for comprehensive explanation of difference between torrent and mule hashing.
AICH was disabled by default: ICH=1
AICHTrust=0
Should I disable ICH too?
I'll hope that freddy77 will make this patch.
Thank you all for understanding.
-
These options do not what you think. AICHTrust means that you trust every hash you receive by someone else. If you disable this (which is the right thing to do™), you only trust a hash if more than 10 clients have sent you the hash and at least 92% of all clients had this hash.
ICH is an old way to recover data, but a lot less efficient than AICH. You can savely disable it, but it will not improve hashing speed.
Please have a look at the available RAM during hashing and see if it's really caused by swapping. Seeing your machine has 32 MB RAM, it's very well possible.
-
wuischke
So there is no way to disable AICH?
Swapping... yes, it present but it'snt so heavy.
When I run both rtorrent and amuled free MEM is almost constant: 500...700 KB,
used SWAP is variable: 16000 ... 29000 KB.
-
AICH does't happen during the final hashing, as far as I know.
-
You are correct, Kry, I excuse for misinforming you, p.r.
// We can only create the AICH hashset if the file is a knownfile or
// if the partfile is complete, since the MD4 hashset is checked first,
// so that the AICH hashset only gets assigned if the MD4 hashset
// matches what we expected. Due to the rareity of post-completion
// corruptions, this gives us a nice speedup in most cases.
-
Mmm... my patch is already in SVN... however I don't understand! My router have 64mb but still hangs when start upload... perhaps the problem is that I don't have swap :( Yes, you should think that if you have 29MB of swap used + 32Mb of ram you have 29+32 = 61Mb used... but this is not true... you have more ram. The reason is that unaccessed anonymous memory don't take physical ram and are only allocated in swap. I have about 40mb free (cache+free) but taking into account process size (about 10mb... really huge!), 20mb (see below) and needed cache to not slow down too much are not that much...
I ran my SVN version for a day on my intel testing machine and memory (with Kad enabled!) stay under 23mb... I really don't understand... I'll try to enable overcommit of memory (I know it's a crazyness but can help). Perhaps I have a problem in mmap code using mips that cause more memory allocation? I'll add some logging. I have also to test a .so I wrote to override allocation function and put allocations in a mmaped are so to emulate swap (but only for amuled :( ).
Perhaps using another SHA/MD4/whatever from different implementation (like OpenSSL) could help. Is there a test program to compare hashing speeds??
-
I'm going to use cpulimit ( http://cpulimit.sourceforge.net ).
I don't know how I'll use it in right way because I have 5 same named ("amuled") processes but only one of them is bad behaved.
Amuled performs final hashing (pid 1152) : PID USER STATUS RSS PPID %CPU %MEM COMMAND
1152 p2p R 11832 761 58.2 38.9 amuled
704 p2p R 4908 701 14.2 16.1 rtorrent
4 root SW 0 1 1.6 0.0 kswapd
760 p2p D 11832 1 0.8 38.9 amuled
1242 root R 196 1203 0.6 0.6 exe
762 p2p S 11832 761 0.2 38.9 amuled
764 p2p S 11832 761 0.0 38.9 amuled
761 p2p S 11832 760 0.0 38.9 amuled
410 root S 112 1 0.0 0.3 chkbutton
700 root S 68 1 0.0 0.2 dtach
667 root S 68 1 0.0 0.2 busybox
411 root S 68 1 0.0 0.2 webs
1 root S 48 0 0.0 0.1 init
588 root S 48 1 0.0 0.1 ftpd
533 root S 44 1 0.0 0.1 sh
1203 root S 28 667 0.0 0.0 sh
701 p2p S 28 700 0.0 0.0 sh
364 root SW 0 1 0.0 0.0 kjournald
3 root SWN 0 1 0.0 0.0 ksoftirqd_CPU0
34 root SW 0 1 0.0 0.0 loop0
8 root SW 0 1 0.0 0.0 mtdblockd
Take a look at Kernel Swap Daemon (kswapd) - its activity is not heavy for CPU and takes only 1.6%.
For comparison - Amuled in normal operation: PID USER STATUS RSS PPID %CPU %MEM COMMAND
704 p2p S 7284 701 16.3 24.0 rtorrent
760 p2p R 5804 1 0.8 19.1 amuled
1242 root R 196 1203 0.2 0.6 exe
667 root S 68 1 0.2 0.2 busybox
4 root SW 0 1 0.2 0.0 kswapd
762 p2p S 5804 761 0.0 19.1 amuled
764 p2p S 5804 761 0.0 19.1 amuled
761 p2p S 5804 760 0.0 19.1 amuled
1203 root S 220 667 0.0 0.7 sh
410 root S 132 1 0.0 0.4 chkbutton
1 root S 80 0 0.0 0.2 init
700 root S 68 1 0.0 0.2 dtach
411 root S 68 1 0.0 0.2 webs
588 root S 48 1 0.0 0.1 ftpd
533 root S 44 1 0.0 0.1 sh
701 p2p S 28 700 0.0 0.0 sh
364 root SW 0 1 0.0 0.0 kjournald
3 root SWN 0 1 0.0 0.0 ksoftirqd_CPU0
34 root SW 0 1 0.0 0.0 loop0
8 root SW 0 1 0.0 0.0 mtdblockd
686 root SW 0 1 0.0 0.0 dropbear
freddy77
How could I get your patch?
-
http://amule.uw.hu/tarballs/
-
News.... I still don't know if good or bad :(
I enabled overcommit in kernel (/proc/sys/vm/overcommit_memory) and amuled with my patch and it started working. The problem is that after a while (some minutes) memory got exhausted :(
I saw that my mmap patch do the job (now hashing don't stop the router) but I cannot run amuled.
Perhaps I'll manage a modified kernel with swap support (not that easy as it seems).
Well... I have still detected two way to reduce memory usage:
- do not buffer downloaded files (I have a partial patch that still use mmap to avoid double memory use... I'll try to fix it or I'll save files directly)
- use utf8 for string coding... not that easy :(
-
Have you tried to disable Kad ?
-
Yes, Kad disabled :(
-
IP filter disabled?
There are several lists (sources, uploadqueue) that are not very dynamic and take a lot of space (I'm just guessing here, I didn't measure). Maybe you can move some of the constant data (strings, hashes,...) to a mmap file, possibly stored on a USB flash drive? I'd let the file simply grow over time (what are a few GB nowadays) and clean it up by restarting the app all 24h or so.
-
Well, torrent uses sha1 on small subsets (64kB to 4MB according to a large online encyclopaedia). ed2k (i.e. aMule) uses bigger chunks of 9500 kB which are hashed using MD4. aMule uses additionally AICH (http://www.amule.org/wiki/index.php/AICH) which uses sha1 to hash small parts of 180 kB, i.e. it creates 52+1 hashes per chunk. (Furthermore aMule copies the whole file after verification whereas torrent usually does not. That's what Kry's question is targetting.)
Therefore it's safe to assume that aMule is doing more work when hashing. This does, however not explain the slowdown. If, however, your computer is low on RAM this might be caused by the higher RAM usage of aMule when hashing. (It does currently load the whole 9.28 MB chunk into RAM, iirc.) This might cause swapping which is always a lot slower. freddy77 is working on patches to lower the RAM usage, so this might be improved in the future.
1. But MD4 isn't significant slower (in 3 times), it's even a little bit faster than sha1.
2. Are any changes in 2.2.4 concerning memory managment during procedure of final file hashing?
-
2. Are any changes in 2.2.4 concerning memory managment during procedure of final file hashing?
No. Freddy's mmap patch is in trunk only.
-
And what about data alignment?
Do amule try to perform aligned memory access with powerpc or don't care about it?
And why powerpc is not mentioned in ArchSpecific.h?
Sorry if I'm asking stupid questions, but I've googled this info (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487803), and I supposed that it could be a reason of my problem too.
-
Should have nothing to do with final hashing.
There is no Cyptopp.cc file in this project, and now rawpoke in cryptopp at all.
You're right about PowerPC missing in ArchSpecific.h (though all RawPokes are used on already aligned data as far as I see). What's the predefine for it - __powerpc__ ?
-
If I knew, may be __powerpc__.
-
Just try it.
#ifdef __powerpc__
ARRGH
#endif
If you get a compile error it's correct. :D
-
Or you can try
echo x | cpp -dM - | sort
to see all predefined symbols.
-
Stu Redman
can't find __powerpc__
only
#define __i386__ 1
#define __tune_i486__ 1
-
Do amule try to perform aligned memory access with powerpc or don't care about it?
And why powerpc is not mentioned in ArchSpecific.h?
Now - do you have a PowerPC build environment (which can't have __i386__ defined) or was your question purely academical?
-
Stu Redman
My question wasn't academical, it's practical.
I suppose that my building environment is hardcoded into crosstoolchains.
I'm trying to look for a cause of that inefficiency:
final file hashing speed:
rtorrent + openssl ~150 MB/minute with NO system slow down
amuled + cryptopp ~50 MB/minute with FULL system slow down
Everything was built with CFLAGS="-O3 -mcpu=603e -mtune=603e" CXXFLAGS="-O3 -mcpu=603e -mtune=603e"
So without knowledge of programming I can only guess about the real cause - may be amuled or/and cryptopp make a monkey work because of wrong alignment/endianness or I don't know what else it could be...
-
So if you have a PowerPC cross compile tool chain, please look for the predefines in it (and not in your machines native compiler like you obviously did last time if __i386__ was defined).
I can't add support for PowerPC if I don't know the predefine. I'm quite sure it's __powerpc__ , but I'd like to have it verified by you.
-
There is no predefined __powerpc__ at all, neither in native nor cross gcc.
In "powerpc-linux-uclibc-gcc -dumpspecs" output I find -D__powerpc__ but it's only for BSD, not Linux.
-
Or you can try echo x | cpp -dM - | sort
to see all predefined symbols.
Please do just that with your powerpc-linux-uclibc-gcc. ::)
-
Strange, but I get different hashing speed for just downloaded files and for added by hand to Incomining directory files:
- added file ~115 MB/minute with slight system slow down
- downloaded file ~50 MB/minute with FULL system slow down
Why?
-
If you start issues and then don't answer my questions I see no reason to help you.
-
The problem has disappeared by itself after downgrade CryptoPP version 5.5.2 to 5.2.0
Now the speed of hashing is ranging from 30 MB/min up to 120 MB/min, depending on if rtorrent is active and if active how busy it(rtorrent) is and WITHOUT any system slowdowns.
BTW, I also carried out an experiment with new CryptoPP version 5.6, but the result was even worse - the hashing thread of aMuled used CPU at 80%...90% with full system slow downs.
Why the hashing thread of aMuled with newer versions of CryptoPP tried to act as real-time process not letting any other processes and even the system to use CPU in concurrent manner remains for me to complete mystery.
Stu Redman
There is NO predefined __powerpc__ for Linux target platforms in my crosstoolchains, it is predefined only for BSD.
freddy77
Thanks for mmap patch.
While it does not affect the CPU load in my case, but it is useful for memory usage - now aMuled try to use memory in wave-like manner.
-
p.r,
Out of curiosity, could you post the results of "gcc -dM -E - < /dev/null | sort"?
Cheers!
-
phoenix
#define PPC 1
#define _ARCH_PPC 1
#define _BIG_ENDIAN 1
#define _CALL_SYSV 1
#define __BIG_ENDIAN__ 1
#define __CHAR_BIT__ 8
#define __CHAR_UNSIGNED__ 1
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __DBL_DIG__ 15
#define __DBL_EPSILON__ 2.2204460492503131e-16
#define __DBL_MANT_DIG__ 53
#define __DBL_MAX_10_EXP__ 308
#define __DBL_MAX_EXP__ 1024
#define __DBL_MAX__ 1.7976931348623157e+308
#define __DBL_MIN_10_EXP__ (-307)
#define __DBL_MIN_EXP__ (-1021)
#define __DBL_MIN__ 2.2250738585072014e-308
#define __DECIMAL_DIG__ 17
#define __ELF__ 1
#define __FINITE_MATH_ONLY__ 0
#define __FLT_DENORM_MIN__ 1.40129846e-45F
#define __FLT_DIG__ 6
#define __FLT_EPSILON__ 1.19209290e-7F
#define __FLT_EVAL_METHOD__ 0
#define __FLT_MANT_DIG__ 24
#define __FLT_MAX_10_EXP__ 38
#define __FLT_MAX_EXP__ 128
#define __FLT_MAX__ 3.40282347e+38F
#define __FLT_MIN_10_EXP__ (-37)
#define __FLT_MIN_EXP__ (-125)
#define __FLT_MIN__ 1.17549435e-38F
#define __FLT_RADIX__ 2
#define __GNUC_MINOR__ 3
#define __GNUC_PATCHLEVEL__ 3
#define __GNUC__ 3
#define __GXX_ABI_VERSION 102
#define __INT_MAX__ 2147483647
#define __LDBL_DENORM_MIN__ 4.9406564584124654e-324L
#define __LDBL_DIG__ 15
#define __LDBL_EPSILON__ 2.2204460492503131e-16L
#define __LDBL_MANT_DIG__ 53
#define __LDBL_MAX_10_EXP__ 308
#define __LDBL_MAX_EXP__ 1024
#define __LDBL_MAX__ 1.7976931348623157e+308L
#define __LDBL_MIN_10_EXP__ (-307)
#define __LDBL_MIN_EXP__ (-1021)
#define __LDBL_MIN__ 2.2250738585072014e-308L
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __LONG_MAX__ 2147483647L
#define __NO_INLINE__ 1
#define __PPC 1
#define __PPC__ 1
#define __PTRDIFF_TYPE__ int
#define __REGISTER_PREFIX__
#define __SCHAR_MAX__ 127
#define __SHRT_MAX__ 32767
#define __SIZE_TYPE__ unsigned int
#define __STDC_HOSTED__ 1
#define __USER_LABEL_PREFIX__
#define __USING_SJLJ_EXCEPTIONS__ 1
#define __VERSION__ "3.3.3"
#define __WCHAR_MAX__ 2147483647
#define __WCHAR_TYPE__ long int
#define __WINT_TYPE__ unsigned int
#define __gnu_linux__ 1
#define __linux 1
#define __linux__ 1
#define __powerpc 1
#define __powerpc__ 1
#define __unix 1
#define __unix__ 1
#define linux 1
#define powerpc 1
#define unix 1
-
There is no predefined __powerpc__ at all, neither in native nor cross gcc.
#define __powerpc__ 1
?
-
There is no predefined __powerpc__ at all, neither in native nor cross gcc.
#define __powerpc__ 1
?
Or you can try echo x | cpp -dM - | sort
to see all predefined symbols.
"echo x | cpp -dM - | sort" don't show any predefined symbols that could be valid for powerpc-linux-uclibc-gcc (powerpc 603e, linux), only for native gcc (x86, linux)