Topic: quick rehashing suggestion (Read 3505 times)

m2kio · « **on:** April 07, 2005, 01:42:18 PM »

when i have renamed a file and want to keep on sharing it, amule hashes it again. on the other hand it remembers files by name and does not rehash them if the name has not changed.

so i assume amule already has some code to remember and recognize a file.

i suggest that a file is not only recognized by name but also by file size and/or modification date and/or inode. these are informations harder to change than file name. in case of doubt amule could compare the md4sum of the first chunk to decide between known or new.

hashing is a pretty longish task and it would be good to reduce work here.

... m2kio !

Kry · « **Reply #1 on:** April 07, 2005, 02:11:31 PM »

A file is identified on the filesystem by name. We can't change that.

Xaignar · « **Reply #2 on:** April 07, 2005, 03:55:49 PM »

The problem with using such things as the inode to identify the files is that it isn't portable.

m2kio · « **Reply #3 on:** April 07, 2005, 04:14:58 PM »

Quote

Originally posted by Xaignar
The problem with using such things as the inode to identify the files is that it isn't portable.

i'm aware of this. the most simple idea is the file length. long files only very occassionally have the same file length. (ok, maybe except iso images) and the file length is even part of the ed2k identifier. so something like the following in the hasher should do it:

Code: [Select]

if filename is unknown
  if file length is known
    if md4(first chunk of file) == md4(first chunk of known file)
      change name in data base
      return // don't hash again
    endif
  endif
endif
hash(file)  // unknown

the file name is more easily changed than anything else for a file.
so a check file length + md4(1st block) is more accurate than looking up the file name.
everything else were just additional ideas which could be optionally enabled per platform.

... m2kio !

ken · « **Reply #4 on:** April 07, 2005, 09:47:00 PM »

A different solution to the same problem is to allow aMule to rename files in the Shared Files screen. When it does this, it will of course update the known*.met entries for that file so that it won't have to rehash.

Xaignar · « **Reply #5 on:** April 07, 2005, 09:50:01 PM »

Ken: Yes, but it's not very pleasent to have to go through aMule to rename shares, and you can bet that most people wont bother anyway.

Vollstrecker · « **Reply #6 on:** April 07, 2005, 10:33:41 PM »

Why have the filename to be used. If a file is finished, I think it doesn't change anymore. So the file could be recognised by a md5sum or so instead of the filename. I think doing a md5 on the whole feile is much faster than rehashing and would allow to rename it.

Xaignar · « **Reply #7 on:** April 07, 2005, 10:39:48 PM »

That wont be all that much faster than normal hashing, and only because the hashing thread also calculates the AICH hashset.

m2kio · « **Reply #8 on:** April 07, 2005, 10:42:14 PM »

it's just the hashing which i wanted to _avoid_ !

Grüße nach Hessen

aMule Forum

News:

Author Topic: quick rehashing suggestion (Read 3505 times)

m2kio

quick rehashing suggestion

Kry

Re: quick rehashing suggestion

Xaignar

Re: quick rehashing suggestion

m2kio

Re: quick rehashing suggestion

ken

Re: quick rehashing suggestion

Xaignar

Re: quick rehashing suggestion

Vollstrecker

Re: quick rehashing suggestion

Xaignar

Re: quick rehashing suggestion

m2kio

Re: quick rehashing suggestion