aMule Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

We're back! (IN POG FORM)

Author Topic: Probability based fake elimination  (Read 2194 times)

quakemire

  • Newbie
  • Karma: 0
  • Offline Offline
  • Posts: 4
Probability based fake elimination
« on: April 23, 2007, 10:21:47 PM »

It sounds complicated, but is rather easy (and unique, sic those other clients, boys!).

Imagine you search for "amule source code" and get 3 hits (that is: unique files with unique MD5s):

A: "Amule Source Code.zip"
B: "source code of the amule client.tar.bz"
C: "amuleus sourceros codeos des internationals espanol del sol arriva!.txt"

(i made up the last one, i don't speak spanish, please don't be upset, it is just an example)

Imagine that A, B, and C each come with two additional names:

A2: "Free Emule client source.zip"
A3: "open source client.zip"

B2: "Sex, Porn, Virus.exe"
B3: "Amule Source for Linux.zip"

C2: "Olala, los torreros!.txt"
C3: "Juanitas de Madrid.txt"

Now, here is the catch: Compare your original search statement to _all_ names of _all_ files and find out how often your keywords show up in the names people give to one and the same file. Also take into consideration how _many_ people call a file by a certain name and how _much_ of the file they already have (a seeder should know a fake). Furthermore: if one and the same word(s) turns up in _more_ alternative filenames than _any_ of _our_ keywords something must be wrong (normaly, you would have a B4: "Virus construction kit.zip", B5: "Porn! Best ever.avi" and B6: "SEX SEX SEX.rar" and so on).

File A is obviously a correct file since at least one keyword shows up in _all_ the filenames.

File B may or may not be a fake since there are filenames where none of the keywords apear.

File C must be a fake (or an error of the search engine or the server, in any way useless) since none of the keywords appear in any of the alternative filenames.

I am not a good coder, but i think if you get this a bit grainier by using the numbers amule already _knows_ about the search and the results and the numbers amule _learns_ when it starts downloading then you could compute a "confidence factor" or "anti-fake indicator" to show as a column in the download subwindow. So, one would start files A, B and C with "white" indicators and - after a few minutes - see C faltering to "Red Alert", B going to yellow and A gaining a nice, green color. One would kill B and C and hence eliminate stress on the net, his own link and his computer. And it would whiten the knuckles of the emule, lphant and MLdonkey devolopers, because _they_ (or their supporters) didn't came up with it, hi hi...

Quakemire
« Last Edit: April 23, 2007, 10:43:27 PM by quakemire »
Logged

skolnick

  • Global Moderator
  • Hero Member
  • *****
  • Karma: 24
  • Offline Offline
  • Posts: 1188
  • CentOS 6 User
Re: Probability based fake elimination
« Reply #1 on: April 24, 2007, 05:15:59 AM »

OK, there are files that are missnamed just to make people download them, so there are more complete sources. But think of this: In your example files, what would happen if you search for "torreros"? The file C2 would show, and it could probably be a useful file for this search, but it would show as a fake because of the other names. How would you solve that? I think it's not aMule responsability to "guess" what is a fake and what not, the user should solve that.

Regards.
Logged