aMule Forum

Please login or register.

Login with username, password and session length
Advanced search  

News:

We're back! (IN POG FORM)

Author Topic: filtering leeches from search results  (Read 3075 times)

hanan

  • Newbie
  • Karma: 0
  • Offline Offline
  • Posts: 2
filtering leeches from search results
« on: February 27, 2008, 11:36:41 PM »

Every search returns with at least 20 popular results which usually contain unwanted programs.

Stopping them from using the network is probably difficult.

An alternative is to filter the results based on the hash of the file.
My proposal: upon connection the client will issue two or three random searches without any shared keywords. The common hash codes in the search results are from leeches and will be deleted and banned for the entire session.

I have no idea how much bandwidth these fake files take, but filtering makes sense if just for the sake of enhanced user experience and teaching the fake file providers a lesson
Logged

Absurdität

  • Newbie
  • Karma: 0
  • Offline Offline
  • Posts: 4
Re: filtering leeches from search results
« Reply #1 on: February 29, 2008, 08:42:45 PM »

Looks like a good idea, for me it's more 40 unwanted results and mainly it's porn movies names
Logged

Xaignar

  • Admin and Code Junky
  • Hero Member
  • *****
  • Karma: 19
  • Offline Offline
  • Posts: 1103
Re: filtering leeches from search results
« Reply #2 on: March 01, 2008, 05:42:25 PM »

That most likely wont work. The number of shared files are in the millions, and the number of results returned form a search are in the hundreds. So the chances of hitting the same hash multiple times based on random keywords is pretty damn slim to begin with.

Moreover, since having multiple filenames for the same (valid) file is possible, your method would result in an unknown number of false positives. Indeed, it would take user intervention to determine which were the false positives and which were true positives, further complicating the issue. And it would be pretty hard to come up with a list of keywords which would not return valid shared results.
Logged

hanan

  • Newbie
  • Karma: 0
  • Offline Offline
  • Posts: 2
Re: filtering leeches from search results
« Reply #3 on: March 01, 2008, 09:20:35 PM »

let me clarify:
1. search results are supposed to be based on the search text, so looking for two distinct searches should result in different results
2. the hash codes are used for the actual transfer. once you start downloading a file you will get different files which have the same size and hash code. this is what enables you to rename a file and still share it. thus once you select a file from a search result its file name is meaningless, and the sharing is only based on the hash code (possibly with file size as well).

the way the fake files take advantage of the system is by replying to every search with a text containing the keywords pointing to a fake file (usually porn and gambling).
Additionally they make sure their results have the most sources.  I don't know if they trick the system by sending multiple answers, or if their file really exists many times in the mule network. Either way they are always on the top of the list.

As an example I entered the following two global searches:
1. "knocking on heavens door"
2. "barak obama biography"
You'd expect very few (if any) similar results, but if you sort both search results according to the number of sources you'd find the same hash codes appearing in both lists.

My proposal has to do with using random but highly probable key words in two initial searches, and filtering out the matching results.
A more extreme measure would be to stop sharing the files with that hash code, thus even if someone downloaded one of those files, he/she will not be using network resources sharing it.
Logged