adversarial search state-of-the-art #2547

synctext · 2016-09-20T13:17:16Z

Keyword search within a self-organising system is a challenging unsolved problem.

Detecting and removing spam has proven to be extremely difficult. Creating a trustworthy search service, out of unreliable and possibly fraudulent resources is a challenge. A starting point is creating a web-of-trust or other feedback mechanism.

Existing work:

Web of trust for voting within Tribler:

www.ds.ewi.tudelft.nl/fileadmin/pds/reports/2013/PDS-2013-002.pdf
Multichain accounting
BM25 discussed in this thesis

synctext · 2016-09-29T17:13:23Z

Real world $2.44 million fraud with Amazon reviews/votes, thnx @pimveldhuisen
http:https://www.zdnet.com/article/exclusive-inside-a-million-dollar-amazon-kindle-catfishing-scam/

jellelicht · 2016-09-30T12:42:06Z

Most of these were quite useful for gaining some understanding on this topic, thanks @synctext and @pimveldhuisen.

I am currently looking into
Fighting peer-to-peer SPAM and decoys with object reputation, and to a lesser extent parts of P2P-Based Collaborative Spam Detection and Filtering.

It currently seems that lots of partial 'solutions' wrt adversarial search exist and have been researched, but most often they heavily depend on some form of centralisation or have another major drawback.

jellelicht · 2016-09-30T12:47:04Z

Also, regarding an often-used WoT based system:
the gpg shortkey issue that came up recently is interesting, but I am not sure if focusing on WoTs is wise at this moment in time, seeing as these are implementation details when looking at the state of the art of adversarial search. WDYT, @synctext?

synctext · 2016-09-30T13:32:47Z

@wordempire A lot of abuse, fraud and spam examples can be found in social media and e-commerce. So that is nice stuff to write about.

but most often they heavily depend on some form of centralisation or have another major drawback.

That is a perfect storyline! Anything more for self-organising systems or P2P? Stuff like, http:https://www.ece.umd.edu/~goergen/docs/sec-nwatch.pdf ..

Web-of-trust mechanisms can be a minority part of your report, halve, or the majority. Whatever makes the most interesting story. A list of partial, flawed, and fantasy WoT solutions would be ideal.

synctext · 2016-10-03T12:42:23Z

https://github.com/pimotte/msc-thesis

synctext · 2016-10-07T12:23:57Z

Fraud with search results with direct financial gain.

jellelicht · 2016-10-10T12:44:10Z

lee2006understanding: develops a model that looks at the link between user behavior/awareness and pollution of a p2p network.

jellelicht · 2016-10-10T13:55:18Z

yoshida2009controlling: shows that index poisoning is an effective way of dealing with copyright violations when looking at the Winny network for small sets of files. This approach has the potential to disrupt the network as a whole, which might or might not be desirable for an adversary.

jellelicht · 2016-10-12T08:28:17Z

Determine layers of operation and archetypes for each layer. For each type,
refer to drawbacks and assumptions made to make it all work.
1. Trust building/subversion
  
  Choosing whether to trust an authority or group of peers can be based on
  a variety of existing decision making processes. For example, this would
  be the perfect section to refer to the abuse happening w.r.t. twitter,
  amazon reviews and of course the gpg short key issue. Proxy measurements
  for trusts could also be reviewed here (e.g., if this person shares a
  lot, they might be trustworthy and that kind of thing).
2. Index pollution/building
  
  Indexes have to be generated for content in p2p networks, but have been
  polluted for some older networks, such as the Gnutella network. This
  section can focus on whether existing approaches can handle a subset of
  users putting low quality material online. Find out why it happens
  (user-centric), how it can be prevented and perhaps how it could be
  leveraged to practically block access to a undesirable/illegal resource.
3. Content poisoning
  
  Older p2p protocols used a very course checksum to verify entire files.
  It was quite to poison a download, therefore forcing the downloader to
  re-download the entire file. Depending on file size, this can be quite
  expensive. Look at the 'evolution' of systems, with at some point
  referring to the BitTorrent and its piecemeal hashing that partially
  alleviates this issue.

synctext · 2016-10-12T09:02:52Z

OK, + add 4th or 5th section.

start .tex in https://www.google.nl/search?q=ieeee+format format

https://scholar.google.com/scholar?q=dht+poisoning
https://scholar.google.com/scholar?q=link+farm
https://scholar.google.com/scholar?q=kazaa+pollution
Reddit HackerNews, upvote, shadow ban, etc. techniques
https://scholar.google.com/scholar?q=collaborative+spam+filtering
https://en.wikipedia.org/wiki/Stealth_banning
Honesty among drug dealers, 90% satisfaction level with drug deals: http:https://dl.acm.org/citation.cfm?id=2488408
https://scholar.google.com/scholar?q=explicit+feedback+spam+filtering
User feedback & moderation: http:https://www.sciencedirect.com/science/article/pii/S0308596108000955

the Tribler voting and spam prevention mechanism
control D, Dispersy, show votecast
sqlitebrowser ~/.Tribler/sqlite/tribler.sdb
browse _ChannelVotes table
create interesting plot

jellelicht · 2017-01-11T13:04:04Z

This was the user-study where the assumption that expert users can quickly assess whether something is spam is questioned: Lee, Uichin, et al. "Understanding Pollution Dynamics in P2P File Sharing." IPTPS. Vol. 6. 2006.

synctext · 2017-01-11T14:15:51Z

first warmup task: understand and plot key daya from AllChannel content discovery and voting mechanism.

Plot ideas:

of currently roughly 2710 channels in Tribler: show number of votes for each channel. X-axis consists of channels, sorted by number of votes.
of all 250.000 cast votes, try to determine the time of voting. Plot vote activity in time.
Examples of plotting usage

jellelicht · 2017-01-24T14:45:18Z

I am currently still deciding on how to export all my thesis-related artifacts (no generated artifacts in repositories), but for now a preview of a plot from last week(in xkcd style so I won't accidentally include them in a report as-is):

Also quick question: Is there any more recent work than Niels Zeilemaker's thesis from 2010 regarding the search strategy used in tribler nowadays? AFAIS, search is done by first looking in the local data, and then asking your TasteBuddies for more info, but I could of course be mistaken.

After spending some time thinking about the directions we could to go with this project, I would like to expand on the concept of trust and taking into account the possiblity of trustees being compromised. Trustees in this case could be something like "friends", people with similar voting behaviour or perhaps even something that can best be described as "moderators".

Some issues that I would have to research/address/decide on:

How to define and bootstrap "trust" (Perhaps I could look into using something similar to TrustChain, or piggyback on the existing records embedded in TrustChain)
How to handle the combination of content-based reputation (votes) and user-based reputation (weighing or even ignoring some votes)
If I want to base important decisions on someones votes, I somehow need to make sure Sybil attacks don't influence the decisions too much/at all.

jellelicht · 2017-01-24T14:58:57Z

I would also like to propose a different issue title, as "adversarial search" is usually used in the context of e.g. game related A.I. things. How about "Spam-resilient search in decentralized systems"

jellelicht · 2017-01-26T14:50:02Z

jellelicht · 2017-01-26T14:54:56Z

Problem can be split in two parts:

Prevent spam and spam-related meta-information from entering the network
Prevent spam and spam-related meta-information from hindering the proper usage of the network

synctext · 2017-01-26T15:13:18Z

Survey paper possible elements:

discuss proposed stuff + pictures/architecture in the past (Credence, socery by Yale,..)
deeply explain AllChannel
Measurements: understand and plot key data from AllChannel content discovery and voting mechanism.

synctext · 2017-07-19T10:29:29Z

ToDo:

start a 2-column IEEE format .tex paper
crawl and process allchannel votes, create a non-xkcd style picture
write down what you did and how it work. {Tribler created an operational system, we crawled this and analysed the distribution of magical votes, etc etc.}
say something intelligent about AllChannels. {1980 voting spam, 98% 2%, most active outliers, etc.}
read 15 years papers
Write a Problem Description: adversarial search in decentralized systems
Discuss 30+ related work www.seas.upenn.edu/~cse400/CSE400_2015_2016/reports/report_28.pdf

jellelicht · 2017-08-24T09:57:39Z

Draft version:
main.pdf

synctext · 2017-08-24T10:10:01Z

recent report + discussion: https://news.ycombinator.com/item?id=15055522
Active conference series "rebooting the web-of-trust"
idealist vision: https://idcubed.org/home_page_feature/towards-post-industrial-networked-democracy-decentralized-data-commons-exchange-tokens-trust-value/
motivational storyline also applies here: Swarm size community: content popularity #2783 (comment)
- modification: the algorithm to divide the world in good and evil is hard (or in technical terms honest and dishonest peers)
read: Trust building for your Solutions chapter. https://github.com/blockchain-lab/shared_vision_towards_programmable_economy/blob/master/tex/article.tex

Draft feedback:

improve structure
start sections with clean main point.
Why, What, How
Scientific
Sybil :

jellelicht · 2017-09-13T10:30:53Z

draft v2
main.pdf

jellelicht · 2017-09-13T12:15:17Z

Also @synctext , how would you like me to cite https://github.com/blockchain-lab/shared_vision_towards_programmable_economy/blob/master/tex/article.tex?

synctext · 2017-09-13T13:56:56Z

A system without these drawbacks that consistently classifies peers as honest or dishonest does not currently exist.
Any web page, tweet, blog post, wikipedia edits, news article can be fake or real.
Define IR
solutions for dishonesty
no 3.B
More then [25]
Experiment with real distributed search code. Example
We spend exactly 1 hour which each of these software packages and describe their maturity level. Include screenshot of each.
- freenet
- gnunet
- YACY
- Tribler
- etc

jellelicht · 2017-11-07T10:17:02Z

main.pdf

synctext · 2017-11-07T10:29:17Z

Are social media trolls in scope New Yorker and BBC plus Bloomberg? The official report"Assessing Russian Activities and Intentions in Recent US Elections", drafted and coordinated among The Central Intelligence Agency (CIA), The Federal Bureau of Investigation (FBI), and The National Security Agency (NSA).
Read The Sybil Attack - Theory and Practice by Kelong.

jellelicht · 2017-11-07T10:48:25Z

Influence of mass media on perception (shifting the 'normal' distribution e.g. Sybil region) effectively, why did this work as effective as it did?
What are the real-world costs associated with creating many 'troll accounts'?

ghost · 2017-11-07T13:23:10Z

Given how many legitimate news organizations and people are routinely labelled 'troll' by their competitors (RT / AlJazeera / CNN / FOX are, even if they are biased on questions of russian/qatar/US-blue-team/US-red-team interest) - the question of 'why did this work as effectively as it did' has an underlying truth component of 'because what they were saying was just as true of a constructed narrative of social facts as the competing consensus was'. It's not the whole reason why they are successful but if we're thinking about search and mass media we should keep in mind that in addition to the mass media perception shifting going on from one player in the 'troll account' narrative, there is great (perhaps greater) mass media perception management going on from the other player as well. Some success by the other players may serve to balance out the bias of the network itself in the favour of the incumbents.

To phrase in the context of, say, Kelong Cong's paper 3.3...the 'honest region' does not include either the blue or red team and everyone associated with it, both meatspace and bot, to the extent that shared, necessary illusions involved in group membership are held.

synctext · 2018-06-06T08:25:20Z

@ichorid See this ticket of related work. Especially the 8000 fake Twitter accounts.

ichorid · 2018-06-06T09:26:24Z

@synctext thanks, I'll take this stuff into account.

ichorid · 2018-06-09T09:42:27Z

related #3615

synctext · 2018-06-15T12:31:20Z

Broader vision, beyond keyword search. An extensive technical analysis of the threat model in troubled regions. Aid workers are exposed to difficult challenges, see On Enforcing the Digital Immunity of a Large Humanitarian Organization.

synctext · 2020-07-20T12:40:08Z

Status update after a few years:

we avoid solving the general keyword search problem
items are bundled into channels, with 1 publisher
each channel has a number of votes by voters
each voter has a sybil-attack estimator through trustchain
count weighted votes to estimate channel + item relevance ranking.

synctext added the type: MSc-Thesis-Work label Sep 20, 2016

synctext assigned jellelicht Sep 20, 2016

synctext mentioned this issue Jun 6, 2018

Towards global Consensus on Trust #3357

Open

qstokkink added this to the Backlog milestone Nov 27, 2018

synctext mentioned this issue Nov 27, 2018

decentralised non-profit payment services #4044

Closed

ichorid added this to To do in GigaChannels via automation Jul 18, 2020

ichorid moved this from To do to Discussion topics in GigaChannels Jul 18, 2020

drew2a added the component: channels label Jan 15, 2021

synctext mentioned this issue Apr 9, 2021

content popularity community: performance evaluation #3868

Open

ichorid mentioned this issue Nov 3, 2021

Vadim's testament #6481

Closed

synctext mentioned this issue Oct 18, 2023

Phd Placeholder: learn-to-rank, decentralised AI, on-device AI, something. #7586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adversarial search state-of-the-art #2547

adversarial search state-of-the-art #2547

synctext commented Sep 20, 2016 •

edited

synctext commented Sep 29, 2016 •

edited

jellelicht commented Sep 30, 2016

jellelicht commented Sep 30, 2016 •

edited

synctext commented Sep 30, 2016

synctext commented Oct 3, 2016

synctext commented Oct 7, 2016

jellelicht commented Oct 10, 2016

jellelicht commented Oct 10, 2016

jellelicht commented Oct 12, 2016

synctext commented Oct 12, 2016 •

edited

jellelicht commented Jan 11, 2017

synctext commented Jan 11, 2017 •

edited

jellelicht commented Jan 24, 2017 •

edited

jellelicht commented Jan 24, 2017

jellelicht commented Jan 26, 2017

jellelicht commented Jan 26, 2017

synctext commented Jan 26, 2017

synctext commented Jul 19, 2017 •

edited

jellelicht commented Aug 24, 2017

synctext commented Aug 24, 2017 •

edited

jellelicht commented Sep 13, 2017 •

edited

jellelicht commented Sep 13, 2017

synctext commented Sep 13, 2017 •

edited

jellelicht commented Nov 7, 2017

synctext commented Nov 7, 2017 •

edited

jellelicht commented Nov 7, 2017

ghost commented Nov 7, 2017

synctext commented Jun 6, 2018

ichorid commented Jun 6, 2018

ichorid commented Jun 9, 2018

synctext commented Jun 15, 2018

synctext commented Jul 20, 2020

adversarial search state-of-the-art #2547

adversarial search state-of-the-art #2547

Comments

synctext commented Sep 20, 2016 • edited

synctext commented Sep 29, 2016 • edited

jellelicht commented Sep 30, 2016

jellelicht commented Sep 30, 2016 • edited

synctext commented Sep 30, 2016

synctext commented Oct 3, 2016

synctext commented Oct 7, 2016

jellelicht commented Oct 10, 2016

jellelicht commented Oct 10, 2016

jellelicht commented Oct 12, 2016

synctext commented Oct 12, 2016 • edited

jellelicht commented Jan 11, 2017

synctext commented Jan 11, 2017 • edited

jellelicht commented Jan 24, 2017 • edited

jellelicht commented Jan 24, 2017

jellelicht commented Jan 26, 2017

jellelicht commented Jan 26, 2017

synctext commented Jan 26, 2017

synctext commented Jul 19, 2017 • edited

jellelicht commented Aug 24, 2017

synctext commented Aug 24, 2017 • edited

jellelicht commented Sep 13, 2017 • edited

jellelicht commented Sep 13, 2017

synctext commented Sep 13, 2017 • edited

jellelicht commented Nov 7, 2017

synctext commented Nov 7, 2017 • edited

jellelicht commented Nov 7, 2017

ghost commented Nov 7, 2017

synctext commented Jun 6, 2018

ichorid commented Jun 6, 2018

ichorid commented Jun 9, 2018

synctext commented Jun 15, 2018

synctext commented Jul 20, 2020

synctext commented Sep 20, 2016 •

edited

synctext commented Sep 29, 2016 •

edited

jellelicht commented Sep 30, 2016 •

edited

synctext commented Oct 12, 2016 •

edited

synctext commented Jan 11, 2017 •

edited

jellelicht commented Jan 24, 2017 •

edited

synctext commented Jul 19, 2017 •

edited

synctext commented Aug 24, 2017 •

edited

jellelicht commented Sep 13, 2017 •

edited

synctext commented Sep 13, 2017 •

edited

synctext commented Nov 7, 2017 •

edited