From 04ce285dbc0aa69bb6702e9c0d827431dff5a2e1 Mon Sep 17 00:00:00 2001 From: Ardhi Putra Pratama H Date: Thu, 13 Oct 2016 14:49:37 +0200 Subject: [PATCH] Rephrasing complex word. Processing 12/10 feedback --- 1_introduction.tex | 84 ++++++++++++++++++++++++---------- 2_problemdesc.tex | 106 +++++++++++++++++++++---------------------- bib/bibliography.bib | 22 +++++++++ thesis.tex | 2 + 4 files changed, 136 insertions(+), 78 deletions(-) diff --git a/1_introduction.tex b/1_introduction.tex index 1d12f61..142493f 100644 --- a/1_introduction.tex +++ b/1_introduction.tex @@ -22,25 +22,27 @@ \chapter{Introduction} Peer-to-peer network has many different applications. Some of them are multimedia streaming, online gaming, and file-transfer. All of those applications has different requirement to ensure user has flawless experience. Multimedia streaming, for example, need to achieve two conditions. First, the start up delay must be small to make sure user do not abandon his intent to stream the files. Secondly, the chunk (or piece) loss must be negligible, or at least low enough to provide good quality\cite{2008:givetogetvod:Mol}. Other application, P2P gaming, require more complex situation. Depend on the type of the game (e.g., FPS), peer latency must be under certain threshold\cite{2010:surveyp2pgame:shen}. It also needs to consider bandwidth demand and high security to prevent cheating between user. The most common P2P application, file transfer, obviously need high throughput by maximizing all the connection a user has. -\section{The arise of freeriding} +\section{The freeriding phenomenon} Among all the peer-to-peer usage in the Internet, file-sharing is the most popular one. It started with Napster in 1999 to share music file between its users. It shut down at 2001 and immediately followed by Kazaa and Gnutella afterward. Both services allowed the user to share not only music file but also another type of files. Currently, both already shut down because of legal and performance issues. In Gnutella case, majority of users (70\%) stopped to share their files. Moreover, about half of the communication only served by top 1\% of the community \cite{2000:freeridegnutella:adar}. Gnutella suffers from a social phenomenon called \textit{freeriding} on majority of its users. -Freerider, can cause several problems, especially in peer-to-peer network. First and foremost, freeriding behaviour can lead to vulnerabilities in the system. With only few of the user provide the service for many, it is eventually become more centralized than decentralized system. Another well known problem caused by freeriders is the degradation of system performance \cite{2000:freeridegnutella:adar}. If freeriders become majority in file-sharing peer-to-peer system, as they occupy significant amount of resource, eventually bottleneck in the system will occur. As the time goes, honest peer may not feel satisfied and decided to leave the system. With important peer leave the system, it will degrade more and lost the file that used to be served by leaving peer. The system become unhealthy and sooner or later will be completely left by its peers. +Freerider defined as the user behavior that selfishly consume all the resource without giving back. This behavior can cause several problems, especially in peer-to-peer network. First and foremost, freeriding behaviour can lead to vulnerabilities in the system. With only few of the user provide the service for many, it is eventually become more centralized than decentralized system. Another well known problem caused by freeriders is the degradation of system performance \cite{2000:freeridegnutella:adar}. If freeriders become majority in file-sharing peer-to-peer system, as they occupy significant amount of resource, eventually bottleneck in the system will occur. As the time goes, honest peer may not feel satisfied and decided to leave the system. With important peer leave the system, it will degrade more and lost the file that used to be served by leaving peer. The system become unhealthy and sooner or later will be completely left by its peers. If everyone can free ride, the whole system performance may degrade significantly. In other words, freeriding can lead to systematically worse problem called ``tragedy of the commons'' \cite{1968:tragedycommon:hardin}. This problem is popularized by \citet*{1968:tragedycommon:hardin} in \citeyear{1968:tragedycommon:hardin}. This social dilemma emerge because overuse and overexploitation in the shared resource without feedback from the user. % how can egoist cooperate : The Emergence of Cooperation among Egoists (Robert Axelrod). Solved by tit-for-tat -> good performance. managing supply and demand meulpowder p.7 % freerider behaviour, tit-for-tat result -In \bt, one of the protocol used in file-sharing, it is unlikely a user will extremely freeride. We define this behavior as not upload anything while keep downloading data. Instead of extremely freeriding, it is more common to find \textit{hit and run} behavior \cite{2011:managesupplydemand:meulpolder}. Hit and run (HnR) is a situation where a user has finished downloading then immediately stop his contribution. Hit and run also often referred as one of the freeriding behavior that peer-to-peer community wanted to prevent. \citeauthor{2015:freeriderinbtcommunity:das} also studied the freerider behavior in \bt~communities. They conclude that freerider in \bt~may not degrade performance as long as the swarm has at least one dedicated and available seeders. The potential availability of seeder also become a factor that keeping the swarm alive \cite{2015:freeriderinbtcommunity:das}. +In \bt, one of the protocol used in file-sharing, it is unlikely a user will \textit{extremely freeride}. We define this behavior as not upload anything while keep downloading data. Instead of extremely freeriding, it is more common to find \textit{hit and run} behavior \cite{2011:managesupplydemand:meulpolder}. Hit and run (HnR) is a situation where a user has finished downloading then immediately stop his contribution. Hit and run also often referred as one of the freeriding behavior that peer-to-peer community wanted to prevent. \citeauthor{2015:freeriderinbtcommunity:das} also studied the freerider behavior in \bt~communities. They conclude that freerider in \bt~may not degrade performance as long as the swarm has at least one dedicated and available seeders. The potential availability of seeder also become a factor that keeping the swarm alive \cite{2015:freeriderinbtcommunity:das}. %. One thing that need to take into consideration is that in their research, they only take four communities as dataset \section{BitTorrent protocol} \bt~\cite{2003:bittorrent:cohen}, nowadays, stand as \textit{de facto} file-sharing protocol on top of peer-to-peer network. It survives until now because \bt~ is a \textit{protocol} that can be implemented by anyone, instead of service that Napster, Kazaa, and Gnutella used to have. To build \bt~environment, it is essential to know the complete view how \bt~work. % tit-for-tat, choking, unchoke, optimistic unchoke -In general view, \bt~consists of peers who participated in file-sharing and \textit{tracker}. \textit{Tracker} is responsible for monitors the distribution and progress of the file and peers in the swarm. \textit{Swarm} is a set of peers formed with the same purpose of downloading or uploading certain files represented in \texttt{.torrent} metadata file. Static \texttt{.torrent} file, which contains information such as tracker addresses and unique hash value of this swarm, is created by peer who wants to publish their files. Peer uses information in \texttt{.torrent} file to connect each other. Files in a swarm consists of several \textit{chunks} or file pieces. A chunk is exchanged by the peers in a particular \textit{session}. A peer actively participated in many swarms on the uninterrupted time-frame called \textit{session}. +In general view, \bt~consists of peers who participated in file-sharing and \textit{tracker}. \textit{Tracker} is responsible for monitors the distribution and progress of the file and peers in the swarm. \textit{Swarm} is a set of peers formed with the same purpose of downloading or uploading certain files represented in \texttt{.torrent} metadata file. Static \texttt{.torrent} file, which contains information such as tracker addresses and unique hash value of this swarm, is created by peer who wants to publish their files. Peer uses information in \texttt{.torrent} file to connect each other. Files in a swarm consists of several \textit{chunks} or file pieces. A chunk is exchanged by the peers in a particular \textit{session}. A peer actively participated in many swarms on the uninterrupted time-frame called \textit{session}. -In \bt, it is desirable to have many peers upload piece of file to the swarm. This way, swarm can be \textit{healthier}, and overall download speed can increase. However, many peers become a \textit{leecher}, which quit the swarm when his download finished. This behavior also called as \textit{Hit and Run} (HnR) \cite{2014:sustainabilitytorrent:chen}. In general, those unwanted behavior normally forbidden in so-called \textit{private communities}. In such a community, the administrator enforces several policy such as \textit{Share Ratio Enforcement} (SRE). SRE define the amount a user need to upload before able to download from the community \cite{2012:economicbt:kash}. +There are a lot of \bt~ \textit{communities} that served as a portal to stored \texttt{.torrent} file. A community usually has their own tracker. In general, community in \bt~ can be divided into two categories : \textit{public} and \textit{private}. Public tracker means everybody can join the swarm served by that tracker. In the other hand, private communities are closed community which can be accessed by passing particular requirement \cite{2010:pubpriv:meulpolder, 2014:sustainabilitytorrent:chen}. Typically, public communities have lower performance compared to private communities \cite{2010:pubpriv:meulpolder}. Higher performance comes with a drawback : in private communities, it is also very difficult to get new membership and very easy to be kicked out \cite{2013:survivepriv:jia}. + +In \bt, it is desirable to have many peers upload piece of file to the swarm. This way, swarm can be \textit{healthier}, and overall download speed can increase. However, many peers become a \textit{leecher}, which quit the swarm when his download finished. This behavior also called as \textit{Hit and Run} (HnR) \cite{2014:sustainabilitytorrent:chen}. In general, those unwanted behavior normally forbidden in private communities. In such a community, the administrator enforces several policy such as \textit{Share Ratio Enforcement} (SRE). SRE define the amount a user need to upload before able to download from the community \cite{2012:economicbt:kash}. % how bittorrent handle freeriding (short term) \bt~uses \textit{tit-for-tat} mechanism to reward good behavior and punish bad behavior. This mechanism tried to solve fairness issue introduced by freeriding behavior \cite{2003:bittorrent:cohen}. \textit{Tit-for-tat} in \bt~ encourage user to only upload file to one who also has uploaded his file somewhere else. Furthermore, it is also ranked by upload amount and speed. Freerider always getting low priority in this mechanism. In this way, \textit{tit-for-tat} incentivizes for user to upload a file. \bt~protocol and its \textit{tit-for-tat} become a standard in file-sharing peer-to-peer system with many clients implemented this protocol. \textit{Tit-for-tat} valid only in a scope of single torrent. That means, the configuration from one community can not be carried to another community. This causes \textit{tit-for-tat} works best only in short term transaction and limited parties. @@ -56,15 +58,11 @@ \subsection{Tribler} All of the Tribler main components such as end-to-end encryption, channel discovery, and many others relied in database and dissemination system called \texttt{Dispersy} \cite{2013:dispersy:zeilemaker}. Dispersy maintain and perform the communication between Tribler peers in fully decentralized manner. Dispersy able to circulate the message in one-to-one or one-to-many within a group of node called \texttt{community}. User can adapt and implement its desired \textit{community} by itself. It is including how, what, and where the communication will occur. -Tribler implements several Dispersy \textit{communities} on its core function. \citeauthor{2016:tribler-techdebt:vos} summarize the recent community in Tribler. Important features such as channel discovery, search within community, end-to-end Tor-like operations, and currency mechanism shown in table \ref{tbl:community}. \textit{Channel} is a collection of torrent that has extra capabilities such as vote system, spam prevention, and comment (social) attributes. Every user can create his own channel, add and remove torrent to it, and maintain its activity. Worth to mention that Tribler implemented its own reputation system to incentivize user. Reputable user will get boost from other Tribler user, so it is beneficial in its own way. - -In Tribler, there were several attemps to tackle freerider issue. Give-to-Get \cite{2008:givetogetvod:Mol} is one approach in peer-to-peer streaming video system. It works by give freerider only idle bandwidth slots and therefore their download speed will much slower\footnote{\url{https://www.tribler.org/Give-To-Get/}(Accessed 22 September 2016)}. There was also reputation management implemented in \textit{BarterCast4 community}, specifically to prevent freeriding in Tribler \cite{2009:bartercast:meulpolder}. It was used to spread the statistics about upload and download rate of a particular user \cite{2016:tribler-techdebt:vos}. And lastly, there is Multichain \cite{2015:multichain:norberhuis}, the anonymous tamper-proof interaction history that works on onion routing in Tribler network. The relation between MultiChain and BarterCast4 will be discussed in section \ref{section:sec_currency}. - \begin{table}[tbp] \centering \caption{Overview of implemented Dispersy community in Tribler \cite{2016:tribler-techdebt:vos}.} \label{tbl:community} - \begin{tabular}{|l|p{11cm}|} + \begin{tabular}{|c|p{11cm}|} \hline \rowcolor[HTML]{EFEFEF} \multicolumn{1}{|c|}{\cellcolor[HTML]{EFEFEF}{\color[HTML]{333333} \textbf{Community Name}}} & \multicolumn{1}{c|}{\cellcolor[HTML]{EFEFEF}{\color[HTML]{333333} \textbf{Purpose}}} \\ \hline @@ -77,12 +75,25 @@ \subsection{Tribler} \end{tabular} \end{table} +Tribler implements several Dispersy \textit{communities} on its core function. \citeauthor{2016:tribler-techdebt:vos} summarize the recent community in Tribler. Important features such as channel discovery, search within community, end-to-end Tor-like operations, and currency mechanism shown in table \ref{tbl:community}. \textit{Channel} is a collection of torrent that has extra capabilities such as vote system, spam prevention, and comment (social) attributes. Every user can create his own channel, add and remove torrent to it, and maintain its activity. Worth to mention that Tribler implemented its own reputation system to incentivize user. Reputable user will get boost from other Tribler user, so it is beneficial in its own way. + +In Tribler, there were several attemps to tackle freerider issue. Give-to-Get \cite{2008:givetogetvod:Mol} is one approach in peer-to-peer streaming video system. It works by give freerider only idle bandwidth slots and therefore their download speed will much slower\footnote{\url{https://www.tribler.org/Give-To-Get/}(Accessed 22 September 2016)}. There was also reputation management implemented in \textit{BarterCast4 community}, specifically to prevent freeriding in Tribler \cite{2009:bartercast:meulpolder}. It was used to spread the statistics about upload and download rate of a particular user \cite{2016:tribler-techdebt:vos}. And lastly, there is Multichain \cite{2015:multichain:norberhuis}, the anonymous tamper-proof interaction history that works on onion routing in Tribler network. The relation between MultiChain and BarterCast4 will be discussed in section \ref{section:sec_currency}. + \section{Rewarding user contribution} -Freeriding behavior can be prevented by proper incentive mechanism. By showing goodness, specifically, by uploading data to others, user should get a reward. However, users are typically selfish and always tries to maximize their own benefit \cite{2015:incentivep2pgame:kang}. With unclear and non-obvious incentive mechanism, some peers who download a lot may or may not know that freeriding behavior is causing trouble for the system. Therefore, they could suffer from punishment. Reward and punishment can be in many forms such as right to download specific content, get higher download speed, and social acknowledgement. +\label{sec:userreward} +Freeriding behavior can be prevented by proper incentive mechanism. By showing goodness, specifically, by uploading data to others, user should get a reward. However, users are typically selfish and always tries to maximize their own benefit \cite{2015:incentivep2pgame:kang}. With unclear and non-obvious incentive mechanism, some peers who download a lot may or may not know that freeriding behavior is causing trouble for the system. Therefore, they could suffer from punishment. Reward and punishment can be in many forms such as right to download specific content, get higher or lower download speed, and social acknowledgement. To gain the reward is not as trivial as it sounds. Reward comes with good behavior which can be done by uploading content. This requires another user to actually download the content. In a community where the punishment is significantly severe, users typically very selective of its download activity. By downloading more, a user can be suspected with bad behavior that lead to punishment. Similar situation applies if the reward is insufficient. The user who want to get reward may need to standby for a long time waiting someone to download their files \cite{2013:survivepriv:jia}. This approach is inefficient, bandwidth wasting, but commonly practicable\cite{2013:survivepriv:jia}. -\todo{expand:??} +Incentive mechanism is essential as it is one of the property to increase general performance. \citeauthor{2011:managesupplydemand:meulpolder} discussed several kinds of incentive mechanism techniques. The technique can be combined and complement each other. Those are : (i) direct reciprocity, (ii) indirect reciprocity, (iii) centralized reputation, (iv) decentralized reputation, and (v) currency \cite{2011:managesupplydemand:meulpolder}. \textit{Reciprocity} focused on the relationship between peers. \textit{Reputation} technique is more straightforward. The information of user behavior in the past is stored (centralized or decentralized). This information iteratively updated and spread through all the peers. Last technique is \textit{currency} which uses \textit{credit} to incentivize user. + +\bt's \textit{tit-for-tat} implements reciprocation to enforce user contribution. It is based on \textit{direct reciprocation}. However, this technique has many limitations. It used short sliding window of the recent past. Although direct reciprocation is effective for relatively short period, \citeauthor{2011:managesupplydemand:meulpolder} stated that it is unlikely that two user will encounter each other again in the near future \cite{2011:managesupplydemand:meulpolder}. Especially in a swarm with many members. This limitation can be solved by \textit{indirect reciprocity}. Other user may contribute each other based on \textit{trust system} that naturally occurred in the swarm. \citeauthor{2005:indirectreciprocity:nowak} categorized indirect reciprocity into two which shown in Figure \ref{fig:reciprocation} : upstream and downstream \cite{2005:indirectreciprocity:nowak}. In \textit{downstream} reciprocity, if one peer is observed helping another peer, the observer may have more motivation to seed. This effect is based on trust. It is natural to help the helper in a society. In \textit{uptream} reciprocity, the peer that received help will have higher chance to seed another peer. This phenomenon can be interpreted as the peer who receives the data is ``giving back'' to the community. +\begin{figure}[ht] + \centering + \includegraphics[width=0.7\textwidth]{pics/reciprocation.pdf} + \caption{Direct and indirect reciprocation} + \label{fig:reciprocation} +\end{figure} % The focus on this thesis is to introduce credit mining system, a system to automatically upload prospected files. This system tries to find a collection of files which give relatively high reward if it uploaded in the future. As the system is implemented to increase user experience, it is implemented in such a way that it will not disturb any kind of user activities. Balancing between gaining high reward, consumed resource, and freeriding prevention become a key question of this thesis work. \subsection{Secure Currency} @@ -91,7 +102,14 @@ \subsection{Secure Currency} Tribler used to have secure reputation system called BarterCast4 \cite{2009:bartercast:meulpolder}. Reputation system in a sense, can be used as an incentive mechanism. A user, may receive high reputation if he provide quality content, and may lose reputation if he cheat in the system. Although BarterCast is fully decentralized, the system is vulnerable from attacks. BarterCast4 has no security measure to prevent tampering records. As \citeauthor{2015:multichain:norberhuis} pointed, BarterCast uses self-reported reputation as its base. It is possible for someone change his reputation arbitrarily, then use it to get higher priority in data transaction. MultiChain, introduced in 2015, addressed to solve this issue and will replace BarterCast as the currency in the Tribler community \cite{2015:multichain:norberhuis}. -MultiChain is a secure reputation system which contains interaction history between corresponded peers in a distributed environment \cite{2015:multichain:norberhuis}. MultiChain is inspired by the Block chain technology implemented in cryptocurrency such as Bitcoin. The difference is, if Bitcoin represents computation, MultiChain represents bandwidth used. In blockchain, multiple transactions are combined into a single block. A single block has a pointer to the previous block. If a peer \texttt{A} upload a file to peer \texttt{B}, \texttt{A} may want to increase its reputation by write it to the MultiChain. This transaction is transcribed into one MultiChain block. This block is protected by confirmed signature from both parties, \texttt{A} and \texttt{B}. The protected block can be easily verified by other peer. Note that MultiChain only reputation system. Incentive mechanism is still needed to assign proper reward and punishment fo honest peer and freeriders. +\begin{figure}[h] + \centering + \includegraphics[width=0.9\textwidth]{pics/blockchain.png} + \caption{MultiChain illustration. Source : \url{http://www.blockchain-lab.org}} + \label{fig:multichain} +\end{figure} + +MultiChain is a secure reputation system which contains interaction history between corresponded peers in a distributed environment \cite{2015:multichain:norberhuis}. MultiChain is inspired by the Block chain technology implemented in cryptocurrency such as Bitcoin. The difference is, if Bitcoin represents computation, MultiChain represents bandwidth used. In blockchain, multiple transactions are combined into a single block. A single block has a pointer to the previous block for that participated peer. If a peer \texttt{A} upload a file to peer \texttt{B}, \texttt{A} may want to increase its reputation by write it to the MultiChain. This transaction is transcribed into one MultiChain block. This block is protected by confirmed signature from both parties, \texttt{A} and \texttt{B}. The protected block can be easily verified by other peer. Note that MultiChain only reputation system. Incentive mechanism is still needed to assign proper reward and punishment fo honest peer and freeriders. Figure \ref{fig:multichain} is the illustration of how MultiChain works with multiple peers. It is also shown that the entanglement between block occurred because every block is shared between peers \cite{2015:multichain:norberhuis}. In a secure network with onion routing, one who relay the network at the cost of its bandwidth need to be incentivized as well. LIRA \cite{2013:lira:jansen} offers lightweight incentive system for user who contribute for the Tor network. It is based on the probability on getting lottery. The lottery itself is a priority on the Tor network which will improve the client performance. However, as LIRA is based on centralized bank in Tor's network, it still induced scalability problem. TorCoin \cite{2014:torcoin:ghosh} is another work covering this issue. Similar with MultiChain, TorCoin proof-of-work is its bandwidth embedded in the blockchain mechanism. This work also preserves anonymity, enforces accounting mechanism, and deploys in fully distributed manner. The main problem of TorCoin is that it requires extra specific protocol in Tor that not yet implementable called TorPath. TorPath is a protocol that assign a secure circuit to each client, monitor the route, and generate TorCoin as a proof or relaying or providing bandwidth to other user. In overview, there are three steps of TorPath. First is TorPath forms a group of users to interconnect each other. After that it shuffles and assign the circuit. At this point, Tor circuit is established and then each of the client can route with this established circuit. TorCoin can be mined by user via this TorPath. @@ -104,7 +122,7 @@ \section{Economics in file-sharing} \subsection{P2P currency and incentive} % incentive p2p sveral forms, recipro, reputation, credit -Incentive mechanism in peer-to-peer network is essential as it is one of the property to increase swarm performance. \citeauthor{2011:managesupplydemand:meulpolder} discussed several kinds of incentive mechanism techniques. The technique can be combined and complement each other. Those are : (i) direct reciprocity, (ii) indirect reciprocity, (iii) centralized reputation, (iv) decentralized reputation, and (v) currency \cite{2011:managesupplydemand:meulpolder}. \textit{Reciprocity} focused on the relationship between peers. \bt's \textit{tit-for-tat} is one example. It may be direct between two peers or indirectly by transfer the \textit{trust} to other peer. \textit{Reputation} technique is more straightforward. The information of user behavior in the past is stored (centralized or decentralized). This information iteratively updated and spread through all the peers. BarterCast \cite{2009:bartercast:meulpolder} and MultiChain \cite{2015:multichain:norberhuis} are the example of reputation technique. \textit{Currency}, as we will discuss later, use the concept of \textit{credit}. User need to \textit{buy} the content and can get \textit{credit} by providing service. \textit{Private communities} with SRE is the most common implementation of currency mechanism. +In previous section, we have shown the techniques used to reward user to contribute more (see section \ref{sec:userreward}). If we take an example for each of those technique, there are several implementation available in peer-to-peer system. \bt's \textit{tit-for-tat} is one example of direct reciprocation. BarterCast \cite{2009:bartercast:meulpolder} and MultiChain \cite{2015:multichain:norberhuis} are the example of reputation technique. \textit{Currency}, as we will focusly discuss, use the concept of \textit{credit}. User need to \textit{buy} the content and can get \textit{credit} by providing service. \textit{Private communities} with SRE is the most common implementation of currency mechanism. % incentive in p2p example Different incentive mechanism may be implemented in different type of application. \citeauthor{2015:incentivep2pgame:kang} proposed an incentive mechanism for dynamic and heterogeneous peer with game theory. They take peer capabilities and selfish nature as consideration. The mechanism targeted at wireless and low computing peer which always aim to maximize its own benefit through its credit system. In their system, each peer can set a price for service it provides. The buyer (downloader), in this case, able to negotiate with the seller (uploader) regarding the content price and its bandwidth allocation. This research objective is to maximize the \textit{performance satisfaction factor} where occurred after the transaction \cite{2015:incentivep2pgame:kang}. On the other side, especially in \bt~network, \citeauthor{2010:effortincentive:rahman} proposed effort-based incentive to advocate fairness between peers. They believe that current incentive system disfavor slow peers and eventually will decrease overall performance. In this system, user awarded based on its effort, which is relative on its capacity. This mechanism need alteration in \bt~existing policy on unchoke mechanism and peer selection. However, there is an increasing performance. Download speed for slow peers increase up to 63\% at the expense of decreasing speed for fast peer at 4\%. @@ -121,10 +139,36 @@ \subsection{P2P currency and incentive} \subsection{Supply and Demand} \label{section:suppdemand} % supply < demand -Supply and demand for both public and private \bt~communities have been studied by \citeauthor{2009:demandsupplyres:andrade} in \citeyear{2009:demandsupplyres:andrade}. \citeauthor{2009:demandsupplyres:andrade} shows that user who contribute more to the community, actually consume a lot from it. This explains that \bt~users are not altruistic enough to seed continuously. Although a significant amount of demand is successfully served by the community, there is only a few swarm that does not suffer from contention. Two hypothetical reasons \citeauthor{2009:demandsupplyres:andrade} suggested are: (i) an asymmetric number of seeder and leecher, which seeder cannot compensate; and (ii) lack of incentive mechanism in the higher level aside from \bt~\textit{tit-for-tat} \cite{2009:demandsupplyres:andrade}. +Supply and demand for both public and private \bt~communities have been intensively studied previously \cite{2009:demandsupplyres:andrade, 2010:pubpriv:meulpolder}. \citeauthor{2009:demandsupplyres:andrade} showed that in \bt~community, the supply is mostly meet the demand. Despite of that, public communities have lower service rate compared to private communities \cite{2009:demandsupplyres:andrade}. This fact also supported by \citeauthor{2010:pubpriv:meulpolder} mentioning that public communities have considerably lower seeder/leecher ratio that directly affect swarm's performance \cite{2010:pubpriv:meulpolder}. We define a supply and demand \textit{misalignment} if there is not enough supply to serve all the demand without performance degradation. Significantly low seeder/leecher ratio can lead to supply and demand misalignment. -% supply in private > in public; flashcrowd increase demand -In public community, there is less supply compared to private community which enforce SRE \cite{2009:demandsupplyres:andrade}. This affects a file longevity because user seed longer in private community. If this behavior happened in long period, it might produce significant imbalance on supply and demand as seeder kept seeding a particular torrent without switching to another swarm. This phenomenon is accumulated by existing of \textit{flashcrowd} effect. Flashcrowd effect is the sudden increase in resource demand due to various reason. Newly published torrent is one of the reasons where flashcrowd effect take place \cite{2013:swarmevolution:su}. These misalignments between supply and demand can worsen the downloading experience in \bt. +\begin{table}[] + \centering + \caption{Supply and demand in public and private communities \cite{2010:pubpriv:meulpolder}} + \label{tbl:supdemand} + \begin{adjustwidth}{-1.5cm}{} + \begin{tabular}{|c|c|c|c|c|c|c|c|l|} + \hline + \multicolumn{1}{|c|}{\multirow{2}{0.1\linewidth}{community}} & \multicolumn{3}{c|}{download speed (kbps)} & \multicolumn{1}{c|}{\multirow{2}{0.1\linewidth}{avg \% unconn}} & \multicolumn{1}{c|}{\multirow{2}{0.1\linewidth}{avg s/l ratio}} & \multicolumn{3}{c|}{seeding duration (hours)} \\ \cline{2-4} \cline{7-9} + \multicolumn{1}{|c|}{} & {mean} & {median} & {top 10\%} & {} & {} & {mean} & {median} & {top 10\%} \\ \hline + \multicolumn{8}{l}{} \\ \hline + The Pirate Bay & {1037} & {333} & {\textgreater2134} & {47.0} & {2.6} & {11.7} & {1.8} & {\textgreater31.4} \\ \hline + EZTV & {928} & {294} & {\textgreater1575} & {48.3} & {6.6} & {18.1} & {4.7} & {\textgreater52.0} \\ \hline + \multicolumn{8}{l}{} \\ \hline + TVTorrents & 3590 & 1362 & \textgreater7692 & 32.5 & 104.5 & 44.1 & 17.9 & \textgreater130.7 \\ \hline + TorrentLeech & {4937} & {1030} & {\textgreater7166} & {33.9} & {25.4} & {50.4} & {16.8} & {\textgreater153.9} \\ \hline + PolishTracker & {8625} & {1331} & {\textgreater14128} & {20.6} & {63.8} & {58.0} & {20.2} & {\textgreater156.0} \\ \hline + \end{tabular} + \end{adjustwidth} +\end{table} + +% supply in private > in public +In public community, there is significantly less seeder/leecher ratio compared to private community which enforce SRE \cite{2009:demandsupplyres:andrade,2010:pubpriv:meulpolder}. This will result a supply and demand misalignment that will affect overall swarm performance. In the private community with SRE, there is a consequences for peers who do not seed. This enforcement will make the community end up with a lot of peer who actively seeding, or in other words, giving supply. This phenomenon does not happen in the public community. In fact, public community usually suffer from undersupply. Two possible reason why this happened are : (i) an asymmetric number of seeder and leecher, which seeder cannot compensate; and (ii) lack of incentive mechanism in the higher level aside from \bt~\textit{tit-for-tat} \cite{2009:demandsupplyres:andrade}. \citeauthor{2010:pubpriv:meulpolder} stated that in private communities, \textit{tit-for-tat} is almost irrelevant as nearly all of the data comes from the seeders \cite{2010:pubpriv:meulpolder}. This is not surprising as from the same study shown in Table \ref{tbl:supdemand}, the ratio of seeder and leecher in private community can reach up to 1589 with average can reach more than 100. By contrast, in the public community there is only 2-7 seeder per leecher and maximum ratio is 46 \cite{2010:pubpriv:meulpolder}. + +% undersupply vs oversupply +In classical file-sharing peer-to-peer system, it is common to see that a swarm is \textit{undersupplied}. Undersupply means that there are not enough resource shared within the swarm to be distributed over the peers who wanted it. The reason why a user join a swarm is to download a file, so that is why undersupplied is commonly occurred. However, with the introduction of private community which enforce upload policy such as SRE or reputation mechanism, the problem shifted to a phenomenon called \textit{oversupply}. Both undersupply and oversupply is the sub-case of supply and demand misalignment. Undersupply condition can be solved by adding more high performance peer to boost download experience. In the other hand, oversupply problem is not trivial to solve. + +% oversupply -> fierce upload competition -> unbalance situation +In oversupplied swarm, user may find it difficult to earn the credit by uploading the file. This is because the problem described by \citeauthor{2011:managesupplydemand:meulpolder} called ``upload competition'' \cite{2011:managesupplydemand:meulpolder}. Two conditions from peers perspective must be fulfilled to make P2P system sustain, which is peers must be cooperative, and cooperative peer must stay as long as possible in the swarm \cite{2011:managesupplydemand:meulpolder}. Table \ref{tbl:supdemand} shows that in average, user in private community standby for seeding for 50 hours. In upload competition problem, cooperative peer can not control his upload rate. The one who will download the seeder chunks is out of the seeder knowledge and control. This may result an expulsion from the community with a SRE because the cooperative peer looks like it do not contribute. \citeauthor{2010:crashsustain:rahman} also stated that oversupply may result to system seize up by \textit{crashing} \cite{2010:crashsustain:rahman}. \citeauthor{2013:survivepriv:jia} find out that common way to survive expulsion is by seeding longer. However, if this behavior happened in long period, it might produce significant imbalance on supply and demand as seeder kept seeding a particular torrent without switching to another swarm. Sustainability of a swarm is in a risk in this situation as the misalignments between supply and demand can worsen the downloading experience in \bt. \begin{figure}[h] \centering @@ -133,12 +177,6 @@ \subsection{Supply and Demand} \label{fig:sysbalance} \end{figure} -% undersupply vs oversupply -In classical file-sharing peer-to-peer system, it is common to see that a swarm is \textit{undersupplied}. Undersupply means that there are not enough resource shared within the swarm to be distributed over the peers who wanted it. The reason why a user join a swarm is to download a file, so that is why undersupplied is commonly occurred. However, with the introduction of private community which enforce upload policy such as SRE or reputation mechanism, the problem shifted to a phenomenon called \textit{oversupply}. Both undersupply and oversupply is the sub-case of supply and demand misalignment. Undersupply condition can be solved by adding more high performance peer to boost download experience. In the other hand, oversupply problem is not trivial to solve. - -% oversupply -> fierce upload competition -> unbalance situation -In oversupplied swarm, user may find it difficult to earn the credit by uploading the file. This is because the problem described by \citeauthor{2011:managesupplydemand:meulpolder} called ``upload competition'' \cite{2011:managesupplydemand:meulpolder}. Two conditions from peers perspective must be fulfilled to make P2P system sustain, which is peers must be cooperative, and cooperative peer must stay as long as possible in the swarm \cite{2011:managesupplydemand:meulpolder}. In upload competition problem, cooperative peer can not control his upload rate. The one who will download the seeder chunks is out of the seeder knowledge and control. This may result an expulsion from the community with a SRE because the cooperative peer looks like it do not contribute. \citeauthor{2010:crashsustain:rahman} also stated that oversupply may result to system seize up by \textit{crashing} \cite{2010:crashsustain:rahman}. Sustainability of a swarm is in a risk in this situation. - % balance -> not sustain \citeauthor{2011:managesupplydemand:meulpolder} in his work illustrated the relation between various P2P system properties and its relation to system balance. The illustration shown in figure \ref{fig:sysbalance}. Request and seeding behavior is another term of user downloading and uploading behavior, respectively. Seed selection is one part that responsible for choosing which peer to seed. In \cite{2011:managesupplydemand:meulpolder}, \citeauthor{2011:managesupplydemand:meulpolder} showed that using naive random seeding behavior is not sufficient to make P2P system balance. Unbalance system can lead to unsustainable community. Therefore, it is important to work study seeding behavior for each peer by the implementation of credit mining. diff --git a/2_problemdesc.tex b/2_problemdesc.tex index 6de13d8..e8743a8 100644 --- a/2_problemdesc.tex +++ b/2_problemdesc.tex @@ -1,23 +1,20 @@ \chapter{Problem Description} \label{chp:relwork} -In this thesis, we introduce ``Credit mining system'', an automatic investment framework on swarm with multidimensional gain. The prototype of the framework was conducted by \citeauthor{2015:creditmining:capota}, mainly to run credit mining system without any restriction or coordination with any client. With credit mining system, locally, a user can gain credit with internally limited bandwidth allocation without any intervention needed. The credit can be in many forms such as share ratio (upload-to-download ratio), uploaded amount, effort based credit, and many other. From higher perspective, credit mining system will help a swarm to keep alive by providing integral pieces to the peer who need it. Although credit mining system will be implemented in Tribler system, it is possible to apply this feature to any file-sharing system. +In this thesis, we introduce ``Credit mining system'', an automatic investment framework on swarm with multidimensional gain. With credit mining system, locally, a user can gain credit with internally limited bandwidth allocation without any intervention needed. The credit can be in many forms such as share ratio (upload-to-download ratio), uploaded amount, effort based credit, and many other. From higher perspective, credit mining system will help a swarm to keep alive by providing integral pieces to the peer who need it. Although credit mining system will be implemented in Tribler system, it is possible to apply this feature to any file-sharing system. -\todo{3 problem - 1 prior work - 2 question} In this chapter, three problems that are the main concern of this thesis will be elaborated. First we will discuss cooperation and performance problem in peer-to-peer system, specifically \bt. The characteristics and importance of cooperation in peer-to-peer network \bt~will be reviewed. Secondly, the issue of \bt~credit and investment will be explained as the main concern of this work. It covers the importance, potential gaining, and desired effect of the possible credit investment. Thirdly, we will illustrate how much the bandwidth resource might be consumed by crawling in \bt~ecosystem. After specifying problems, prior works on credit mining will be reviewed. Further improvements on those work are the core of this thesis. Lastly, two research questions will be formulated. \section{Cooperation and performance in \bt} % social in p2p -In higher abstraction level, it is common to see P2P system, specifically in \bt, as social networking. Many social challenges, such as incentives mechanism, economic value to survive in the community, reputation identification, and user anonymity, addressed in this kind of network. All of those challenges involves peer behavior whether to help each other for the greater goods, selfishly consume all the resource without giving back, or inconsistently act between these two. It can be interpreted as maximizing their benefits and giving as little as possible. \textit{Freeriding} is the term given to this kind of behavior. It is often to describe this peer as \textit{freeriders}. Based on study by \citeauthor{2000:freeridegnutella:adar}, lots of P2P peers are always show self-interest and rationality, that is, freeriding \cite{2000:freeridegnutella:adar}. In Gnutella case, it even reaches 70\% of its user. However, \citeauthor{2005:bittorrentcooperation:andrade} showed that \bt~is indeed increased cooperation with only less than 10\% peer is uploading something. In \textit{private community}, this has gone better with higher SLR \cite{2005:bittorrentcooperation:andrade}. Even \citeauthor{2015:freeriderinbtcommunity:das} found that freerider in \bt~ does not deteriorate system performance\cite{2015:freeriderinbtcommunity:das}. All of this fact, however, does not change the fact that P2P users generally are still selfish \cite{2014:userbehaviourprivate:jia}. +In higher abstraction level, it is common to see P2P system, specifically in \bt, as social networking. Many social challenges, such as incentives mechanism, economic value to survive in the community, reputation identification, and user anonymity, addressed in this kind of network. All of those challenges involves peer behavior whether to help each other for the greater goods, selfishly consume all the resource without giving back, or inconsistently act between these two. It can be interpreted as maximizing their benefits and giving as little as possible. \citeauthor{2000:freeridegnutella:adar} showed that lots of P2P peers are always show self-interest and rationality \cite{2000:freeridegnutella:adar} that can be categorized as freeriding. However, \citeauthor{2005:bittorrentcooperation:andrade} showed that \bt~is indeed increased cooperation with only less than 10\% peer is uploading something. In \textit{private community}, this has gone better with higher SLR \cite{2005:bittorrentcooperation:andrade}. Even \citeauthor{2015:freeriderinbtcommunity:das} found that freerider in \bt~ does not deteriorate system performance\cite{2015:freeriderinbtcommunity:das}. All of this fact, however, does not change the fact that P2P users generally are still selfish \cite{2014:userbehaviourprivate:jia}. % cooperation is important in bt In a \bt~system, cooperation between peers is crucial to keep a file available in the network. With more user provides the file, the download speed gained for other users will be increased as well. However, this needs user enthusiasm for providing the file regardless of its needs. For both public and private communities, the number of seeders becomes an issue that made a swarm unhealthy \cite{2010:pubpriv:meulpolder, 2014:sustainabilitytorrent:chen}. With freeriders join the swarm, naturally, it will reduce the overall performance. Furthermore, when freeriders become a majority, the swarm is as good as dead \cite{2000:freeridegnutella:adar}. % public vs private community. compare performance -\bt~ community can be divided into two categories : \textit{public} and \textit{private}. A community usually served by a \textit{tracker}. Public tracker means everybody can join the swarm served by that tracker. In the other hand, private communities are closed community which can be accessed by passing particular requirement \cite{2010:pubpriv:meulpolder, 2014:sustainabilitytorrent:chen}. \citeauthor{2010:pubpriv:meulpolder} measured that private communities have 3-5 times higher download speed compared to public communities \cite{2010:pubpriv:meulpolder}. This benefit makes joining private community is typically harder compared to public community. - % issue in both community. Imbalance. -Despite has different performance, both public and private community suffer from a similar issue: ``Poor downloading experience''. It is widely known that public community generally has low SLR which directly affect the swarm performance. In the other hand, private tracker suffers from ``\textit{poor downloading motivation}'' as described by \citeauthor{2014:sustainabilitytorrent:chen}\cite{2014:sustainabilitytorrent:chen} although private community intended to solve low SLR issue. The poor downloading motivation on private tracker affect the sustainability of a swarm. The imbalance of demand and supply will harm new members of private community and gradually degrade the motivation to keep active in the community for another user \cite{2014:sustainabilitytorrent:chen}. +\citeauthor{2010:pubpriv:meulpolder} measured that private communities have 3-5 times higher download speed compared to public communities \cite{2010:pubpriv:meulpolder}. This benefit makes joining private community is typically harder compared to public community. Despite has different performance, both public and private community suffer from a similar issue: ``Poor downloading experience''. It is widely known that public community generally has low SLR which directly affect the swarm performance. In the other hand, private tracker suffers from ``\textit{poor downloading motivation}'' as described by \citeauthor{2014:sustainabilitytorrent:chen}\cite{2014:sustainabilitytorrent:chen} although private community intended to solve low SLR issue. The poor downloading motivation on private tracker affect the sustainability of a swarm. The imbalance of demand and supply will harm new members of private community and gradually degrade the motivation to keep active in the community for another user \cite{2014:sustainabilitytorrent:chen}. % the necessity to add more reputation management using cooperation. To monetize cooperation. Peer-to-peer file sharing community, especially \bt~ can improve the user downloading experience. It does not give strain to server connection and naturally will download as fast as possible depending on file availability. However, uncooperative peer behavior and low file availability can affect a swarm's health thus reducing download experience. To complement tit-for-tat mechanism, it is necessary to implement global incentive scheme in \bt. Some researchers start by leveraging the reputation system for peers. This also supported by \citeauthor{2002:reputationtotragedy:milinski} that reputation can help solving ``tragedy of the commons'' problem \cite{2002:reputationtotragedy:milinski}. The mechanism can be centralized on decentralized. Private communities that enforce SRE is an example of centralized mechanism. The reputation of user is stored in the server while it update the data in the communication via tracker. BarterCast \cite{2009:bartercast:meulpolder} and its successor MultiChain \cite{2015:multichain:norberhuis} are the example of decentralized incentive mechanism that works on top of reputation system. @@ -26,7 +23,8 @@ \section{Cooperation and performance in \bt} %Most of them focused their work on the incentives for peer or alteration of the currency system itself. Tribler for example, working on a MultiChain \cite{2015:multichain:norberhuis} as a secure and accountable currency in P2P system. \citeauthor{2008:givetogetvod:Mol} published free-riding resilient algorithm for Video on Demand (VoD) in P2P environment\cite{2008:givetogetvod:Mol}. \citeauthor{2015:incentivep2pgame:kang} used game theory as a formulation to incentivize peer in order to prevent free-riding behaviour\cite{2015:incentivep2pgame:kang}. They also considered mobile P2P network which only capable of low complexity mechanism. In their work, peers are awarded with different credit depend on connection type and content. % problem: p2p social community is good if all peer is considerable, otherwise, it sucks.bittorrent pattern flashcrowd : many S/L, only at the beginning. deteriorate afterwards. User rewarded for providing old content? -\section{The confusion in \bt~credit investment} +\section{The dilemma in \bt~credit investment} +\todo{high credit : benefit user. Too high (few user): bad for community, imbalance. When to donate/invest, vice versa} In peer-to-peer system, specifically in \bt~protocol, \textit{incentive mechanism} is introduced to tackle freeriding problem \cite{2003:bittorrent:cohen}. With limited resource possessed by each peer, there is a price that need to be paid on accessing resource. The whole collection of transaction created incentive system which should be defined in order to extract user goodness in computational way. In economic terminology, it is necessary to specify a value of the resource that can lead to the ``wealth'' of a user. In \bt~system ``credit'' can be defined in various object. For specific, private community such as DIME\footnote{\url{www.dimeadozen.org}}, \citeauthor{2012:economicbt:kash} defined credit as \texttt{4 x upload - download} in bytes, accumulated for all the torrent served in that community \cite{2012:economicbt:kash}. \citeauthor{2015:creditmining:capota} assume the credit on his work as the difference between uploaded and downloaded bytes. \citeauthor{2014:sustainabilitytorrent:chen} mentioned another form of credit that can be earned depend on the activity, for example, seeding more torrents, seeding longer and old torrent, and seeding torrent that consumes large disk space\cite{2014:sustainabilitytorrent:chen}. In \bt, incentive system is enforced by its choking algorithm. It prefers to give the resource to the one who has the highest credit, in this case, the upload rate. In general file-sharing peer-to-peer application, user will receive credit when they upload data to others. @@ -47,55 +45,13 @@ \subsection{Gain return and performance by spending credit} Recent work on helping other user to increase downloading performance using \bt~ has been done. \citeauthor{2014:cloudseed:leon} uses \bt~ protocol to increase user download speed and at the same time reduce datacenters load. They analyze which swarm or file to help using user bandwidth information and number of connected user\cite{2014:cloudseed:leon}. From another perspective, \citeauthor{2015:coalitionbt:zhang} introduced the \textit{coalition} between \bt~ peers. Coalition is a set of peers that cooperate each other in regards to \bt~policy to minimize download completion time. They also propose coalition-compatible choking strategy to replace the current \bt~one. This research lead to significant performance improvement within the coalition \cite{2015:coalitionbt:zhang}. Although not using \bt~protocol, in \citeyear{2009:p2phelp:he}, \citeauthor{2009:p2phelp:he} proved that helper peer also can improve the streaming capacity in P2P system\cite{2009:p2phelp:he}. \citeauthor{2016:gameauctionp2pstream:mostafavi} extend this work by introducing auction aspect for uploader to choose which user will receive the bandwidth he donate \cite{2016:gameauctionp2pstream:mostafavi}. \citeauthor{2016:gameauctionp2pstream:mostafavi} used game-theory to propose new framework in uncooperative peers with maximizing the credit gain for helpers. % the importance of healthy swarm -> public good, prevent tragedy of the common. user is selfish, it is good to donate. Reality : they want something in return -> credit. Move the problem into investment problem. -The ideal situation of balanced high performance and sustainability in \bt~community is desired by align supply and demand as discussed in section \ref{section:suppdemand}. By gifting and helping undersupply community adequately, the optimal situation can be achieved and tragedy of the common can be prevented \cite{2002:reputationtotragedy:milinski}. However, commonly, that is not the case. P2P user are typically selfish in economical way \cite{2014:userbehaviourprivate:jia}. By gifting, it will take ones resource without any return. In fact, common user wanted some return as compensation. Therefore, investment is the most feasible method to balance user and community needs. +The ideal situation of balanced high performance and sustainability in \bt~community is desired by align supply and demand as discussed in section \ref{section:suppdemand}. By gifting and helping undersupply community adequately, the optimal situation can be achieved and tragedy of the common can be prevented \cite{2002:reputationtotragedy:milinski}. However, commonly, that is not the case. P2P user are typically selfish in economical way \cite{2014:userbehaviourprivate:jia}. \citeauthor{2009:demandsupplyres:andrade} also shows that user who contribute more to the community, actually consume a lot from it. This explains that \bt~users are not altruistic enough to seed continuously. By gifting, it will take ones resource without any return. In fact, common user wanted some return as compensation. Therefore, investment is the most feasible method to balance user and community needs. % find low price, sell high price In classical economic principal, the key to gain benefit is to buy low and sell high. However, this property depends on the item and the market condition. If we translate it into \bt~ economic environment, the item is the file, and the condition is the swarm. \citeauthor{2012:economicbt:kash} introduced term is \textit{resale value}. Resale value is the amount of \textit{gross} credit one will get by uploading a file. In DIME case, it is 4 times uploaded bytes. In other words, resale value is the amount of return one can expect by uploading a file. We saw this mechanism as a way to incentivize user. Because by uploading one byte, a user can get 4 credit which can be used to spend/download 4 bytes. By finding popular item and suitable swarm, the potential of investment become huge. -\section{Optimizing resource} -To gain credit, seeding is necessary. However, users are forced to seed for excessively long time to maintain adequate credit \cite{2013:survivepriv:jia}. \citeauthor{2013:survivepriv:jia} also stated that this activity is commonly practiced although it is not productive. By seeding unproductively, user wastes his resources, such as bandwidth, storage capacity, and computer power. - -In larger scale, if we meant to seed a lot of torrents, for example, a million, the bottleneck occurred will be more fundamental. \citeauthor{2012:milliontorrent:arvid} shows an example how costly the \textit{announce} request accompanied by \textit{response} payload can be. Seeding 1 million torrent with announce once per every hour, which is half of the default interval, need 130 kB/s upload and 75 kB/s download bandwidth constantly \cite{2012:milliontorrent:arvid}. This value is significant for most of common Internet connection. - -Most researches have measured \bt~ by crawling its community pages \cite{2013:survivepriv:jia, 2005:bittorrentcooperation:andrade, 2014:userbehaviourprivate:jia, 2010:pubpriv:meulpolder, 2014:sustainabilitytorrent:chen, 2012:economicbt:kash, 2013:investmentcm:capota, 2009:demandsupplyres:andrade, 2011:interswarm:capota}. This way, they can get the data summarized by the pages. Some researches contact the tracker regularly or using its dump logs \cite{2011:yoshida:crawlbtnet, 2005:bittorrentcooperation:andrade, 2015:freeriderinbtcommunity:das, 2011:interswarm:capota}. Most of the research using logs as its dataset only use single tracker to monitor a particular torrent. Few of the research use instrumented client or directly observe the \bt~environment by understanding peer behavior \cite{2010:pubpriv:meulpolder, 2013:swarmevolution:su}. A work conducted by \citeauthor{2010:btworld:wojciechowski} discussed several method of \bt~measurement. The methods shown in table \ref{tbl:btmeasuremethod}. - -\begin{table}[ht] - \centering - \caption{\bt~measurement techniques \cite{2010:btworld:wojciechowski}} - \label{tbl:btmeasuremethod} - \begin{tabular}{|l|l|l|} - \hline - \rowcolor[HTML]{C0C0C0} - \multicolumn{1}{|c|}{\cellcolor[HTML]{C0C0C0}\textbf{Level}} & \multicolumn{1}{c|}{\cellcolor[HTML]{C0C0C0}\textbf{Advantage}} & \multicolumn{1}{c|}{\cellcolor[HTML]{C0C0C0}\textbf{Disadvantage}} \\ \hline - Internet & Excellent coverage & ISP collaboration \\ \hline - Community & Implementation & Peer details \\ \hline - Swarm & Details & Context \\ \hline - Peer & Details & Scalability \\ \hline - \end{tabular} -\end{table} - -In the following section, we will focus on the measurement technique with peer discovery. With the trackless \bt evolved in early 2008 by DHT protocol \cite{2008:dht:loewenstern}, monitoring trackers may result in inaccurate result. Moreover, the credit mining mechanism heavily rely on real-time data which can not be obtained from querying the community pages. +This phenomenon is accumulated by the existing of \textit{flashcrowd} effect. Flashcrowd effect is the sudden increase in resource demand due to various reason. Newly published torrent is one of the reasons where flashcrowd effect take place \cite{2013:swarmevolution:su}. \todo{take advantage of flashcrowd} -% peer discovery DHT, PEX, LSD -\subsection{Peer Discovery} -One of the integral part in \bt~protocol is peer discovery. With numerous known peers, the algorithm will have more option on which peer to unchoke. State of the swarm itself often represented by the peer belong to that swarm. As mentioned before, it is relatively costly just to discover new peers if there are a lot of swarms monitored. - -In \bt, there are four methods to discover new or update peer. Those are using centralized trackers, distributed hash table (DHT), peer exchange (PEX), and local service discovery (LSD). The methods will be described below. - -\subsubsection{Tracker Peer Announce} -In original design of \bt, it uses tracker to allow peer discover each other \cite{2003:bittorrent:cohen}. Tracker tends to use random and limited list of peers. Peer contact tracker periodically to expand their peer dictionary. This act of requesting peer to tracker is called \textit{announce}. Usually, most tracker has a policy about recommended interval when to recontact for getting new peers. Violate this policy can result a particular peer blocked. - -\subsubsection{Distributed Hash Table (DHT)} -Originally, peer needs to contact tracker to fetch new peer address and file information. This makes \bt~very dependent on centralized system which vulnerable to single point of failure. In 2008, Distributed Hash Table (DHT) was proposed \cite{2008:dht:loewenstern}. Towards a ``trackerless'' \bt~system, DHT allows each peer to become a tracker. DHT stores peer contact information with defined key-space as ``node ID''. Each peer stored other peer's node ID and its address in their own routing table. A ``distance'' is measured on two node ID to define how close those two. ``Distance'' also can be measured between infohash of a torrent and node ID. - -To enrich its peer dictionary, a node can compare a torrent's infohash and node ID in its routing table. If the distance under the threshold, it contacts that node to ask the information of the swarm, which includes the peer list. If contacted node do not know this torrent, it will respond with another node in its table which closest to the provided infohash. -%\todo{expand_:DHT performance?} - -\subsubsection{Peer Exchange (PEX)} -To increase the chance of getting higher downloading speed, having up to date peer is desired. This can be achieved by contacting tracker or using DHT. Reducing the interval of contacting tracker can result in getting a number of updated peer sooner, however, it will put a burden on the tracker itself. Peer Exchange (PEX )\cite{2015:PEX:the8472} is proposed to tackle this problem. PEX used list of peers that bootstrapped from another mechanism. This mechanism allows contacting known peer directly to get and give up-to-date information on swarm. Theoretically, it can keep this swarm together if trackers are down. Specification mentioned in \cite{2015:PEX:the8472} stated a restriction such as number of request per minute and number of peer added or removed in a PEX message. - -\subsubsection{Local Service Directory (LSD)} -To increase the performance when downloading from a swarm, it is preferable to get the file from local network if available. Local service directory (LSD) permit this by discover peers that are in the same local network. The transfer rate is much higher compared to other type of peers. In short, LSD uses multicast-like mechanism which broadcast infohash of a torrent. %\\ %Anonymous Relaying performance in Tribler \cite{2015:onionroutetribler:stokkink}\\ %Significant portion when seeding million torrents \cite{2012:milliontorrent:arvid} @@ -109,9 +65,9 @@ \subsubsection{Local Service Directory (LSD)} \section{Prior Credit Mining Research} \label{section:cmprior} -Preliminary work on credit mining has been done by \citeauthor{2015:creditmining:capota} \cite{2015:creditmining:capota, 2013:investmentcm:capota, 2014:bwmarket:capota}. On the prototype they made, they implement complex method with speculative download to assess the swarms\cite{2013:investmentcm:capota}. Extending this work, they introduced \textit{helper} peer to seed low capacity swarm using libtorrent \textit{share mode}\cite{2014:bwmarket:capota}. Recently, they moved into multiple swarm approach and using public community as their research object. With swarm selection policy, they observed whether helper peer can generate high credit with less downloading\cite{2015:creditmining:capota}. +Our work is based on preliminary work by \citeauthor{2015:creditmining:capota} from 2010 till 2013 \cite{2015:creditmining:capota, 2013:investmentcm:capota, 2014:bwmarket:capota}. On the prototype they made, they implement complex method with speculative download to assess the swarms\cite{2013:investmentcm:capota}. Extending this work, they introduced \textit{helper} peer to seed low capacity swarm using libtorrent \textit{share mode}\cite{2014:bwmarket:capota}. Recently, they moved into multiple swarm approach and using public community as their research object. With swarm selection policy, they observed whether helper peer can generate high credit with less downloading\cite{2015:creditmining:capota}. \citeauthor{2015:creditmining:capota} conducted emulation and simulation in their work. -\begin{figure}[ht] +\begin{figure}[t] \centering \includegraphics[width=0.7\textwidth]{pics/SDE2013.png} \caption{Speculative download mechanism \cite{2013:investmentcm:capota}}. @@ -138,7 +94,47 @@ \section{Prospecting good investment} Investing can not be separated by another activity called ``prospecting''. We define \textit{prospecting} as the activity to identify and measure a swarm in the hopes of getting more credit by putting a some credit as capital investment. Prospecting is the initial phase of investment, therefore, it is not needed to be comprehensive and only necessary in smaller scale. In economic perspective, not all undersupplied swarm need to be seeded, even more oversupplied swarm. Choosing a swarm to be seeded is depend on a peer available resource, intention, and investment target. In credit based community, correct investment may spark the community thus improving the performance. From user perspective, good prospecting algorithm can result a high return of the resource used from both investing and prospecting. % crawl -> part of prospecting, knowing the swarm. -One important part of prospecting is to identify and measure a particular swarm. Some information can be gathered by querying tracker as central coordinator. A work by \citeauthor{2011:yoshida:crawlbtnet} is the example of contacting tracker regularly to get swarm information \cite{2011:yoshida:crawlbtnet}. As the multi-tracker structure become common, they proposed to contact only one \textit{representative tracker}, which maintain the maximum number of peers in a swarm. BTWorld\footnote{\url{http://btworld.nl/}} has identified four measurement techniques in \bt~\cite{2010:btworld:wojciechowski} as shown in table \ref{tbl:btmeasuremethod}. As investment need real-time data, both \textit{swarm-level} and \textit{peer-level} measurement seems to be the most compatible with prospecting method implementation. Both \textit{internet-level} and \textit{community-level} need compiled data from ISP company and community administrators, respectively \todo{any work how to measure swarm by looking at the peers?}. +Most researches have measured \bt~ by crawling its community pages \cite{2013:survivepriv:jia, 2005:bittorrentcooperation:andrade, 2014:userbehaviourprivate:jia, 2010:pubpriv:meulpolder, 2014:sustainabilitytorrent:chen, 2012:economicbt:kash, 2013:investmentcm:capota, 2009:demandsupplyres:andrade, 2011:interswarm:capota}. This way, they can get the data summarized by the pages. Some researches contact the tracker regularly or using its dump logs \cite{2011:yoshida:crawlbtnet, 2005:bittorrentcooperation:andrade, 2015:freeriderinbtcommunity:das, 2011:interswarm:capota}. Most of the research using logs as its dataset only use single tracker to monitor a particular torrent. Few of the research use instrumented client or directly observe the \bt~environment by understanding peer behavior \cite{2010:pubpriv:meulpolder, 2013:swarmevolution:su}. BTWorld\footnote{\url{http://btworld.nl/}} has identified four measurement techniques in \bt~\cite{2010:btworld:wojciechowski} as shown in table \ref{tbl:btmeasuremethod}. As investment need real-time data, both \textit{swarm-level} and \textit{peer-level} measurement seems to be the most compatible with prospecting method implementation. Both \textit{internet-level} and \textit{community-level} need compiled data from ISP company and community administrators, respectively \todo{any work how to measure swarm by looking at the peers?}. + +\begin{table}[ht] + \centering + \caption{\bt~measurement techniques \cite{2010:btworld:wojciechowski}} + \label{tbl:btmeasuremethod} + \begin{tabular}{|l|l|l|} + \hline + \rowcolor[HTML]{C0C0C0} + \multicolumn{1}{|c|}{\cellcolor[HTML]{C0C0C0}\textbf{Level}} & \multicolumn{1}{c|}{\cellcolor[HTML]{C0C0C0}\textbf{Advantage}} & \multicolumn{1}{c|}{\cellcolor[HTML]{C0C0C0}\textbf{Disadvantage}} \\ \hline + Internet & Excellent coverage & ISP collaboration \\ \hline + Community & Implementation & Peer details \\ \hline + Swarm & Details & Context \\ \hline + Peer & Details & Scalability \\ \hline + \end{tabular} +\end{table} + +In the following section, we will focus on the measurement technique with peer discovery. With the trackless \bt evolved in early 2008 by DHT protocol \cite{2008:dht:loewenstern}, monitoring trackers may result in inaccurate result. Moreover, the credit mining mechanism heavily rely on real-time data which can not be obtained from querying the community pages. Swarm measurement includes swarm size, peers properties, and swarm popularity. These data can be acquired both directly and indirectly by crawling peers in the swarm. + +% peer discovery DHT, PEX, LSD +\subsection{Peer Discovery} +One of the integral part in \bt~protocol is peer discovery. With numerous known peers, the algorithm will have more option on which peer to unchoke. State of the swarm itself often represented by the peer belong to that swarm. It is relatively costly just to discover new peers if there are a lot of swarms monitored. \citeauthor{2012:milliontorrent:arvid} shows an example how costly the \textit{announce} request accompanied by \textit{response} payload can be in seeding a lot of torrents. Seeding 1 million torrents with announce once per every hour, which is half of the default interval, need 130 kB/s upload and 75 kB/s download bandwidth constantly \cite{2012:milliontorrent:arvid}. This value is significant for most of common Internet connection. + +In \bt, there are four methods to discover new or update peer. Those are using centralized trackers, distributed hash table (DHT), peer exchange (PEX), and local service discovery (LSD). The methods will be described below. To be able fully trackerless, \textit{magnet link} extension is needed in every peer \cite{2008:magnet:hazel}. By magnet link, user can join a swarm and complete the download without using \texttt{.torrent} as its initial data. + +\subsubsection{Tracker Peer Announce} +In original design of \bt, it uses tracker to allow peer discover each other \cite{2003:bittorrent:cohen}. Tracker tends to use random and limited list of peers. Peer contact tracker periodically to expand their peer dictionary. This act of requesting peer to tracker is called \textit{announce}. Usually, most tracker has a policy about recommended interval when to recontact for getting new peers. Violate this policy can result a particular peer blocked. + +\subsubsection{Distributed Hash Table (DHT)} +Originally, peer needs to contact tracker to fetch new peer address and file information. This makes \bt~very dependent on centralized system which vulnerable to single point of failure. In 2008, Distributed Hash Table (DHT) was proposed \cite{2008:dht:loewenstern}. Towards a ``trackerless'' \bt~system, DHT allows each peer to become a tracker. DHT stores peer contact information with defined key-space as ``node ID''. Each peer stored other peer's node ID and its address in their own routing table. A ``distance'' is measured on two node ID to define how close those two. ``Distance'' also can be measured between infohash of a torrent and node ID. + +To enrich its peer dictionary, a node can compare a torrent's infohash and node ID in its routing table. If the distance under the threshold, it contacts that node to ask the information of the swarm, which includes the peer list. If contacted node do not know this torrent, it will respond with another node in its table which closest to the provided infohash. +%\todo{expand_:DHT performance?} + +\subsubsection{Peer Exchange (PEX)} +To increase the chance of getting higher downloading speed, having up to date peer is desired. This can be achieved by contacting tracker or using DHT. Reducing the interval of contacting tracker can result in getting a number of updated peer sooner, however, it will put a burden on the tracker itself. Peer Exchange (PEX )\cite{2015:PEX:the8472} is proposed to tackle this problem. PEX used list of peers that bootstrapped from another mechanism. This mechanism allows contacting known peer directly to get and give up-to-date information on swarm. Theoretically, it can keep this swarm together if trackers are down. Specification mentioned in \cite{2015:PEX:the8472} stated a restriction such as number of request per minute and number of peer added or removed in a PEX message. + +\subsubsection{Local Service Directory (LSD)} +To increase the performance when downloading from a swarm, it is preferable to get the file from local network if available. Local service directory (LSD) permit this by discover peers that are in the same local network. The transfer rate is much higher compared to other type of peers. In short, LSD uses multicast-like mechanism which broadcast infohash of a torrent. + +\subsection{Credit mining as investment tool} The idea of credit mining system is to help undercapacity swarm, while at the same time to get credit for uploading data. The system try to find which swarm that might have high return by \textit{prospecting}. The investment, which relies in prospecting function, is considered with limited resources as additional requirement. Resource can be in several forms such as bandwidth, memory, or storage. Although the term ``good'' may be relative, we intend to show the efficiency of credit mining from different aspect. Therefore, we define the first research question as : @@ -153,7 +149,7 @@ \section{Prospecting good investment} In order to answer the question, we formulate technical challenge that need to be solved. The challenges include engineering and performance evaluation aspect. Prospecting swarm and continuously seed to gain credit may disrupt the user activity. In the other hand, it is important to take advantage of unused bandwidth. In the previous work, it is assumed that credit mining system will consume all the bandwidth. In evaluating the system, it is necessary to observe the effect of credit mining system in a whole. This can be achieved by deploying the system in live production environment. Many characteristics of swarm such as low seeder, practically dead swarm, and new published swarm will be considered. Also, the system can be improved by evaluating the properties continuously. \section{Substituting investment cache} -In the first question, we have addressed how to gain credit as much as possible efficiently and in non-disruptive manner. However, this is not answering the limited resource available at the user disposal. In this issue, we will specifically focus on the storage limitation. The term \textit{storage} and \textit{cache} can be used interchangeably. It points to the container used to store the swarm data as the source of investment. +In the first question, we have addressed how to gain credit as much as possible efficiently and in non-disruptive manner. However, this is not answering the limited resource available at the user disposal. Investment is a tedious activity if being done manually. Users are often forced to seed for excessively long time to maintain adequate credit \cite{2013:survivepriv:jia}. \citeauthor{2013:survivepriv:jia} also stated that this activity is commonly practiced although it is not productive. By seeding unproductively, user wastes his resources, such as bandwidth, storage capacity, and computer power. In this issue, we will specifically focus on the storage limitation. The term \textit{storage} and \textit{cache} can be used interchangeably. It points to the container used to store the swarm data as the source of investment. To start seeding, the data must be available locally in the storage. By having many data, there are higher chance to seed many swarms as well. Eventually, it is necessary to replace obsolete investment. Several reasons to do so such as gaining less profit, unstable credit, or unreliable swarm. Downloading a swarm from nothing is costly, especially if all the content need to be downloaded. Moreover, as mentioned before, it is better to avoid both underseeded and overseeded swarm because it will affect the sustainability. By replacing old swarm by the new swarm, the balance of the ecosystem must be remained stable. The method to find which swarm that has less impact to replace by the new potential swarm is needed. Although user can control the investing process, it is desirable to do this automatically. Therefore, we define the second research question as : diff --git a/bib/bibliography.bib b/bib/bibliography.bib index f7fa301..5cfd3fa 100644 --- a/bib/bibliography.bib +++ b/bib/bibliography.bib @@ -15,6 +15,13 @@ @Article{ 2008:dht:loewenstern year = {2008} } +@Article{ 2008:magnet:hazel, + title = {Extension for Peers to Send Metadata Files}, + author = {Hzel, Greg and Norberg, Arvid}, + journal = {BitTorrent.org. http://www.bittorrent.org/beps/bep\_0009.html. Accessed: 13 October 2016}, + year = {2008} +} + @Article{ 2008:tribler:pouwelse, archiveprefix = {arXiv}, arxivid = {arXiv:1302.5679v1}, @@ -564,3 +571,18 @@ @INPROCEEDINGS{2011:eternalseed:jia doi={10.1109/P2P.2011.6038746}, ISSN={2161-3559}, month={Aug},} + +@article{2005:indirectreciprocity:nowak, + author = {Nowak, Martin A. and Sigmund, Karl}, + doi = {10.1038/nature04131}, + issn = {0028-0836}, + journal = {Nature}, + month = {oct}, + number = {7063}, + pages = {1291--1298}, + publisher = {Nature Publishing Group}, + title = {{Evolution of indirect reciprocity}}, + url = {http://www.nature.com/doifinder/10.1038/nature04131}, + volume = {437}, + year = {2005} +} diff --git a/thesis.tex b/thesis.tex index 31311cf..108c421 100644 --- a/thesis.tex +++ b/thesis.tex @@ -14,6 +14,8 @@ % for fancy header on table \usepackage[table,xcdraw]{xcolor} +\usepackage{multirow} +\usepackage{chngpage} % todo notes \usepackage{todonotes}