Introduction - Department of Computer Science



Spotlighting Decentralized P2P File Sharing

By:

Archie Kuo and Ethan Le

CS158B

Table of Contents

|Section |Page |

|1. Introduction………………………………………………………………………. |3 |

|2. The Three Paradigms…………………………………………………………….. |3 |

| 2.1 Centralized Directory………………………………………………………... |3 |

| 2.2 Decentralized Directory……………………………………………………... |4 |

| 2.3 Query Flooding……………………………………………………………… |4 |

|3. Kazaa…………………………………………………………………………….. |5 |

|4. BitTorrent………………………………………………………………………… |6 |

|5. Other P2P Systems……………………………………………………………….. |7 |

|6. Social and Security Issues………………………………………………………... |8 |

|7. Conclusion……………………………………………………………………….. |8 |

1. Introduction

One of the major advantages in today’s Internet is the ability to access and share information across great distances. Traditionally, the technique to transport information involves the user retrieving desired information from a server, which are managed by content providers, ISP, or CDN servers. This follows the client-server model. However, another aspect of content sharing involves the elimination of the server. This type of content distribution is known as peer-to-peer (P2P) file sharing, wherein individuals, serving as both the client and the server, can directly connect and retrieve information from each other. “In a peer-to-peer network, any node is able to initiate or complete any supported transaction with any other node. Peer nodes may differ in local configuration, processing speed, and network bandwidth” (Wikipedia, . org/wiki/Peer_to_peer). “Such systems are in a sense, the direct opposite of the client-server model” (Goel, n.d., pg. 1). P2P file sharing systems are classified based on their generations. In this paper, we will highlight the three P2P file sharing paradigms or architectures including centralized directory, decentralized directory, and query flooding. We will focus our attention on the decentralized architecture and discuss its advantages and disadvantages as well as describe two specific P2P software that use this architecture. Finally, we will take a step back and briefly touch upon attacks on P2P systems, the social impacts of P2P file sharing, and alternative P2P software.

2. The Three Paradigms

Three P2P file-sharing paradigms are pervasively used in today’s networks. In this section, we will briefly highlight the set-up, advantages, and disadvantages of each system. Then we will present an in depth analysis of the decentralized paradigm.

2.1 Centralized Directory

Centralized directory is a first generation P2P file-sharing paradigm. It rests upon the principle that a single, large server provides the directory service. The server keeps track of all users based on their IP address and the objects or files the user wish to share. By collecting this information from each active user, the directory server consequently creates a “centralized, dynamic database that maps each object name to a set of IP addresses. When an active peer obtains a new object, or removes an object, it informs the directory server, so that the directory server can update its database” (Kurose and Ross, 2003, p. 167). To keep track of its active peers, the server periodically sends messages to the peer to see if they are alive. The server can also maintain a constant TCP connection with each peer so that as soon as a connection ends, the peer is disconnected and is removed from the server’s database.

The centralized directory file-sharing paradigm is straightforward and easy to maintain since the manager only need to manage the server. However, this simplicity incurs inherent flaws. The single point of management (i.e. the all-powerful server) translates to a single point of failure. “If the directory server crashes, the entire P2P application network crashes” (Kurose and Ross, 2003, p. 168). Furthermore, the major demands on the server can and will create a bottleneck effect. With only a single server and literally hundreds of thousands of users requesting service at the same time, the server must maintain a gigantic database in order to respond to the thousands of requests for service in seconds. The server can be jammed with much more requests than it can handle, thus, propagating the delay to its users. Finally, it is easy to sift out copyright infringement and shut down a centralized directory system. The server maintains a huge database of all shared materials, thus, enabling easy tracking of which shared material is illegal (for example, music or MP3 files). Consequently, any violations can result in the closure of the server, which translates to the death of that network (as was the case with Napster).

2.2 Decentralized Directory

A more efficient paradigm for P2P file sharing, and one that is more pervasive, is the decentralized network approach. Unlike the centralized directory paradigm, decentralized directory eliminates a directory server. Decentralized network can be further broken down to two different types: super (i.e. decentralized directory) and equal (i.e. query flooding) decentralized network.

With a super decentralized network, or decentralized directory, peers who are group leaders replace the central directory. When peers signs on to the P2P application, they are assigned to a group leader. These group leaders only track the contents of its own group or neighborhood by mapping names of content to IP addresses of the peers. This architecture forms an overlay network in which there is an edge between each peer and its group leader, and each group leader has an edge to each other. It is important to note that an “edge” is a virtual, not physical, link between the peers. To a request object, the peer queries its group leader, who then can query their neighborhood or other group leaders. Once the desired object is located, the peers connect directly to each other. Furthermore, the group leader is just an ordinary peer, not a dedicated server. Consequently, the management, specifically, the assignments of a peer’s rank (whether as a group leader or regular peer), must be managed by a bootstrap node.

When a peer wants to join the network, it first contacts the bootstrapping node. The bootstrapping node responds with the IP address of one of the group leaders, and the peer then establishes an edge with that group leader. Furthermore, when the peer initially contacts the bootstrapping node, the bootstrapping node can designate the peer as a new group leader. If a peer is a group leader, it needs to know the IP addresses of some (or all) of the other group leaders. It obtains this information as well from the bootstrapping node. Because the bootstrapping nodes are “always on” servers, peers can use DNS to locate them (Kurose and Ross, 2003, p. 170).

The super decentralized network conveniently dissolves the huge dedicated central server database into small sized databases managed by each group leader. Consequently, it is harder to shut down or track copyright infringements with a decentralized architecture. However, there are several setbacks to this type of paradigm. First, the construction and maintenance of the overlay network is left in the hand of a complex protocol (i.e. the bootstrapping node) to account for the dynamic flow of peers. Second, this approach does not fully decentralize the network because the network depends on a bootstrapping node and the unequal ranks of group leaders and peers.

2.3 Query Flooding

In this technique, the peers first organize themselves into an overlay network. The major difference between this overlay network and the one described above is that in this network, all the peers are equal. In other words, there is no hierarchical structure or topology where there are group leaders and subordinate peers. Instead, the peers are directly connect to a set of peers via a complex protocol (run by an always on bootstrapping node) to maintain the dynamic changes of the overlay network. In addition, the peers do not maintain directory information relating content to IP addresses. Peers searches for contents using a technique known as query flooding. This technique requires that a peer forward a request to their neighboring peers, who then forward that request to their neighbor, and so on. If a peer receiving the query has the desired object, it replies to the originator of the query stating that it has the requested object. This architecture, employed by Gnutella, creates a highly decentralized network and simplifies the network design.

However, this design has a major a flaw: queries floods the network and leads to scalability problems. For example, say Alice has three neighbors, each of whom has three neighbors of their own. When Alice wants to find an object A, she sends a query to her three neighbors, who then forwards this query to their three neighbors, who then continues to forward the query to their neighbors until a reply is received. We can see that by the time Alice’s neighbor sends out their queries, there are 12 queries on the network for the same object (3 from Alice, and 9 from her 3 neighbors). This causes an exponential growth of the number of queries for the same object put in the network. Extending this scenario to all peers, the network will constantly be plagued with queries and peers bombarded with requests. To combat this scenario, Gnutella put in place a hop count limit, which is specified by a specific number called the node-count field. This means that when a peer receives a query, it will decrement this node-count by one. Consequently, a peer will stop forwarding the query to its neighbor if the node-count is zero. “In this manner, when a node initiates a query, the flooding is localized to a region of the overlay network. This approach reduces the query traffic at the expense of potentially not finding all the peers with the desired content” (Kurose and Ross, 2003, p. 171-172).

We will now examine Kazaa and BitTorrent, which uses a decentralized architecture, and move forward to highlight the social impacts and attacks of P2P systems.

3. Kazaa

If there were a single software application that is associated with decentralized networks, that one would be Kazaa. With a user base of over 3 million, Kazaa is the definitive decentralized network application. The basic premise of Kazaa is that users are able to connect to one another over the internet and share data, all without the need for a central point of management. Before Kazaa, peer-to-peer (P2P) file sharing was growing rapidly in popularity with applications such as Napster and Gnutella, but with the advent of Kazaa, P2P file sharing has exploded. Presented in a study by Liang et al. (2004), Kazaa held a 76% share of all the P2P sharing traffic at one point in time.

Kazaa, designed by Sharman Networks, does not have documented analysis on the design available to the public. Through detailed studies performed on Kazaa’s behavior, there exists documentation and analysis on how Kazaa works. As stated, Kazaa is a decentralized network that allows users to share files without the need for a central agent. What is of interest here is that not all nodes in the Kazaa topology are in the same classification. states that the Kazaa application classifies nodes into two categories: Ordinary Nodes (ON) and Super Nodes (SN). This grouping of users in Kazaa is known as FastTrack, which is a second-generation P2P protocol. explains that users are designated either as SN or ON based on their systems capabilities such as fast network connection, high-bandwidth, and high processing capabilities. This process is all done without the user’s knowledge so users do not know whether they have been designated a SN. Kazaa has approximately 30,000 SNs and each one acts like a traffic hub that processes data requests from slower ONs, serving approximately 60 – 150 ON’s. Each ON is assigned a SN upon logging into the network. The SN is given the ON’s metadata and allows the SN to keep a database of all the files its children are sharing. The topology that is created is a tree topology, with the root of each tree communicating with the roots of other trees.

When a user makes a request, the request propagates to the SN, which in turn communicates with other SNs. The SNs then communicate with their respective ONs, the child nodes of those ONs and so on until the Time To Live (TTL) of seven expire, giving the user a search that is seven levels deep. Once the correct file is located, file transfer occurs directly between the two nodes without needing to go through the SN using Hyper Text Transfer Protocol (HTTP).

The details of what happens when a user makes a request involve sending a query with keywords over a TCP connection from the user’s ON to its SN. For every match the SN sends the IP address and metadata of the matching node. Each SN also maintains TCP connections with other SNs creating an overlay network among SNs. When an SN receives a query it forwards the query to one or more directly connected SNs allowing the query to visit a small subset of the network. SNs shuffle their directly connected nodes at a rate of approximately 10 minutes allows for a larger area of the network to be reached.

Detailed analysis shows that there are four kinds of TCP traffic in use. The four types of traffic are signaling traffic, file transfer traffic, commercial traffic and instant messaging traffic. Signaling traffic involves the handshake process for connection establishment. All signaling traffic is encrypted. File transfer traffic involves the actual data being transferred from peer-to-peer without the aid of an intermediate node. This traffic is sent unencrypted over HTTP. The third type of traffic, commercial traffic, consists of commercial advertisements sent over HTTP. The fourth type of traffic, instant messaging traffic is as its name states, instant messaging traffic. This traffic is encoded as Base64 and allows peers to communicate with each other.

As popular as Kazaa was, it is facing a downturn in user-ship. Competition and reliability are the driving factors behind Kazaa’s decline in popularity. In the next section, the focus will be on Kazaa’s competition.

4. BitTorrent

The main alternative to Kazaa today is BitTorrent. So what exactly is BitTorrent? BitTorrent is a peer-to-peer network program that achieves data distribution through user cooperation. Essentially, the users of the network are the network. Data gathered by the users is also the data distributed by users, all without the need for a centralized agent. So why should the general public care? To quote , “BitTorrent is a free-speech tool.” What is created here is a tool that allows for the free flow of information without the traditionally high costs associated with widespread distribution. Some of the costs include material, time and medium costs. If we use a newly created song file as an example, the material costs associated with traditional distribution are pressing CD’s, printing artwork, and printing advertising material. Some of the costs associated with the medium are distribution to media outlets and getting playtime on the radio and TV, and bandwidth for internet distribution. Time is also a big factor because all the previously mentioned tasks require a great deal of time, which contributes to a high opportunity cost. BitTorrent is able to bypass all these costs and reach a large audience with virtually no cost for materials, time and medium.

So how does BitTorrent achieve widespread data distribution at no monetary or bandwidth cost so successfully? The crux is user cooperation. A requesting machine will receive the data in chunks from various users rather than from one central agent. Files are broken up into smaller chunks, usually a quarter of a megabyte in size. As the chunks are distributed throughout the decentralized network, the requesting machine will receive the chunks in random order and will be able to reassemble the files, even if the chunks are received out of sequence. The requesting machine is able to recognize the missing file and locate the peer that has the best connection. The requesting machine at the same time can utilize its upload channel and distribute the pieces it already has to other requesting peers on the network. This model of distribution provides exponential growth. The more peers that download the file the more peers there will be to distribute the file. states, “Cooperative distribution can grow almost without limit, because each new participant brings not only demand but also supply.” This leads to limitless growth with little cost.

Now that we have examined how BitTorrent works, how does it compare to other file sharing systems. Some of the alternatives to BitTorrent are eDonkey, eMule, and Kazaa (Mitchell, ). The eDonkey file sharing system, popular in Europe has a larger number of files being shared and downloaded which leads to less bandwidth compared to BitTorrent. Also of note is that the original version of eDonkey, eDonkey2000, did little to prevent leeching. The systems eDonkey is available on are Linux, Macintosh and Windows. Another alternative to BitTorrent is eMule. The goal of eMule was to improve upon eDonkey by establishing a larger user base through the connection of eDonkey user base with several other networks. To prevent leeching, eMule uses a “credits” system that rewards file sharing. The main drawback to eMule is that it is slower than most other P2P networks. The most popular alternative to BitTorrent is Kazaa. Both fast and easy to use, Kazaa uses a “participation level” system to encourage sharing. When multiple requests come in for a file, the user with the highest participation gets it first. That user then passes it on to the user with the next highest level and so on. This creates a pyramid system of sharing.

5. Other P2P Systems

Since BitTorrent is still a work in progress, there are new developments constantly on the horizon. Two of the upcoming developments are web seeding and broadcatching. Web seeding is a PHP script that places one seed permanently online so that the file will never disappear. The basic idea is to set up a machine that will take care of the initial seeding and when there are enough seeds the initial seeder can be taken down.

Broadcatching, a term coined by Fen Lebalme, is a system that implements RSS feeds in conjunction with BitTorrent. defines broadcatching as, “a many to one gathering of information, using a network of personalized agents to ideally sift through all available information and return just that which is of possible current interest from trusted, authenticatable sources and in a form and style amenable to the user.” In other words, the user can set up a RSS feed of BitTorrent files and periodically check the feed for new files and download these files. If the user likes this feed, the user can set up a subscription to this feed. For example, if the user likes to watch The Apprentice, the user can set up an RSS feed that will download new episodes of The Apprentice automatically, similar to the Season Pass feature on Tivo.

6. Social and Security Issues

The biggest issue that decentralized networks and more specifically P2P applications face is the question of whether or not these applications are even legal. On one side, the argument is that these applications are in fact legal. The logic being that the software is simply a means of distributing data from node-to-node or peer-to-peer without the need for a central agent. “The technology underlying file sharing programs is not inherently bad” (The Dark Side, 2003, p. 3). There is nothing inherently wrong or illegal about such a system. What people choose to use the medium for is not the responsibility of the creators. The decentralized nature of these programs creates an unpredictable, uncontrollable environment that contains a “minority” of users who choose to conduct illegal activities on the system.

On the opposing corner, opponents of P2P applications argues that the creators of these applications know of its user base’s intended illegal use and do nothing to stop it. They argue that illegal file sharing is foreseeable and therefore the legal responsibility of the creators of Kazaa and BitTorrent is to take measures to prevent it from happening. The biggest proponent of this argument is the music industry, which claims hundreds of millions in lost revenue due to file sharing.

Another issue faced by these P2P applications, and one that has contributed to the significant downturn in usage of Kazaa is the issue of fake files. In 2002, the Recording Industry Association of America sued Kazaa and while the lawsuits were being fought in court, unconventional attacks were performed on Kazaa itself. The music industry began seeding Kazaa with fake files of songs, which began to propagate throughout the network. These files would begin with the first 10 seconds of the song and then be filled with bleeps and blips the rest of the way. The goal of this attack was to frustrate users to the point where they will decide to go and purchase the music instead. This attack is known as a poisoning attack. Some other attacks are denial of service attacks, viruses, malware, and identity attacks. Denial of service attacks try to break the network or at least try to make it run very slowly. Viruses are infected files distributed over the network. Often times, the file description will be misleading, such as posing as a popular music file. Malware attacks are similar to Virus attacks except that the files are embedded with spyware for example. An identity attack consists of tracking down the identity of the user and harassing or legally attacking the user.

7. Conclusion

P2P file-sharing systems have allowed users to share information without the use of a server. Three architectures for P2P systems (centralized, decentralize, and query flooding) were presented, with an emphasis on the decentralized and query flooding paradigm. Furthermore, we examined Kazaa and BitTorrent to gain a better understand of how decentralized systems operate. It is clear that the decentralized and query flooding architecture offers a longer lifetime and a more reliable service due to its distributed architecture. In addition, we briefly touched upon the social and security issues plaguing any P2P system.

Bibliography

BitTorrent (2001-2005). Retrieved March 16, 2005, from

BitTorrent Definition (2005). Retrieved March16, 2005, from

Broadcatching Definition (2005) Retrieved March 16, 2005 from



Goel, Ashish (n.d.). Lecture 17: Introduction to Peer-to-Peer Systems. Retrieved March 16, 2005, from

Kurose, James F., Ross, Keith W. (2003). Computer Networking: A Top-Down Approach Featuring the Internet. Boston: Addison-Wesley

Liang, Jian, Kumar, Rakesh, and Ross, Keith W. (2004). Understanding Kazaa. Retrieved March 16, 2005, from

Mitchell, Bradley (n.d.) Top 7 P2P File Sharing Software Programs. Retrieved March 16, 2005, from

Peer-to-peer definition (2005). Retrieved March 16, 2005, from

Peer-to-Peer (P2P) and How Kazaa Works. Retrieved March 16, 2005, from

The Dark Side of a Bright Idea: Could Personal and National Security Risks Compromise the Potential of Peer-to-Peer File-Sharing Networks? Hearing before the Committee on the Judiciary, United States Senate, 108th Congress (2003).

Watson, Stephanie (n.d.). How Kazaa Works. Retrieved March, 16, 2005, from

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download