1 - Kent



Gareth Tyson

“Peer-to-Peer Content Distribution”

B.Sc. (Hons) Computer Science

March 2005

“I certify that the material contained in the dissertation is my own work and does not contain significant portion of unreferenced unacknowledged material. I also warrant that the above statement applies to the implementation of the project and all association documentation.

Regarding the electronically submitted version of this submitted work, I consent to

this being stored electronically and copied for assessment purposes, including the

Department’s use of plagiarism detection systems in order to check the integrity of

assessed work.

I agree to my dissertation being placed in the public domain, with my name

explicitly included as the author of the work.”

Signed,

________________

Gareth Tyson

ABSTRACT

Since the emergence of the Peer-to-Peer revolution the proliferation of the technology has been unstoppable, its many benefits have helped to numb the pain of its inefficiencies but current designs mean that without severe modification to their scalability it will be simply impossible to service the requirements of the growing P2P community.

This project aims to improve the search algorithms of current P2P technology whilst still maintaining some of the better features. This report outlines the design, development and evaluation of the creation of such a P2P network with the emphasis on evaluating the chosen design and its possible derivatives.

ABSTRACT……………………………………………………………………………....3

CONTENTS………………………………………...……………………………………4

1. INTRODUCTION…………………………………………………...….7

1. General Overview of Peer-toPeer………………………………………………7

2. Project Aims……………………………………………………………………..8

3. Report Overview………………………………………………………………...9

2. BACKGROUND………………………………………………………11

1. Terminology of P2P……………………………………………………………11

2. Analysis of P2P Architectures………………………………………………...12

1. Gnutella.....................................................................................................12

2. Kazaa.........................................................................................................15

3. Napster.......................................................................................................17

4. Pastry.........................................................................................................18

3. Legal Background...............................................................................................19

4. Data Resource Management (DRM).................................................................19

1. The Generations of DRM...........................................................................20

2. Encrypted Shell..........................................................................................20

3. Marking......................................................................................................20

4. Centrally Controlled DRM........................................................................21

5. Problems with DRM...................................................................................21

5. Choices in Software and Technology................................................................21

1. C.................................................................................................................22

2. Java............................................................................................................22

3. Java RMI....................................................................................................22

4. JXTA. ........................................................................................................22

6. Justification for Building a New Type of Network..........................................23

7. Summary.............................................................................................................26

3. DESIGN..................................................................................................27

1. General Overview of the System.......................................................................27

2. Overall Structure................................................................................................27

3. Network...............................................................................................................27

1. Connecting New Nodes to the Network.....................................................28

2. Creating the Indexed Structure of the Network.........................................29

3. Creating New Supernodes..........................................................................29

4. Dealing with the Loss of Supernodes.........................................................30

5. Keeping File Lists up to Date....................................................................31

6. Distributing File Lists to Supernodes........................................................32

4. Searching.............................................................................................................32

1. Routing of Search Queries.........................................................................33

2. Search Aggregation...................................................................................34

3. How Supernodes Search their own Databases..........................................34

4. How Supernodes Perform Fuzzy Searches................................................34

5. Downloads/Uploads............................................................................................35

6. System Architecture...........................................................................................37

7. Class Overview....................................................................................................39

1. Design Patterns..........................................................................................41

8. User Interface......................................................................................................42

9. Summary.............................................................................................................46

4. IMPLEMENTATION............................................................................47

1. The GazNet Protocol..........................................................................................47

1. 000-PRESENT – 001-PRESENT? – Pinging a Node................................49

2. 030-CONNECT? – The connection process..............................................49

3. 080-SEARCH=search string – The Searching Process.............................50

4. 120-DOWNLOAD=filename=x=y – The Download Process...................50

5. 190-NEW_SUPERNODE – Supernode creation.......................................51

6. 400-ALPHA_LOCATIONS_UPDATE.......................................................51

2. The ProtocolSocket.............................................................................................51

3. Data Structures...................................................................................................52

1. Range.........................................................................................................52

2. Group.........................................................................................................53

3. The FileDatabase.......................................................................................53

4. Important Algorithms........................................................................................55

1. The Splitting Algorithm..............................................................................55

2. Multi-Source Downloads...........................................................................56

3. Aggregating Search...................................................................................58

5. Summary.............................................................................................................59

5. THE SYSTEM IN OPERATION.........................................................60

1. A Typical Session................................................................................................60

2. Summary.............................................................................................................64

6. PROCESS DESCRIPTION...................................................................65

1. The Connection Process.....................................................................................65

2. The Searching Process........................................................................................65

3. The File Transfer Process..................................................................................66

4. The Splitting Process..........................................................................................67

5. Summary.............................................................................................................67

7. TESTING................................................................................................68

1. The Test Bed........................................................................................................68

2. How the System was Tested...............................................................................68

1. Testing Utilities..........................................................................................69

2. The Simulators...........................................................................................69

3. System Test Results............................................................................................70

4. Summary of Errors.............................................................................................75

1. Networking.................................................................................................75

2. File Transfers.............................................................................................76

3. The User Interface.....................................................................................77

5. Incremental Testing............................................................................................77

6. Summary.............................................................................................................79

8. EVALUATION.......................................................................................80

1. Evaluation of Architecture.................................................................................80

1. Scalability..................................................................................................80

2. Search Algorithms......................................................................................87

3. Reliability and Robustness.........................................................................92

4. File Transfers.............................................................................................93

5. Miscellaneous............................................................................................94

2. Evaluation of User Interface..............................................................................96

1. The Welcome Panel....................................................................................96

2. The Search Panel.......................................................................................96

3. The Download Panel..................................................................................96

4. The Upload Panel......................................................................................97

5. The My Files Panel....................................................................................97

3. Alternative Solutions..........................................................................................97

1. Scalability..................................................................................................97

2. Robustness and Reliability.......................................................................103

3. File Transfers...........................................................................................104

4. Critical Summary of Evaluation.....................................................................105

9. CONCLUSION.....................................................................................106

1. Review of Aims..................................................................................................106

2. Review of Original Project Proposal...............................................................109

3. Suggested Revisions to Design.........................................................................110

4. Suggested Revisions to Implementation.........................................................111

5. Future Work......................................................................................................112

6. Lessons Learnt..................................................................................................113

7. Final Conclusion...............................................................................................113

10. REFERENCES....................................................................................115

11. Appendix – Original Project Proposal..............................................117

The project’s working documents are available at “”

1. Introduction

1. General Overview of Peer to Peer

One of the current major issues in computer science is the concept of Peer to Peer (P2P) computing. The purpose behind P2P is to allow a community of users to be able to carry out services such as sharing files without the need for a central server to run the network. In recent years the proliferation of P2P technology has been immense allowing massive distribution of digital media by parties which previously would simply not have had the resources to perform such a large scale task. The idea behind P2P is that a number of peers can form together into a network without the need to be managed by another party; instead they manage themselves, dynamically changing to deal with the addition and removal of peers. This is obviously an attractive idea and has been picked up by many companies and programmers wishing to create such networks.

The first major P2P network was Napster, which allowed peers to share and download music files. This network fulfilled many of the end requirements of this project, primarily the fast search time, however Napster’s reign of supremacy wasn’t to last forever as, for legal reasons, it was shut down, leaving a gaping cavern left to be filled by other less efficient networks such as Gnutella and Kazaa. The inefficiency of these networks is exactly the reason for this project as it strives to improve upon the glaring problems with performance and scalability that resonate throughout current P2P technology.

The intention of this project is to build a new P2P network, titled ‘GazNet’ that will improve on existing networks, in particular it aims to speed up the process in which a peer searches the network to find content. Currently the majority of P2P networks have extremely slow search times, this is because in many cases every peer in the network must be contacted to complete a search. This can be both highly frustrating for the user and a heavy burden on the underlying network. In the previous sentence it was said that the majority of P2P networks have extremely slow search times, not the entirety; this is because there are networks available that can perform high speed searches, but this performance comes at a price. These networks require the user to enter the exact name of the file they are looking for; the slightest deviation from the correct name will result in no results being returned. This fallibility has meant that there has been little popular interest in them, simply because the average user would rather wait a long time and receive a plethora of results, rather than wait a short time and receive no results. This project aims to increase search performance without the sacrifice of ‘fuzzy’ searching i.e. the ability to enter a search word and receive results even though the filename doesn’t exactly match that word, e.g. a search for ‘Help’ would return the file ‘The Beatles – Help!.mp3’. By fulfilling this requirement a network will have been devised that will no longer force the user to choose between search time and the effectiveness of searches.

1.2 Project Aims

1) To design, develop, and evaluate a P2P network to allow the sharing of files.

2) To allow fast, efficient searching of the network

a. To decrease search time compared to current P2P technologies

b. To allow fuzzy searches that don’t require specific details to be entered e.g. the exact name of the file.

c. To limit the effect network size has on search time

d. To keep search time low even when the overlay network is heavily loaded

e. To keep searches as accurate as possible, so out of date file references aren’t returned in search results i.e. references to files that have already been removed from the network

3) To make the network as scalable as possible in both terms of its performance and in the maximum number of nodes that can be concurrently connected to the network.

4) To make the system as reliable and robust as possible

a. The network must be able to recover from the loss of supernodes

b. The network should be able to a reasonable degree withstand targeted attacks by malicious users

c. Downloads must be able to be resumed without having to re-download all the data already received if transfers are interrupted.

d. The system should deal with errors in an elegant way.

5) To make the network as independent as possible i.e. avoiding the use as of centralised managed servers, removing any one point of failure making it near impossible to shut the network down.

6) To allow sharing of files over the network

a. To allow file transfers to be performed as quickly as possible hiding the user from the underlying actions

7) To minimise the time taken to connect to the network

a. To avoid the use of a centralised server to aid connection to the network.

8) To provide a highly simple, easy to use user interface

a. To let the user quickly and with ease search for files without having to understand and underlying processes

b. To provide the user with some form of discriminatory information about specific downloads so they may make an informed decision on which files to download.

c. To allow the user to modify his/her shares without difficulty

d. To allow the user to view what he/she is currently sharing

e. The interface must be responsible at all times even when actions are being carried out behind the scenes such as a search.

f. The user must be provided with progress details of downloads and be told when a file has finished downloading

g. The user must be provided with a indicator (i.e. bit rate) of how quickly a file is downloading.

Abandoned Aims

Due to time constraints some aims spoken about in the project proposal had to be dropped, these aims have been listed below; these topics have been touched on in the report but have received no implementation or in-depth analysis.

1) To provide the user with an incentive to share his/her files.

2) To create a P2P network to run over a wireless environment.

3) To allow the protection of files using DRM

a. To provide all users with the functionality to protect their own files

b. To allow the safe transfer to already licensed shares

i. To enforce the controlled distribution and sale of licences i.e. stop users from selling mp3 files but also keep the copy themselves

1.5 Report Overview

Chapter 2 (Background): This section gives an in-depth review of current P2P technologies and provides a foundation on which the report can be easily read.

Chapter 3(Design): This section aims to give the reader an overview of how the system architecture and several key processes are carried out.

Chapter 4(Implementation): This section provides a greater insight in to the workings of the system and shows the system that actually was produced.

Chapter 5(The System in Operation): This short section shows a typical session and the interaction between the system and the user.

Chapter 6(Process Description): This section explains the underlying processes that are carried out during a typical session, this includes both user initiated requests and automatic system initiated processes.

Chapter 7(Evaluation): This section evaluates the effectiveness of the finished system and how well it has fulfilled the aims set out in this chapter.

Chapter 8(Testing): This section lays out the testing schemes used to insure the working of the system and how well the system dealt with the testing.

Chapter 9(Conclusion): This section sums up how well the project aims have been fulfilled and what was done wrong.

2. Background

2.1 Terminology of P2P

Architecture – The topology of the network and how information is routed through the network.

Connecting to the network – The process in which a node becomes part of the network, this is typically carried out by finding an already connected node and then tunnelling information through that node.

Distributed Hash Table (DHT) – A type of P2P network that allocates an amount of hash space to nodes in the network, these nodes then look after all files that reside in that hash space.

File list – A list of files and their location in the network (IP address), extra details can also be kept about the files such as their size and type.

Hash algorithm – A mathematical process performed on a string to create a number that is as unique as possible for the purpose of storing data. If a hash algorithm returns 5 it could be then put the data in an array at position 5, then all that is required to locate the data is to perform the hash algorithm again to find out where it is stored. The simplest hash algorithm is to simply to use the first letter of the string being hashed.

Hash Key – The value returned from a hash algorithm

Hash space – The hash keys that a particular area of storage holds, e.g. using a hashing algorithm that hashes on the first letter in a string, position 0 in an array could look after the hash space ‘A’.

Node – Any computer that is connected to the network

P2P Generations – P2P architectures can be split into three generations, the first is centralised P2P structures such as Napster which uses a central file server to search the network. Second generation networks use decentralised file lists. The third is Distributed Hash Tables.

Servent – A node that is neither a server nor a client, instead it performs both duties, this is a frequent occurrence in P2P networks as every node should be the same.

Supernode – A high performance node that is used in the network to reduce the burden on both the overall network and on individual nodes. This is usually done by caching file lists so a smaller number of nodes need to be searched to find files however they can also be used in different fashions such as caching actual files to reduce download time.

2.2 Analysis of P2P Architectures

2.2.1 Gnutella

The Gnutella protocol is probably the most robust P2P protocol available, it is fully distributed and therefore very resilient to faults. The first process a Gnutella servent must carry out is to connect to GnutellaNet, this is done by connecting to an already connected node, currently the most prevalent method for finding connected nodes is through node caches that maintain databases of existing nodes on the network. Once an already connected node has been found, the new node must then attempt to connect to it, to do this it initiates a TCP connection to the servent then sends the following request string (encoded in ASCII),

GNUTELLA CONNECT/\n\n

To this, the servent will reply with

GNUTELLA OK\n\n

if it wishes to accept the connection, any other reply will indicate that the connection has been refused.

When a servent wishes to perform a search it will channel the query through the node it is connected to, it should be noted that a servent can have multiple connections to other servents, in such a situation queries will be forwarded onto all the connected servents. If a servent receives a Query that matches content in its shared folder it will then reply with a QueryHit Descriptor, informing the searching node that it has content of interest to it. The Gnutella protocol is summaries in Table 2.1,

|Descriptor |Description |

|Ping |Used to discover other nodes on the network, a ping request should be replied with a Pong |

|Pong |The response to a Ping; it tells the pinging node that the servent is present and other information such as |

| |its IP address and statistics on the data it is sharing |

|Query |Used to search the network, if a servent receives a Query which matches one or more files that it owns it |

| |will reply with a QueryHit |

|QueryHit |The response to a Query; it will inform the searching servent of information about the host such as its |

| |connection speed, it will also provide a list of the files that match the Query. |

|Push |A mechanism that allows a firewalled servent to contribute file-based data to the network |

Table 2.1 Overview of Gnutella Protocol

Every protocol Descriptor from Table 2.1 will be encapsulated in a Descriptor packet, this is shown in Diagram 2.1. There are 5 fields in the Descriptor, the Descriptor ID which is a uniquely identifiable string, the Payload Descriptor which indicates what type of Descriptor, a TTL field which limits the packets ‘reach’ in the network, the Hops field which records the number of nodes the Descriptor’s traversed and a Payload Length field which contains the length of the payload. This header is followed by the payload, which will be one of the Descriptors from Table 2.1.

The TTL field is used to limit the reach that the message will have over the network, every servent that receives a Descriptor packet will decrement the TTL before it is forwarded on, if a servent receives a Descriptor with a TTL of 0 it will not forward it on, therefore deleting the packet from the network.

|Descriptor ID |Payload Descriptor |Time to Live (TTL) |Hops |Payload Length |

|0 15 |16 |17 |18 |19 22 |

Diagram 2.1 – Descriptor Header

The TTL is the most important field in the Descriptor, this is because this factor controls how large the network is, in practical terms. The number of nodes in the network possesses no significance to an individual user, this is because a singe node only has a certain degree of ‘reach’ in the network, this ‘reach’ is set by two factors, the TTL and the number of nodes it is connected to. If there was no TTL field, a Descriptor would simply circle the network being forwarded by node after node, as the GnutellaNet tree isn’t a spanning tree these Descriptor’s would get forwarded back to nodes that had already received them and therefore never leave the network, it is therefore impossible to actually search the whole network as, unless the searching node knows the entire topology of the network, it is likely that messages will just start looping through the network uncontrollably if a suitability large TTL field is given to extend reach to the furthest away nodes.

Downloads are performed using HTTP; this is because HTTP is an well established file transfer protocol that can carry out the required purpose perfectly well, it is more suited to the application than a protocol such as FTP as it requires no logging in process to be carried out.

This architecture has obvious benefits, it’s fully distributed layout(Diagram 2.2) means that it would be at worst extremely hard and at best impossible for the network to be shutdown. This means that users can always expect the network to be fully working. The only weak point in the structure is the locating of a node to connect to in the first place but this could be overlooked as the problem is resident in every P2P architecture. Its search mechanism means that all queries are carried in real-time meaning that all results will be completely up to date therefore eliminating the frustration of attempting to download files that have already been removed from the network.

Gnutella’s high resilience to break-down however unfortunately comes at a cost and this cost is performance; out of all the P2P architectures Gnutella performs by far the worst, this is because when a node wants to find a file the query must be propagated over the whole network meaning that every single node connected to the network must receive every single query (if the whole network was to be searched). This would be acceptable if the network was running over a high speed LAN with only 500 users but the amount of network traffic created when a search is sent over the internet with 3 millions users running the servent is immense. Obviously it would soon get impractical making Gnutella’s biggest problem scalability, the simple fact is that over any sort of large network Gnutella will not scale, and this is discussed further in the next subsection. Take this problem into the realms of Wireless networking and it suddenly explodes, when someone on a desktop PC searches for a file they will generally be running the internet over a fixed rate connection with multiple web browser windows open. This means that long search times are far more tolerable as the user can both afford the long wait and also entertain themselves during it by carrying out other tasks such as checking their e-mails. A wireless user is generally less fortunate as their connection is usually a metered one and there is less multi-tasking available for the user to carry out things whilst they wait making the search time extremely expensive.

Gnutella’s Scalability (Ritter, 2001)

It has been mentioned earlier that Gnutella is highly un-scalable but exactly to what degree does Gnutella’s architecture limit scalability. In this short analysis of Gnutella’s scalability a few variables will be used,

P – The number of connected users

N – The number of connections held open to other nodes

T – The Time to Live(TTL) of packets, these are used to age packets to stop queries from circling round the network forever.

Due to the structure of Gnutella, P is never relevant to your potential reach, in reality the only limiting factors are N and T. Therefore raising N or T will raise the number of nodes searched and decreasing N and T will reduce the number of nodes that can be searched.

| |

|Class |Description |

|Client |This class calls the P2Pgui class and will create a instance of GazNet |

|Supernode |This class creates a Supernode. |

Table 3.1 – Overview of GazNet Package

|“GazNet.database” package |

|Class |Description |

|DataFileFilter |This object is used solely by IO to filter out files of type ’dat’, these files are incomplete downloads. |

|FileDatabase |This class holds FileDescription classes in a hash table, it can perform a variety of necessary functions |

| |on the database, including searching it, converting it to different formats (e.g. an array) and extracting|

| |a particular set of files. |

|FileDescription |This class represents files, it contains four variables, the file’s name, the file’s size, the owner of |

| |the file (i.e. the IP address) and a hash key. The hash key is not actually a hash key but a string that |

| |is used for searching and will be modified by each supernode that the file passes through; originally the |

| |hash key will be the same as the filename but as the FileDescription passes through each tier the first |

| |letter of the hash key will be removed, this is to facilitate the recursive nature of the search |

| |algorithm. |

|IO |This class deals with all Input/Output aspects of the system such as getting a list of the shared files |

| |and reading/writing administrative files such as ‘NodeFile.dat’. |

|ListOfFiles |This object implements the interface, Enumeration and is used to step through FileDescription arrays. |

|NodeDatabase |This class hold a database of NodeElement classes |

|NodeElement |This class represents nodes in the network |

Table 3.2 – Overview of GazNet.database Package

|“GazNet.download” package |

|Class |Description |

|CompletedListener |This is an interface for the Observer design pattern, it uses the method percentageChanged to |

| |follow the progress of a download/upload. |

|Download |This class performs a single download from one node. |

|DownloadManager |This class implements the Singleton design pattern and therefore this one class manages all |

| |downloads. It is responsible for selecting which nodes to download from and it deals with the |

| |calculations for performing multi-source downloads. |

|Upload |This class performs a single upload to one node. |

|UploadConnections |This class listens on a given port for download requests and will pass the request onto the Upload |

| |class if one is received. |

|UploadControl |This class ensures that large transfers are performed without the connection freezing. |

|UploadListener |This is an interface for the Observer design pattern which monitors new upload requests. |

Table 3.3 – Overview of GazNet.download Package

|“GazNet.gui” package |

|Class |Description |

|DownloadPanel |This class extends javax.swing.JPanel and deals with the displaying off all the current |

| |downloads. |

|MyFilesPanel |This class extends javax.swing.JPanel and deals with which files the user shares and where |

| |downloaded files are stored. |

|P2Pgui |This class extends Tabs and contains the main method to run GazNet. |

|PopupListener |This class listens for right click on the tables in DownloadPanel, SearchPanel and UploadPanel |

| |and will display a popup menu if appropriate. |

|SearchPanel |The class extends javax.swing.JPanel and allows the user to search for files and then displays |

| |the results. |

|SuperNodeConsole |This class is used for debugging and allows someone who’s running a supernode to view the nodes|

| |connected to it and the contents of the FileDatabase. |

|SuperNodeGui |This class extends SuperNodeConsole and adds the functionality of an ActionListener so the user|

| |can actually interact with the console. |

|Tabs |This class combines all the panels into a javax.swing.TabbedPane. |

|UneditableTableModel |This class extends the DefaultTableModel to restrict the user from editing tables that contain |

| |file information. |

|UploadPanel |This class extends javax.swing.JPanel and displays all the current uploads |

|WelcomePanel |This class extends javax.swing.JPanel and is the default tab to be displayed when the gui |

| |starts up. |

Table 3.4 – Overview of GazNet.gui Package

|“work” package |

|Class |Description |

|Connection |This class deals with connecting a node to the network |

|NetworkStartup |This class aids other classes in starting up the network, it follows the Singleton design |

| |pattern and contains many static variables that other classes will reference. It is |

| |responsible for locating and maintaining supernode addresses and when running as a supernode |

| |updating the appropriate servers with new addresses. |

|Node |This class deals with all interactions between the node and the supernode (excluding |

| |searches). It also deals with messages from other nodes such as popup message requests. |

|NodeConnections |This class listens for connections from other nodes and then passes them on to the Node class |

|Search |This class is used by nodes to search for a file. |

|SuperNode |This class deals with all interactions with the supernode from both other supernodes and |

| |nodes. |

|SuperNodeConnections |This class listens for connections to the supernode and then passes them on to the SuperNode |

| |class. |

Table 3.5 – Overview of work Package

|“GazNet.util” package |

|Class |Description |

|Alpha |This class is performs operations on strings, characters and numbers that convert them into |

| |different formats depending on their purpose. |

|BitRate |This class extends StopWatch and calculates the bit rate of a transfer and also provides other |

| |utilities to aid in the displaying on transfer information |

|FTPClient |This is an FTP client used to transfer files that contain the locations of supernodes to web |

| |servers |

|Group |This is a transfer class that wraps all classes that are transmitted over the network. |

|ProtectedInteger |This class is used to syncronise downloads and the compiling of search results. |

|ProtocolSocket |This class follows the Wrapper pattern and wraps the .Socket class adding functionality|

| |such as the ability to send file lists. |

|Range |This class holds the range of hash space that a supernode looks after, it also hold the tier |

| |that the supernode is on. |

|Rating |This class deals generating the performance rating of a node. |

|Reporter |This class is used to print out all debugging messages. |

|Semaphore |This class is a semaphore and is used by the ProtectedInteger class |

|StopWatch |This class measures periods of time. |

Table 3.6 – Overview of GazNet.util Package

3.7.1 Design Patterns

To improve performance and increase the ease and elegance of the system’s development, three design patterns were necessary

Singleton

In a system such as this, it is frequently required for certain variables to stay constant throughout the system, a perfect example of this comes in the form of the routing table; multiple classes will use the routing table however multiple classes can also modify the routing table, it is therefore imperative that some form of control is used to stop two classes using a different routing table. The NetworkStartup class maintains the routing table and a number of other variables that should remain consistent throughout the system. The solution devised was to use the Singleton Design Pattern, a method is provided called getInstance(), calling this method will instantiate a instance of the class and hold a reference to it in a static variable, however if the variable had already been instantiated this instance would be returned instead of creating a new one. By also making the class’s constructor private it would mean that only one instance of the object can ever exist, making this the perfect choice to implement in the NetworkStartup class.

Observer

In certain situations in the system it is necessary for a multitude of objects to keep a single object informed of changes in its status. An example of this is in the progress field of the transfer panes; every time another percent of the file is transferred this must be displayed in the progress field. As, many simultaneous transfers can take place at one time the task of following each file’s progress can become quickly unmanageable. To carry out this process the observer design pattern is used; all classes that must listen (i.e. the transfer table in the transfers pane) must implement an appropriate interface containing a method that can be called to notify them of a change (e.g. updateProgress). Any object wishing to be informed of changes will be added to a vector, then when a change occurs the object will cycle through the vector calling the update method on each listening class.

Decorator

In the system it is sometimes necessary to add functionality to a class in a dynamic way therefore eliminating it from performed through inheritance. Such an instance is in the ProtocolSocket, where extra functionality above a standard java Socket must be supported. To do this the decorator design pattern is employed, the ProtocolSocket creates a shell around the standard java Socket and will pass on method calls to the Socket however before this, it may make modifications to the parameters. Alternatively it may make modifications to what the Socket returns. An example of this is when the read() method is called, the Socket will return an array of bytes but the Protocol Socket will return a string.

3.8 User Interface

The user interface is an extremely important component of the overall system as this is the only way a user has of interacting with the network, one of the aims of the system was to provide a simple, easy to use interface which will therefore require a balance between the level of control the user has over the program and the level of complexity faced by the user.

There were two original designs, the first was a simple looking straight forward GUI (design A) and the second was a GUI more closely modelled on existing P2P file sharing GUIs such as Kazaa (design B).

Design A’s strength lies in its simplicity, it will be both easy to implement and easy to use; any user with a basic grasp of the purpose of the software will be able to begin using the interface within minutes of seeing it. However the limited design allows no customisation and gives the user little extra functionality over the very basics; an example of this is the inability to resize the screen or to select more advanced search options.

[pic]

Diagram 3.5 - GUI design A

Design B’s strengths are exactly what Design A’s weakness are, it will provide more options for the user such as the ability to select different types of searches (film, video, exe etc) and provide features like the resizing of the window. It is obvious that this type of UI would be superior to Design A for many different purposes, mainly everyday common use, this is because users would probably frequently use the application and therefore become quickly accustomed to the GUI and therefore be able to exploit the extra features, such as advanced searching which would allow user to have more control over their search results.

Despite these differences, the two designs have many similarities as they both work on the same main principals. Both have tabs with similar titles and many features reside on both GUIs, the main difference lies in the size and therefore the extra features that can be put into the larger design (Design B).

[pic]

Diagram 3.6 - GUI Design B.

Design A was chosen because it suited most closely the requirements of the system as the system aims to focus on networking aspects rather than HCI (Human Computer Interaction) aspect, because of this, the UI is one of the lower ranking requirements (number 8). Therefore the UI is there to serve the purpose of supplying an interface between the network and the user (or tester) with the greatest ease of use rather than the most functional degree of use. Design A can best serve this purpose as its simplicity allows for quicker implementation and quicker testing, despite this main function, the GUI is also perfectly adequate for use by an average user.

Full Design

The design splits down into five panes, each contained on one tab; this first, the welcome screen, can be seen in Diagram 3.5, it contains two elements, a message of the day screen which is intended to contain information about updates and other information, and a connected status symbol to tell the user if he/she is connected to the network or not.

[pic]

Diagram 3.7 - The search panel

The search pane intends to provide a simple method for searching the network, the user need only type in the search words in the text field then click search. Any results returned will be displayed in the table.

[pic]

Diagram 3.8 - The Download/Upload panel

The upload and download panels are identical, they both contain a table which lists the files being transferred along with the progress of the transfer (in percentage) and the bit rate.

The most complex panel is the ‘My Files’ panel, which looks after the folders that the user wishes to share and the location in which the user wants his/her downloaded files to be put. The panel also displays the number of files that are being shared and their total size. The panel is dominated by a tree which will contain all the folders and all the files in those folders that are shared, the tree can be modified by using the Add and Remove buttons which will allow folders to be added or removed from the shared list. The update button will manually force the node to update the file references on the supernodes in the network. The bottom text field will contain the location in which all downloaded files are placed; the browse button allows the user to find the appropriate file using a file selection window.

Diagram 3.9 - The ‘My Files’ Panel

3.9 Summary

This chapter has given a relatively high level overview of how the system should be implemented, excluding modification made during incremental testing this system should be an accurate representation of the end system. The following chapter covers the implementing of the design and hopes to give the reader a more low-level knowledge of how the system actually works.

4. Implementation

4.1 The GazNet Protocol

Up until now GazNet has always been a reference to the designed application, however a more accurate definition would be that it is the protocol that the application merely implements. The protocol is ASCII based and works over TCP running on port 4001. It is text based, primarily to aid debugging; this was decided because of the complex nature of the network and therefore in anticipation of a number of problems. Another major factor in the choice of a text based protocol was its greater extensibility; it is likely that with prolonged testing and usage that a number of modifications to the protocol will need to be made, similarly as the network increases in size changes may need to be made to improve scalability. Extending a bit oriented protocol may become a complex affair, this is because many fields are already set and cannot be changed in the name of backward compatibility, similarly the size of packets are already set, therefore they cannot be expanded to support extra features such as longer search fields or IPv6 addresses. It is therefore clear that, at least in the early stages of the protocol, that a text based approach is superior, however it is not perfect. The biggest problem with text based protocols is their inefficiency, a good example is GazNet’s pong message (response to a ping), which is 000-PRESENT(Supernode); this message is 22 bytes long despite the fact that there are only two sections, the message type (000-PRESENT) and the type of node, (Supernode), this message could easily be replaced by a much smaller bit orientated message using one byte for the message type (allowing 256 different types of message) and only one bit for the node type. Diagram 7.1 shows such a protocol message, the first 8 bits show the message type (the number 1 represents a pong), and the 9th bit represents what type of node the reply has come from (0 for node and 1 for supernode). This has more than the halved size of the protocol message, which may not make that much difference on a small scale network but on a large network with millions of nodes it will represent a massive reduction in network traffic.

|Message Type |Node Type |

|0 |0 |0 |

|Supernode/ Node |000-PRESENT(Supernode/Node) |A response to a ping, the value in the brackets represents the|

| | |type of node that is replying |

|Supernode/ Node |001-PRESENT? |A ping to find out if a node is online and what its node type |

| | |is (Supernode or node) |

|Supernode/ Node |002-OPEN? |A message that is sent to verify if the TCP connection is |

| | |still open, there is no reply because a exception will |

| | |automatically be created if there is the connection is not |

| | |open |

|Node |010-ALHPA_LOCATIONS? |A request for the current addresses of all 26 top tier |

| | |supernodes |

|Supernode |011-NEW_ALPHA_LOCATIONS= |A message indicating that a node’s routing table is incorrect,|

| | |this message will be followed by a array containing the 26 top|

| | |tier supernodes |

|Supernode |022-NEW_SUPERNODE=supernode |This message will be sent if a newer supernode has been |

| | |created that must deal with a node’s request. |

|Node |030-CONNECT? |A request sent by a node to find out if a supernode will |

| | |accept new connections |

|Node |031-JOIN? |A request sent by a node to find out if a supernode will |

| | |accept more file lists. |

|Supernode |040-CONNECT_OK |A reply from a supernode to a 030 request accepting the |

| | |connection from a node |

|Supernode |041-JOIN_OK |A reply from a supernode to a 031 request accepting a new file|

| | |list from a node |

|Supernode |042-CONNECT_REFUSED |A reply refusing either a 030 or 031 request from a node |

|Supernode/ Node |050-LIST |A request from a supernode for a the file list to be sent |

|Supernode/ Node |060-LIST_READY |A reply from a node to a 050 request which will be followed by|

| | |a file list. |

|Supernode/ Node |070-LIST_OK |A reply from a supernode confirming that the file list is |

| | |valid. |

|Supernode/ Node |071-LIST_ERROR |A reply from a supernode indicating that the file list has |

| | |either not been received or is invalid |

|Node |072-RATING=rating |A message containing the performance rating of a node |

|Supernode |073-RATING_OK |A reply from a supernode saying that the performance rating is|

| | |valid. |

|Supernode |074-RATING_ERROR |A reply from a supernode saying that the performance rating is|

| | |invalid |

|Node |080-SEARCH=search string |A search request from a supernode |

|Supernode |090-SEARCH_OK |A reply from a supernode saying that the query is valid |

|Supernode |091-INVALID_SEARCH |A reply from a supernode saying that the query contains |

| | |invalid characters e.g. ‘@’ |

|Supernode |100-SEARCH_LIST_READY |A message saying that the search list has been prepared. |

|Node |110-SEARCH_LIST_OK |A message saying that the search list has been received and it|

| | |is a valid format |

|Node |120-DOWNLOAD=filename=x=y |A request to download a file stating at offset x of length y |

|Node |130-DOWNLOAD_OK |A reply sent to indicate that the file resides on the node and|

| | |it can be downloaded |

|Node |131-FILE_NOT_PRESENT |An error message indicating the file requested is not in the |

| | |user’s shared folders |

|Node |140-DOWNLOAD_READY |A message sent to the uploading node to indicate the download |

| | |is ready to commence |

|Supernode |190-NEW_SUPERNODE |A message to a node requesting that it converts into a |

| | |supernode |

|Node |200-ACCEPT_SUPERNODE |A reply sent by a node accepting a 190 request |

|Supernode |210-SENDING_LIST | |

|Supernode |400-ALPHA_LOCATIONS_UPADTE | |

|Supernode |410-UPDATE_OK | |

|Supernode |420-UPDATE_REFUSED | |

|Supernode/Node |900-MESSAGE=message |A request sent to a node for a popup message to be displayed |

|Supernode/ Node |999-UNRECOGNISED_COMMAND |A reply sent if the message isn’t recognised as a valid |

| | |protocol message |

Table 4.1 – Protocol Overview

4.1.1 000-PRESENT – 001-PRESENT? – Pinging a Node

When a node (or person) wishes to verify that a node is still online a ‘001-PRESENT?’ request will be sent; the reply will either be 000-PRESENT(Supernode) or 000-PRESENT(Node) depending on the node’s type.

4.1.2 030-CONNECT? – The Connection Process

When a node wishes to connect to the network it must connect to a supernode, the first step is to send a ‘030-CONNECT?’ request to a supernode, a reply will be then sent accepting (‘040-CONNECT_OK’) or refusing (‘042-CONNECT_REFUSED’) the request. If the request is refused it will be followed by a ‘022-NEW_SUPERNODE= supernode’ message, the supernode will refer to the most recently created supernode; next the supernode will send an array containing the 26 top tier supernodes – this will ensure that the node has all the up to date addresses, the 022 message is sent because that supernode referenced will have the greatest chance of being open to new connections as it is youngest supernode available.

If the request is accepted the supernode will send a ‘050-LIST’ request which asked the node to send its file list, the node will then send a ‘060-LIST_READY’ message followed by its file list, the supernode will then reply with a ‘070-LIST_OK’ message or a ‘071-LIST_ERROR’ depending on whether the list was valid.

The next stage is for the node to inform the supernode of its performance rating, this is measured by a number between 0 and 100, and is sent in a ‘073-RATING=rating’ message. If the number is valid the supernode will reply with ‘073-RATING_OK’ else it will reply with ‘074-RATING_ERROR’.

The majority of supernodes that a newly connected node will contact will receive join requests opposed to connect requests. When a node joins a supernode it does not inform it of its performance rating and the supernode does not record the node in its node database instead it merely adds its file list to the file database. The same process is followed as for a connect request but a ‘031-JOIN?’ message is sent instead of a ‘030-CONNECT?’ message. Diagram 4.2 shows the interactions involved in a successful 030-CONNECT? request.

[pic]

Diagram 4.2 – Supernode-Node Interaction during the Connection Process

4.1.3 080-SEARCH=search string – The Searching Process

To search the network a node will first check the first letter of the search string, it will then look up the correct supernode to contact in its routing table, the node will then send a ‘080-SEARCH=search string’ to the supernode where ‘search string’ is a query that the user wants to search for. The supernode will then check that the search string is valid i.e. it begins with the correct letter and it doesn’t begin with a non-alpha character, if it is valid it will reply with a ‘090-SEARCH_OK’ message else it will reply with a ‘091-INVALID_SEARCH’ message. Once the supernode has searched the file database and compiled the results it will send a ‘100-SEARCH_LIST_READY’ message to the node to indicate the list is ready, it will then send the file list. If the list is valid the node will reply with ‘110-SEARCH_LIST_OK’.

4.1.4 120-DOWNLOAD=filename=x=y – The Download Process

When a node wishes to download a file it will send a ‘120-DOWNLOAD= filename=x=y’ request to the node that the file resides on. The filename refers to the full filename of the file, x refers to the offset that it wishes the download to begin at and y refers to the number of bytes to transfer from that offset; x and y are used to facilitate multi-source downloads. If the file resides on the node and the node is willing to commence a transfer it will reply with a ‘130-DOWNLOAD_OK’ message, if the file doesn’t reside on the node it will send a ‘131-FILE_NOT_PRESENT’ message. When the downloading node is ready to start the download it will send a ‘140-DOWNLOAD_READY’ then the uploading node will start sending the data over the same connection.

4.1.5 190-NEW_SUPERNODE – Supernode Creation

When a supernode wishes to split and create a new supernode it will first select the highest performance node from its node database and then send a ‘190-NEW_SUPERNODE’ message. The node will then reply with a ‘200-ACCEPT_SUPERNODE’ message, then the supernode will send a file list containing all the FileDescription object that the new supernode must add to its file database, if the file list is valid the new supernode will reply with a ‘070-LIST_OK’ message – the creation of the supernode will then be complete.

4.1.6 400-ALPHA_LOCATIONS_UPDATE

One of the most important requirements of the protocol is to keep the routing information on the network up to date, if one supernode becomes unsynchronized the whole network could start to collapse. To ensure this doesn’t happen supernodes must inform their parent supernodes of any changes in their routing table. Changes only occur when a supernode splits so whenever this happens a supernode will contact the supernode that created it and give it the new routing information. To do this a supernode sends its parent a ‘400-ALPHA_LOCATIONS_UPDATE’ message, the parent will then reply with a ‘410-UPDATE_OK’ message; the child supernode will send a Group containing the modified supernode addresses. Alternatively the parent supernode may reply with a ‘420-UPDATE_REFUSED’ message in which case the child will not be able to send the update.

4.2 The ProtocolSocket

The GazNet protocol makes use of a variety of different transport methods, the most frequently used method of communication is threw ASCII strings, however on top of this, the protocol uses object transmission and byte streams; because of this a class called ProtocolSocket was defined to perform all tasked involved in the transmitting the protocol.

|Method |Description |

|protocolFetch():String |Receives a protocol message e.g. 080-SEARCH |

|protocolSend(String s) |Sends a protocol message |

|getList(): Group |Receives a Group object (see section 4.3.2) |

|sendList(Group g) |Sends a Group object (see section 4.3.2) |

|getObject(): Object |Receives an Object |

|sendObject(Object o) |Sends an Object |

Table 4.2 Function of the ProtocolSocket class

The ProtocolSocket is a wrapper class, it contains an instance of a Socket class which is used to actually transmit data but before it is forwarded onto the Socket it is in some way modified, for example the protocolSend(String s) method accepts a String as a parameter but passes a byte array on to the Socket’s write method.

There are two constructors in the ProtocolSocket class, the first accepts a String and an integer, the String contains the IP address to connect to and the integer is the port to connect to; this constructor is used when a node wishes to initiate a connection with another node. The second constructor accepts a Socket as a parameter, the ProtocolSocket then sets this as the Socket to wrap; this constructor is used when a node has received a connection from another node.

The ProtocolSocket class also carries out the important function of maintaining a list of all the connections a node has, this is important because

a) The application is multi-threaded and therefore connections need to be managed to prevent them from interfering with each other

b) Some connections are persistent and kept open throughout the node’s life, it is therefore necessary to ensure that multiple connections to the same node aren’t opened to prevent inefficiencies and errors.

c) Sometimes a class that doesn’t have a reference to the correct ProtocolSocket might need to contact another already connected node, by keeping a connections list it allows any class to access any Socket that it requires.

The connections list is held in a static Vector in the ProtocolSocket class; whenever a new ProtocolSocket is constructed it will first check the connections list to ascertain whether or not a connection already exists to that IP address and port. If no open connection exists the Socket variable in the new ProtocolSocket is initiated and the new Socket is added to the activeConnections Vector. If the connection already exists no new Socket is initiated and the Socket variable inside the new ProtocolSocket is set to the existing Socket from the activeConnections Vector.

4.3 Data Structures

The whole premise of P2P file sharing is the distribution of data, therefore data structures play an important part in this project. The following section deals with the three important data structures the Range, the Group and the File Database.

4.3.1 Range

The Range class is a very important class in the implementation; it is used to represent the range of letters (A-M, N-Z etc) that a supernode is looking after. When a supernode splits it will modify its own Range in accordance of its new duties, it will also send a Range object to the new supernode informing it of the range of letters that it is looking after. The Range class stores ranges using two integers that have the values 0 – 25 representing the letters A – Z; these values are stored in the variables, first and last and are referred to as the first range and the last range. Due to GazNet’s tree like structure it is also necessary that the Range class has the ability to differentiate between the tiers of the network tree; to do this another integer called tier is used, the first tier is represented by 0 and each lower tier is represented by the number of hops from the first tier e.g. the second tier is represented by 1.

4.3.2 Group

The Group class is the main object of network transfer used; it is used to store arrays of other objects when they are being transferred over the network. It has two private variables, and Object array, contents[] and a Range object, range. The object is primarily used to transfer arrays holding FileDescriptions however it is also used to transfer arrays containing supernode addresses, because the Group class stores Objects any subclass of Object can be used.

4.3.3 The FileDatabase

The FileDatabase is the most important data structure in the project, it is responsible for holding looking after files and performing a number of different operations on them.

The type of element stored in the FileDatabase is the FileDescription; this class holds information specific to each file, the following table list the variables stored in it.

|Variable |Type |Description |

|fileName |String |This is the filename of the file it represents |

|fileSize |String |This is the size of the file it represents |

|owner |String |This is the IP address of the computer that the file resides on |

|hashKey |String |This is used to search for files. Every time the file is passed down to another a lower tier, the|

| | |first letter is removed from the hashKey (which originally equals the fileName). Similarly when a|

| | |search request is passed down to a lower tier the first letter is removed from the search string,|

| | |this recursive process makes the implementation far more elegant. |

|beenUsed |boolean |This is used in the search algorithm; because multiple copies of one FileDescription are |

| | |distributed throughout the network it is possible that one supernode may end up storing multiple |

| | |copies of the same FileDescription in its database. Therefore when searching its databases the |

| | |beenUsed variable will be set to true after it has been added to the search results; only |

| | |FileDescriptions with their beenUsed variable set to false will be checked, this is to prevent |

| | |multiple copies of one FileDescription being returned in search results. |

Table 4.3 – Overview of the FileDescription class

The FileDatabase serves two purposes, when being used in a node it stores all the files that reside on that node however, very little of the functionality is used and its main purpose is to split the files up into the appropriate hash space.

The main functions of the FileDatabase are used when the computer is running as a supernode, obviously its abstract purpose stays the same, to store files, but a supernode will be storing a much higher number of files from all over the network so therefore performance is a big issue.

How the File Database stores files

This is probably the most important aspect of the File Database, as this its primary function. Files are stored using the same hash function as is used on the overall network i.e. files the hashed by their first letter. Each instance of a File Database has an array containing 26 Vectors, each Vector holds FileDescriptions whose hashKey begins with a particular letter. When a FileDatabase object is constructed it is given a list of FileDescriptions which are then linearly scanned and placed in the appropriate vector. The FileDatabase also has a load method which allows extra FileDescriptions to be added to the database at any time. It is obvious that this is not the most efficient way of storing file references as faster lookup time could be obtained by using a different hashing algorithm however by using this technique much faster supernode splitting times can be obtained as little processing has to go on to find out which FileDescriptions to be sent to the new supernode, all that needs to be done is to pass on the Vectors that lie in the alpha range of the new supernode.

How the File Database is searched

There are two methods in the FileDatabase class used for searching, a public method, searchKey(String searchString) and a private method, search(String searchString, int vector, Vector searchFiles, Vector changed). searchKey receives a String as a parameter, this String is what the File Database is being searched for, it will split the String up into separate words and then call search methods passing in each separate word. The search method takes 4 different parameters, the first is the searchString, then second is the index of the Vector array that should be searched, the third is a Vector in which the search results should be put into, the fourth is a Vector in which all the searched FileDescriptions should be put into, this is so that the beenUsed variable (see Table 7.2) can be set back to false.

When the search method is called it will do a linear search of the appropriate Vector looking for matches, any search results will be placed in the searchFiles Vector. As one supernode store more than one Vector of FileDescriptions it is possible that multiple search methods will be called on different Vectors; the same searchFiles Vector will be passed into every call of the search method. Once every search method has completed the searchFiles vector will contain all the results from that supernode.

Other functions

The FileDatabase class also carries out a number of other functions such as the removal of files and the retrieval of Ranges of files. Probably the most notable other function carried out is the cutList() method. When a supernode creates a new tier, it cannot simply pass a portion of its File Database on to the new supernode as it would do if it was splitting on the same tier. Instead it must halve the database on the second letter rather than the first, it therefore uses the method cutList() which returns a new FileDatabase containing all the FileDescriptions in the supernode but arranged in the database based on their second letter, this FileDatabase can then be split just as if the supernode was splitting on the same tier.

Another important feature of the FileDatabase is that of storing the Range which the supernode is looking after, this is kept in a static variable that can be accessed by any other class.

4.4 Important Algorithms

The following section will outline three important algorithms in the implementation, the splitting algorithm, the multi-source download algorithm and aggregating search algorithm.

4.4.1 The Splitting Algorithm

Undoubtedly the most important algorithm in the entire system is the splitting algorithm; the modification of this algorithm can either improve or degrade performance dramatically, more so than any other element in the system. The splitting algorithm deals with removing files from a supernode’s file database and allocating responsibility for them to another supernode. The process of splitting is carried out over two methods, split() and createNewSuperNode(Range r), the former calculates the new range for the child supernode to look after and the later deals with removing the appropriate file references and updating routing tables.

The system works on the premise that supernodes will look after blocks of contiguous letters, as long as this is adhered the splitting algorithm can be changed in any way without affecting the rest of the system. The splitting algorithm used is the most simple available, every time the supernode splits its hash space is halved and the later segment is allocated to a new supernode. When a supernode reaches a state in which it is only looking after one letter (e.g A), it then must split to create a new tier, this tier will look after the next letter in the word (e.g. Ae), when a supernode splits to a new tier exactly the same splitting algorithm is used, the hash space is simply halved again but this time based on the second letter.

public Range split()

{

if(the supernode only looks after one letter)

{ //needs to create a new tier

set this supernode’s range to 0-13

return a Range of 14-25 with a tier of this supernode’s tier+1

}

else

{

int first = the first range of this supernode

int last = the last range of this supernode

int myLast = ((last-first)/2) + first

set this supernodes range to first – myLast

return a Range of myLast+1 – last

}

}

The second part of the splitting process is to actually carry out the split based on the Range supplied by the split() method.

public boolean createNewSuperNode()

{

Range childRange = split();

boolean sentOK = notify child supernode of its duties

if(sentOK==true)

{

remove childRange from file database

set child range in routing table to new supernode’s addr

return true;

}

else

{

return false;

}

}

4.4.2 Multi-source Downloads

An important part of the overall project is the downloading process as this constitutes the largest proportion time spent on the network. It is therefore necessary to make the process as fast as possible. The method used by GazNet and many other file share applications is that of multi-source downloads i.e. downloading different parts of the file from multiple different nodes therefore utilising as much bandwidth as possible.

Two classes are involved in the download process, the DownloadManager and the Download class. The DownloadManager implements the Singleton design pattern and manages all downloads.

When the user wishes to start a new download, the application calls the addDownload(FileDescription[] files, int selected) method, the array contains all the files that the user’s search returned and the integer is the index in the array of the file that they wish to download. The DownloadManager will then search the array to find matches with the selected file, matches are decided based on whether the filename and the file size match. From this process a maximum of 5 file matches will be returned, the next process is to divide the file into portions that can be downloaded from separate sources.

Splitting the file into download portions

There are multiple processes that take place to carry out a multi-source download, these processes are implemented in 4 different methods; these methods support the pausing and resuming of files.

This method generates the length of each segment depending on how much has been previously downloaded therefore allowing downloads to be resumed.

long calculateChunkSize(FileDescription file, int numDownloadPoints, int num)

{

chunkSize = size of file / number of sources

if (exists a data file containing part of the download)

{

remove this portion of the file from the chunk size

}

return chunkSize

}

The calculateStartPoint method finds the position in which the download should begin, this is calculated by adding the start position of the segment and how much of the file has already been downloaded, together.

long calculateStartPoint(int num, long chunkSize)

{

if (exists a data file containing part of the download)

{

return (how far the download has already progressed + the position

of start of the segment)

}

}

The calculateFinalChunk method calculates the size of the final segment of the file, it is calculated by taking the starting point of the download away from the total size of the file.

long calculateFinalChunk(long startPoint, int num)

{

return total size of file – starting point of final download

}

The final method uses the previous three methods to initiate a new download,

for(number of download sources-1) //loop through all but the last download

{

calculate chunk size;

calculate start point;

begin new download

}

calculate final chunk size

begin new download of remainder of file

When a point to point download is initiated, it is handled by the Download class, if it is a multi-source download it is given five parameters, public Download(FileDescription f, long first, long len, ProtectedInteger fin, File dataFile). The FileDescription is the file that is to be downloaded, this will also contain the IP address of the download point; first is the position in the file to start to download from; len is the number of bytes to download from the starting point; fin is used to coordinate all the downloads so the DownloadManager can tell when all the segments have been downloaded; dataFile is the file in which the bytes should be put into.

4.4.3 Aggregating Searches

Supernodes will frequently, especially on a small network, look after ranges of hash space rather than just one letter (hence the need for a Range data structure rather than merely one integer), it is therefore necessary to sometimes send multiple search words to one supernode. Without any extra processing a node would simply send multiple separate searches to the supernode, this would represent a waste of bandwidth, as every search query would also have extra information with it (i.e. the overhead of the protocol messages).

The search aggregation algorithm is split into two sections, the first breaks up the search string into separate tokens and selects the correct supernode to send them to; the second part aggregates the search words that are going to the same supernode into the same

query.

String[] strings = new String[26];

while( still more search words)

{

int index = value of first letter in search word //A=0, B=1…Z=25

strings[index].concat (next search word)

}

This algorithm will create an array of size 26 with the appropriate search words in each element, e.g. using the example Beach Boys, strings[1] will contain “Beach Boys”, if it was “The Beach Boys”, strings[1] will contain “Beach Boys” and strings[19] will contain “The”. This array is then passed onto the next algorithm which will decide which addresses to send the queries to.

Range range = new Range // range = 0 to 25

set last range to 0

for (int i = 0; i ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download