ALGORITHMS FOR DATA DISSEMINATION AND COLLECTION

Diploma

ABSTRACT

Broadcasting and gossiping are classical problems that have been widely studied for decades. In broadcasting, one source node wishes to send a message to every other node, while in gossiping, each node has a message that they wish to send to everyone else. Both are some of the most basic problems arising in communication networks. In this dissertation we study problems that generalize gossiping and broadcasting. For example, the source node may have several messages to broadcast or multicast. Many of the works on broadcasting in the literature are focused on homogeneous networks. The algorithms developed are more applicable to managing data on local-area networks. However, large-scale storage systems often consist of storage devices clustered over a wide-area network. Finding a suitable model and developing algorithms for broadcast that recognize the heterogeneous nature of the communication network is a significant part of this dissertation.

We also address the problem of data collection in a wide-area network, which has largely been neglected, and is likely to become more significant as the Internet becomes more embedded in everyday life. We consider a situation where large amounts of data have to be moved from several different locations to a destination. In this work, we focus on two key properties: the available bandwidth can fluctuate, and the network may not choose the best route to transfer the data between two hosts.

We focus on improving the task completion time by re-routing the data through intermediate hosts and show that under certain network conditions we can reduce the total completion time by a factor of two. This is done by developing an approach for computing coordinated data collection schedules using network flows.


Introduction

Broadcasting and gossiping are some of the most basic problems arising in communication networks. In this thesis, we are primarily concerned with problems related to broadcasting and gossiping that arise when managing large amounts of data. We study several different problems, all of which are related to data dissemination and collection on both local-area and wide-area networks.

The problems of broadcasting and gossiping have been widely studied for decades. The broadcasting problem is defined as follows: there are n nodes, and one source node needs to convey an item to every other node. In the gossiping problem, each node has an item that they wish to communicate to everyone else. We may treat the gossip problem as simultaneously performing n broadcasts. Communication is typically done in rounds, where in each round a node may send (or receive) an item to (or from) at most one other node. One typical objective function is to minimize the number of communication rounds. In this thesis, we use this objective function as the performance metric. Another typical objective function considered in the literature is to minimize the number of calls placed.