The Information Funnel: Exploiting Named Data for Information-maximizing Data Collection



download Download PDF

The Information Funnel: Exploiting Named Data for Information-maximizing Data Collection
by Shiguang Wang, Tarek Abdelzaher, Santhosh Gajendran, Ajith Herga, Sachin Kulkarni, Shen Li, Hengchang Liu, Chethan Suresh, Abhishek Sreenath, Hongwei Wang, William Dron, Alice Leung, Ramesh Govindan, John Hancock
In Proceedings of 10th IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS), Marina Del Rey, CA, May 2014.

This paper describes the exploitation of hierarchical data names to achieve information-utility maximizing data collection in social sensing applications. We describe a novel transport abstraction, called the information funnel. It encapsulates a data collection protocol for social sensing that maximizes a measure of delivered information utility, that is the minimized data redundancy, by diversifying the data objects to be collected. The abstraction leverages named-data networking, a communication paradigm where data objects are named instead of hosts. We argue that this paradigm is especially suited for utility-maximizing transport in resource constrained environments, because hierarchical data names give rise to a notion of distance between named objects that is a function of only the topology of the name tree. This distance, in turn, can expose similarities between named objects that can be leveraged for minimizing redundancy among objects transmitted over bottlenecks, thereby maximizing their aggregate utility. With a proper hierarchical name space design, our protocol prioritizes transmission of data objects over bottlenecks to maximize information utility, with very weak assumptions on the utility function. This prioritization is achieved merely by comparing data name prefixes, without knowing application-level name semantics, which makes it generalizable across a wide range of applications. Evaluation results show that the information funnel improves the utility of the collected data objects compared to other lossy protocols.