An efficient and privacy-preserving framework for information dissemination among independent entities
Information dissemination is the very reason for the existence of the Internet. Within the community of independent entities that make up the Internet, the quality of openness that has contributed to scalability and connectivity has also introduced numerous security and privacy challenges. This is particularly the case when sensitive information is distributed among entities that do not have pre-existing trust relationships. In this thesis, we concentrate on several important problems that arise in constructing a framework for information transmission within an open environment while providing privacy. We first consider the procedure of establishing mutual trust by exchanging digital credentials, a process referred to as trust negotiation. Different from other existing work that focuses on how to establish trust safely and completely, we investigate the problem of minimizing the amount of credential information that is exchanged during a trust-negotiation process. We prove the NP-hardness of this minimization problem, and propose and evaluate efficient heuristic algorithms that are still safe and complete. We next investigate how to distribute information with a minimum cost among entities that have established trust relationships. Specifically, we study this minimization problem in a so-called publish/subscribe system. Publish/subscribe (pub/sub) is an emerging paradigm for information dissemination in which information published by publishers and interests submitted by subscribers are sent to the pub/sub system. The pub/sub system then matches events and interests and delivers to each user those events that satisfy that user's declared interests. We consider cases where information dissemination is restricted by policy constraints (e.g., due to security or confidentiality concerns), and where information can be combined at so-called brokers in the network, a process known as composition. Unsurprisingly, the minimization problem is shown to be NP-complete. We then propose and compare different approximation approaches, showing that the proposed heuristics found good solutions over a range of problem configurations, especially in a policy-constrained system. We then examine the problem of protecting private information in a stream processing system. We propose a Mulit-Set Attribute (MSA) model to address the need for formal evaluation and verification of the privacy and policy constraints that must be met by the system. The MSA model is designed to provide privacy protection to personally identifiable information under real time requirements and in the presence of untrustworthy processing elements. Under a MSA model, data requests not compliant with privacy policy constraints are denied. This binary-decision (i.e., either allowance or denial) model can be too rigid in practice and fail to balance data privacy and utility. To quantify the trade-off between privacy and utility, we propose a privacy model based on identifiability risk computation that estimates the risk of data access, and allows those requesting data access to decide whether the risk is justified. We present the definition and calculation of the identifiability risk. We further illustrate our approach using data published by the U. S. Census Bureau.