A Graph Theoretic Linkage Attack on Microdata in a Metric Space
The knowledge of (e.g. geographic) distances between the entities whose data are stored in a microdata table makes several methods of analysis only possible. For instance, such knowledge is necessary and sufficient to perform data mining tasks such as nearest neighbour search or clustering. However, the risk of identity disclosure has to be reconsidered once more when (approximate) inter-record distances are published in addition to the microdata for research purposes. In order to tackle this problem, we introduce a flexible graph model for microdata in a metric space and propose a linkage attack based on a realistic assumption on a data snooper's background knowledge. This attack is based on the idea of finding a maximum approximate common subgraph of two vertex-labelled and edge-weighted graphs. By adapting a standard argument from graph theory to our setup this task is transformed to the maximum clique detection problem in a corresponding product graph. A toy example and experimental results on simulated data show that publishing even approximate distances only might increase the risk of identity disclosure unreasonably