Automation of Strategic Data Prioritization in System Model Calibration : Sensor Placement
Model calibration is challenging for large-scale system models with a great number of variables. Existing approaches to partitioning of system models and prioritizing data acquisition rely on heuristics rather than formal treatments. The sensor placement problem on physical dynamic systems points to a promising avenue for formalizing strategic data prioritization and partial model calibration, which addresses the following question on system models: with the model at hand and a pre-existing data availability on certain model variables, what are the (next) k model variables that would bring the largest utility to model calibration, once their data are acquired? In this study, we formalize this problem as combinatorial optimization and adapt two solutions for physical systems to system models in social sciences: the information-entropy method and the miss-probability method, from physical systems to system models in social sciences. Then, based on the idea of Data Availability Partition, we develop a third method. The new method can be understood from the entropy perspective and is embedded in the theoretical framework for the evaluation of side information. Our solution applies to system models of different topologies: analytical results of optimal placement are derived for binary/multi-ary trees; for general tree structures, the algorithm to determine optimal placement is developed, whose complexity is upper-bounded by O(nlog_2(n)) for an n-variable model; for arbitrary model topologies with the presence of loops, sequential-optimal and simulated-annealing solvers are formulated. Three methods are compared on a transparent validating model structure; our method outperforms the two translated methods, yielding practical and robust solutions across different usage scenarios. Its stability in decision recommendation is coupled with the method's sufficient accommodation of the conditional nature of the placement problem. Application on a multi-compartment system model further showcases the toolkit's practical utility