A flexible, scaleable approach to the international patent 'name game'
This paper reports a new approach to disambiguation of large patent databases. Available international patent databases do not identify unique innovators. Record disambiguation poses a significant barrier to subsequent research. Present methods for overcoming this barrier couple ad-hoc rules for name harmonisation with labourintensive manual checking. We present instead a computational approach that requires minimal and easily automated data cleaning, learns appropriate record-matching criteria from minimal human coding, and dynamically addresses both computational and data-quality issues that have impeded progress. We show that these methods yield accurate results at rates comparable to outcomes from more resource-intensive hand coding.