Ludwig, Jens; Mullainathan, Sendhil; Rambachan, Ashesh - National Bureau of Economic Research - 2025
generation) is valid under one condition: no "leakage" between the LLM's training dataset and the researcher's sample. No leakage … can be ensured by using open-source LLMs with documented training data and published weights. Using LLM outputs for …) requires the researcher to collect at least some validation data: without such data, the errors of the LLM's automation cannot …