Serena Ng, National Science Foundation

Topics in Analysis of Big Data and Complex Models

The proposed research seeks to provide researchers with new methods to study economic issues of current interest, making effective use of big datasets that are only recently available. The projects deal with issues in data preprocessing, estimation, and hypothesis testing. The emphasis is on methods with broad applicability and that can be put to practical use. The proposed research is also multidisciplinary, combining methodologies from statistics with those from computer science, while providing methods for empirical researchers from any field that uses statistical methods on massive data sets. The proposed research consists of three projects. The first project develops methods for efficient and effective analysis of big data for the purpose of understanding micro and macroeconomic phenomenon. While datasets that are terabytes in size are increasingly available, resource constraints often make it necessary to study a smaller set of observations, which raises the question about how to form subsamples. The investigator will develop methods that can efficiently use large datasets, while preserving data features valuable to economic analysis. The second project provides frequentist tools to assess sensitivity of the estimation results to model assumptions and features of the data, which will be particularly useful to assess the results from complex structural models. The third project will assess whether uncertainty is a cause or a consequence of economic fluctuations. Given that there is no ideal instrument to distinguish uncertainty shocks from real activity shocks, the investigator will develop an iterative method that would purge the unwarranted variations from a potentially invalid instrument in order to arrive at a valid instrument. This generated external IV procedure can generally be used in applications when no valid instrument is available.