Working Papers & Projects

The Dark Side of the Moon: Searching for the Other Half of Seasonality

Drafted , 2020

Abstract

Seasonality is among the most visible properties in time series data, yet a multitude of statistical tests devised over decades of research have only achieved limited success in its detection. In this paper we examine eight existing tests of seasonality and show that there is significant variation in how they classify a series. We then show how this variation, combined with characteristics of the time series (e.g. autocorrelation, frequency, skewness, kurtosis, etc.), can be exploited by a Random Forest framework to map the hypothesis test space and make more accurate predictions regarding the seasonal disposition of a series. Our proposed method reduces Type II errors by approximately sixty percentage points over the next best alternative.

Download Paper

Roots from Trees: A Machine Learning Approach to Unit Root Detection

Drafted , 2022

Abstract

In this paper we draw inspiration from the ensemble forecasting and model averaging literature and use a gradient descent boosting algorithm to exploit variation between test statistics used to determine if a series contains a unit root. The result is a pseudo-composite ML-based test for unit roots which is four to six percentage points more accurate than the next best traditional test. Through a train-validation framework this method allows for control over Type I error rates and the gains in power come with little variation in specificity (empirical size). Additionally, the proposed method is agnostic towards deterministic elements traditionally needed in the established testing environment and thus closes off an additional error path for unit root testing; that of model misspecification. We illustrate this new testing procedure by applying it to an established benchmark data set and examining the state-level hypothesis of unemployment hysteresis.

Download Paper

For What It’s Worth: Measuring Land Value in the Era of Big Data and Machine Learning

Drafted , 2023

Abstract

This paper develops a new method for valuing land, a key asset on a nation’s balance sheet. The method first employs an unsupervised machine learning method, kmeans clustering, to discretize unobserved heterogeneity, which we then combine with a supervised learning algorithm, gradient boosted trees (GBT), to obtain property-level price predictions and estimates of the land component. Our initial results from a large national dataset show this approach routinely outperforms hedonic regression methods (as used by the U.K.’s Office for National Statistics, for example) in out-of-sample price predictions. To exploit the best of both methods, we further explore a composite approach using model stacking, finding it outperforms all methods in out-of-sample tests and a benchmark test against nearby vacant land sales. In an application, we value residential, commercial, industrial, and agricultural land for the entire contiguous U.S. from 2006-2015. The results offer new insights into valuation and demonstrate how a unified method can build national and subnational estimates of land value from detailed, parcel-level data. We discuss further applications to economic policy and the property valuation literature more generally.

Download Paper

Alloy Inference: Tests of a Single Null

Drafted , 2024

Abstract

This paper presents a new joint testing framework that fuses multiple test statistics into a single, more powerful inference tool. Using the probability integral transform to confine the support to the unit-hypercube, I use simulated null cases and Archimedean copulas to approximate the underlying joint null distribution of two or more statistics. Analogous to an alloy in metallurgy, where the final product has [typically] stronger properties than its constituent parts, I show how two or more tests can be combined to outperform a single test statistic in finite samples. To illustrate the performance of this approach, I provide a stylized example using the game of craps such that trade-offs can be be assessed in economic terms. Under potential uncertainty in the fairness of game dice, the proposed method–a combination of the Student-t and $\chi^2$ statistic–provides increased power, producing a revenue distribution which second-order stochastically dominates its constituent parts.

Download Paper

It’s About Time (Series): A Simple Correction for Difference-in-Difference Estimators

Drafted , 2025

Abstract

This paper reconsiders the difference-in-differences (DiD) research design for panel data, particularly when serial correlation stems from first-order model misspecification (i.e., dependence in $y_t$ rather than exclusively in $\epsilon_t$). When time-series issues like this are overlooked, the traditional parallel trends assumption is insufficient. In fact, for most panel applications ($T>2$ periods), DiD designs will misidentify and inflate a time-invariant treatment effect. To correct this, we show that DiD assumptions should be modified for dynamic panels and how explicitly accounting for temporal dependence in the design can recover the true, dynamically-robust effect. We evaluate a simple modification to DiD designs through Monte Carlo simulations and then explore its implications with empirical examples. Two examples leverage a policy shock used in recent literature to reevaluate the impact of household credit constraints on outcomes like state-level GDP growth and labor market participation. When we implement the proposed modification, which can be as simple as incorporating a lagged outcome and group interaction into a DiD model, the results illustrate a reduction in bias predicted by theory, yielding a more generalizable estimator for most applications. Finally, we find synthetic DiD and synthetic control methods do not remedy this particular issue, as similar modifications (e.g., pre-whitening) are needed to address temporal dependence in the outcome.

Download Paper

Gary Cornwall

Working Papers & Projects

The Dark Side of the Moon: Searching for the Other Half of Seasonality

Roots from Trees: A Machine Learning Approach to Unit Root Detection

For What It’s Worth: Measuring Land Value in the Era of Big Data and Machine Learning

Alloy Inference: Tests of a Single Null

It’s About Time (Series): A Simple Correction for Difference-in-Difference Estimators