Working Papers & Projects

The Dark Side of the Moon: Searching for the Other Half of Seasonality

Published:

Abstract: Seasonality is among the most visible properties in time series data, yet a multitude of statistical tests devised over decades of research have only achieved limited success in its detection. In this paper we examine eight existing tests of seasonality and show that there is significant variation in how they classify a series. We then show how this variation, combined with characteristics of the time series (e.g. autocorrelation, frequency, skewness, kurtosis, etc.), can be exploited by a Random Forest framework to map the hypothesis test space and make more accurate predictions regarding the seasonal disposition of a series. Our proposed method reduces Type II errors by approximately sixty percentage points over the next best alternative.

Download Paper

Roots from Trees: A Machine Learning Approach to Unit Root Detection

Published:

Abstract: In this paper we draw inspiration from the ensemble forecasting and model averaging literature and use a gradient descent boosting algorithm to exploit variation between test statistics used to determine if a series contains a unit root. The result is a pseudo-composite ML-based test for unit roots which is four to six percentage points more accurate than the next best traditional test. Through a train-validation framework this method allows for control over Type I error rates and the gains in power come with little variation in specificity (empirical size). Additionally, the proposed method is agnostic towards deterministic elements traditionally needed in the established testing environment and thus closes off an additional error path for unit root testing; that of model misspecification. We illustrate this new testing procedure by applying it to an established benchmark data set and examining the state-level hypothesis of unemployment hysteresis.

Download Paper

For What It’s Worth: Measuring Land Value in the Era of Big Data and Machine Learning

Published:

Abstract: This paper develops a new method for valuing land, a key asset on a nation’s balance sheet. The method first employs an unsupervised machine learning method, kmeans clustering, to discretize unobserved heterogeneity, which we then combine with a supervised learning algorithm, gradient boosted trees (GBT), to obtain property-level price predictions and estimates of the land component. Our initial results from a large national dataset show this approach routinely outperforms hedonic regression methods (as used by the U.K.’s Office for National Statistics, for example) in out-of-sample price predictions. To exploit the best of both methods, we further explore a composite approach using model stacking, finding it outperforms all methods in out-of-sample tests and a benchmark test against nearby vacant land sales. In an application, we value residential, commercial, industrial, and agricultural land for the entire contiguous U.S. from 2006-2015. The results offer new insights into valuation and demonstrate how a unified method can build national and subnational estimates of land value from detailed, parcel-level data. We discuss further applications to economic policy and the property valuation literature more generally.

Download Paper

Alloy Inference: Tests of a Single Null

Published:

Abstract: This paper presents a new joint testing framework that fuses multiple test statistics into a single, more powerful inference tool. Using the probability integral transform to confine the support to the unit-hypercube, I use simulated null cases and Archimedean copulas to approximate the underlying joint null distribution of two or more statistics. Analogous to an alloy in metallurgy, where the final product has [typically] stronger properties than its constituent parts, I show how two or more tests can be combined to outperform a single test statistic in finite samples. To illustrate the performance of this approach, I provide a stylized example using the game of craps such that trade-offs can be be assessed in economic terms. Under potential uncertainty in the fairness of game dice, the proposed method–a combination of the Student-t and $\chi^2$ statistic–provides increased power, producing a revenue distribution which second-order stochastically dominates its constituent parts.

Download Paper