Date: Friday, December 11, 2015
Location: Co-located with NIPS 2015, Montreal, Canada
Attendance: The workshop was open to all NIPS workshops registrants.
To perform inference after model selection, we propose controlling the selective type I error; i.e., the error rate of a test given that it was performed. By doing so, we recover long-run frequency properties among selected hypotheses analogous to those that apply in the classical (non-adaptive) context. Our proposal is closely related to data splitting and has a similar intuitive justification, but is more powerful. Exploiting the classical theory of Lehmann and Scheffe (1955), we derive most powerful unbiased selective tests and confidence intervals for inference in exponential family models after arbitrary selection procedures. For linear regression, we derive new selective z-tests that generalize recent proposals for inference after model selection and improve on their power, and new selective t-tests that do not require knowledge of the error variance sigma^2.
Controlling false discovery rate (the overall proportion of false positives among all discoveries), as opposed to family-wise error rate (zero false positives allowed), allows for an adaptive threshold for discovering signals, which will be less conservative in a data set where there are more signals overall. In some settings, we are able to go further by leveraging other sources of information. I will discuss two problems: first, ordered hypothesis testing, where prior information allows us to identify which tests are most likely to contain true signals and thus we can adaptively decide how many hypotheses to test along an ordered list; and second, multi-layer false discovery rate control, where we group the hypotheses being tested (for instance, arising from spatial structure), and develop a multiple testing procedure that respects the group structure.
We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given n samples from an unknown distribution can give valid answers to more than O(n^2) adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. We also show an analogous result when the algorithm may be computationally unbounded but the data is very high dimensional.
Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, Dwork et al. and Bassily et al., showed how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm. They also gave efficient algorithm that can answer nearly n^2 adaptively chosen queries, which shows our result is almost quantitatively tight.
Conceptually, our result demonstrates that achieving statistical validity alone can be a source of computational intractability in adaptive settings. For example, in the modern large collaborative research environment, data analysts typically choose a particular approach based on previous findings. False discovery occurs if a research finding is supported by the data but not by the underlying distribution. While the study of preventing false discovery in Statistics is decades old, to the best of our knowledge our result is the first to demonstrate a computational barrier. In particular, our result suggests that the perceived difficulty of preventing false discovery in today's collaborative research environment may be inherent.
This talk is based on joint work with Moritz Hardt and Thomas Steinke and conversations with Adam Smith.