**Date:** Friday, December 9, 2016

**Location:** Part of NIPS 2016, Barcelona, Spain

**Attendance:** The workshop is open to all NIPS
workshops registrants.

Morning Session (8:55am – 12:00pm)

8:55-9:00

Organizers

Introductory remarks

9:00-9:35

Ruth Heller (Tel Aviv U.)

9:35-10:10

Weijie Su (Penn)

10:10-10:20

Vitaly Feldman (IBM Research)

Discussion

10:20-10:50

Break

10:50-12:00

Ibrahim Alabdulmohsin, Joshua Loftus, Yu-Xiang Wang, Sam Elder, Aaditya Ramdas, Ryan Rogers

12:10am - 2:30pm

Lunch Break

Afternoon Session (2:30pm – 6pm)

3:05-3:40

Katrina Ligett (Caltech/Hebrew U.)

3:40-3:50

Aaditya Ramdas (Berkeley)

Discussion

3:50-4:35

Poster break

4:35-4:55

Lucas Janson (Stanford)

5:15-5:50

Peter Grunwald (CWI/Leiden U.)

5:50-6:00

Aaron Roth (Penn)

Wrap-up discussion

In many genomic applications, it is common to perform tests using aggregate-level statistics within naturally defined classes for powerful identification of signals. Following aggregate-level testing, it is naturally of interest to infer on the individual units that are within classes that contain signal. Failing to account for class selection will produce biased inference. We develop multiple testing procedures that allow rejection of individual level null hypotheses while controlling for conditional (familywise or false discovery) error rates. We use simulation studies to illustrate validity and power of the proposed procedures in comparison to several possible alternatives. We illustrate the usefulness of our procedures in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project.

Joint work with Nilanjan Chatterjee, Abba Krieger, and Jianxin Shi.

We provide the first differentially private algorithms for controlling the false discovery rate (FDR) in multiple hypothesis testing. Our general approach is to adapt a well-known variant of the Benjamini-Hochberg procedure (BHq), making each step differentially private. This destroys the classical proof of FDR control. To prove FDR control of our method, we develop a new proof of the original (non-private) BHq algorithm and its robust variants -- a proof requiring only the assumption that the true null test statistics are independent, allowing for arbitrary correlations between the true nulls and false nulls. This assumption is fairly weak compared to those previously shown in the vast literature on this topic, and explains in part the empirical robustness of BHq.

The traditional notion of generalization --- i.e., learning a hypothesis whose empirical error is close to its true error --- is surprisingly brittle. As has recently been noted, even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization ---increasing in strength--- that are robust to post-processing and amenable to adaptive composition, and examine the relationships between them.

A common problem in modern statistical applications is to select, from a large set of candidates, a subset of variables which are important for determining an outcome of interest. For instance, the outcome may be disease status and the variables may be hundreds of thousands of single nucleotide polymorphisms on the genome. This talk introduces model-free knockoffs, a framework for finding dependent variables while provably controlling the false discovery rate (FDR) in finite samples. FDR control holds no matter the form of the dependence between the response and the covariates, which does not need to be specified in any way. What is required is that we observe i.i.d. samples (X,Y) and know something about the distribution of the covariates although we have shown that the method is robust to unknown/estimated covariate distributions. This framework builds on the knockoff filter of Foygel Barber and Candès introduced a couple of years ago, which was limited to linear models with fewer variables than observations (n ‹ p). In contrast, model-free knockoffs deal with a range of problems far beyond the scope of the original knockoff paper—e.g. it provides valid selections in any generalized linear model including logistic regression---while being more powerful than the original procedure when it applies. Finally, we apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.

Recent development in selective inference has provided a framework of valid inference after some information of the data has been used for model selection. However, most literature concerning selective inference require the practitioners to commit to a pre-specified procedure for model selection. This is rather stringent for applications. In many cases, multiple exploratory data analyses will be performed and the outcome of each will be input to the final model selected by the practitioners. Therefore, we want to develop a framework that allows multiple queries to the data. In a framework similar to that in differential privacy, we allow valid inference after multiple queries to the database. We seek to address this problem from the perspective of “multiple views of the data” and two concrete examples are considered below.

Joint work with Jonathan Taylor.

Standard p-value based hypothesis testing is not at all adaptive: if our test result is promising but not conclusive (say, p = 0.07) we cannot simply decide to gather a few more data points. While the latter practice is ubiquitous in science, it invalidates p-values and error guarantees.

Here we propose an alternative test based on supermartingales - it has both a gambling and a data compression interpretation. This method allows us to freely combine results from different tests by multiplication (which would be a mortal sin for p-values!), and avoids many other pitfalls of traditional testing as well. If the null hypothesis is simple (a singleton), it also has a Bayesian interpretation, and essentially coincides with a proposal by Vovk (1993) and Berger et al. (1994). Here we work out, for the first time, the case of composite null hypotheses, which allows us to formulate safe, nonasymptotic versions of the most popular tests such as the t-test and the chi square tests. Safe tests for composite H0 are not Bayesian, and initial experiments suggests that they can substantially outperform Bayesian tests (which for composite nulls are not adaptive in general).

10:50-11:00. * Ibrahim Alabdulmohsin*. On the Interplay between Information, Stability, and Generalization

11:00-11:10. * Joshua Loftus*. Significance testing after cross-validation

11:10-11:20. * Yu-Xiang Wang, Jing Lei and Stephen E. Fienberg*. A Minimax Theory for Adaptive Data Analysis

11:20-11:30. * Sam Elder*. Bayesian Adaptive Data Analysis: Challenges and Guarantees

11:30-11:40. *Rina Foygel Barber and Aaditya Ramdas. p-filter: An internally consistent framework for FDR.*

11:40-11:50. * Ryan Rogers, Aaron Roth, Adam Smith and Om Thakkar*. Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing