Combinatorial Inference in Geometric Data Analysis

Overview

Combinatorial Inference in Geometric Data Analysis provides a combinatorial approach to statistical inference adapted to Euclidean clouds of points arising from Geometric Data Analysis (GDA). The book develops a unified framework for typicality tests (comparing a group to a reference population) and homogeneity tests (comparing several subclouds), relying on combinatorial permutation rather than parametric distributional assumptions.

Authors and Publisher

Brigitte Le Roux, Solène Bienaise, Jean-Luc Durand — Chapman & Hall/CRC, Computer Science & Data Analysis Series, 2019.
Publisher’s page

	Chapter	Page
	Preface	vii
	Symbols	xi
1	Introduction	1
	1.1 On combinatorial inference	1
	1.2 On Geometric Data Analysis	4
	1.3 On Inductive Data Analysis	5
	1.4 Computational aspects	6
2	Cloud of Points in a Geometric Space	9
	2.1 Basic statistics	10
	2.2 Covariance structure of a cloud	14
	2.3 Mahalanobis distance and principal ellipsoids	20
	2.4 Partition of a cloud	25
3	Combinatorial Typicality Tests	29
	3.1 The typicality problem	29
	3.2 Combinatorial typicality test for mean point	32
	3.3 One-dimensional case: typicality test for mean	45
	3.4 Combinatorial typicality test for variance	49
	3.5 Combinatorial inference in GDA	51
	3.6 Computations with R and Coheris SPAD software	55
4	Geometric Typicality Test	65
	4.1 Principle of the test	65
	4.2 Geometric typicality test for mean point	69
	4.3 One-dimensional case: typicality for mean	86
	4.4 The case of a design with two repeated measures	90
	4.5 Other methods	92
	4.6 Computations with R and Coheris SPAD software	97
5	Homogeneity Permutation Tests	107
	5.1 The homogeneity problem	107
	5.2 Principle of combinatorial homogeneity tests	108
	5.3 Homogeneity of independent groups: general case	109
	5.4 Homogeneity of two independent groups	116
	5.5 The case of a repeated measures design	133
	5.6 Other methods	140
	5.7 Computations with R and Coheris SPAD software	141
6	Research Case Studies	153
	6.1 The Parkinson study	156
	6.2 The Members of French Parliament and Globalisation	170
	6.3 The European Central Bankers study	188
	6.4 Cognitive Tests and Education	200
	Bibliography	245
	Author Index	250
	Subject Index	252

Companion Materials

Data and Simplified R Scripts

The simplified R scripts below compute observed significance levels (p-values) and compatibility regions for Chapters 3, 4 and 5.

Chapter 3 — Combinatorial Typicality Tests

Typicality tests consist in comparing the observations of a group with the ones of a reference population of which the group may or may not be a subset. Two test statistics are studied: (1) the Mahalanobis distance between points with respect to the covariance structure of the reference cloud; (2) the variance of the cloud.

Multidimensional case (Euclidean clouds, see pp. 55–59) — reference data (“Target” example): Target_reference.txt ; group data: Target_group.txt ; R script: Combinatorial_Typicality.R.
One-dimensional case (numerical variable) — the previous R script applied to a one-dimensional cloud performs the test with the squared calibrated deviation between means as the test statistic; it does not provide the directional test based on the deviation between means.

Chapter 4 — Geometric Typicality Test

The geometric typicality test consists in comparing the mean point of a Euclidean cloud to a reference point by taking the squared Mahalanobis distance between points as a test statistic. This test can be applied to a design with two repeated measures, the basic dataset being the individual differences.

Multidimensional case (Euclidean clouds) — “Target” example data: Target.txt ; R script (pp. 97–101): Geometric_Typicality.R.
One-dimensional case (numerical variable) — the previous R script applied to a one-dimensional cloud provides the results corresponding to a test with the calibrated deviation between the group mean and the reference mean as test statistic.
Design with two repeated measures — Student’s example: Student.txt.

Chapter 5 — Homogeneity Tests

The homogeneity tests presented in this chapter consist in comparing several subclouds by taking the M-variance between the mean points of subclouds as a test statistic — that is, the variance calculated from the Mahalanobis distance between points. The book studies the case of several independent groups and the case of repeated measures. In the case of several independent groups, several permutation schemes are studied depending on whether the comparison is global, partial, or specific (see pp. 109–110).

Data (p. 142): Target_4.txt
R script (pp. 142–147, partial or specific comparison of two independent groups): Homogeneity.R

Full and SPAD-Interfaced R Scripts

The full R scripts implement the methods described in the book. Each ZIP archive contains three scripts (“main”, “parameters”, “core”), data files and a user’s guide.

Combinatorial typicality tests (Chapter 3): CIGDA_combi.zip
Combinatorial typicality tests — SPAD-interfaced script: CIGDA_Comb-v1.1.R
Geometric typicality test (Chapter 4) — SPAD-interfaced script: CIGDA_Geo-v1.R
Homogeneity permutation tests (Chapter 5) — SPAD-interfaced script: CIGDA_Homog-v1.R

Research Case Studies (Chapter 6)

For each case study, data are provided in Excel format together with a SPAD project that reproduces the analyses presented in Chapter 6.

The Parkinson Study — data: Parkinson.xls; SPAD project: The Parkinson Study
Members of French Parliament and Globalisation — data: MPs&Globalisation.xls; SPAD project: MPs-Globalisation
The European Central Bankers Study — data and SPAD project available on request from Frédéric Lebaron.
Cognitive Tests and Education — data: CognitiveTests.xls; SPAD project: Cognitive Study