Title: | Log-Multiplicative Models, Including Association Models |
---|---|
Description: | Functions to fit log-multiplicative models using 'gnm', with support for convenient printing, plots, and jackknife/bootstrap standard errors. For complex survey data, models can be fitted from design objects from the 'survey' package. Currently supported models include UNIDIFF (Erikson & Goldthorpe, 1992), a.k.a. log-multiplicative layer effect model (Xie, 1992) <doi:10.2307/2096242>, and several association models: Goodman (1979) <doi:10.2307/2286971> row-column association models of the RC(M) and RC(M)-L families with one or several dimensions; two skew-symmetric association models proposed by Yamaguchi (1990) <doi:10.2307/271086> and by van der Heijden & Mooijaart (1995) <doi:10.1177/0049124195024001002> Functions allow computing the intrinsic association coefficient (see Bouchet-Valat (2022) <doi:10.1177/0049124119852389>) and the Altham (1970) index <doi:10.1111/j.2517-6161.1970.tb00816.x>, including via the Bayes shrinkage estimator proposed by Zhou (2015) <doi:10.1177/0081175015570097>; and the RAS/IPF/Deming-Stephan algorithm. |
Authors: | Milan Bouchet-Valat [aut, cre], Heather Turner [ctb], Michael Friendly [ctb], Jim Lemon [cph], Gabor Csardi [cph] |
Maintainer: | Milan Bouchet-Valat <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7.4 |
Built: | 2024-11-02 02:56:39 UTC |
Source: | https://github.com/nalimilan/logmult |
These functions allow performing in a straightforward and efficient way an analysis of association (ANOAS) consisting of successive RC(M) or RC(M)-L models from 1 to N dimensions. They fit the models efficiently by using scores from the previous model as starting values for the next one.
anoas(tab, nd = 3, symmetric = FALSE, diagonal = FALSE, ...) anoasL(tab, nd = 3, layer.effect = c("homogeneous.scores", "heterogeneous", "none"), symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), ...)
anoas(tab, nd = 3, symmetric = FALSE, diagonal = FALSE, ...) anoasL(tab, nd = 3, layer.effect = c("homogeneous.scores", "heterogeneous", "none"), symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), ...)
tab |
a two-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above two will be collapsed as appropriate. |
nd |
the number of dimensions to include in the most complex model. Cannot exceed
|
symmetric |
|
layer.effect |
See |
diagonal |
|
... |
Contrary to most analyses of association in the literature, this function currently does not fit uniform association model (“U”), nor separate models with only row and column association (“R” and “C” models), nor log-linear row and column association models.
Currently, no significance test is performed on the models. Please note that it is not correct to test the one-dimension association model against the independence model.
A list
of gnm
objects. The first element is the independence model, the remaining ones are rc
(for anoas
) or rcL
(for anoasL
) objects with dimensions from 1 to nd
.
Milan Bouchet-Valat
Wong, R.S-K. (2010). Association models. SAGE: Quantitative Applications in the Social Sciences.
## Wong (2010), Table 2.6 data(gss8590) # The table used in Wong (2010) is not perfectly consistent # with that of Wong (2001) tab <- margin.table(gss8590[,,c(2,4)], 1:2) tab[2,4] <- 49 # Results correspond to lines 1, 6 and 11 results <- anoas(tab, nd=2) results
## Wong (2010), Table 2.6 data(gss8590) # The table used in Wong (2010) is not perfectly consistent # with that of Wong (2001) tab <- margin.table(gss8590[,,c(2,4)], 1:2) tab[2,4] <- 49 # Results correspond to lines 1, 6 and 11 results <- anoas(tab, nd=2) results
Identify log-multiplicative association scores from over-parameterized gnm models.
## S3 method for class 'rc' assoc(model, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, ...) ## S3 method for class 'rc.symm' assoc(model, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, ...) ## S3 method for class 'hmskew' assoc(model, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, ...) ## S3 method for class 'yrcskew' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL.symm' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL.trans' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'hmskewL' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL.trans.symm' assoc(model, weighting = c("marginal", "uniform", "none"), ...)
## S3 method for class 'rc' assoc(model, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, ...) ## S3 method for class 'rc.symm' assoc(model, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, ...) ## S3 method for class 'hmskew' assoc(model, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, ...) ## S3 method for class 'yrcskew' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL.symm' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL.trans' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'hmskewL' assoc(model, weighting = c("marginal", "uniform", "none"), ...) ## S3 method for class 'rcL.trans.symm' assoc(model, weighting = c("marginal", "uniform", "none"), ...)
model |
a |
weighting |
the weights to be used when normalizing scores (see ‘Details’). |
rowsup |
a matrix with the same columns as the model data giving supplementary (passive) rows to include in the result. |
colsup |
a matrix with the same rows as the model data giving supplementary (passive) columns to include in the result. |
... |
currently unused. |
These functions extract parameters from gnm
log-multiplicative models
and make them identifiable by imposing the required constraints on them. The general pattern
is that row and column scores are separately centered around 0 and scaled so that they sum to 1,
and so that their cross-dimensional correlation is null. From this operation result two series of
scores (rows and columns) plus an intrinsic association coefficient (phi) for each dimension.
Most users do not need to call these directly, but they are still made public since they may be
useful for advanced uses, notably when combining log-multiplicative association components with
other model specifications. assoc
can be used to identify the scores, the rest of the
coefficients being extracted manually by the caller.
An assoc
object with the following components:
phi |
The intrisic association parameters, one per dimension. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
weighting |
The name of the weighting method used, reflected by |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
rc
, hmskew
, yrcskew
, rcL
,
rcL.trans
Three-way table crossing eye color and hair color in two places, Caithness and Aberdeen. This table is used by Becker and Clogg (1989) to illustrate several log-multiplicative row-column association models, with and without layer effect.
data(color)
data(color)
Becker, M.P., and Clogg, C.C. (1989). Analysis of Sets of Two-Way Contingency Tables Using Association Models. J. of the Am. Stat. Association 84(405), 142-151.
## see ?rc
## see ?rc
Number of men aged 15-19 charged with a criminal case for whom charges were dropped: Denmark, 1955-1958. This two-way table is used by Goodman (1991) to illustrate a log-multiplicative row-column model with one dimension. It was used before by Rasch (1966), Christiansen and Stene (1969), and Andersen (1986, 1990).
data(criminal)
data(criminal)
Goodman, L.A. (1991). Measures, Models, and Graphical Displays in the Analysis of Cross-Classified Data. J. of the Am. Stat. Association 86(416), 1086, Table 1.
## see ?rc
## see ?rc
Three-way table crossing education and occupational attainment by sex and period among white men and women from the General Social Survey: United States, 1975-1980 and 1985-1990. This table is used by Wong (2010) to illustrate log-multiplicative row-column models with three dimensions.
data(gss7590)
data(gss7590)
Wong, R.S-K. (2010). Association models. SAGE. 32, Table 4.3.
## see ?rcL and ?plot.rcL
## see ?rcL and ?plot.rcL
Two-way table crossing education and occupational attainment among women from the General Social Survey: United States, 1985-1990. This table is used by Wong (2001, 2010) to illustrate a log-multiplicative row-column model with two dimensions.
data(gss8590)
data(gss8590)
Wong, R.S-K. (2001). Multidimensional Association Models : A Multilinear Approach. Sociol. Methods & Research 30, 197-240, Table 2.
Wong, R.S-K. (2010). Association models. SAGE. 32, Table 2.3 B.
## see ?rc and ?plot.rc # The table reported in Wong (2010) has a cell inconsistent with # what was reported in Wong (2001). To fix this: data(gss8590) tab <- margin.table(gss8590[,,c(2,4)], 1:2) tab[2,4] <- 49
## see ?rc and ?plot.rc # The table reported in Wong (2010) has a cell inconsistent with # what was reported in Wong (2001). To fix this: data(gss8590) tab <- margin.table(gss8590[,,c(2,4)], 1:2) tab[2,4] <- 49
Two-way table crossing occupational attainment and years of education among persons aged 20 through 64 from the General Social Survey: United States, 1988. This table is used by Clogg and Shihadeh (1994) to illustrate a log-multiplicative row-column model with one dimension.
data(gss88)
data(gss88)
Clogg, C.C., and Shihadeh, E.S. (1994). Statistical Models for Ordinal Variables. Sage: Advanced Quantitative Techniques in the Social Sciences (4), Table 3.1, p. 40.
## see ?rc
## see ?rc
Three-way mobility table assembled by Hazelrigg and Garnier (1976), and used by Zhou (2015) to illustrate the shrinkage estimation of the log-odds ratios and of the Altham index.
data(hg16)
data(hg16)
Hazelrigg, L.E., Garnier, M.A (1976). Occupational Mobility in Industrial Societies: A Comparative Analysis of Differential Access to Occupational Ranks in Seventeen Countries. American Sociological Review 41: 498-511.
Zhou, X. (2015). Shrinkage Estimation of Log-Odds Ratios for Comparing Mobility Tables. Sociological Methodology 45(1):33-63.
## see ?iac
## see ?iac
Fits a skew-symmetric association model proposed in van der Heijden & Mooijaart (1995) to describe asymmetry of square tables. Skew-symmetric association can be combined with quasi-symmetry (the default), quasi-independence, or symmetric (homogeneous) RC(M) associations.
hmskew(tab, nd.symm = NA, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
hmskew(tab, nd.symm = NA, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
tab |
a square two-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above two will be collapsed. |
nd.symm |
the number of dimensions to include in the symmetric RC(M) association. Cannot exceed
|
diagonal |
should the model include parameters specific to each diagonal cell? This amounts to taking quasi-independence, rather than independence, as the baseline model. |
weighting |
what weights should be used when normalizing the scores. |
rowsup |
if present, a matrix with the same columns as |
colsup |
if present, a matrix with the same rows as |
se |
which method to use to compute standard errors for parameters. |
nreplicates |
the number of bootstrap replicates, if enabled. |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
The original model presented by van der Heijden & Mooijaart (1995), called “quasi-symmetry plus
skew-symmetry”, combines a skew-symmetric association with a quasi-symmetry baseline; it is the variant
fitted by default by this function. If nd.symm
is set to a positive integer value, though, variants
using a RC(M) model to describe the symmetric association are used, with our without
diagonal-specific parameters (depending on the value of the diagonal
argument).
These models follow the equation:
where is the expected frequency for the cell at the intersection of row i and column j of
tab
, and a quasi-symmetric specification, with either full interaction parameters, or
a RC(M) association. See reference for detailed information about the degrees of freedom and the identification
constraints applied to the scores.
Another model presented in the paper, the “symmetry plus skew-symmetry model” is not currently supported
out of the box, but should be relatively straightforward to implement using the underlying assoc.hmskew
function combined with a symmetric association model.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for skew association parameters are computed using an eigen value decomposition from the
results of the model without skew association component (“base model”); if nd.symm
is not NA
and
strictly positive, random starting values are used. In some complex cases, using start = NULL
to start with
random values can be more efficient, but it is also less stable and can converge to non-optimal solutions.
A hmskew
object, which is a subclass of an rc.symm
object (see rc
) if
nd.symm
is strictly positive. In addition to this class, it contains a assoc.hmskew
component
holding information about the skew-symmetric association:
phi |
The intrisic association parameters, one per dimension. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
covmat |
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if |
adj.covmats |
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if |
covtype |
The method used to compute the variance-covariance matrix (corresponding to the
|
Milan Bouchet-Valat
van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.
## van der Heijden & Mooijaart (1995), Table 2c, p. 23 data(ocg1973) # 5:1 is here to take "Farmers" as reference category (angle 0) model <- hmskew(ocg1973[5:1, 5:1], weighting="uniform") model ass <- model$assoc.hmskew # First column of the table round(ass$row[,,1] * sqrt(ass$phi[1,1]), d=2)[5:1,] # Right part of the table round(ass$phi[1] * (ass$row[,2,1] %o% ass$row[,1,1] - ass$row[,1,1] %o% ass$row[,2,1]), d=3)[5:1, 5:1] # Plot plot(model, coords="cartesian")
## van der Heijden & Mooijaart (1995), Table 2c, p. 23 data(ocg1973) # 5:1 is here to take "Farmers" as reference category (angle 0) model <- hmskew(ocg1973[5:1, 5:1], weighting="uniform") model ass <- model$assoc.hmskew # First column of the table round(ass$row[,,1] * sqrt(ass$phi[1,1]), d=2)[5:1,] # Right part of the table round(ass$phi[1] * (ass$row[,2,1] %o% ass$row[,1,1] - ass$row[,1,1] %o% ass$row[,2,1]), d=3)[5:1, 5:1] # Plot plot(model, coords="cartesian")
A function of class "nonlin"
to specify a van der Heijden & Mooijaart (1995) skew-symmetric
association in the formula argument to gnm
.
HMSkew(row, col, inst = NULL)
HMSkew(row, col, inst = NULL)
row |
the levels of the row variable |
col |
the levels of the column variable |
inst |
a positive integer specifying the instance number of the term |
This function is used by hmskew
to fit skew-symmetric models proposed by
van der Heijden & Mooijaart (1995) and their variants. It can be used directly to fit custom
variants of the model not supported by hmskew
.
This function combines its arguments in the following way:
where is the skew association for the cell at the intersection of row i and column j of
the table. See reference for mathematical details.
A list with the required components of a "nonlin"
function:
predictors |
the expressions passed to |
term |
a function to create a deparsed mathematical expression of the term, given labels for the predictors. |
call |
the call to use as a prefix for parameter labels. |
Milan Bouchet-Valat
van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.
# See ?hmskew.
# See ?hmskew.
Fits an extension of the skew-symmetric association model proposed in van der Heijden & Mooijaart (1995) to describe asymmetry of square tables. This model introduces a layer effect by which the strength of skew-symmetric association, and optionnally scores, can vary over the levels of the third dimension of the table. Skew-symmetric association can be combined with quasi-symmetry (the default), quasi-independence, or symmetric (homogeneous) RC(M) associations, with or without layer effect.
hmskewL(tab, nd.symm = NA, layer.effect.skew = c("homogeneous.scores", "heterogeneous", "none"), layer.effect.symm = c("heterogeneous", "uniform", "regression.type", "homogeneous.scores", "none"), diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
hmskewL(tab, nd.symm = NA, layer.effect.skew = c("homogeneous.scores", "heterogeneous", "none"), layer.effect.symm = c("heterogeneous", "uniform", "regression.type", "homogeneous.scores", "none"), diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
tab |
a three-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above three will be collapsed. First two dimensions must be symmetric (i.e. of the same length). |
nd.symm |
the number of dimensions to include in the symmetric RC(M) association. Cannot exceed
|
layer.effect.skew |
determines the form of the interaction between skew-symmetric association and layers. See “Details” below. |
layer.effect.symm |
determines the form of the interaction between symmetric row-column association,
or quasi-symmetric association (if |
diagonal |
what type of diagonal-specific parameters to include in the model, if any. Only makes sense
when |
weighting |
what weights should be used when normalizing the scores. |
se |
which method to use to compute standard errors for parameters. |
nreplicates |
the number of bootstrap replicates, if enabled. |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
This model follows an equation inspired from that presented by van der Heijden & Mooijaart (1995) for two-way tables
(see hmskew
):
where is the expected frequency for the cell at the intersection of row i, column j and layer k of
tab
, and a quasi-symmetric specification, with either full interaction parameters, or
a RC(M) association. See reference for detailed information about the degrees of freedom and the identification
constraints applied to the scores.
If layer.effect.skew
is set to ‘heterogeneous’, different scores will be computed for each level,
which is equivalent to fitting separate models using hmskew
on the k two-way tables.
If it is set to ‘homogeneous.scores’, then and
for all
layers k: only the
are allowed to vary across layers. If it is set to ‘none’, then in addition
to the previous conditions all
are forced to be equal for all layers k, which amounts to a stability
of the association across layers.
When nd.symm
is different from NA
, the symmetric association works exactly like a call to rcL
,
with parameters nd.symm
and layer.effect.symm
translated respectively to nd
and
layer.effect
. When nd.symm == NA
, symmetric association parameters are either stable across layers,
are multiplied by a layer coefficient (UNIDIFF model, see unidiff
), follow a regression-type
(Goodman-Hout) specification, or are different for each layer, when layer.effect.symm
is respectively
none
, uniform
, regression.type
and heterogeneous
.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for skew association parameters are computed using an eigen value decomposition from the
results of the model without skew association component (“base model”); if nd.symm
is not NA
and
strictly positive, random starting values are used. In some complex cases, using start = NULL
to start with
random values can be more efficient, but it is also less stable and can converge to non-optimal solutions.
A hmskewL
object, which is a subclass of an rcL.symm
object (see rcL
) if
nd.symm
is strictly positive. In addition to this class, it contains a assoc.hmskew
component
holding information about the skew-symmetric association:
phi |
The intrisic association parameters, one per dimension and per layer. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
weighting |
The name of the weighting method used, reflected by |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
covmat |
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if |
adj.covmats |
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if |
covtype |
The method used to compute the variance-covariance matrix (corresponding to the
|
Milan Bouchet-Valat
van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.
Compute the intrisic association coefficient of a table.
This coefficient was first devised by Goodman (1996) as the
“generalized contingency” when a logarithm link is used, and it is equal
to the standard deviation of the log-linear two-way interaction parameters
.
To obtain the Altham index, multiply the result by
sqrt(nrow(tab) * ncol(tab)) * 2
(see “Examples” below).
iac(tab, cell = FALSE, weighting = c("marginal", "uniform", "none"), component = c("total", "symmetric", "antisymmetric"), shrink = FALSE, normalize = FALSE, row.weights = NULL, col.weights = NULL)
iac(tab, cell = FALSE, weighting = c("marginal", "uniform", "none"), component = c("total", "symmetric", "antisymmetric"), shrink = FALSE, normalize = FALSE, row.weights = NULL, col.weights = NULL)
tab |
a two- or three-way table without zero cells; for three-way tables, average marginal weighting is used when “weighting = "marginal"”, and the MAOR is computed for each layer (third dimension). |
cell |
if “TRUE”, return the per-cell contributions
(affected by the value of |
weighting |
what weights should be used when normalizing the scores. |
component |
whether to compute the total association, or from symmetric or antisymmetric interaction coefficients only. |
shrink |
whether to use the empirical Bayes shrinkage estimator proposed by Zhou (2015) rather than the direct estimator. |
normalize |
whether to return the normalized version of the index varying between 0 and 1 proposed by Bouchet-Valat (2022) rather than the classic index varying between 0 and positive infinity. |
row.weights |
optional custom weights to be used for rows, e.g. to compute
the phi coefficient for several tables using their overall marginal
distribution. If specified, |
col.weights |
see |
See Goodman (1996), Equation 52 for the (marginal or other) weighted version of
the intrinsic association coefficient (); the unweighted version can be
computed with unit weights. The coefficient should not be confused with Goodman and Kruskal's lambda coefficient.
The uniform-weighted version is defined as:
The (marginal or other) weighted version is defined as:
with and
.
The normalized version of the index is defined from and
as:
Per-cell contributions are defined so that:
. For the unweighted case,
; for the weighted case,
.
This index cannot be computed in the presence of zero cells since it is based on the logarithm of proportions. In these cases, 0.5 is added to all cells of the table (Agresti 2002, sec. 9.8.7, p. 397; Berkson 1955), and a warning is printed. Make sure this correction does not affect too much the results (especially with small samples) by manually adding different values before calling this function.
The numeric value of the intrinsic association coefficient (if cell = FALSE
),
or the corresponding per-cell contributions (if cell = TRUE
).
Milan Bouchet-Valat
Agresti, A. 2002. Categorical Data Analysis. New York: Wiley.
Altham, P. M. E., Ferrie J. P., 2007. Comparing Contingency Tables Tools for Analyzing Data from Two Groups Cross-Classified by Two Characteristics. Historical Methods 40(1):3-16.
Bouchet-Valat, M. (2022). General Marginal-free Association Indices for Contingency Tables: From the Altham Index to the Intrinsic Association Coefficient. Sociological Methods & Research 51(1): 203-236.
Berkson, J. (1955). Maximum Likelihood and Minimum chi2 Estimates of the Logistic Function. J. of the Am. Stat. Ass. 50(269):130-162.
Goodman, L. A. (1996). A Single General Method for the Analysis of Cross-Classified Data: Reconciliation and Synthesis of Some Methods of Pearson, Yule, and Fisher, and Also Some Methods of Correspondence Analysis and Association Analysis. J. of the Am. Stat. Ass. 91(433):408-428.
Zhou, X. (2015). Shrinkage Estimation of Log-Odds Ratios for Comparing Mobility Tables. Sociological Methodology 45(1):33-63.
# Altham index (Altham and Ferrie, 2007, Table 1, p. 3 and commentary p. 8) tab1 <- matrix(c(260, 195, 158, 70, 715, 3245, 874, 664, 424, 454, 751, 246, 142, 247, 327, 228), 4, 4) iac(tab1, weighting="n") * sqrt(nrow(tab1) * ncol(tab1)) * 2 # Zhou (2015) data(hg16) # Add 0.5 due to the presence of zero cells hg16 <- hg16 + 0.5 # Figure 3, p. 343: left column then right column # (reported values are actually twice the Altham index) iac(hg16, weighting="n") * sqrt(nrow(hg16) * ncol(hg16)) * 2 * 2 iac(hg16, weighting="n", shrink=TRUE) * sqrt(nrow(hg16) * ncol(hg16)) * 2 * 2 # Table 4, p. 347: values are not exactly the same u <- unidiff(hg16) # First row cor(u$unidiff$layer$qvframe$estimate, iac(hg16, weighting="n")) cor(u$unidiff$layer$qvframe$estimate, iac(hg16, weighting="n"), method="spearman") # Second row cor(u$unidiff$layer$qvframe$estimate, iac(hg16, shrink=TRUE, weighting="n")) cor(u$unidiff$layer$qvframe$estimate, iac(hg16, shrink=TRUE, weighting="n"), method="spearman")
# Altham index (Altham and Ferrie, 2007, Table 1, p. 3 and commentary p. 8) tab1 <- matrix(c(260, 195, 158, 70, 715, 3245, 874, 664, 424, 454, 751, 246, 142, 247, 327, 228), 4, 4) iac(tab1, weighting="n") * sqrt(nrow(tab1) * ncol(tab1)) * 2 # Zhou (2015) data(hg16) # Add 0.5 due to the presence of zero cells hg16 <- hg16 + 0.5 # Figure 3, p. 343: left column then right column # (reported values are actually twice the Altham index) iac(hg16, weighting="n") * sqrt(nrow(hg16) * ncol(hg16)) * 2 * 2 iac(hg16, weighting="n", shrink=TRUE) * sqrt(nrow(hg16) * ncol(hg16)) * 2 * 2 # Table 4, p. 347: values are not exactly the same u <- unidiff(hg16) # First row cor(u$unidiff$layer$qvframe$estimate, iac(hg16, weighting="n")) cor(u$unidiff$layer$qvframe$estimate, iac(hg16, weighting="n"), method="spearman") # Second row cor(u$unidiff$layer$qvframe$estimate, iac(hg16, shrink=TRUE, weighting="n")) cor(u$unidiff$layer$qvframe$estimate, iac(hg16, shrink=TRUE, weighting="n"), method="spearman")
Compute the mean absolute odds ratio of a table, i.e. the (possibly weighted)
geometric mean of the odds ratios or of their inverse when they are above one,
which is also closely related to the the intrinsic association coefficient.
The latter coefficient was first devised by Goodman (1996) as the
“generalized contingency” when a logarithm link is used, and it is equal
to the mean of the absolute value of log-linear two-way interaction parameters
(in its original version it consists in the square root of
the sum of squared parameters).
maor(tab, phi = FALSE, cell = FALSE, weighting = c("marginal", "uniform", "none"), norm = 2, component=c("total", "symmetric", "antisymmetric"), row.weights = NULL, col.weights = NULL)
maor(tab, phi = FALSE, cell = FALSE, weighting = c("marginal", "uniform", "none"), norm = 2, component=c("total", "symmetric", "antisymmetric"), row.weights = NULL, col.weights = NULL)
tab |
a two- or three-way table without zero cells; for three-way tables, average marginal weighting is used when “weighting = "marginal"”, and the MAOR is computed for each layer (third dimension). |
phi |
if “TRUE”, return the intrinsic association coefficient rather than the Mean absolute odds ratio. |
cell |
if “TRUE”, return the per-cell contributions
(affected by the value of |
weighting |
what weights should be used when normalizing the scores. |
norm |
the norm to use to compute the mean of |
component |
whether to compute the total association, or from symmetric or antisymmetric interaction coefficients only. |
row.weights |
optional custom weights to be used for rows, e.g. to compute
the phi coefficient for several tables using their overall marginal
distribution. If specified, |
col.weights |
see |
See Goodman (1996), Equation 52 for the (marginal or other) weighted version of
the intrinsic association coefficient (); the unweighted version can be
computed with unit weights. The coefficient is called
in the original
article, but to avoid the confusion with Goodman and Kruskal's lambda coefficient,
it is here denoted as
, as usual in row-column association models.
The uniform-weighted version is defined as:
The (marginal or other) weighted version is defined as:
with and
.
The uniform-weighted version of the mean absolute odds ratio (MAOR) is defined as:
The (marginal or other) weighted version is defined as:
Per-cell contributions are defined so that
and .
This index cannot be computed in the presence of zero cells since it is based on the logarithm of proportions. In these cases, 0.5 is added to all cells of the table (Agresti 2002, sec. 9.8.7, p. 397; Berkson 1955), and a warning is printed. Make sure this correction does not affect too much the results (especially with small samples) by manually adding different values before calling this function.
The numeric value of the mean absolute odds ratio, or of the intrinsic association
coefficient (if phi = TRUE
), or the corresponding per-cell contributions
(if cell = TRUE
).
Milan Bouchet-Valat
Agresti, A. 2002. Categorical Data Analysis. New York: Wiley.
Goodman, L. A. (1996). A Single General Method for the Analysis of Cross-Classified Data: Reconciliation and Synthesis of Some Methods of Pearson, Yule, and Fisher, and Also Some Methods of Correspondence Analysis and Association Analysis. J. of the Am. Stat. Ass. 91(433):408-428.
Berkson, J. (1955). Maximum Likelihood and Minimum chi2 Estimates of the Logistic Function. J. of the Am. Stat. Ass. 50(269):130-162.
Mobility table for the United States from the 1973 Occupational Changes in a Generation (OCG-II) survey. This table has been used by Yamaguchi (1987, 1990), Xie (1992) and van der Heijden & Mooijaart (1995).
data(ocg1973)
data(ocg1973)
Yamaguchi, K. (1990). Some Models for the Analysis of Asymmetric Association in Square Contingency Tables with Ordered Categories. Sociological Methodology 20, 181-212.
van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.
## see ?yrcskew, ?hmskew and ?plot.hmskew
## see ?yrcskew, ?hmskew and ?plot.hmskew
Graphical display of category scores from association models.
## S3 method for class 'rc' plot(x, dim = c(1, 2), what = c("both", "rows", "columns"), which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = c("blue", "red"), col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'rc.symm' plot(x, dim = c(1, 2), which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'hmskew' plot(x, dim = c(1, 2), what = c("skew-symmetric", "symmetric"), which = NULL, mass = TRUE, luminosity = length(x$assoc.hmskew$diagonal > 0), arrow = 45, conf.int = NA, replicates = FALSE, coords = c("polar", "cartesian"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'yrcskew' plot(x, dim = c(1, 2), what = c("skew-symmetric", "symmetric"), which = NULL, mass = TRUE, luminosity = length(x$assoc.yrcskew$diagonal > 0), arrow = 45, conf.int = NA, replicates = FALSE, coords = c("polar", "cartesian"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'rcL' plot(x, dim = c(1, 2), layer = "average", what = c("both", "rows", "columns"), which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = c("blue", "red"), col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'rcL.symm' plot(x, dim = c(1, 2), layer = "average", which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'hmskewL' plot(x, dim = c(1, 2), layer = "average", what = c("skew-symmetric", "symmetric"), which = NULL, mass = TRUE, luminosity = length(x$assoc.hmskew$diagonal > 0), arrow=45, conf.int = NA, replicates = FALSE, coords = c("polar", "cartesian"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'assoc' plot(x, dim = c(1, 2), layer = 1, what = c("both", "rows", "columns"), which = NULL, mass = TRUE, luminosity = length(x$diagonal > 0), arrow = NULL, conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = c("blue", "red"), col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...)
## S3 method for class 'rc' plot(x, dim = c(1, 2), what = c("both", "rows", "columns"), which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = c("blue", "red"), col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'rc.symm' plot(x, dim = c(1, 2), which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'hmskew' plot(x, dim = c(1, 2), what = c("skew-symmetric", "symmetric"), which = NULL, mass = TRUE, luminosity = length(x$assoc.hmskew$diagonal > 0), arrow = 45, conf.int = NA, replicates = FALSE, coords = c("polar", "cartesian"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'yrcskew' plot(x, dim = c(1, 2), what = c("skew-symmetric", "symmetric"), which = NULL, mass = TRUE, luminosity = length(x$assoc.yrcskew$diagonal > 0), arrow = 45, conf.int = NA, replicates = FALSE, coords = c("polar", "cartesian"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'rcL' plot(x, dim = c(1, 2), layer = "average", what = c("both", "rows", "columns"), which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = c("blue", "red"), col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'rcL.symm' plot(x, dim = c(1, 2), layer = "average", which = NULL, mass = TRUE, luminosity = length(x$assoc$diagonal > 0), conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'hmskewL' plot(x, dim = c(1, 2), layer = "average", what = c("skew-symmetric", "symmetric"), which = NULL, mass = TRUE, luminosity = length(x$assoc.hmskew$diagonal > 0), arrow=45, conf.int = NA, replicates = FALSE, coords = c("polar", "cartesian"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = "blue", col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...) ## S3 method for class 'assoc' plot(x, dim = c(1, 2), layer = 1, what = c("both", "rows", "columns"), which = NULL, mass = TRUE, luminosity = length(x$diagonal > 0), arrow = NULL, conf.int = NA, replicates = FALSE, coords = c("cartesian", "polar"), rev.axes = c(FALSE, FALSE), cex = par("cex"), col = c("blue", "red"), col.conf.int = col, groups = NULL, add = FALSE, type, xlim, ylim, asp, xlab, ylab, main, pch, font, ...)
x |
an association model, or an object inheriting from class |
dim |
numerical vector of length 2 indicating the dimensions to plot on horizontal and vertical axes respectively; default is first dimension horizontal and second dimension vertical. |
layer |
integer indicating which layer should be represented, or “average” or “average.rotate” when scores are homogeneous (see “Details” below). |
what |
for |
which |
an optional subset of points to be plotted, specified via a logical, integer or character
vector indexing the row or column component of the association object; if |
mass |
whether the size of the point symbols should reflect on the mass of the categories;
this only makes sense when marginal weights were used when fitting the model. The precise
formula is that the |
luminosity |
whether the luminosity of the symbols should vary according to the diagonal-specific
parameters (if |
arrow |
Numeric value indicating the angle at which the polar coordinates system arrow should be plotted; use NULL for no arrow. |
conf.int |
an integer between 0 and 1 giving the confidence level to use when drawing error bars/ellipses around the points (see “Details” below); by default nothing is plotted. Only possible if jackknife or bootstrap was enabled when fitting the model. |
replicates |
whether to plot points representing the values of the scores obtained for all of the jackknife of bootrap replicates, when applicable (see “Details” below). |
coords |
whether to use a Cartesian or a polar coordinate system; the former makes sense when axes
offer an interpretation (like in RC(M) models), while the latter are more appropriate when
only the angle and distance to origin are of interest (like in |
rev.axes |
a numeric of length 1 or 2 indicating whether the sign of scores on the axes should be changed; as this sign is arbitrary in RC(M) models, changing it has no incidence on the results and may be more intuitive or consistent with other presentations. |
cex |
a numeric vector indicating the size of the point symbols, recycled as necessary; the most common
choice is probably to pass only one value and use |
col |
a vector indicating the color of the point symbols, recycled as necessary; as a special case, a
vector of length 2 can be passed, to indicate the color of row and column points, respectively.
See also |
col.conf.int |
a vector indicating the color of the confidence bars/ellipses, when these are plotted; see
|
groups |
a vector indicating what symbol should be used for each point, recycled as necessary; groups will
use |
add |
whether to draw over an existing plot instead of creating a new one. |
type |
set to "n" to avoid actually plotting the points and labels; useful for customization based on the returned coordinates, see “Value” below. |
xlim |
numeric vector of length 2, giving the x coordinates range. |
ylim |
numeric vector of length 2, giving the y coordinates range. |
asp |
the y/x aspect ratio, see |
xlab |
a title for the x axis: see |
ylab |
a title for the y axis: see |
main |
an overall title for the plot: see |
pch |
a vector of plotting ‘character’, i.e., symbol to use for each point, recycled as necessary;
see |
font |
an integer vector indicating the font to use for each label, recycled as necessary;
see |
... |
Further arguments passed to |
The functions documented here represent in a one- or two-dimensional space the category scores obtained from a log-multiplicative association model. They produce symmetric biplots in which the coordinates of points on both axes are the product of normalized scores and of the square root of the intrinsic association coefficient corresponding to each dimension: thus, row and column points share the same “unit” on all axes (Goodman, 1991, Appendix 2; Wong, 2010, eq. 2.38; Clogg & Shihadeh, 1994, p. 91-92). As a special case, models with only one dimension are presented as a dotchart of the scores.
Various convenience options are provided, with reasonable defaults for each model family. In particular,
you may find it necessary to adapt the cex
, mass
, luminosity
and groups
arguments
depending on the number of categories to be plotted and to their respective weights. When plotting a RC(2) model,
a polar coordinate system can be of substantive interest, allowing to interpret at a glance the distance to
origin as the general strength of the association for a category on both axes (a property that is lost for
higher-dimensional models).
Confidence bars/ellipses are computed from the scores' variances and covariances, based on the assumption that they are
follow a normal distribution, even if standard errors are computed using jackknife or bootstrap. When bootstrap (not
jackknife) was used, this normality assumption can be assessed visually using the replicates
argument to check
whether points globally follow the shape of the ellipses. See se.assoc
for details about checking the
validity of jackknife or bootstrap results.
When layer
is set to “average” for models with layer effect and homogeneous scores, intrinsic association
coefficients are weighted across all layers. In addition, if layer
is set to “average.rotate”, scores are
rotated so that axes of the plot are those with the highest variance; oblique axes represent the original dimensions
in the new space.
The plot.assoc
function is called internally by all others, and may be leveraged for advanced use cases,
like plotting custom models that do not correspond stricly to the supported types.
An invisible list with components row
and col
, two matrices containing the coordinates of the
plotted points (NULL
when not plotted).
For RC(M) models:
Goodman, L.A. (1991). Measures, Models, and Graphical Displays in the Analysis of Cross-Classified Data. J. of the Am. Stat. Association 86(416), 1085-1111.
Clogg, C.C., and Shihadeh, E.S. (1994). Statistical Models for Ordinal Variables. Sage: Advanced Quantitative Techniques in the Social Sciences (4).
Wong, R.S-K. (2010). Association models. Sage: Quantitative Applications in the Social Sciences (164).
For van der Heijden & Mooijaart models:
van der Heijden, P.G.M., and Mooijaart, A. (1995). Some new log bilinear models for the analysis of asymmetry in a square contingency table. Sociol. Methods and Research 24, 7-29.
rc
, rcL
, rcL.trans
, hmskew
,
hmskewL
, yrcskew
## Wong (2010), Figures 2.2 and 2.3 (p. 50-51) data(gss8590) ## Not run: model <- rc(margin.table(gss8590[,,c(2,4)], 1:2), nd=2, weighting="none", se="jackknife") plot(model, what="row", rev.axes=c(TRUE, FALSE), conf.int=0.95) plot(model, what="col", rev.axes=c(TRUE, FALSE), conf.int=0.95) ## End(Not run) ## Wong (2010), Figures 4.1 and 4.2 (p. 108-109) data(gss7590) model <- rcL(gss7590, nd=2, weighting="none") opar <- par(mfrow=c(2, 2)) for(i in 1:4) plot(model, layer=i, what="rows", rev.axes=c(TRUE, FALSE), main=rownames(model$assoc$phi)[i], xlim=c(-1.2, 1.2), ylim=c(-1.2, 1.2)) par(mfrow=c(2, 2)) for(i in 1:4) plot(model, layer=i, what="col", rev.axes=c(TRUE, FALSE), main=rownames(model$assoc$phi)[i], xlim=c(-1.4, 1.4), ylim=c(-1.2, 1.2)) par(opar) ## van der Heijden & Mooijaart (1995), Figure 1c (p. 23) data(ocg1973) # 5:1 is here to take "Farmers" as reference category (angle 0) model <- hmskew(ocg1973[5:1, 5:1], weighting="uniform") # Reproduce the plot from the original article plot(model, coords="cartesian") # Use a polar coordinates system, which makes more sense in this setting plot(model)
## Wong (2010), Figures 2.2 and 2.3 (p. 50-51) data(gss8590) ## Not run: model <- rc(margin.table(gss8590[,,c(2,4)], 1:2), nd=2, weighting="none", se="jackknife") plot(model, what="row", rev.axes=c(TRUE, FALSE), conf.int=0.95) plot(model, what="col", rev.axes=c(TRUE, FALSE), conf.int=0.95) ## End(Not run) ## Wong (2010), Figures 4.1 and 4.2 (p. 108-109) data(gss7590) model <- rcL(gss7590, nd=2, weighting="none") opar <- par(mfrow=c(2, 2)) for(i in 1:4) plot(model, layer=i, what="rows", rev.axes=c(TRUE, FALSE), main=rownames(model$assoc$phi)[i], xlim=c(-1.2, 1.2), ylim=c(-1.2, 1.2)) par(mfrow=c(2, 2)) for(i in 1:4) plot(model, layer=i, what="col", rev.axes=c(TRUE, FALSE), main=rownames(model$assoc$phi)[i], xlim=c(-1.4, 1.4), ylim=c(-1.2, 1.2)) par(opar) ## van der Heijden & Mooijaart (1995), Figure 1c (p. 23) data(ocg1973) # 5:1 is here to take "Farmers" as reference category (angle 0) model <- hmskew(ocg1973[5:1, 5:1], weighting="uniform") # Reproduce the plot from the original article plot(model, coords="cartesian") # Use a polar coordinates system, which makes more sense in this setting plot(model)
Plots the layer coefficient estimates from a UNIDIFF model, together with confidence bars based on quasi-standard errors or “traditional” standard errors.
## S3 method for class 'unidiff' plot(x, what = c("layer.coef", "phi", "maor"), se.type = c("quasi.se", "se"), conf.int = 0.95, numeric.auto = TRUE, type = "p", xlab = names(dimnames(x$data))[3], ylab = NULL, add = FALSE, ylim, ...)
## S3 method for class 'unidiff' plot(x, what = c("layer.coef", "phi", "maor"), se.type = c("quasi.se", "se"), conf.int = 0.95, numeric.auto = TRUE, type = "p", xlab = names(dimnames(x$data))[3], ylab = NULL, add = FALSE, ylim, ...)
x |
an object resulting from a call to |
what |
“layer.coefficient” to plot the layer coefficients in the log odds ratio
scale, with a reference of 1 for the first layer; “phi” to plot the intrinsic
association coefficient (on the log odds ratio scale); “maor” to plot the
mean absolute odds ratio (see |
se.type |
whether to use quasi-standard errors or “traditional” standard errors to compute confidence intervals. |
conf.int |
the confidence level to retain for confidence bars. |
numeric.auto |
whether layer names should be converted to numeric values when possible (see “Details” below). |
type |
what type of plot should be drawn: see |
xlab |
a title for the x axis: see see |
ylab |
a title for the y axis: see see |
add |
whether to create a new plot using |
ylim |
the y limits of the plot. |
... |
Further arguments passed to |
If numeric.auto = TRUE
and layer names (issued from the dimnames
of the third
dimension of the original table) can be converted to numeric (i.e. they consist of figures),
the position of points on the x axis will be determined by the value of the name. This makes most
sense when layers represent years, especially when they are not regularly spaced. If this behaviour
is disabled, layers will be placed regularly on the x axis, disregarding their possible interpretation
as numeric values.
Milan Bouchet-Valat
# See ?unidiff
# See ?unidiff
Adjust a table by multiplying rows and columns in order to reproduce the provided margins, preserving all the odds ratios. This procedure is know as the RAS or Deming-Stephan algorithm, as iterative proportional fitting (IPF) or as biproportional fitting.
ras(tab, row, col, tolerance = .Machine$double.eps)
ras(tab, row, col, tolerance = .Machine$double.eps)
tab |
a two-way table with only positive or zero entries |
row |
a vector of strictly positive elements containing the wanted row margins (sums). |
col |
a vector of strictly positive elements containing the wanted column margins (sums). |
tolerance |
the convergence criterion to stop iterating. |
Note that sum(row)
must be equal to sum(col)
for the algorithm to make sense.
The adjusted table with row sums equal to row
and column sums equal to col
.
Milan Bouchet-Valat
Fit log-multiplicative row-column association models, also called RC(M) models or Goodman's (1979) Model II, with one or several dimensions. Supported variants (for square tables) include symmetric (homogeneous) row and column scores, possibly combined with separate diagonal parameters.
rc(tab, nd = 1, symmetric = FALSE, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
rc(tab, nd = 1, symmetric = FALSE, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
tab |
a two-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above two will be collapsed. |
nd |
the number of dimensions to include in the model. Cannot exceed
|
symmetric |
should row and column scores be constrained to be equal? Valid only for square tables. |
diagonal |
should the model include parameters specific to each diagonal cell? This amounts to taking quasi-independence, rather than independence, as the baseline model. Valid only for square tables. |
weighting |
what weights should be used when normalizing the scores. |
rowsup |
if present, a matrix with the same columns as |
colsup |
if present, a matrix with the same rows as |
se |
which method to use to compute standard errors for parameters (see |
nreplicates |
the number of bootstrap replicates, if enabled. |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
This function fits log-multiplicative row-column association models, usually called (after Goodman) RC(M) models, typically following the equation:
where is the expected frequency for the cell at the intersection of row i and column j of
tab
, and M the number of dimensions. See references for detailed information about the
variants of the model, the degrees of freedom and the identification constraints applied to the scores.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for association parameters are computed using a singular/eigen value decomposition
from the results of the model without association component (“base model”). In some complex cases, using
start = NULL
to start with random values can be more efficient, but it is also less stable and can converge
to non-optimal solutions.
A rc
object, with all the components of a gnm
object, plus an
assoc.rc
component holding the most relevant association information:
phi |
The intrisic association parameters, one per dimension. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
weighting |
The name of the weighting method used, reflected by |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
covmat |
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if |
adj.covmats |
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if |
covtype |
The method used to compute the variance-covariance matrix (corresponding to the
|
Milan Bouchet-Valat
Goodman, L.A. (1979). Simple Models for the Analysis of Association in Cross-Classifications having Ordered Categories. J. of the Am. Stat. Association 74(367), 537-552.
Becker, M.P., and Clogg, C.C. (1989). Analysis of Sets of Two-Way Contingency Tables Using Association Models. Journal of the American Statistical Association 84(405), 142-151.
Goodman, L.A. (1985). The Analysis of Cross-Classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing Entries. The Annals of Statistics 13(1), 10-69.
Goodman, L.A. (1991). Measures, Models, and Graphical Displays in the Analysis of Cross-Classified Data. J. of the Am. Stat. Association 86(416), 1085-1111.
Clogg, C.C., and Shihadeh, E.S. (1994). Statistical Models for Ordinal Variables. Sage: Advanced Quantitative Techniques in the Social Sciences (4).
Wong, R.S-K. (2010). Association models. SAGE: Quantitative Applications in the Social Sciences.
## Goodman (1991), Table 17.1 (p. 1097) data(criminal) model <- rc(criminal) model$assoc # These are the phi (.07), mu and nu model$assoc$row[,1,1] * model$assoc$phi[1,1] # These are the mu' model$assoc$col[,1,1] * model$assoc$phi[1,1] # These are the nu' ## Becker & Clogg (1989), Table 5 (p. 145) # See also ?rcL to run all models in one call ## Not run: data(color) # "Uniform weights" in the authors' terms mean "no weighting" for us # See ?rcL for average marginals caithness.unweighted <- rc(color[,,1], nd=2, weighting="none", se="jackknife") caithness.marginal <- rc(color[,,1], nd=2, weighting="marginal", se="jackknife") aberdeen.unweighted <- rc(color[,,2], nd=2, weighting="none", se="jackknife") aberdeen.marginal <- rc(color[,,2], nd=2, weighting="marginal", se="jackknife") caithness.unweighted caithness.marginal aberdeen.unweighted aberdeen.marginal # To see standard errors, either: se(caithness.unweighted) # and so on... # (ours are much smaller for the marginal-weighted case) # Or: summary(caithness.unweighted) ## End(Not run) ## Clogg & Shihadeh (1994), Tables 3.5a and b (p. 55-61) data(gss88) model <- rc(gss88) # Unweighted scores summary(model, weighting="none") # Marginally weighted scores summary(model, weighting="marginal") # Uniformly weighted scores summary(model, weighting="uniform") ## Wong (2010), Table 2.7 (p. 48-49) ## Not run: data(gss8590) # The table used in Wong (2001) is not perfectly consistent # with that of Wong (2010) tab <- margin.table(gss8590[,,c(2,4)], 1:2) tab[2,4] <- 49 model <- rc(tab, nd=2, weighting="none", se="jackknife") model summary(model) # Jackknife standard errors are slightly different # from their asymptotic counterparts # Compare with bootstrap standard errors model2 <- rc(tab, nd=2, weighting="none", se="bootstrap") plot(model, conf.int=0.95) summary(model2) ## End(Not run)
## Goodman (1991), Table 17.1 (p. 1097) data(criminal) model <- rc(criminal) model$assoc # These are the phi (.07), mu and nu model$assoc$row[,1,1] * model$assoc$phi[1,1] # These are the mu' model$assoc$col[,1,1] * model$assoc$phi[1,1] # These are the nu' ## Becker & Clogg (1989), Table 5 (p. 145) # See also ?rcL to run all models in one call ## Not run: data(color) # "Uniform weights" in the authors' terms mean "no weighting" for us # See ?rcL for average marginals caithness.unweighted <- rc(color[,,1], nd=2, weighting="none", se="jackknife") caithness.marginal <- rc(color[,,1], nd=2, weighting="marginal", se="jackknife") aberdeen.unweighted <- rc(color[,,2], nd=2, weighting="none", se="jackknife") aberdeen.marginal <- rc(color[,,2], nd=2, weighting="marginal", se="jackknife") caithness.unweighted caithness.marginal aberdeen.unweighted aberdeen.marginal # To see standard errors, either: se(caithness.unweighted) # and so on... # (ours are much smaller for the marginal-weighted case) # Or: summary(caithness.unweighted) ## End(Not run) ## Clogg & Shihadeh (1994), Tables 3.5a and b (p. 55-61) data(gss88) model <- rc(gss88) # Unweighted scores summary(model, weighting="none") # Marginally weighted scores summary(model, weighting="marginal") # Uniformly weighted scores summary(model, weighting="uniform") ## Wong (2010), Table 2.7 (p. 48-49) ## Not run: data(gss8590) # The table used in Wong (2001) is not perfectly consistent # with that of Wong (2010) tab <- margin.table(gss8590[,,c(2,4)], 1:2) tab[2,4] <- 49 model <- rc(tab, nd=2, weighting="none", se="jackknife") model summary(model) # Jackknife standard errors are slightly different # from their asymptotic counterparts # Compare with bootstrap standard errors model2 <- rc(tab, nd=2, weighting="none", se="bootstrap") plot(model, conf.int=0.95) summary(model2) ## End(Not run)
Fit log-multiplicative row-column association models with layer effect, also called RC(M)-L models, with one or several dimensions. Supported variants include homogeneous or heterogeneous scores over the layer variable, and (for square tables) symmetric (homogeneous) row and column scores, possibly combined with separate diagonal parameters.
rcL(tab, nd = 1, layer.effect = c("homogeneous.scores", "heterogeneous", "none"), symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, eliminate=NULL, trace = FALSE, verbose = TRUE, ...)
rcL(tab, nd = 1, layer.effect = c("homogeneous.scores", "heterogeneous", "none"), symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, eliminate=NULL, trace = FALSE, verbose = TRUE, ...)
tab |
a three-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above three will be collapsed. |
nd |
the number of dimensions to include in the model. Cannot exceed
|
layer.effect |
determines the form of the interaction between row-column association and layers. See “Details” below. |
symmetric |
should row and column scores be constrained to be equal? Valid only for square tables. |
diagonal |
what type of diagonal-specific parameters to include in the model, if any. This amounts to taking quasi-conditional independence, rather than conditional independence, as the baseline model. Valid only for square tables. |
weighting |
what weights should be used when normalizing the scores. |
se |
which method to use to compute standard errors for parameters. |
nreplicates |
the number of bootstrap replicates, if enabled. |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
eliminate |
either |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
This function fits log-multiplicative row-column association models with layer effect, usually called (after Wong) RC(M)-L models, typically following the equation:
where is the expected frequency for the cell at the intersection of row i, column j and layer k of
tab
, and M the number of dimensions. If layer.effect
is set to ‘heterogeneous’, different scores
will be computed for each level, which is equivalent to fitting separate RC(M) models on the k two-way tables.
If it is set to ‘homogeneous.scores’, then and
for all
layers k: only the
are allowed to vary across layers. If it is set to ‘none’, then in addition
to the previous conditions all
are forced to be equal for all layers k, which amounts to a stability
of the association across layers. See references for detailed information about the variants of the model, the degrees
of freedom and the identification constraints applied to the scores.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for association parameters are computed using a singular/eigen value decomposition
from the results of the model without association component (“base model”). In some complex cases, using
start = NULL
to start with random values can be more efficient, but it is also less stable and can converge
to non-optimal solutions.
A rcL
object, with all the components of a gnm
object, plus an
assoc
component holding the most relevant association information:
phi |
The intrisic association parameters, one per dimension and per layer. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
weighting |
The name of the weighting method used, reflected by |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
covmat |
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if |
adj.covmats |
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if |
covtype |
The method used to compute the variance-covariance matrix (corresponding to the
|
Milan Bouchet-Valat
Wong, R.S-K. (2010). Association models. SAGE: Quantitative Applications in the Social Sciences.
## Becker & Clogg (1989), Table 5 (p. 145) # See also ?rc for more details ## Not run: data(color) # "Uniform weights" in the authors' terms mean "no weighting" for us, # and "average marginals" means "marginal" with rcL # See ?rc for "marginals" unweighted <- rcL(color, nd=2, weighting="none", layer.effect="heterogeneous", se="jackknife") marginal <- rcL(color, nd=2, weighting="marginal", layer.effect="heterogeneous", se="jackknife") unweighted marginal # (our standard errors are much smaller for the marginal-weighted case) summary(unweighted) summary(marginal) opar <- par(mfrow=c(1, 2)) plot(marginal, layer="Caithness", conf.int=0.95) plot(marginal, layer="Aberdeen", conf.int=0.95) par(opar) ## End(Not run) ## Wong (2010), Table 4.6 (p. 103), model 9 ## Not run: data(gss7590) model <- rcL(gss7590, nd=2, weighting="none", se="jackknife") model summary(model) # Jackknife standard errors are slightly different # from their asymptotic counterparts # See ?plot.rcL for plotting ## End(Not run)
## Becker & Clogg (1989), Table 5 (p. 145) # See also ?rc for more details ## Not run: data(color) # "Uniform weights" in the authors' terms mean "no weighting" for us, # and "average marginals" means "marginal" with rcL # See ?rc for "marginals" unweighted <- rcL(color, nd=2, weighting="none", layer.effect="heterogeneous", se="jackknife") marginal <- rcL(color, nd=2, weighting="marginal", layer.effect="heterogeneous", se="jackknife") unweighted marginal # (our standard errors are much smaller for the marginal-weighted case) summary(unweighted) summary(marginal) opar <- par(mfrow=c(1, 2)) plot(marginal, layer="Caithness", conf.int=0.95) plot(marginal, layer="Aberdeen", conf.int=0.95) par(opar) ## End(Not run) ## Wong (2010), Table 4.6 (p. 103), model 9 ## Not run: data(gss7590) model <- rcL(gss7590, nd=2, weighting="none", se="jackknife") model summary(model) # Jackknife standard errors are slightly different # from their asymptotic counterparts # See ?plot.rcL for plotting ## End(Not run)
Fit log-multiplicative row-column association models with transitional layer effect, which are related to the RC(M)-L model, with one or several dimensions. Supported variants include (for square tables) symmetric (homogeneous) row and column scores, possibly combined with separate diagonal parameters.
rcL.trans(tab, nd = 1, symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
rcL.trans(tab, nd = 1, symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
tab |
a three-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above three will be collapsed. |
nd |
the number of dimensions to include in the model. Cannot exceed
|
symmetric |
should row and column scores be constrained to be equal? Valid only for square tables. |
diagonal |
what type of diagonal-specific parameters to include in the model, if any. This amounts to taking quasi-conditional independence, rather than conditional independence, as the baseline model. Valid only for square tables. |
weighting |
what weights should be used when normalizing the scores. |
se |
which method to use to compute standard errors for parameters. |
nreplicates |
the number of bootstrap replicates, if enabled. |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
This function fits log-multiplicative row-column association models with regression-type layer effect which are
experimental models combining the principles behind RC(M)-L (Wong, 2010; see rcL
) and regression-type
models (Goodman & Hout, 1998). More specifically, like RC(M)-L models, row and column scores are allowed to vary across
a layer variable, and the pattern of this variation follows the regression-type inspiration: for each dimension, a set of
scores describes the first layer, another set describes the total variation of these scores need to describe the
association observed for the last layer, and one parameter per layer describes the position of the layer between the
first and the last layer. Compared with the RC(M)-L model with homogeneous scores across layers, this models allows
for a finer description of changes since the ordering and distances of categories on a dimension are allowed to vary,
and not only the general strength of the association. It is designed to describe transitions from one state to another,
and is best suited for ordered layer variables like time (though the model is not sensitive to reordering of the layers).
The general equation of the model is:
where is the expected frequency for the cell at the intersection of row i, column j and layer k of
tab
, and M the number of dimensions. The parameter is constrained to be positive, equal to 0
for the first layer (
), and equal to 1 for the last layer.
This model should not be confused with another combination of RC(M) models with the regression-type approach, presented by Goodman & Hout (1998:180), in which two separate RC(M) associations are used to describe respectively the stable and the varying components. In the present model, row and column scores for both components are summed before entering the multiplicative interaction, which means only one RC(M) association exists.
The returned object is a generic rcL
association model describing the fitted scores for each layer. To analyze
more specifically the variation of each (normalized) score from the first to the last layer, use:
model$assoc$row[,,dim(model$assoc$row)[3]] - model$assoc$row[,,1]
(and similarly for column scores).
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values are taken from a model with a stable RC(M) association (“base model”). In some
complex cases, using start = NULL
to get random starting values can be more efficient, but it is also
less stable and can converge to non-optimal solutions.
A rcL
object, with all the components of a gnm
object, plus an
assoc
component holding the most relevant association information:
phi |
The intrisic association parameters, one per dimension and per layer. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
weighting |
The name of the weighting method used, reflected by |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
covmat |
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if |
adj.covmats |
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if |
covtype |
The method used to compute the variance-covariance matrix (corresponding to the
|
Milan Bouchet-Valat
Goodman, L.A., and Hout, M. (1998). Statistical Methods and Graphical Displays for Analyzing How the Association Between Two Qualitative Variables Differs Among Countries, Among Groups, Or Over Time: A Modified Regression-Type Approach. Sociological Methodology 28(1), 175-230. Wong, R.S-K. (2010). Association models. SAGE: Quantitative Applications in the Social Sciences.
A function of class "nonlin"
to specify a log-multiplicative row-column association models with transitional
layer effect with one or several dimensions in the formula argument to gnm
. RCTransSymm
allows specifying a variant with symmetric (homogeneous) row and column scores.
RCTrans(row, col, layer, inst = NULL) RCTransSymm(row, col, layer, inst = NULL)
RCTrans(row, col, layer, inst = NULL) RCTransSymm(row, col, layer, inst = NULL)
row |
the levels of the row variable |
col |
the levels of the column variable |
layer |
the levels of the layer variable |
inst |
a positive integer specifying the instance number of the term |
This function is used by rcL.trans
to fit an experimental model.
RCTrans
combines its arguments in the following way:
where is the skew association for the cell at the intersection of row i, column j
and layer k of the table.
RCTransSymm
is similar, but forces and
(respectively
and
) to be equal for identical values of i (diagonal cells).
A list with the required components of a "nonlin"
function:
predictors |
the expressions passed to |
term |
a function to create a deparsed mathematical expression of the term, given labels for the predictors. |
call |
the call to use as a prefix for parameter labels. |
Milan Bouchet-Valat
Get standard errors for log-multiplicative association scores and intrinsic association coefficients.
se(x, ...) ## S3 method for class 'assoc' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'rc' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'hmskew' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'yrcskew' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'rcL' se(x, type = c("se", "quasi.se"), ...)
se(x, ...) ## S3 method for class 'assoc' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'rc' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'hmskew' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'yrcskew' se(x, type = c("se", "quasi.se"), ...) ## S3 method for class 'rcL' se(x, type = c("se", "quasi.se"), ...)
x |
an |
type |
the type of standard errors to be computed (see “Details” below). |
... |
currently unused. |
Currently, only jackknife or bootstrap standard errors are supported, depending on the se
argument
passed when fitting the model. Some care is needed before using such standard errors and confidence
intervals. First one must ensure all model replicates converged to a correct solution, especially for bootstrap;
second, when relying on normal confidence intervals computed from these standard errors, one must ensure that the
coefficients estimators follow a normal distribution. Both checks can be performed by calling
plot.boot
on the boot.results
component of the assoc
object of the models (not
supported for jackknife), with the index
argument identifying the coefficient of interest (call colnames
on the t
member of the boot.results
object to find out the index you need).
If outliers are present, standard errors and confidence intervals will be artificially large; to fix this, the
tolerance
argument must be set to a smaller value when fitting the models (which may in turn require
increasing the value of the iterMax
argument if convergence is too slow). Once outliers are removed, if
coefficient estimates are still not normally distributed, robust bootstrap confidence intervals can be computed
using boot.ci
on the same object, provided a large number of replicates (> 1000) were computed.
For each replicate, stable scores and intrinsic association coefficients are identified using an orthogonal
Procrustes analysis to suppress meaningless variations due to random reflections, permutations and rotations of
dimensions (Milan & Whittaker, 1995). For hmskew
and hmskewL
models, a rotation within
each pair of dimensions and a permutation of pairs of dimensions is performed, but no reflection as it would change
the sign of intrinsic association coefficients.
Quasi-standard errors are computed using qvcalc
. See the help page for this function for
details and references about them.
An object of the same form as the assoc
component of the model, but with standard errors rather than
the corresponding coefficients.
Milan Bouchet-Valat
Milan, L., and J. Whittaker (1995). Application of the Parametric Bootstrap to Models that Incorporate a Singular Value Decomposition. Journal of the Royal Statistical Society. Series C (Applied Statistics) 44(1), 31-49.
assoc
, rc
, hmskew
, hmskewL
, yrcskew
,
rcL
, rcL.trans
# See ?rc about Wong (2010)
# See ?rc about Wong (2010)
These functions print the summary of a list of models fitted using the anoas
function.
## S3 method for class 'anoas' summary(object, ...) ## S3 method for class 'anoas' print(x, ...) ## S3 method for class 'summary.anoas' print(x, digits = 1, nsmall = 2, scientific = FALSE, ...)
## S3 method for class 'anoas' summary(object, ...) ## S3 method for class 'anoas' print(x, ...) ## S3 method for class 'summary.anoas' print(x, digits = 1, nsmall = 2, scientific = FALSE, ...)
object |
an |
x |
an |
digits |
See ?format. |
nsmall |
See ?format. |
scientific |
See ?format. |
... |
more arguments to be passed to further methods (ignored by |
Contrary to most analyses of association in the literature, this function currently does not fit uniform association model (“U”), nor separate models with only row and column association (“R” and “C” models), nor log-linear row and column association models.
Currently, no significance test is performed on the models. Please note that it is not correct to test the one-dimension association model against the independence model.
A data.frame
with the following columns:
Res. Df |
the residual number of degrees of freedom of the model. |
Res. Dev |
the residual deviance of the model (likelihood ratio Chi-squared statistic, or L-squared). |
Dev. Indep. (%) |
the ratio of the residual deviance of the model over that of the independence model, times 100. This measures the share of departure from independence that cannot be explained using the number of dimensions of the model. |
Dissim. (%) |
the dissimilarity index of the model's fitted values with regard to the observed data. |
BIC |
the Bayesian Information Criterion for the model. |
AIC |
Akaike's An Information Criterion for the model. |
Deviance |
the reduction in deviance of the model compared to the previous one |
Df |
the reduction in the number of degrees of freedom of the model compared to the previous one. |
Milan Bouchet-Valat
summary
method for objects of class assocmod
, including rc
,
rcL
, rcL.trans
, hmskew
, hmskewL
and yrcskew
models.
## S3 method for class 'assocmod' summary(object, weighting, ...) ## S3 method for class 'summary.assocmod' print(x, digits = max(3, getOption("digits") - 4), ...)
## S3 method for class 'assocmod' summary(object, weighting, ...) ## S3 method for class 'summary.assocmod' print(x, digits = max(3, getOption("digits") - 4), ...)
object |
an association model of class |
x |
an object of class |
weighting |
what weights should be used when normalizing the scores. |
digits |
the number of siginificant digits to use when printing. |
... |
further arguments passed to |
print.summary.assocmod
prints the original call to assoc
; a
summary of the deviance residuals from the model fit; the coefficients of
interest of the model; the residual deviance; the residual degrees of freedom;
the Schwartz's Bayesian Information Criterion value; the Akaike's An Information
Criterion value.
Association coefficients are printed with their standard errors, p-values and significance
stars. The “Normalized” columns contains normalized scores, i.e. their (weighted)
sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional
correlation is null. For models with only one layer (rc
, hmskew
,
yrcskew
), adjusted scores are printed in the “Adjusted” column:
these correspond to normalized scores times the square root of the corresponding intrinsic
association parameter (phi).
p-values correspond to normalized scores, and are computed under the assumption that estimators
of coefficients are normally distributed, even if jackknife of bootstrap are used. See
se.assoc
for details about checking this assumption and the validity of jackknife
or bootstrap results.
Note that setting the weighting
argument to a value different from that used at the
time of the fit discards the computed standard errors, if any.
An object of class summary.assoc
, with the following components:
call |
the |
diagonal |
the |
deviance.resid |
the deviance residuals, see |
coefficients |
a matrix holding the association coefficients estimates, standard errors and p-values. |
diagonal |
a matrix holding the diagonal coefficients, if any. |
weighting |
the weigthing method used when normalizing the scores. |
deviance |
the |
chisq |
the Pearson Chi-squared statistic for the model fit. |
dissim |
the dissimilarity index for the model fit. |
df.residual |
the |
bic |
the value of the BIC for the model fit (contrary to the value reported by
|
aic |
the value of the AIC for the model fit (contrary to the value reported by
|
family |
the |
dispersion |
the estimated dispersion |
df |
a 3-vector of the rank of the model; the number of residual degrees of freedom; and number of unconstrained coefficients. |
Milan Bouchet-Valat
assoc
, plot.assoc
, rc
, rcL
,
rcL.trans
, hmskew
, hmskewL
, yrcskew
summary
method for objects of class unidiff
.
## S3 method for class 'unidiff' summary(object, ...) ## S3 method for class 'summary.unidiff' print(x, digits = max(3, getOption("digits") - 4), ...)
## S3 method for class 'unidiff' summary(object, ...) ## S3 method for class 'summary.unidiff' print(x, digits = max(3, getOption("digits") - 4), ...)
object |
an object resulting from a call to |
x |
an object of class |
digits |
the number of siginificant digits to use when printing. |
... |
further arguments passed to |
print.summary.unidiff
prints the original call to unidiff
; a
summary of the deviance residuals from the model fit; the coefficients of
interest of the model; the residual deviance; the residual degrees of freedom;
the Schwartz's Bayesian Information Criterion value; the Akaike's An Information
Criterion value.
Layer and two-way interaction coefficients are printed with their standard errors,
quasi-standard errors (see qvcalc
), p-values (based on standard
errors) and significance stars. Constrained coefficients have a value of 0 (by default),
and 0 standard errors, but still have quasi-standard errors.
An object of class summary.unidiff
, with the following components:
call |
the |
deviance.resid |
the deviance residuals, see |
layer |
a |
phi.layer |
a |
interaction |
a |
deviance |
the |
diagonal |
the |
weighting |
the |
chisq |
the Pearson Chi-squared statistic for the model fit. |
dissim |
the dissimilarity index for the model fit. |
df.residual |
the |
bic |
the value of the BIC for the model fit (contrary to the value reported by
|
aic |
the value of the AIC for the model fit (contrary to the value reported by
|
family |
the |
dispersion |
the estimated dispersion |
df |
a 3-vector of the rank of the model; the number of residual degrees of freedom; and number of unconstrained coefficients. |
Milan Bouchet-Valat
Fit association models to data from a complex survey design, with inverse-probability weighting and (optionally) standard errors based on replicate weights.
svyrc(formula, design, nd = 1, symmetric = FALSE, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyhmskew(formula, design, nd.symm = NA, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyyrcskew(formula, design, nd.symm = NA, nd.skew = 1, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NA, etastart = NULL, tolerance = 1e-8, iterMax = 15000, trace = FALSE, verbose = TRUE, ...) svyrcL(formula, design, nd = 1, layer.effect = c("homogeneous.scores", "heterogeneous", "none"), symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyrcL.trans(formula, design, nd = 1, symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyhmskewL(formula, design, nd.symm = NA, layer.effect.skew = c("homogeneous.scores", "heterogeneous", "none"), layer.effect.symm = c("heterogeneous", "uniform", "homogeneous.scores", "none"), diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
svyrc(formula, design, nd = 1, symmetric = FALSE, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyhmskew(formula, design, nd.symm = NA, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyyrcskew(formula, design, nd.symm = NA, nd.skew = 1, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), rowsup = NULL, colsup = NULL, Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NA, etastart = NULL, tolerance = 1e-8, iterMax = 15000, trace = FALSE, verbose = TRUE, ...) svyrcL(formula, design, nd = 1, layer.effect = c("homogeneous.scores", "heterogeneous", "none"), symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyrcL.trans(formula, design, nd = 1, symmetric = FALSE, diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...) svyhmskewL(formula, design, nd.symm = NA, layer.effect.skew = c("homogeneous.scores", "heterogeneous", "none"), layer.effect.symm = c("heterogeneous", "uniform", "homogeneous.scores", "none"), diagonal = c("none", "heterogeneous", "homogeneous"), weighting = c("marginal", "uniform", "none"), Ntotal = nrow(design), exclude = c(NA, NaN), se = c("none", "replicate"), ncpus = getOption("boot.ncpus"), family = quasipoisson, weights = NULL, start = NULL, etastart = NULL, tolerance = 1e-8, iterMax = 5000, trace = FALSE, verbose = TRUE, ...)
formula |
a formula specifying margins for the table (using ‘+’ only)
on which the model will be fitted (passed to
|
design |
a survey object; if |
nd |
the number of dimensions to include in the model. Cannot exceed
|
nd.symm |
the number of dimensions to include in the symmetric RC(M) association. Cannot exceed
|
nd.skew |
the number of dimensions to include in the skew-symmetric RC(M) association. |
layer.effect |
determines the form of the interaction between row-column association and layers. See “Details” below. |
layer.effect.skew |
determines the form of the interaction between skew-symmetric association and layers. See “Details” below. |
layer.effect.symm |
determines the form of the interaction between symmetric row-column association,
or quasi-symmetric association (if |
symmetric |
should row and column scores be constrained to be equal? Valid only for square tables. |
diagonal |
what type of diagonal-specific parameters to include in the model, if any. Only makes sense
when |
weighting |
what weights should be used when normalizing the scores. |
Ntotal |
sum of counts to normalize the table to (passed to |
exclude |
a vector of values to be exclude when building the table, passed to |
rowsup |
if present, a matrix with the same columns as |
colsup |
if present, a matrix with the same rows as |
se |
whether to compute replicate standard errors or not (only supported for |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
The model is fitted to a table with probabilities estimated by
svytable
and (when Ntotal = nrow(design)
)
with the sample size equal to the observed sample size, treating the
resulting table as if it came from iid multinomial sampling, as described
by Rao and Scott. This assumption affects the fit statistics but not parameter
point estimates.
Standard errors that do not rely on this assumption can be computed
by fitting the model using each series of replicate weights. If your
data does not come with replicate weights, use
as.svrepdesign
to create them first, and pass
the resulting svrepdesign
object via the
design
argument.
An assocmod
object whose exact class depends on the function called.
Note that printed fit statistics and degrees of freedom rely on the
iid assumption. This is also the case of the variance-covariance matrix
returned by the vcov.gnm
function.
Milan Bouchet-Valat
Rao, J.N.K., Scott, A.J. (1984). On Chi-squared Tests For Multiway Contingency Tables with Proportions Estimated From Survey Data. Annals of Statistics 12, 46-60.
rc
, hmskew
, yrcskew
, rcL
,
rcL.trans
, hmskewL
svytable
, svyloglin
,
svyglm
, as.svrepdesign
Fit association models to data from a complex survey design, with inverse-probability weighting and (optionally) standard errors based on replicate weights.
svygnm(formula, design, ...) ## S3 method for class 'svyrep.design' svygnm(formula, design, subset = NULL, data.fun = NULL, rescale = NULL, rho = NULL, return.replicates = FALSE, keep.weights = FALSE, na.action, eliminate, ncpus = getOption("boot.ncpus"), ...)
svygnm(formula, design, ...) ## S3 method for class 'svyrep.design' svygnm(formula, design, subset = NULL, data.fun = NULL, rescale = NULL, rho = NULL, return.replicates = FALSE, keep.weights = FALSE, na.action, eliminate, ncpus = getOption("boot.ncpus"), ...)
formula |
a symbolic description of the nonlinear predictor. |
design |
a survey object; if |
subset |
expression to select a subpopulation |
data.fun |
function called on each replicate to generate the |
rescale |
Rescaling of weights, to improve numerical stability. The default rescales weights to sum to the sample size. Use |
rho |
For replicate BRR designs, to specify the parameter for
Fay's variance method, giving weights of |
return.replicates |
return the replicates as a component of the result? |
keep.weights |
whether to save the weights in the |
na.action |
handling of NAs |
eliminate |
a factor to be included as the first term in the model.
|
ncpus |
the number of CPU cores to use to run replicates. Pass |
... |
more arguments to be passed to |
This function can be used in a similar way as svyglm
,
but for generalized nonlinear models. It computes standard errors
using replicates only (i.e. no asymptotic standard errors). If your
data does not come with replicate weights, use
as.svrepdesign
to create them first, and pass
the resulting svrepdesign
object via the
design
argument.
An svygnm
object.
Note that printed fit statistics and degrees of freedom rely on the
iid assumption. This is also the case of the variance-covariance matrix
returned by the vcov.gnm
function.
Milan Bouchet-Valat, based on the svyglm
function by Thomas Lumley
Rao, J.N.K., Scott, A.J. (1984). On Chi-squared Tests For Multiway Contingency Tables with Proportions Estimated From Survey Data. Annals of Statistics 12, 46-60.
library(survey) data(api) dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) rstrat<-as.svrepdesign(dstrat) glm.mod <- svyglm(api00~ell+meals+mobility, design=rstrat) gnm.mod <- svygnm(api00~ell+meals+mobility, design=rstrat, ncpus=1) # Both functions give the same result for GLMs summary(glm.mod) summary(gnm.mod) # GNM, can only be fitted with svygnm() summary(svygnm(api00~ell+meals+mobility, design=rstrat, family=poisson, ncpus=1))
library(survey) data(api) dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) rstrat<-as.svrepdesign(dstrat) glm.mod <- svyglm(api00~ell+meals+mobility, design=rstrat) gnm.mod <- svygnm(api00~ell+meals+mobility, design=rstrat, ncpus=1) # Both functions give the same result for GLMs summary(glm.mod) summary(gnm.mod) # GNM, can only be fitted with svygnm() summary(svygnm(api00~ell+meals+mobility, design=rstrat, family=poisson, ncpus=1))
Fit the log-multiplicative uniform difference model (UNIDIFF, see Erikson & Goldthorpe, 1992), also called the log-multiplicative layer effect model (Xie, 1992). For square tables, diagonal cells can be handled separately.
unidiff(tab, diagonal = c("included", "excluded", "only"), constrain = "auto", weighting = c("marginal", "uniform", "none"), norm = 2, family = poisson, tolerance = 1e-8, iterMax = 5000, eliminate=NULL, trace = FALSE, verbose = TRUE, checkEstimability = TRUE, ...)
unidiff(tab, diagonal = c("included", "excluded", "only"), constrain = "auto", weighting = c("marginal", "uniform", "none"), norm = 2, family = poisson, tolerance = 1e-8, iterMax = 5000, eliminate=NULL, trace = FALSE, verbose = TRUE, checkEstimability = TRUE, ...)
tab |
a three-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above three will be collapsed as appropriate. |
diagonal |
|
constrain |
(non-eliminated) coefficients to constrain, specified by a regular expression, a numeric vector of indices, a logical vector, a character vector of names, or "[?]" to select from a Tk dialog. The default constrains to 0 the first layer parameter and interaction coefficients for the first row and column of the table. |
weighting |
what weights should be used when normalizing coefficients. This does not affect
layer coefficients, which are set to 1 for the first layer, but only two-way interaction
coefficients and layer association levels, which are layer coefficients times the intrinsic
association coefficient (see |
norm |
the norm to use to compute the mean absolute odds ratio (see |
family |
a specification of the error distribution and link function to be used in the model.
This can be a character string naming a family function; a family function, or the result of
a call to a family function. See |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
eliminate |
either |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
checkEstimability |
a logical value indicating whether the estimability of the contrasts should
be checked via |
... |
more arguments to be passed to |
The equation of the fitted model is:
where is the expected frequency for the cell at the intersection of row i, column j and
layer k of
tab
. When diagonal = "excluded"
, parameters are added
but set to 0 when
(off-diagonal). When
diagonal = "only"
, is set
to 0 when
.
Note that by default weighting="marginal"
, meaning that reported
interaction coefficients do not correspond to what is usually expected
in log-linear modeling. Use weighting="none"
or weighting="uniform"
to use more classic identification constraints (effects coding).
Layer coefficients are internally exponentiated in the gnm formula, which means the reported
values are in log scale, with reference 0 for the first year. Interaction coefficients use the
“sum” contrast, also known as “effect” coding, except when
diagonal
is different from
included
, in which case “treatment” constrast (a.k.a “reference” or “dummy”
coding) is used.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply allows for direct identification of the log-multiplicative parameters by setting the
appropriate constraints, and improves performance by eliminating less interesting coefficients.
A unidiff
object, with all the components of a gnm
object, plus an
unidiff
component holding the most relevant information:
layer |
a |
phi |
the value of the intrinsic association coefficient (see |
maor |
the value of the Mean absolute odds ratio (see |
interaction |
a data frame object holding the two-way interaction coefficients, and their standard errors. |
diagonal |
the value of the |
weighting |
the value of the |
Milan Bouchet-Valat
Erikson, R., and Goldthorpe, J.H. (1992). The Constant Flux: A Study of Class Mobility in Industrial Societies. Oxford: Clarendon Press. Ch. 3.
Xie, Yu (1992). The Log-Multiplicative Layer Effect Model for Comparing Mobility Tables. Am. Sociol. Rev. 57(3):380-395.
Yaish, M. (1998). Opportunities, Little Change. Class Mobility in Israeli Society, 1974-1991. Ph.D. thesis, Nuffield College, University of Oxford.
Yaish, M. (2004). Class Mobility Trends in Israeli Society, 1974-1991. Lewiston: Edwin Mellen Press.
## Yaish (1998, 2004) data(yaish) # Last layer omitted because of low frequencies yaish <- yaish[,,-7] # Layer (education) must be the third dimension yaish <- aperm(yaish, 3:1) model <- unidiff(yaish) model summary(model) plot(model)
## Yaish (1998, 2004) data(yaish) # Last layer omitted because of low frequencies yaish <- yaish[,,-7] # Layer (education) must be the third dimension yaish <- aperm(yaish, 3:1) model <- unidiff(yaish) model summary(model) plot(model)
Fit a skew-symmetric association model proposed in Yamaguchi (1990) to describe asymmetry of square tables. This model can be combined with symmetric association models like a quasi-symmetry (the default) or symmetric (homogeneous) RC(M) models.
yrcskew(tab, nd.symm = NA, nd.skew = 1, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NA, etastart = NULL, tolerance = 1e-8, iterMax = 15000, trace = FALSE, verbose = TRUE, ...)
yrcskew(tab, nd.symm = NA, nd.skew = 1, diagonal = FALSE, weighting = c("marginal", "uniform", "none"), se = c("none", "jackknife", "bootstrap"), nreplicates = 100, ncpus = getOption("boot.ncpus"), family = poisson, weights = NULL, start = NA, etastart = NULL, tolerance = 1e-8, iterMax = 15000, trace = FALSE, verbose = TRUE, ...)
tab |
a two-way table, or an object (such as a matrix) that can be coerced into a table; if present, dimensions above two will be collapsed. |
nd.symm |
the number of dimensions to include in the symmetric RC(M) association. Cannot exceed
|
nd.skew |
the number of dimensions to include in the skew-symmetric RC(M) association. |
diagonal |
should the model include parameters specific to each diagonal cell? This amounts to taking quasi-independence, rather than independence, as the baseline model. |
weighting |
what weights should be used when normalizing the scores. |
se |
which method to use to compute standard errors for parameters. |
nreplicates |
the number of bootstrap replicates, if enabled. |
ncpus |
the number of processes to use for jackknife or bootstrap parallel computing. Defaults to
the number of cores (see |
family |
a specification of the error distribution and link function
to be used in the model. This can be a character string naming
a family function; a family function, or the result of a call
to a family function. See |
weights |
an optional vector of weights to be used in the fitting process. |
start |
either |
etastart |
starting values for the linear predictor; set to |
tolerance |
a positive numeric value specifying the tolerance level for convergence; higher values will speed up the fitting process, but beware of numerical instability of estimated scores! |
iterMax |
a positive integer specifying the maximum number of main iterations to perform; consider raising this value if your model does not converge. |
trace |
a logical value indicating whether the deviance should be printed after each iteration. |
verbose |
a logical value indicating whether progress indicators should be printed, including a diagnostic error message if the algorithm restarts. |
... |
more arguments to be passed to |
The original presented by Yamaguchi (1990), called “row-column-effect skew-symmetric
association (logbilinear) model with full quasi-symmetry (QS+RC_SK)”, combines a skew-symmetric association
with a quasi-symmetry baseline; it is the variant fitted by default by this function. If nd.symm
is
set to a positive integer value, though, variants using a RC(M) model to describe the symmetric association
are used, with our without diagonal-specific parameters (depending on the value of the diagonal
argument);
among them is the HM_RC+RC_SK variant, when nd.symm
is 1.
These models follow the equation:
where is the expected frequency for the cell at the intersection of row i and column j of
tab
, and a quasi-symmetric or a RC(M) association. See reference for detailed information
about the degrees of freedom and the identification constraints applied to the scores.
Please note that contrary to other association models, this model is sensitive to reorderings of rows and columns. You have to take care of passing a table whose categories follow a hierarchical order with a substantive meaning.
Another model presented in the paper, the homogeneous symmetric and skew-symmetric associations models (HM_(S+SK)) is not currently supported.
Actual model fitting is performed using gnm
, which implements the Newton-Raphson algorithm.
This function simply ensures correct start values are used, in addition to allowing for identification
of scores even with several dimensions, computation of their jackknife or bootstrap standard errors, and plotting.
The default starting values for main parameters are taken from the model without association parameters
(“base model”); association parameters start with random starting values. In some complex
cases, using start = NULL
to get completely random starting values can be more efficient, but it is also
less stable and can converge to non-optimal solutions.
A yrcskew
object, which is a subclass of an rc.symm
object (seerc
) unless
nd.symm
is NA
. In addition to this class, it contains a assoc.yrcskew
component
holding information about the skew-symmetric association:
phi |
The intrisic association parameters, one per dimension. |
row |
Row scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
col |
Column scores, normalized so that their (weighted) sum is 0, their (weighted) sum of squares is 1, and their (weighted) cross-dimensional correlation is null. |
row.weights |
The row weights used for the identification of scores, as specified by the
|
col.weights |
The column weights used for the identification of scores, as specified by the
|
covmat |
The variance-covariance matrix for phi coefficients and normalized row and column
scores. Only present if |
adj.covmats |
An array stacking on its third dimension one variance-covariance matrix for
the adjusted scores of each layer in the model (used for plotting). Only present if |
covtype |
The method used to compute the variance-covariance matrix (corresponding to the
|
This family of model sometimes converges to a non-optimal solution, in which case the reported scores are wrong. To protect yourself from this problem, you are advised to run the models several times to find out which convergence point is the true one. Furthermore, when model converges slowly, restarting the fitting procedure may produce much better random starting values.
Milan Bouchet-Valat
Yamaguchi, K. (1990). Some Models for the Analysis of Asymmetric Association in Square Contingency Tables with Ordered Categories. Sociol. Methodology 20, 181-212.
## Yamaguchi (1990), Table 5, p. 202, and Table 6B, p. 205 data(ocg1973) # Simple symmetric RC(1) model ("Null skew-symmetry") rc.model <- rc(ocg1973, diagonal=TRUE, symmetric=TRUE, weighting="none") # Reported phi is slightly different, coefficients agree rc.model # Note model does not always converge, several attempts may be needed # Here we set known starting values to be sure it works set.seed(5) model <- yrcskew(ocg1973, nd.symm=1, nd.skew=1, diagonal=TRUE, weighting="none") # We do not get the same results as the author, but the smaller deviance # indicates a better fit in our version (!) model
## Yamaguchi (1990), Table 5, p. 202, and Table 6B, p. 205 data(ocg1973) # Simple symmetric RC(1) model ("Null skew-symmetry") rc.model <- rc(ocg1973, diagonal=TRUE, symmetric=TRUE, weighting="none") # Reported phi is slightly different, coefficients agree rc.model # Note model does not always converge, several attempts may be needed # Here we set known starting values to be sure it works set.seed(5) model <- yrcskew(ocg1973, nd.symm=1, nd.skew=1, diagonal=TRUE, weighting="none") # We do not get the same results as the author, but the smaller deviance # indicates a better fit in our version (!) model
A function of class "nonlin"
to specify a Yamaguchi (1990) skew-symmetric association in
the formula argument to gnm
.
YRCSkew(row, col, rowinf, rowsup, inst = NULL)
YRCSkew(row, col, rowinf, rowsup, inst = NULL)
row |
for each cell in the table, the row category. |
col |
for each cell in the table, the column category. |
rowinf |
must be 1 for cells above the diagonal, 0 for cells below and on the diagonal. |
rowsup |
must be 1 for cells below the diagonal, 0 for cells above and on the diagonal. |
inst |
a positive integer specifying the instance number of the term. |
This function is used by yrcskew
to fit the “row-column-effect skew-symmetric
association (logbilinear) model with full quasi-symmetry (QS+RC_SK)” proposed by Yamaguchi (1990).
It can be used directly to fit custom variants of the model not supported by yrcskew
.
This function combines its arguments in the following way:
When arguments are set according to what is suggested above, and the skew parameter is
constrained to 1, this amounts to the equation:
where is the skew association for the cell at the intersection of row i and column j of
the table. See reference for mathematical details, and the code of
yrcskew
for real-world usage.
A list with the required components of a "nonlin"
function:
predictors |
the expressions passed to |
term |
a function to create a deparsed mathematical expression of the term, given labels for the predictors. |
call |
the call to use as a prefix for parameter labels. |
Milan Bouchet-Valat
Yamaguchi, K. (1990). Some Models for the Analysis of Asymmetric Association in Square Contingency Tables with Ordered Categories. Sociol. Methodology 20, 181-212.
# See ?yrcskew.
# See ?yrcskew.