---
title: "Short Reference for logmult"
author: "Milan Bouchet-Valat"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Short Reference for logmult}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Supported Models and Options

The logmult package currently supports these model families via separate functions:

* UNIDIFF (a.k.a. log-multiplicative layer effect model): `unidiff` function.
* RC(M) (a.k.a. Goodman Type II) row-column association models: `rc` function.
* RC(M)-L row-column association models with layer effect: `rcL` function.
* Skew-symmetric row-column association model (van der Heijden & Mooijaart): `hmskew` function.
* Skew-symmetric row-column association model with layer effect (extension of
  van der Heijden & Mooijaart): `hmskewL` function.
* Skew-symmetric row-column association model (Yamaguchi RC-SK): `yrcskew` function.

Please refer to the inline documentation for each function (e.g. `?unidiff`) for more details
and classic examples.

These functions take as their first argument a table, typically obtained via the `table`
or `xtabs` function. Arrays of counts without row, column and layer names will have
letters attributed automatically; use `rownames`, `colnames` and/or `dimnames` to change
these names.

Main options common to several models include:

* No weighting, uniform weighting or marginal weighting when normalizing scores:
  `weighting` argument.
* Symmetric (a.k.a. homogeneous) scores for rows and columns: `symmetric` argument.
* Homogeneous scores and association coefficients for all layers, homogeneous scores only
  (a.k.a. "simple homogeneous"), or heterogeneous scores and association coefficients:
  `layer.effect`, `layer.effect.symm` and `layer.effect.skew` arguments.
* Number of dimensions: `nd`, `nd.symm` and `nd.skew` arguments.
* Diagonal-specific parameters ("quasi-" models), either stable or varying over layers:
  `diagonal` argument.
* Jackknife and bootstrap standard errors: `se` and `nreplicates` argument.
* Supplementary rows and columns: `rowsup` and `colsup` arguments.
* Fully random or precomputed (semi-random) starting values: `start` argument.
* Fitting control via arguments passed to `gnm`: tolerance criterion (`tolerance`),
  maximum number of iterations (`iterMax`), progress output (`trace` and `verbose`),
  faster fitting by not estimating uninteresting parameters (`elim`).

Custom models which cannot be obtained via the standard options can be fitted manually
by calling `gnm` directly. Association coefficients can then be extracted by calling one of
the `assoc.*` functions on the model: `assoc.rc`, `assoc.rcL`, `assoc.rcL.symm`,
`assoc.hmskew`, `assoc.hmskewL`, `assoc.rc.symm` or `assoc.yrcskew`. Since these functions
are not exported, you need to fully qualify them to call them, e.g. `logmult:::assoc.rc(model)`.
The resulting objects (of class `assoc`) can be passed to `plot` and support the same options
as models.

Models of the "quasi-" type, i.e. excluding some cells of a table, can be fitted by setting
the corresponding cells of the input table to `NA`. Reported degrees of freedom will be correct
(contrary to what often happens when setting zero weights for these cells).

# Plotting

The package supports rich plotting features for each model family.

For the UNIDIFF model the layer coefficient can be plotted by simply calling `plot`
on the fitted model. See `?plot.unidiff` for details and examples.

For association models, one- and multi-dimensional scores plots can be drawn, again by calling
`plot` on the fitted model. For models with a layer effect, a given layer can be chosen via the
`layer` argument, or an average of association coefficients can be used (for models with
homogeneous scores only). Several arguments allow tweaking the display, including:

* Which dimensions to plot: `dim` argument.
* Whether to plot the symmetric or skew-symmetric part of the association (when applicable):
  `what` argument.
* Whether to show rows, columns or both: `what` argument.
* Which specific rows/columns to represent: `which` argument.
* Whether to draw confidence intervals/ellipses (when jackknife/bootstrap were enabled
  for fitting): `conf.int` and `replicates` argument.
* Whether the size of symbols should vary according to their frequencies: `mass` argument.
* Whether the luminosity of symbols should vary according to the strength of the association:
  `luminosity` argument.
* Whether to reverse the axes: `rev.axes` argument.
* Standard arguments allow choosing the title (`main`), axis labels (`xlab`, `ylab`),
  axis limits (`xlim`, `ylim`), symbol size (`cex`) and type (`pch`), draw onto an existing
  plot (`add`).

See `?plot.assoc` for the full reference.

# Notes About LEM

Results provided by logmult should generally be consistent with LEM, and have been checked
against it when possible. Some models are known not to work correctly in LEM, though.

* UNIDIFF layer coefficients are consistent with those computed by LEM, including when diagonal
  cells are excluded (using the `wei` commands or diagonal-specific parameters).
  Row-column intraction coefficients obtained with `weighting="none"` or `weighting="uniform"` are consistent with LEM
  (coefficients reported by LEM exclude the last row and column).
* RC(1) scores and intrinsic association coefficients are consistent with logmult; some sign
  changes can happen but do not affect results.
* Multidimensional RC(M) models can be fitted in LEM, but their association parameters
  are not identified; however fit statistics agree with logmult.
* RC(M)-L model scores and intrinsic association coefficients are consistent with logmult;
  some sign changes can happen but do not affect results.

Even when models are supposed to be consistent between LEM and logmult, it can happen that
different results are obtained. There are several possible reasons to that:

* Several local optima may exist. Since logmult uses random starting values, running the model
  many times will allow checking whether another solution with a lower deviance exists. This can
  be achieved with LEM by adding `ran` at the end of the `mod` line.
* Convergence may appear to have been reached while this is not the case. This is a
  particularly common risk with LEM since the default tolerance criterion is not very strict.
  Add a `cri 0.00000001` line (or use an even lower value if time permits) to use a stricter
  criterion. Even then, check that changing the criterion does not affect
  too much the estimated coefficients: if that is the case, they may not be reliable.

When unsure whether parameters of a model are identified in LEM, add `ran` at the end
of the `mod` line to use random starting values. Unidentified coefficients will then be
different at every run; only identified coefficients will remain the same. logmult only
reports identifiable parameters. On the other hand, gnm returns unidentified parameters
from `coef`, but these have `NA` standard errors when calling `summary(asGnm(model))`;
since random starting values are used by default, unidentified parameters will also be
different when re-fitting a model.

When using null weights, LEM reports incorrect degrees of freedom, as zero-weight cells
are still considered as free. With logmult, instead of using null weights, set corresponding
cells to `NA` in the input table; this will report the same results as LEM, but with correct
degrees of freedom.

# logmult/gnm Limitations Compared With LEM

gnm and logmult do not always work well with effects coding (`"contr.sum"`). Models may fail
to converge and parameters extraction will not always work. Using dummy coding
(`"contr.treatment"`) is recommended, and gives the same log-multiplicative parameters as
when using effects coding (which only affects linear parameters).