Introduction
Linear mixed-effects models
(LMMs) have become a standard tool for analyzing correlated or hierarchical
data across many applied fields, including biostatistics, psychology, ecology,
and toxicology. Their appeal lies in the ability to model both population-level
effects (fixed effects) and sources of variability associated with grouping or
experimental structure (random effects) within a single coherent framework.
However, practitioners
frequently encounter warning messages when fitting LMMs, such as “failed to
converge,” “boundary (singular) fit,” or “Hessian not positive definite.” These
warnings are often treated as technical nuisances or software-specific quirks,
and it is tempting to either ignore them or attempt minor numerical fixes
(e.g., changing optimizers or increasing iteration limits). Such warnings
usually signal deeper statistical issues related to model identifiability and
data support.
Convergence problems are
especially common when LMMs are fitted to small or sparse data sets, or when
the random-effects structure is ambitious relative to the available
information. In these situations, the model may attempt to estimate more
variance–covariance parameters than the data can reliably inform. As a result,
the estimation procedure can struggle to identify a unique and stable optimum
of the likelihood function.
Theoretically, applying
linear mixed models to smaller data sets increases the risk of encountering
convergence issues, particularly the problem of a singular Hessian matrix
during restricted maximum likelihood (REML) estimation. This situation can lead
to unwanted outcomes, such as one or more components having an asymptotic
standard error of zero. Such an occurrence indicates that there are zero
degrees of freedom for the residual (error) term, meaning the error variance is
not defined. In other words, this suggests insufficient data or excessive model
complexity (overparameterization).
Discussion
In
principle, fitting linear mixed-effects models (LMMs) to relatively small or
sparse data sets is statistically delicate, because these models rely on
estimating variance–covariance components from limited information. Estimation
is typically carried out via maximum likelihood (ML) or, more commonly,
restricted maximum likelihood (REML), which involves optimizing a likelihood
surface in a high-dimensional parameter space that includes both fixed effects
and random-effects variance components.
When the
amount of data is small relative to the complexity of the random-effects
structure, the likelihood surface can become flat or ill-conditioned in certain
directions. Therefore, the observed (or expected) information matrix—whose
inverse is used to approximate the covariance matrix of the parameter
estimates—may be singular or nearly singular. In REML estimation, this
manifests as a singular Hessian matrix at the solution.
A
singular Hessian indicates that one or more parameters are not identifiable
from the data. In practical terms, the data do not contain enough independent
information to support the estimation of all specified variance components. This
lack of identifiability can arise for several, often overlapping, reasons:
a) insufficient
sample size, particularly a small number of grouping levels for random
effects;
b) near-redundancy among random effects, such as attempting to
estimate both random intercepts and random slopes with little within-group
replication;
c) boundary solutions, where one or more variance
components are estimated as zero (or extremely close to zero).
Interpretation
of Zero or Near-Zero Standard Errors
One
symptomatic outcome of a singular Hessian is the appearance of zero (or
numerically negligible) asymptotic standard errors for certain parameters,
especially variance components. This is not a meaningful indication of infinite
precision; rather, it is a numerical artifact reflecting non-identifiability.
In the
specific case of the residual (error) variance, an estimated standard error of
zero suggests that the residual variance cannot be distinguished from zero
given the fitted model. Conceptually, this corresponds to having zero
effective degrees of freedom for the residual error term. Put differently,
after accounting for fixed and random effects, the model leaves no independent
information with which to estimate unexplained variability.
This
situation implies that the model has effectively saturated the data: each
observation is being explained by the combination of fixed effects and random
effects with no remaining stochastic noise. From a statistical perspective,
this is problematic because a) the residual variance is no longer defined in a
meaningful way, b) the standard
inferential procedures (confidence intervals, hypothesis tests) become invalid,
and c) small perturbations of the data can lead to large changes in parameter
estimates (ill-conditioned problem).
Overparameterization
and Model–Data Mismatch
The
fundamental issue underlying these problems is overparameterization: the
specified model is too complex for the available data. In mixed models, this
typically occurs when the random-effects structure is overly rich relative to
the number of observations and grouping units. For example, attempting to
estimate multiple random slopes per group with only a few observations per
group almost guarantees identifiability problems.
From a
modeling perspective, a singular Hessian or zero residual variance should not
be interpreted as an intrinsic property of the data-generating process, but
rather as evidence of a mismatch between model complexity and data support.
The data are insufficient to separate the contributions of all variance
components, leading the estimation procedure to collapse some parameters to
boundary values.
Practical
Implications
These
issues underscore the importance of exercising caution when applying linear
mixed models to small data sets. In such situations, it is often necessary to simplify
the random-effects structure by eliminating random slopes or correlations, or,
if feasible, to increase either the sample size or the number of grouping
levels. As a final option, one might consider alternative modeling strategies,
such as fixed-effects models or penalized/regularized mixed models.
Ultimately,
convergence diagnostics, singularity warnings, and zero standard errors
should be viewed as key signals that the model may be statistically ill-posed,
rather than as purely technical inconveniences. They suggest that the
inferential conclusions drawn from such a model are likely unreliable unless
the model specification is revisited.
References
Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
McElreath, R. (2020). Statistical Rethinking. CRC Press.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS. Springer.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software.
Matuschek, H. et al. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language.
Barr, D. J. et al. (2013). Random effects structure for confirmatory hypothesis testing. Journal of Memory and Language.