pymer4.models.glmer#
Tutorial
Check out the LMMs and GLMMs tutorial for usage examples
GLMMs#
Generalized Linear Models Models fit using Restricted-Maximum-Likelihood-Estimation (REML) or Maximum-Likelihood-Estimation (MLE)
GLMMs generalize LMMs like GLMs generalize LMs. They are well suited for non-independent data and non-gaussian outcome variables such as binary outcomes or counts. This includes models like mixed-effects logistic-regression and multi-level poisson models.
Like LMMs they are particularly useful in situations when observations are non-independent (e.g. repeated-measures designs, hierarchical data, panel-data, time-series, clustered data). To account for this GLMMs estimate additional random-effects estimates that reflect how a cluster of observations deviates from fixed effects estimates (e.g. random-intercepts and/or random-slopes)
For some models like mixed logistic-regression, it can be helpful to use .fit(exponentiate=True) to transform estimates to the odds scale to aid interpretability. By default the 'fitted' column in model.data and the output of model.predict() uses type_predict = 'response' so that model predictions are on the response scale, i.e. probabilities for mixed logistic-regression.
from pymer4 import load_dataset('titanic')
from pymer4.models import glm
titanic = load_dataset('titanic')
# Logistic regression accounting repeated observations within pclass
# by estimating a random intercept per level of pclass
log_reg = glm('survived ~ fare + (1|plass)', family='binomial', data=titanic)
log_reg.set_factors('pclass')
# See parameter estimates on odds scale
log_reg.fit(exponentiate=True)
- class pymer4.models.glmer.glmer(formula, data, family='gaussian', link='default', **kwargs)[source]#
Generalized linear mixed effects model estimated via ML/REML. Inherits from
lmer.This class implements generalized linear mixed effects models using Maximum Likelihood or Restricted Maximum Likelihood estimation. It extends the linear mixed effects model class to handle different response distributions and link functions while accounting for random effects.
- Parameters:
formula (str) – R-style formula specifying the model, including random effects
data (DataFrame) – Input data for the model
family (str) – Response distribution family (e.g. “gaussian”, “binomial”). Defaults to “gaussian”
link (str) – Link function to use. Defaults to “default” which uses the canonical link for each family
Estimation Methods#
Estimation methods comprise the most common method you will work with on a routine basis for estimating model parameters, omnibus-tests, marginal estimations & comparisons, predictions, and simulations.
- pymer4.models.glmer.glmer.fit(self, exponentiate=False, summary=False, conf_method='wald', nboot=1000, save_boots=True, type_predict='response', parallel='multicore', ncpus=4, conf_type='perc', **kwargs)#
Fit a generalized linear mixed effects model using
glmer()in R.- Parameters:
summary (bool, optional) – Whether to return the model summary. Defaults to False
conf_method (str, optional) – Method for confidence interval calculation. Defaults to “parametric”
ci_type (str, optional) – Type of bootstrap confidence intervals. Defaults to “perc”
ddf_method (str, optional) – Method for computing denominator degrees of freedom. Defaults to “Satterthwaite”
nboot (int, optional) – Number of bootstrap samples. Defaults to 1000
conf_level (float, optional) – Confidence level for intervals. Defaults to 0.95
type_predict (str, optional) – Type of prediction to compute (“response” or “link”). Defaults to “response”
- Returns:
Model summary if
summary=True- Return type:
GT, optional
- pymer4.models.lmer.lmer.anova(self, summary=False, auto_ss_3=True, jointtest_kwargs={'lmer_df': 'satterthwaite', 'mode': 'satterthwaite'}, anova_kwargs={})#
Calculate a Type-III ANOVA table for the model using
joint_tests()in R.- Parameters:
summary (bool) – whether to return the ANOVA summary. Defaults to False
auto_ss_3 (bool) – whether to automatically use balanced contrasts when calculating the result via joint_tests(). When False, will use the contrasts specified with set_contrasts() which defaults to “contr.treatment” and R’s anova() function; Default is True.
jointtest_kwargs (dict) – additional arguments to pass to joint_tests() Defaults to using Satterthwaite degrees of freedom
anova_kwargs (dict) – additional arguments to pass to anova()
- pymer4.models.lmer.lmer.emmeans(self, marginal_var, by=None, p_adjust='sidak', **kwargs)#
Compute marginal means and/or contrasts between factor levels.
marginal_varis the predictor whose levels will have means or contrasts computed.byis an optional predictor to marginalize over. Ifcontrastsis not specified, only marginal means are returned- Parameters:
marginal_var (str) – name of predictor to compute means or contrasts for
by (str/list) – additional predictors to marginalize over
contrasts (str | 'pairwise' | 'poly' | dict | None, optional) – how to specify comparison within marginal_var. Defaults to None.
p_adjust (str) – multiple comparisons adjustment method. One of: none, tukey (default), bonf, sidak, fdr, holm, dunnet, mvt (monte-carlo multi-variate T, aka exact tukey/dunnet).
- Returns:
Table of marginal means or contrasts
- Return type:
DataFrame
- pymer4.models.base.model.empredict(self, at: dict, apply_transforms=True, type='response', **kwargs)#
Compute marginal predictions at arbitrary levels of predictors by passing in a dictionary of predictor names and values. If the string ‘data’ is used for predictor, then all observed values for that predictor will be used. If a predictor is ommitted, then it’s marginal value will be used (e.g. mean for continuous predictors, grand-mean for factors).
- Parameters:
at (dict) – Dictionary mapping predictor names to values at which to compute predictions. Use “data” as the value to use all observed values for that predictor.
apply_transforms (bool, optional) – Whether to apply any transformations (center/scale/zscore) that were applied to predictors. Doesn’t currently handle .over() transforms. Defaults to True.
- Returns:
A DataFrame containing the predicted values and their uncertainty.
- Return type:
predictions (DataFrame)
Examples
>>> # Assuming model is y ~ x * group and x has been mean-centered >>> model.empredict({'x': [1, 2, 3]}) # Predictions at x=1,2,3 for each level of group >>> model.empredict({'x': [1, 2, 3], 'group': 'data'}) # Predictions at x=1,2,3 using all group level assignment of each observation >>> model.empredict({'x': [-1, 0, 1]}, apply_transforms=False) # Pass-in values on the mean-centered scale
- pymer4.models.glmer.glmer.predict(self, data: DataFrame, use_rfx=True, type_predict='response', **kwargs)#
Make predictions using new data accounting for the link function.
- Parameters:
data (DataFrame) – Input data for predictions
use_rfx (bool, optional) – Whether to include random effects in predictions. Defaults to True. Equivalent to
re.form = NULLin R if True,re.form = NAif Falsetype_predict (str, optional) – Type of prediction to compute (“response” or “link”). Defaults to “response”
**kwargs – Additional arguments passed to predict function
- Returns:
Predicted values
- Return type:
ndarray
- pymer4.models.lmer.lmer.simulate(self, nsim: int = 1, use_rfx=True, **kwargs)#
Simulate values from the fitted model.
- Parameters:
nsim (int, optional) – Number of simulations to run. Defaults to 1
use_rfx (bool, optional) – Whether to include random effects in simulations. Defaults to True. Equivalent to
re.form = NULLin R if True,re.form = NAif False**kwargs – Additional arguments passed to simulate function
- Returns:
- Simulated values with the same number of rows as the original data
and columns equal to nsim
- Return type:
DataFrame
- pymer4.models.base.model.vif(self)#
Calculate the variance inflation factor (VIF) and confidence interval increase factor (CI) (square root of VIF) for each predictor in the model.
- Returns:
A DataFrame containing the VIF and CI for each predictor.
- Return type:
DataFrame
Summary Methods#
Summary methods return nicely formatted outputs of the .result_* attributes of a fitted model
- pymer4.models.glmer.glmer.summary(self, pretty=True, decimals=3)#
Print a nicely formatted summary table that contains
.result_fitUses thegreat_tablespackage, which can be exported in a variety of formats- Parameters:
decimals (int) – number of decimal places to round to; p-values are rounded to
decimals + 1places
- pymer4.models.base.model.summary_anova(self, decimals=3)#
Print a nicely formatted summary table that contains
.result_anovaUses thegreat_tablespackage, which can be exported in a variety of formats- Parameters:
decimals (int) – number of decimal places to round to; p-values are rounded to
decimals + 1places
Transformation & Factor Methods#
These methods are essential for working categorical predictors (factors), customizing specific linear hypotheses, and transforming continous predictors (e.g. mean-centering).
- pymer4.models.base.model.set_factors(self, factors_and_levels: str | dict | list)#
Turn 1 or more variables into factors or change the levels of existing factors. Provide either a list of column names or a dictionary where keys are column names and values are lists of levels in the requested order. Relies on the fact that
rpy2will convert pandas categorical types to R factors: srcAny existing factors can be seen with
.show_factors().- Parameters:
factors_and_levels (str | dict | list) – factors and their levels
- pymer4.models.base.model.unset_factors(self, factors: str | list | None = None)#
Convert factors back to their original data types (e.g. strings, integers, or floats)
- pymer4.models.base.model.show_factors(self)#
Print any current factors and their levels. The order of factor levels determines what parameter estimates represent and what how post-hoc contrasts are specified.
- pymer4.models.base.model.set_contrasts(self, contrasts: dict, normalize=False)#
Change the default contrast coding scheme used by R for factors or specify a set of custom contrasts between factor levels. Unlike base R, custom contrasts should be provided in terms of a human-readable contrast matrix representing differences across factor levels. This is similar to the make.contrasts function from the
gmodelspackage. Custom contrast will be automatically converted to a coding matrix which is what R expects. This allows you specify fewer that k-1 contrasts for a factor with k levels and we will solve for the remaining orthogonal contrasts just like R.Note: setting contrasts will not affect the results of
anova()when used with the deafultauto_ss_3=True- Parameters:
contrasts (dict) – a dictionary where keys are variables that are factors and value is a string specifying the contrast type, e.g.
"contr.treatment","contr.poly", or"contr.sum"or numeric contrast codes to compare across factor levelsnormalize (bool) – whether to normalize contrasts by dividing by their vector norm to put them in standard-deviation units similar to
contr.poly; only applies for custom contrasts
- pymer4.models.base.model.show_contrasts(self)#
Show the contrasts that have been set
- pymer4.models.base.model.set_transforms(self, cols_and_transforms: dict, group=None)#
Scale numeric columns by centering and/or scaling
- Parameters:
cols_and_transforms (dict) – a dictionary where keys are column names and values are transform functions as strings, e.g. “center”, “scale”, “zscore”, “rank”
group (str; optional) – column name to group by before scaling
- pymer4.models.base.model.unset_transforms(self, cols=None)#
Undo the effect of calling .set_transforms()
- Parameters:
cols (str | list; optional) – column name(s) to unscale; if None, all scaled columns will be unscaled
- pymer4.models.base.model.show_transforms(self)#
Show the columns that have been scaled
Auxillary Methods#
Helper methods for more advanced functionality and debugging
- pymer4.models.base.model.report(self)#
Generate a natural language report of the model results.
Uses R’s report package to generate a text description of the model, its parameters, and fit statistics.
- Returns:
A natural language description of the model results
- Return type:
str
- pymer4.models.base.model.show_logs(self)#
Show any captured messages and warnings from R.
Prints all messages and warnings that have been captured from R during model fitting and analysis.
- pymer4.models.base.model.clear_logs(self)#
Clear any captured messages and warnings from R.
Resets the R console message buffer to empty.