pymer4.models.glm

`pymer4.models.glm`#

Tutorial

Check out the linear regression and GLMs tutorial for usage examples

GLM#

Generalized Linear Models fit using Maximum-Likelihood-Estimation (MLE)

GLMs are useful for estimating models with non-gaussian outcome variables. These include models like logistic regression for binary data and poisson regression for count data that are controlled using the family and link arguments when initializing a model along with a formula and data.

For some models like logistic regression, it can be helpful to use .fit(exponentiate=True) to transform estimates to the odds scale to aid interpretability. By default the 'fitted' column in model.data and the output of model.predict() uses type_predict = 'response' so that model predictions are on the response scale, i.e. probabilities for logistic regression.

from pymer4 import load_dataset('titanic')
from pymer4.models import glm

titanic = load_dataset('titanic')

# Logistic regression with logit link
log_reg = glm('survived ~ fare', family='binomial', data=titanic)

# See parameter estimates on odds scale
log_reg.fit(exponentiate=True)

# Logistic regression with probit link
probit_reg = glm('survived ~ fare', family='binomial', link='probit', data=titanic)

# Now estimates on the link scale
# which is z-score of p(survived)
probit_reg.fit()

class pymer4.models.glm.glm(formula, data, family='gaussian', link='default', **kwargs)[source]#

Generalized linear model estimated via MLE. Inherits from lm.

This class implements generalized linear models using Maximum Likelihood Estimation. It extends the base linear model class to handle different response distributions and link functions.

Parameters:

formula (str) – R-style formula specifying the model
data (DataFrame) – Input data for the model
family (str) – Response distribution family (e.g. “gaussian”, “binomial”). Defaults to “gaussian”
link (str) – Link function to use. Defaults to “default” which uses the canonical link for each family

Estimation Methods#

Estimation methods comprise the most common method you will work with on a routine basis for estimating model parameters, omnibus-tests, marginal estimations & comparisons, predictions, and simulations.

pymer4.models.glm.glm.fit(self, exponentiate=False, summary=False, conf_method='wald', nboot=1000, save_boots=True, type_predict='response', parallel='multicore', ncpus=4, conf_type='perc', **kwargs)#

Fit a GLM using glm() in R.

Parameters:

exponentiate (bool, optional) – Whether to exponentiate the parameter estimates to the odds scale. Defaults to False
summary (bool, optional) – Whether to return the model summary. Defaults to False
conf_method (str, optional) – Method for confidence interval calculation. Defaults to “wald”. Alternatively, "boot" for bootstrap CIs.
nboot (int, optional) – Number of bootstrap samples. Defaults to 1000
save_boots (bool, optional) – Whether to save bootstrap samples. Defaults to True
type_predict (str, optional) – Type of prediction to compute (“response” or “link”). Defaults to “response”
parallel (str, optional) – Parallelization for bootstrapping. Defaults to “multicore”
ncpus (int, optional) – Number of cores to use for parallelization. Defaults to 4
conf_type (str, optional) – Type of confidence interval to calculate. Defaults to “perc”
**kwargs – Additional arguments passed to the R GLM function

Returns:

Model summary if summary=True

Return type:

GT, optional

pymer4.models.base.model.anova(self, auto_ss_3=True, summary=False, jointtest_kwargs={}, anova_kwargs={})#

Calculate a Type-III ANOVA table for the model using joint_tests() in R.

Parameters:

summary (bool) – whether to return the ANOVA summary. Defaults to False
auto_ss_3 (bool) – whether to automatically use balanced contrasts when calculating the result via joint_tests(). When False, will use the contrasts specified with set_contrasts() which defaults to “contr.treatment” and R’s anova() function; Default is True.
jointtest_kwargs (dict) – additional arguments to pass to joint_tests()
anova_kwargs (dict) – additional arguments to pass to anova()

pymer4.models.base.model.emmeans(self, marginal_var: str | list, by: str | list | None = None, p_adjust='sidak', type='response', normalize=False, apply_transforms=True, **kwargs)#

Compute marginal means/trends and optionally contrasts between those means/trends at different factor levels. marginal_var is the predictor whose levels will have means or trends. by is an optional factor predictor to calculate separate means or trends for. If contrasts is provided, they are computed with respect to the marginal means or trends calculated

Parameters:

marginal_var (str | list) – name of predictor to compute means or contrasts for
by (str/list) – additional predictors to marginalize over
contrasts (str | 'pairwise' | 'poly' | dict | None, optional) – how to specify comparison within marginal_var. Defaults to None.
interaction (str | dict | None, optional) – how to specify any contrasts between levels of by. Defaults to None.
normalize (bool) – normalize numerical contrasts to generate orthogonal polynomial similar to R; preferable for contrasts across more that 2 factor levels; Default False
type (str) – compute marginal means and contrasts on the ‘response’ or ‘link’ scale; Default ‘response’ (e.g. probabilities for logistic regression)

Returns:

Table of marginal means or contrasts

Return type:

DataFrame

pymer4.models.base.model.empredict(self, at: dict, apply_transforms=True, type='response', **kwargs)#

Compute marginal predictions at arbitrary levels of predictors by passing in a dictionary of predictor names and values. If the string ‘data’ is used for predictor, then all observed values for that predictor will be used. If a predictor is ommitted, then it’s marginal value will be used (e.g. mean for continuous predictors, grand-mean for factors).

Parameters:

at (dict) – Dictionary mapping predictor names to values at which to compute predictions. Use “data” as the value to use all observed values for that predictor.
apply_transforms (bool, optional) – Whether to apply any transformations (center/scale/zscore) that were applied to predictors. Doesn’t currently handle .over() transforms. Defaults to True.

Returns:

A DataFrame containing the predicted values and their uncertainty.

Return type:

predictions (DataFrame)

Examples

>>> # Assuming model is y ~ x * group and x has been mean-centered
>>> model.empredict({'x': [1, 2, 3]})  # Predictions at x=1,2,3 for each level of group
>>> model.empredict({'x': [1, 2, 3], 'group': 'data'})  # Predictions at x=1,2,3 using all group level assignment of each observation
>>> model.empredict({'x': [-1, 0, 1]}, apply_transforms=False)  # Pass-in values on the mean-centered scale

pymer4.models.glm.glm.predict(self, data: DataFrame, type_predict='response', **kwargs)#

Make predictions from the model accounting for the link function.

Parameters:

data (DataFrame) – Data to make predictions on
type_predict (str, optional) – Type of prediction to compute (“response” or “link”). Defaults to “response”
**kwargs – Additional keyword arguments passed to predict function

Returns:

Predicted values

Return type:

ndarray

pymer4.models.base.model.simulate(self, nsim: int = 1, **kwargs)#

Simulate values from the model

Parameters:: nsim (int) – number of simulations to run
Returns:: simulated values with the same number of rows as the original data and columns equal to nsim
Return type:: simulations (DataFrame)

pymer4.models.base.model.vif(self)#

Calculate the variance inflation factor (VIF) and confidence interval increase factor (CI) (square root of VIF) for each predictor in the model.

Returns:: A DataFrame containing the VIF and CI for each predictor.
Return type:: DataFrame

Summary Methods#

Summary methods return nicely formatted outputs of the .result_* attributes of a fitted model

pymer4.models.glm.glm.summary(self, pretty=True, decimals=3)#

Print a nicely formatted summary table that contains .result_fit Uses the great_tables package, which can be exported in a variety of formats

Parameters:: decimals (int) – number of decimal places to round to; p-values are rounded to decimals + 1 places

pymer4.models.base.model.summary_anova(self, decimals=3)#

Print a nicely formatted summary table that contains .result_anova Uses the great_tables package, which can be exported in a variety of formats

Parameters:: decimals (int) – number of decimal places to round to; p-values are rounded to decimals + 1 places

Transformation & Factor Methods#

These methods are essential for working categorical predictors (factors), customizing specific linear hypotheses, and transforming continous predictors (e.g. mean-centering).

pymer4.models.base.model.set_factors(self, factors_and_levels: str | dict | list)#

Turn 1 or more variables into factors or change the levels of existing factors. Provide either a list of column names or a dictionary where keys are column names and values are lists of levels in the requested order. Relies on the fact that rpy2 will convert pandas categorical types to R factors: src

Any existing factors can be seen with .show_factors().

Parameters:: factors_and_levels (str | dict | list) – factors and their levels

pymer4.models.base.model.unset_factors(self, factors: str | list | None = None)#: Convert factors back to their original data types (e.g. strings, integers, or floats)

pymer4.models.base.model.show_factors(self)#: Print any current factors and their levels. The order of factor levels determines what parameter estimates represent and what how post-hoc contrasts are specified.

pymer4.models.base.model.set_contrasts(self, contrasts: dict, normalize=False)#

Change the default contrast coding scheme used by R for factors or specify a set of custom contrasts between factor levels. Unlike base R, custom contrasts should be provided in terms of a human-readable contrast matrix representing differences across factor levels. This is similar to the make.contrasts function from the gmodels package. Custom contrast will be automatically converted to a coding matrix which is what R expects. This allows you specify fewer that k-1 contrasts for a factor with k levels and we will solve for the remaining orthogonal contrasts just like R.

Note: setting contrasts will not affect the results of anova() when used with the deafult auto_ss_3=True

Parameters:

contrasts (dict) – a dictionary where keys are variables that are factors and value is a string specifying the contrast type, e.g. "contr.treatment", "contr.poly", or "contr.sum" or numeric contrast codes to compare across factor levels
normalize (bool) – whether to normalize contrasts by dividing by their vector norm to put them in standard-deviation units similar to contr.poly; only applies for custom contrasts

pymer4.models.base.model.show_contrasts(self)#: Show the contrasts that have been set

pymer4.models.base.model.set_transforms(self, cols_and_transforms: dict, group=None)#

Scale numeric columns by centering and/or scaling

Parameters:

cols_and_transforms (dict) – a dictionary where keys are column names and values are transform functions as strings, e.g. “center”, “scale”, “zscore”, “rank”
group (str; optional) – column name to group by before scaling

pymer4.models.base.model.unset_transforms(self, cols=None)#

Undo the effect of calling .set_transforms()

Parameters:: cols (str | list; optional) – column name(s) to unscale; if None, all scaled columns will be unscaled

pymer4.models.base.model.show_transforms(self)#: Show the columns that have been scaled

Auxillary Methods#

Helper methods for more advanced functionality and debugging

pymer4.models.base.model.report(self)#

Generate a natural language report of the model results.

Uses R’s report package to generate a text description of the model, its parameters, and fit statistics.

Returns:: A natural language description of the model results
Return type:: str

pymer4.models.base.model.show_logs(self)#

Show any captured messages and warnings from R.

Prints all messages and warnings that have been captured from R during model fitting and analysis.

pymer4.models.base.model.clear_logs(self)#

Clear any captured messages and warnings from R.

Resets the R console message buffer to empty.

pymer4.models.glm

Contents

pymer4.models.glm#

GLM#

Estimation Methods#

Summary Methods#

Transformation & Factor Methods#

Auxillary Methods#

`pymer4.models.glm`#