pymer4.tidystats#

The tidystats module contains all the functions that support the features of pymer4’s models. Moreover, tidystats can be used as a functional alternative to the object-oriented-approach of pymer4.models

import pymer4.tidystats as ts
from pymer4 import load_dataset

df = load_dataset('sleep')
model = ts.lm('Reaction ~ Days', data=df)

# Like calling coef() in R
ts.coef(model)

# Like calling tidy() in R
ts.tidy(model)

base#

Wraps functionality from the base R library

pymer4.tidystats.base.summary(arg)[source]#

Produce a summary of the results. Currently unused.

Parameters:

arg (object) – The object to summarize

Returns:

An R-type of the summarized object

Return type:

object

broom#

Wraps functionality from the broom and broom.mixed libraries

pymer4.tidystats.broom.augment(model, /, **kwargs)[source]#

Add information as observations to dataset. Uses broom.mixed:::augment.merMod for linear-mixed-models, broom::augment.lm for linear models, and broom::augment.glm for generalized linear models.

pymer4.tidystats.broom.glance(model, /, **kwargs)[source]#

Report information about the entire model. Uses broom.mixed:::glance.merMod for linear-mixed-models, broom::glance.lm for linear models, and broom::glance.glm for generalized linear models.

pymer4.tidystats.broom.tidy(model, **kwargs)[source]#

Summarize information about model components. Uses broom.mixed::tidy.merMod for linear-mixed-models, broom::tidy.lm for linear models, and broom::tidy.glm for generalized linear models.

easystats#

Wraps functionality from various sub-libraries in the easystats ecoysystem

pymer4.tidystats.easystats.bootstrap_model(r_model, nboot=1000, parallel='multicore', n_cpus=4, **kwargs)[source]#

Generate bootstrap samples for model fixed effects coefficients using the implementation in parameters::bootstrap_model

Parameters:
  • r_model (R model) – lm, glm, lmer, or glmer model

  • nboot (int, optional) – Number of bootstrap samples. Defaults to 1000.

  • parallel (str, optional) – Parallelization method. Defaults to “snow”.

  • n_cpus (int, optional) – Number of CPUs to use. Defaults to 4.

pymer4.tidystats.easystats.get_fixed_params(r_model)[source]#

Get the fixed-effects parameters for a model using the implementation in insight::get_parameters

Parameters:

r_model (R model) – lm, glm, lmer, or glmer model

pymer4.tidystats.easystats.get_param_names(r_model)[source]#

Get the parameter names for a model using the implementation in insight::find_parameters

Parameters:

r_model (R model) – lm, glm, lmer, or glmer model

Returns:

Fixed and random parameter names

Return type:

tuple

pymer4.tidystats.easystats.is_converged(r_model)[source]#

Check if a model is converged using the implementation in insight::is_converged

Parameters:

r_model (R model) – lm, glm, lmer, or glmer model

Returns:

Whether the model converged and the convergence message

Return type:

tuple

pymer4.tidystats.easystats.is_mixed_model(r_model)[source]#

Check if a model is a mixed model using the implementation in insight::is_mixed_model

Parameters:

r_model (R model) – lm, glm, lmer, or glmer model

pymer4.tidystats.easystats.model_icc(r_model, by_group=True, **kwargs)[source]#

Calculate the intraclass correlation coefficient (ICC) for a model using the implementation in performance::icc

Parameters:
  • r_model (R model) – lm, glm, lmer, or glmer model

  • by_group (bool, optional) – Whether to calculate the ICC for each group. Defaults to True.

Returns:

Table of ICCs

Return type:

DataFrame

pymer4.tidystats.easystats.model_params(r_model, **kwargs)[source]#

Get model parameters using the implementation in parameters::model_parameters and standardize names using the implementation in insight::standardize_names

Parameters:
  • r_model (R model) – lm, glm, lmer, or glmer model

  • effects (str, optional) – Whether to include fixed or random effects. Defaults to “fixed”.

  • exponentiate (bool, optional) – Whether to exponentiate the parameters. Defaults to False.

pymer4.tidystats.easystats.model_performance(r_model, **kwargs)[source]#

Calculate model performance using the implementation in performance::model_performance

Parameters:

r_model (R model) – lm, glm, lmer, or glmer model

pymer4.tidystats.easystats.model_performance_cv(r_model, method='k_fold', stack=False, **kwargs)[source]#

Calculate cross-validated model performance using the implementation in performance::performance_cv

Parameters:
  • r_model (R model) – lm, glm, lmer, or glmer model

  • method (str, optional) – Method for cross-validation. Defaults to “k_fold”.

  • stack (bool, optional) – Whether to stack the results. Defaults to False.

pymer4.tidystats.easystats.report(model, **kwargs)[source]#

Generate a report for a model using the implementation in easystats

Parameters:

model (R model) – lm, glm, lmer, or glmer model

emmeans_lib#

Wraps functionality from the emmeans library

pymer4.tidystats.emmeans_lib.emmeans(model, specs, contrasts: str | dict | None = None, **kwargs)[source]#

This function combines functionality from emmeans::emmeans and emmeans::contrast, by first generating a grid and then optionally computing contrasts over it if contrasts is not None.

Parameters:
  • model (R model) – lm, glm, lmer model

  • specs (str) – name of predictor

  • by (str/list) – additional predictors to subset by

  • contrasts (str | 'pairwise' | 'poly' | dict | None, optional) – how to specify comparisonwithin specs. Defaults to None.

Returns:

Table of marginal effects and/or means

Return type:

DataFrame

pymer4.tidystats.emmeans_lib.emtrends(model, contrasts: str | dict | None = None, **kwargs)[source]#

This function combines functionality from emmeans::emtrends and emmeans::contrast, by first generating a grid and then optionally computing contrasts over it if contrasts is not None.

Parameters:
  • model (R model) – lm, glm, lmer model

  • specs (str) – name of predictor

  • by (str/list) – additional predictors to subset by

  • contrasts (str | 'pairwise' | 'poly' | dict | None, optional) – how to specify comparisonwithin specs. Defaults to None.

Returns:

Table of marginal effects and/or means

Return type:

DataFrame

pymer4.tidystats.emmeans_lib.joint_tests(model, **kwargs)[source]#

Compute ANOVA-style F-tests using emmeans::joint_tests

Parameters:

model (R model) – lm, glm, lmer model

Returns:

F-statistics table of main effects/interactions

Return type:

DataFrame

pymer4.tidystats.emmeans_lib.ref_grid(model, *args, **kwargs)[source]#

Create a reference grid of model predictions. Uses emmeans::ref_grid and emmeans::summary_emmGrid.

lmerTest#

Wraps functionality from the lme4 and lmerTest libraries

pymer4.tidystats.lmerTest.bootMer(model, nsim=1000, parallel='multicore', ncpus=4, conf_level=0.95, conf_method='perc', exponentiate=False, save_boots=True, **kwargs)[source]#

Bootstrap model parameters using bootMer Extracts fixed effects using fixef() and random-effects using broom.mixed::tidy()

Parameters:
  • model (R model) – lmer or glmer model

  • nsim (int, optional) – Number of bootstrap samples. Defaults to 1000.

  • parallel (str, optional) – Parallelization method. Defaults to “multicore”.

  • ncpus (int, optional) – Number of cores to use. Defaults to 4.

  • conf_level (float, optional) – Confidence level. Defaults to 0.95.

  • conf_method (str, optional) – Confidence interval method. Defaults to “perc”.

  • exponentiate (bool, optional) – Whether to exponentiate the results. Defaults to False.

pymer4.tidystats.lmerTest.fixef(model, *args, **kwargs) ndarray[source]#

Extract model fixed-effects using fixef

pymer4.tidystats.lmerTest.glmer(*args, **kwargs)[source]#

Fit a generalized linear-mixed-model using glmer

Parameters:
  • formula (str) – model formula

  • family (str) – glm family

  • data (pl.DataFrame) – polars dataframe

Returns:

R model object

Return type:

model (R RS4)

pymer4.tidystats.lmerTest.is_singular(model)[source]#

Check if a model is singular using the implementation in lmerTest

Parameters:

model (R model) – lmer or glmer model

pymer4.tidystats.lmerTest.lmer(*args, **kwargs)[source]#

Fit a linear-mixed-model using lmer and get inferential stats using lmerTest

Parameters:
  • formula (str) – model formula

  • data (pl.DataFrame) – polars dataframe

Returns:

R model object

Return type:

model (R RS4)

pymer4.tidystats.lmerTest.ranef(model, *args, **kwargs) ndarray[source]#

Extract model random-effects/conditional-modes using ranef

multimodel#

Functions that intelligently switch their functionality based on whether they received and lm, glm, lmer or glmer model as input, mimicking “function overloading” in R

pymer4.tidystats.multimodel.boot(data, model, formula, R, family=None, link=None, conf_method='perc', conf_level=0.95, return_boots=False, **kwargs)[source]#

NOTE: Experimental - may not reliably handle glm models. Currently unused. Generate bootstrapped confidence intervals for a model using boot::boot and broom::tidy.boot or for lme4 model using lme4::confint.merMod

Parameters:
  • data (DataFrame) – polars DataFrame to resample

  • model (R model) – model object

  • formula (str) – model formula

  • R (int) – number of bootstrap samples

  • family (str, optional) – family for glm models. Defaults to None.

  • conf_method (str, optional) – how to calculated intervalsl: “perc”, “bca”, “basic”, “norm”. Defaults to “perc”.

  • conf_level (float, optional) – _description_. Defaults to 0.95.

Returns:

bootstrap results

Return type:

summary (DataFrame)

pymer4.tidystats.multimodel.coef(model, *args, **kwargs)[source]#

Extract coefficients from lm, glm, lmer or glmer models. Uses lme4::coef.merMod for linear-mixed-models and stats::coef.lm for linear models.

Parameters:

model (R ListVector or RS4) – R model

Returns:

numpy array of coefficients (lm and glm) or polars DataFrame of BLUPs (lmer and glmer)

Return type:

coefficients (ndarray or DataFrame)

pymer4.tidystats.multimodel.confint(model, *args, as_df=True, **kwargs)[source]#

Confidence intervals including via bootstrapping using stats::confint or lme4::confint.merMod

pymer4.tidystats.multimodel.predict(model, *args, **kwargs)[source]#

Generate predictions from a model given existing or new data. Uses lme4::predict.merMod for linear-mixed-models, stats::predict.lm for linear models, and stats::predict.glm for generalized linear models.

Parameters:

model (R ListVector or RS4) – R model

Returns:

numpy array of predictions same length as the model’s data or input data

Return type:

predictions (ndarray)

pymer4.tidystats.multimodel.simulate(model, *args, **kwargs)[source]#

Simulate a new dataset from a model. Uses lme4::simulate.merMod for linear-mixed-models, stats::simulate.lm for linear models, and stats::simulate.glm for general linear models.

Parameters:

model (R ListVector or RS4) – R model

Returns:

polars DataFrame with number of columns = nsim (1 by default)

Return type:

dataset (DataFrame)

stats#

Wraps functionality from the stats library

pymer4.tidystats.stats.anova(*args, **kwargs)[source]#

Compare one or more models using stats::anova.glm <https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/anova.glm>.

Can also calculate stats::anova from a fitted model, but prefer ts.joint_tests from emmeans to ensure balanced Type-III SS inferences

pymer4.tidystats.stats.glm(*args, **kwargs)[source]#

Fit a generalized linear model using stats::glm

Parameters:
  • formula (str) – model formula

  • family (str) – glm family

  • data (pl.DataFrame) – polars dataframe

Returns:

R model object

Return type:

model (R ListVector)

pymer4.tidystats.stats.lm(*args, **kwargs)[source]#

Fit a linear-model using stats::lm

Parameters:
  • formula (str) – model formula

  • data (pl.DataFrame) – polars dataframe

Returns:

R model object

Return type:

model (R ListVector)

pymer4.tidystats.stats.model_matrix(model, unique=True)[source]#

Extract model design matrix

Parameters:
  • model (R model) – lm, glm, lmer, or glmer model

  • unique (bool; optional) – return a dataframe the size of the model’s data; default False

pymer4.tidystats.stats.resid(model, *args, **kwargs)[source]#

Extract model residuals

tibble#

Wraps functionality from the tibble library

pymer4.tidystats.tibble.as_tibble(*args, **kwargs)[source]#

Coerce input to a tibble

Returns:

polars DataFrame

Return type:

dataframe (DataFrame)

bridge#

This is a special module that helps with converting between R and Python datatypes. It’s particularly useful if you want to try to add additional features to pymer4

pymer4.tidystats.bridge.R2numpy(rarr)[source]#

Local conversion of R array to numpy as recommended by rpy2

pymer4.tidystats.bridge.R2polars(rdf)[source]#

Local conversion of R dataframe to polars as recommended by rpy2

pymer4.tidystats.bridge.con2R(arr)[source]#

Convert human-readable contrasts into a form that R requires. Works like the make.contrasts() function from the gmodels package, in that it will auto-solve for the remaining orthogonal k-1 contrasts if fewer than k-1 contrasts are specified.

Parameters:

arr (np.ndarray) – 1d or 2d numpy array with each row reflecting a unique contrast and each column a factor level

Returns:

A 2d numpy array useable with the contrasts argument of R models

pymer4.tidystats.bridge.convert_argkwarg_dataframe(arg)[source]#

Convert args/kwargs that are Python DataFrames to proper R type(s)

pymer4.tidystats.bridge.convert_argkwarg_dict(arg)[source]#

Convert args/kwargs that are Python dicts to proper R type(s)

pymer4.tidystats.bridge.convert_argkwarg_list(arg)[source]#

Convert args/kwargs that are Python lists to proper R type(s)

pymer4.tidystats.bridge.convert_argkwarg_model(arg)[source]#

Convert arg/kwargs that are pymer4 model objects to access their r_model attribute

pymer4.tidystats.bridge.convert_argkwarg_none(arg)[source]#

Convert args/kwargs that are Python None to proper R type(s)

pymer4.tidystats.bridge.ensure_py_output(func)[source]#

Decorator that converts R outputs to Python equivalents. Currently this includes:

  • R FloatVector -> numpy array

  • R StrVector -> list

  • R dataframe/tibble -> polars dataframe

  • R ListVector of Dataframes -> list of polars dataframes

pymer4.tidystats.bridge.ensure_r_input(func)[source]#

Decorator that converts function arguments that are Pyton types into corresponding R types. Currently this includes:

  • polars DataFrames

  • python lists

  • numpy arrays

  • python dictionaries

  • python None types

  • pymer4 model objects

pymer4.tidystats.bridge.numpy2R(arr)[source]#

Local conversion of numpy array to R array as recommended by rpy2

pymer4.tidystats.bridge.polars2R(df)[source]#

Local conversion of polars dataframe to R dataframe as recommended by rpy2

pymer4.tidystats.bridge.sanitize_polars_columns(result)[source]#

Clean up polars columns using auxillary functions

pymer4.tidystats.bridge.to_dict(listVector)[source]#

Recursively convert an R ListVector into a Python dict with all Python types. Ignores R ‘call’ and ‘terms’. Useful for seeing an lm() or lmer() model object or the output of summary() as a Python dict.

plutils#

Utility functions for working with polars dataframes

class pymer4.tidystats.plutils.RandomExpr(expr: Expr)[source]#

Polars expression namespace for random number generation accessilble as .random.* as part of a polars expression.

Examples

>>> import polars as pl
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>> # Create a DataFrame with random numbers from different distributions
>>> example = pl.DataFrame().with_columns(
>>>     # Normal distribution
>>>     pl.col('*').random.norm(1000, .5, .1).alias('x'),
>>>
>>>     # Binomial distribution
>>>     pl.col('*').random.binom(1000, 1, .5).alias('y'),
>>>
>>>     # Random selection from a group of 3
>>>     pl.col('*').random.group(1000, 3).alias('group'),
>>>
>>>     # Repeated value
>>>     pl.repeat('site_1', 1000).alias('dataset')
>>> )
>>> example.head()
beta(n: int, _alpha: float, _beta: float)[source]#

Generate random numbers from a beta distribution

binom(n: int, size: int, prob: float)[source]#

Generate random numbers from a binomial distribution

chisq(n: int, df: float)[source]#

Generate random numbers from a chi-squared distribution

gamma(n: int, shape: float, scale: float = 1)[source]#

Generate random numbers from a gamma distribution

group(n: int, ngroups: int, replace: bool = True)[source]#

Generate random numbers from a group distribution

norm(n: int, mean: float = 0, std: float = 1)[source]#

Generate random numbers from a normal distribution

poisson(n: int, lamb: float)[source]#

Generate random numbers from a Poisson distribution

uniform(n: int, min: float = 0, max: float = 1)[source]#

Generate random numbers from a uniform distribution

pymer4.tidystats.plutils.expand_grid(*args, column_names=None)[source]#

Expand a list of lists into a dataframe of all combinations

pymer4.tidystats.plutils.join_on_common_cols(df1, df2)[source]#

Join two polars DataFrames on common columns

pymer4.tidystats.plutils.make_factors(df, factors_and_levels: str | dict | list, return_factor_dict: bool = False)[source]#

Convert specified polars columns to categorical types ‘enums’ which are correctly converted to R factors

Parameters:
  • df (DataFrame) – The DataFrame to convert

  • factors_and_levels (str | dict | list) – The column(s) to convert to factors and their levels

  • return_factor_dict (bool, optional) – Whether to return the factor dictionary. Defaults to False.

Returns:

The DataFrame with the specified columns converted to factors

Return type:

DataFrame

pymer4.tidystats.plutils.unmake_factors(df, factors: dict | None)[source]#

Convert specified polars columns from categorical types ‘enums’ to float types

Parameters:
  • df (DataFrame) – The DataFrame to convert

  • factors (dict | None, optional) – The factor dictionary to use for conversion. Defaults to None.

Returns:

The DataFrame with the specified columns converted from factors to their original types

Return type:

DataFrame

class pymer4.tidystats.plutils.RandomExpr(expr: Expr)[source]#

Polars expression namespace for random number generation accessilble as .random.* as part of a polars expression.

Examples

>>> import polars as pl
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>> # Create a DataFrame with random numbers from different distributions
>>> example = pl.DataFrame().with_columns(
>>>     # Normal distribution
>>>     pl.col('*').random.norm(1000, .5, .1).alias('x'),
>>>
>>>     # Binomial distribution
>>>     pl.col('*').random.binom(1000, 1, .5).alias('y'),
>>>
>>>     # Random selection from a group of 3
>>>     pl.col('*').random.group(1000, 3).alias('group'),
>>>
>>>     # Repeated value
>>>     pl.repeat('site_1', 1000).alias('dataset')
>>> )
>>> example.head()
beta(n: int, _alpha: float, _beta: float)[source]#

Generate random numbers from a beta distribution

binom(n: int, size: int, prob: float)[source]#

Generate random numbers from a binomial distribution

chisq(n: int, df: float)[source]#

Generate random numbers from a chi-squared distribution

gamma(n: int, shape: float, scale: float = 1)[source]#

Generate random numbers from a gamma distribution

group(n: int, ngroups: int, replace: bool = True)[source]#

Generate random numbers from a group distribution

norm(n: int, mean: float = 0, std: float = 1)[source]#

Generate random numbers from a normal distribution

poisson(n: int, lamb: float)[source]#

Generate random numbers from a Poisson distribution

uniform(n: int, min: float = 0, max: float = 1)[source]#

Generate random numbers from a uniform distribution

tables#

Functions to generate great tables formatted summary tables for models

pymer4.tidystats.tables.summary_glmm_table(model, show_odds=False, decimals=2)[source]#

Create a summary table for a model.

pymer4.tidystats.tables.summary_lm_table(model, decimals=2)[source]#

Create a summary table for a model.

pymer4.tidystats.tables.summary_lmm_table(model, decimals=2)[source]#

Create a summary table for a model.