pymer4.tidystats#
The tidystats module contains all the functions that support the features of pymer4’s models. Moreover, tidystats can be used as a functional alternative to the object-oriented-approach of pymer4.models
import pymer4.tidystats as ts
from pymer4 import load_dataset
df = load_dataset('sleep')
model = ts.lm('Reaction ~ Days', data=df)
# Like calling coef() in R
ts.coef(model)
# Like calling tidy() in R
ts.tidy(model)
base#
Wraps functionality from the base R library
broom#
Wraps functionality from the broom and broom.mixed libraries
- pymer4.tidystats.broom.augment(model, /, **kwargs)[source]#
Add information as observations to dataset. Uses broom.mixed:::augment.merMod for linear-mixed-models, broom::augment.lm for linear models, and broom::augment.glm for generalized linear models.
- pymer4.tidystats.broom.glance(model, /, **kwargs)[source]#
Report information about the entire model. Uses broom.mixed:::glance.merMod for linear-mixed-models, broom::glance.lm for linear models, and broom::glance.glm for generalized linear models.
- pymer4.tidystats.broom.tidy(model, **kwargs)[source]#
Summarize information about model components. Uses broom.mixed::tidy.merMod for linear-mixed-models, broom::tidy.lm for linear models, and broom::tidy.glm for generalized linear models.
easystats#
Wraps functionality from various sub-libraries in the easystats ecoysystem
- pymer4.tidystats.easystats.bootstrap_model(r_model, nboot=1000, parallel='multicore', n_cpus=4, **kwargs)[source]#
Generate bootstrap samples for model fixed effects coefficients using the implementation in parameters::bootstrap_model
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
nboot (int, optional) – Number of bootstrap samples. Defaults to 1000.
parallel (str, optional) – Parallelization method. Defaults to “snow”.
n_cpus (int, optional) – Number of CPUs to use. Defaults to 4.
- pymer4.tidystats.easystats.get_fixed_params(r_model)[source]#
Get the fixed-effects parameters for a model using the implementation in insight::get_parameters
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
- pymer4.tidystats.easystats.get_param_names(r_model)[source]#
Get the parameter names for a model using the implementation in insight::find_parameters
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
- Returns:
Fixed and random parameter names
- Return type:
tuple
- pymer4.tidystats.easystats.is_converged(r_model)[source]#
Check if a model is converged using the implementation in insight::is_converged
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
- Returns:
Whether the model converged and the convergence message
- Return type:
tuple
- pymer4.tidystats.easystats.is_mixed_model(r_model)[source]#
Check if a model is a mixed model using the implementation in insight::is_mixed_model
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
- pymer4.tidystats.easystats.model_icc(r_model, by_group=True, **kwargs)[source]#
Calculate the intraclass correlation coefficient (ICC) for a model using the implementation in performance::icc
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
by_group (bool, optional) – Whether to calculate the ICC for each group. Defaults to True.
- Returns:
Table of ICCs
- Return type:
DataFrame
- pymer4.tidystats.easystats.model_params(r_model, **kwargs)[source]#
Get model parameters using the implementation in parameters::model_parameters and standardize names using the implementation in insight::standardize_names
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
effects (str, optional) – Whether to include fixed or random effects. Defaults to “fixed”.
exponentiate (bool, optional) – Whether to exponentiate the parameters. Defaults to False.
- pymer4.tidystats.easystats.model_performance(r_model, **kwargs)[source]#
Calculate model performance using the implementation in performance::model_performance
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
- pymer4.tidystats.easystats.model_performance_cv(r_model, method='k_fold', stack=False, **kwargs)[source]#
Calculate cross-validated model performance using the implementation in performance::performance_cv
- Parameters:
r_model (R model) – lm, glm, lmer, or glmer model
method (str, optional) – Method for cross-validation. Defaults to “k_fold”.
stack (bool, optional) – Whether to stack the results. Defaults to False.
emmeans_lib#
Wraps functionality from the emmeans library
- pymer4.tidystats.emmeans_lib.emmeans(model, specs, contrasts: str | dict | None = None, **kwargs)[source]#
This function combines functionality from emmeans::emmeans and emmeans::contrast, by first generating a grid and then optionally computing contrasts over it if
contrastsis not None.- Parameters:
model (R model) – lm, glm, lmer model
specs (str) – name of predictor
by (str/list) – additional predictors to subset by
contrasts (str | 'pairwise' | 'poly' | dict | None, optional) – how to specify comparisonwithin specs. Defaults to None.
- Returns:
Table of marginal effects and/or means
- Return type:
DataFrame
- pymer4.tidystats.emmeans_lib.emtrends(model, contrasts: str | dict | None = None, **kwargs)[source]#
This function combines functionality from emmeans::emtrends and emmeans::contrast, by first generating a grid and then optionally computing contrasts over it if
contrastsis not None.- Parameters:
model (R model) – lm, glm, lmer model
specs (str) – name of predictor
by (str/list) – additional predictors to subset by
contrasts (str | 'pairwise' | 'poly' | dict | None, optional) – how to specify comparisonwithin specs. Defaults to None.
- Returns:
Table of marginal effects and/or means
- Return type:
DataFrame
- pymer4.tidystats.emmeans_lib.joint_tests(model, **kwargs)[source]#
Compute ANOVA-style F-tests using emmeans::joint_tests
- Parameters:
model (R model) – lm, glm, lmer model
- Returns:
F-statistics table of main effects/interactions
- Return type:
DataFrame
- pymer4.tidystats.emmeans_lib.ref_grid(model, *args, **kwargs)[source]#
Create a reference grid of model predictions. Uses emmeans::ref_grid and emmeans::summary_emmGrid.
lmerTest#
Wraps functionality from the lme4 and lmerTest libraries
- pymer4.tidystats.lmerTest.bootMer(model, nsim=1000, parallel='multicore', ncpus=4, conf_level=0.95, conf_method='perc', exponentiate=False, save_boots=True, **kwargs)[source]#
Bootstrap model parameters using bootMer Extracts fixed effects using
fixef()and random-effects usingbroom.mixed::tidy()- Parameters:
model (R model) – lmer or glmer model
nsim (int, optional) – Number of bootstrap samples. Defaults to 1000.
parallel (str, optional) – Parallelization method. Defaults to “multicore”.
ncpus (int, optional) – Number of cores to use. Defaults to 4.
conf_level (float, optional) – Confidence level. Defaults to 0.95.
conf_method (str, optional) – Confidence interval method. Defaults to “perc”.
exponentiate (bool, optional) – Whether to exponentiate the results. Defaults to False.
- pymer4.tidystats.lmerTest.fixef(model, *args, **kwargs) ndarray[source]#
Extract model fixed-effects using fixef
- pymer4.tidystats.lmerTest.glmer(*args, **kwargs)[source]#
Fit a generalized linear-mixed-model using glmer
- Parameters:
formula (str) – model formula
family (str) – glm family
data (pl.DataFrame) – polars dataframe
- Returns:
R model object
- Return type:
model (R RS4)
- pymer4.tidystats.lmerTest.is_singular(model)[source]#
Check if a model is singular using the implementation in lmerTest
- Parameters:
model (R model) – lmer or glmer model
multimodel#
Functions that intelligently switch their functionality based on whether they received and lm, glm, lmer or glmer model as input, mimicking “function overloading” in R
- pymer4.tidystats.multimodel.boot(data, model, formula, R, family=None, link=None, conf_method='perc', conf_level=0.95, return_boots=False, **kwargs)[source]#
NOTE: Experimental - may not reliably handle
glmmodels. Currently unused. Generate bootstrapped confidence intervals for a model using boot::boot and broom::tidy.boot or for lme4 model using lme4::confint.merMod- Parameters:
data (DataFrame) – polars DataFrame to resample
model (R model) – model object
formula (str) – model formula
R (int) – number of bootstrap samples
family (str, optional) – family for glm models. Defaults to None.
conf_method (str, optional) – how to calculated intervalsl: “perc”, “bca”, “basic”, “norm”. Defaults to “perc”.
conf_level (float, optional) – _description_. Defaults to 0.95.
- Returns:
bootstrap results
- Return type:
summary (DataFrame)
- pymer4.tidystats.multimodel.coef(model, *args, **kwargs)[source]#
Extract coefficients from
lm,glm,lmerorglmermodels. Uses lme4::coef.merMod for linear-mixed-models and stats::coef.lm for linear models.- Parameters:
model (R ListVector or RS4) – R model
- Returns:
numpy array of coefficients (lm and glm) or polars DataFrame of BLUPs (lmer and glmer)
- Return type:
coefficients (ndarray or DataFrame)
- pymer4.tidystats.multimodel.confint(model, *args, as_df=True, **kwargs)[source]#
Confidence intervals including via bootstrapping using stats::confint or lme4::confint.merMod
- pymer4.tidystats.multimodel.predict(model, *args, **kwargs)[source]#
Generate predictions from a model given existing or new data. Uses lme4::predict.merMod for linear-mixed-models, stats::predict.lm for linear models, and stats::predict.glm for generalized linear models.
- Parameters:
model (R ListVector or RS4) – R model
- Returns:
numpy array of predictions same length as the model’s data or input data
- Return type:
predictions (ndarray)
- pymer4.tidystats.multimodel.simulate(model, *args, **kwargs)[source]#
Simulate a new dataset from a model. Uses lme4::simulate.merMod for linear-mixed-models, stats::simulate.lm for linear models, and stats::simulate.glm for general linear models.
- Parameters:
model (R ListVector or RS4) – R model
- Returns:
polars DataFrame with number of columns = nsim (1 by default)
- Return type:
dataset (DataFrame)
stats#
Wraps functionality from the stats library
- pymer4.tidystats.stats.anova(*args, **kwargs)[source]#
Compare one or more models using stats::anova.glm <https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/anova.glm>.
Can also calculate stats::anova from a fitted model, but prefer ts.joint_tests from emmeans to ensure balanced Type-III SS inferences
- pymer4.tidystats.stats.glm(*args, **kwargs)[source]#
Fit a generalized linear model using stats::glm
- Parameters:
formula (str) – model formula
family (str) – glm family
data (pl.DataFrame) – polars dataframe
- Returns:
R model object
- Return type:
model (R ListVector)
- pymer4.tidystats.stats.lm(*args, **kwargs)[source]#
Fit a linear-model using stats::lm
- Parameters:
formula (str) – model formula
data (pl.DataFrame) – polars dataframe
- Returns:
R model object
- Return type:
model (R ListVector)
- pymer4.tidystats.stats.model_matrix(model, unique=True)[source]#
Extract model design matrix
- Parameters:
model (R model) – lm, glm, lmer, or glmer model
unique (bool; optional) – return a dataframe the size of the model’s data; default False
tibble#
Wraps functionality from the tibble library
bridge#
This is a special module that helps with converting between R and Python datatypes. It’s particularly useful if you want to try to add additional features to pymer4
- pymer4.tidystats.bridge.R2numpy(rarr)[source]#
Local conversion of R array to numpy as recommended by rpy2
- pymer4.tidystats.bridge.R2polars(rdf)[source]#
Local conversion of R dataframe to polars as recommended by rpy2
- pymer4.tidystats.bridge.con2R(arr)[source]#
Convert human-readable contrasts into a form that R requires. Works like the make.contrasts() function from the gmodels package, in that it will auto-solve for the remaining orthogonal k-1 contrasts if fewer than k-1 contrasts are specified.
- Parameters:
arr (np.ndarray) – 1d or 2d numpy array with each row reflecting a unique contrast and each column a factor level
- Returns:
A 2d numpy array useable with the contrasts argument of R models
- pymer4.tidystats.bridge.convert_argkwarg_dataframe(arg)[source]#
Convert args/kwargs that are Python DataFrames to proper R type(s)
- pymer4.tidystats.bridge.convert_argkwarg_dict(arg)[source]#
Convert args/kwargs that are Python dicts to proper R type(s)
- pymer4.tidystats.bridge.convert_argkwarg_list(arg)[source]#
Convert args/kwargs that are Python lists to proper R type(s)
- pymer4.tidystats.bridge.convert_argkwarg_model(arg)[source]#
Convert arg/kwargs that are pymer4 model objects to access their r_model attribute
- pymer4.tidystats.bridge.convert_argkwarg_none(arg)[source]#
Convert args/kwargs that are Python None to proper R type(s)
- pymer4.tidystats.bridge.ensure_py_output(func)[source]#
Decorator that converts R outputs to Python equivalents. Currently this includes:
R FloatVector -> numpy array
R StrVector -> list
R dataframe/tibble -> polars dataframe
R ListVector of Dataframes -> list of polars dataframes
- pymer4.tidystats.bridge.ensure_r_input(func)[source]#
Decorator that converts function arguments that are Pyton types into corresponding R types. Currently this includes:
polars DataFrames
python lists
numpy arrays
python dictionaries
python None types
pymer4 model objects
- pymer4.tidystats.bridge.numpy2R(arr)[source]#
Local conversion of numpy array to R array as recommended by rpy2
- pymer4.tidystats.bridge.polars2R(df)[source]#
Local conversion of polars dataframe to R dataframe as recommended by rpy2
plutils#
Utility functions for working with polars dataframes
- class pymer4.tidystats.plutils.RandomExpr(expr: Expr)[source]#
Polars expression namespace for random number generation accessilble as
.random.*as part of a polars expression.Examples
>>> import polars as pl >>> import numpy as np >>> np.random.seed(0) >>> >>> # Create a DataFrame with random numbers from different distributions >>> example = pl.DataFrame().with_columns( >>> # Normal distribution >>> pl.col('*').random.norm(1000, .5, .1).alias('x'), >>> >>> # Binomial distribution >>> pl.col('*').random.binom(1000, 1, .5).alias('y'), >>> >>> # Random selection from a group of 3 >>> pl.col('*').random.group(1000, 3).alias('group'), >>> >>> # Repeated value >>> pl.repeat('site_1', 1000).alias('dataset') >>> ) >>> example.head()
- gamma(n: int, shape: float, scale: float = 1)[source]#
Generate random numbers from a gamma distribution
- group(n: int, ngroups: int, replace: bool = True)[source]#
Generate random numbers from a group distribution
- pymer4.tidystats.plutils.expand_grid(*args, column_names=None)[source]#
Expand a list of lists into a dataframe of all combinations
- pymer4.tidystats.plutils.join_on_common_cols(df1, df2)[source]#
Join two polars DataFrames on common columns
- pymer4.tidystats.plutils.make_factors(df, factors_and_levels: str | dict | list, return_factor_dict: bool = False)[source]#
Convert specified polars columns to categorical types ‘enums’ which are correctly converted to R factors
- Parameters:
df (DataFrame) – The DataFrame to convert
factors_and_levels (str | dict | list) – The column(s) to convert to factors and their levels
return_factor_dict (bool, optional) – Whether to return the factor dictionary. Defaults to False.
- Returns:
The DataFrame with the specified columns converted to factors
- Return type:
DataFrame
- pymer4.tidystats.plutils.unmake_factors(df, factors: dict | None)[source]#
Convert specified polars columns from categorical types ‘enums’ to float types
- Parameters:
df (DataFrame) – The DataFrame to convert
factors (dict | None, optional) – The factor dictionary to use for conversion. Defaults to None.
- Returns:
The DataFrame with the specified columns converted from factors to their original types
- Return type:
DataFrame
- class pymer4.tidystats.plutils.RandomExpr(expr: Expr)[source]#
Polars expression namespace for random number generation accessilble as
.random.*as part of a polars expression.Examples
>>> import polars as pl >>> import numpy as np >>> np.random.seed(0) >>> >>> # Create a DataFrame with random numbers from different distributions >>> example = pl.DataFrame().with_columns( >>> # Normal distribution >>> pl.col('*').random.norm(1000, .5, .1).alias('x'), >>> >>> # Binomial distribution >>> pl.col('*').random.binom(1000, 1, .5).alias('y'), >>> >>> # Random selection from a group of 3 >>> pl.col('*').random.group(1000, 3).alias('group'), >>> >>> # Repeated value >>> pl.repeat('site_1', 1000).alias('dataset') >>> ) >>> example.head()
- gamma(n: int, shape: float, scale: float = 1)[source]#
Generate random numbers from a gamma distribution
- group(n: int, ngroups: int, replace: bool = True)[source]#
Generate random numbers from a group distribution
tables#
Functions to generate great tables formatted summary tables for models
- pymer4.tidystats.tables.summary_glmm_table(model, show_odds=False, decimals=2)[source]#
Create a summary table for a model.