Help & FAQs#

If you run into technical issues feel free to raise an issue on github using the button at the top of this page.
However, please keep in mind that we are unable to provide general statistical guidance via github issues. We recommend consulting the various resources available regarding multi-level models and the underlying R packages that power pymer4.

We try to keep the FAQ below updated for common questions and workflows

Help my Kernel Crashes in Jupyter Notebooks/Lab!#

Sometimes using pymer4 interactively can cause the Python kernel to crash. This is more likely to happen if you have multiple interactive sessions running simulatenously. One way around this is to put this at the top of your notebook/code:

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

Or set the following environment variable prior to launching your interactive sessions:

export KMP_DUPLICATE_LIB_OK=TRUE

I’m confused about how to specify random-effects in LMMs/GLMMs#

# Random intercepts only
(1 | Group)

# Random slopes only
(0 + Variable | Group)

# Random intercepts and slopes (and their correlation)
(Variable | Group)

# Random intercepts and slopes (without their correlation)
(1 | Group) + (0 + Variable | Group)

# Same as above but will not separate factors (see: https://rdrr.io/cran/lme4/man/expandDoubleVerts.html)
(Variable || Group)

# Random intercept and slope for more than one variable (and their correlations)
(Variable_1 + Variable_2 | Group)

# Random intercepts, slopes, correlations for multiple random-effects
(Variable | Subject) + (Variable | Item)

I want traditional ANOVA results and post-hoc tests#

  • Before using .fit() use .set_factors('categorical_var') to tell the model what your categorical predictors are

  • Call .anova() which will automatically return correctly coded Type-III Sum-of-Squares omninbus \(F\) tests

  • You can explore “post-hoc” comparisons (marginal estiamtes and contrasts) using .emmeans() with contrasts='pairwise'

Hide code cell source
from pymer4.models import lm
from pymer4 import load_dataset

poker = load_dataset("poker")
model = lm('balance ~ hand * skill', data=poker)
model.set_factors(['hand', 'skill'])
model.anova(summary=True)
ANOVA (Type III tests)
model term df1 df2 F_ratio p_value
hand 2 294 79.169 <.001 ***
skill 1 294 2.434 0.1198
hand:skill 2 294 7.083 <.001 ***
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1

Compare levels of hand at each level of skill

model.emmeans('hand', by='skill', contrasts='pairwise')
shape: (6, 9)
contrastskillestimateSEdflower_CLupper_CLt_ratiop_value
catcatf64f64f64f64f64f64f64
"bad - good""average"-9.2110.804093294.0-11.141963-7.280037-11.4551480.0
"bad - neutral""average"-5.25720.804093294.0-7.188163-3.326237-6.5380538.2552e-10
"good - neutral""average"3.95380.804093294.02.0228375.8847634.9170950.000004
"bad - good""expert"-4.95880.804093294.0-6.889763-3.027837-6.1669516.8880e-9
"bad - neutral""expert"-3.5530.804093294.0-5.483963-1.622037-4.4186450.000042
"good - neutral""expert"1.40580.804093294.0-0.5251633.3367631.7483060.225

Compare levels of skill at each level of hand

model.emmeans('skill', by='hand', contrasts='pairwise')
shape: (3, 9)
contrasthandestimateSEdflower_CLupper_CLt_ratiop_value
catcatf64f64f64f64f64f64f64
"average - expert""bad"-2.70980.804093294.0-4.292307-1.127293-3.370010.000852
"average - expert""good"1.54240.804093294.0-0.0401073.1249071.9181870.056056
"average - expert""neutral"-1.00560.804093294.0-2.5881070.576907-1.2506020.212074

Perform pairwise comparisons between all combinations of hand and skill

# 15 comparisons just print the first 5
model.emmeans(['hand', 'skill'], contrasts='pairwise').head()
shape: (5, 8)
contrastestimateSEdflower_CLupper_CLt_ratiop_value
strf64f64f64f64f64f64f64
"bad average - good average"-9.2110.804093294.0-11.584528-6.837472-11.4551480.0
"bad average - neutral average"-5.25720.804093294.0-7.630728-2.883672-6.5380534.1276e-9
"bad average - bad expert"-2.70980.804093294.0-5.083328-0.336272-3.370010.012705
"bad average - good expert"-7.66860.804093294.0-10.042128-5.295072-9.5369610.0
"bad average - neutral expert"-6.26280.804093294.0-8.636328-3.889272-7.7886551.7553e-12

I want to perform an unequal-variances t-test#

In situations when group sizes are different you may want to account for unequal variances. You can do this using the weights argument to a model.

Hide code cell source
from pymer4.models import lm
from pymer4 import load_dataset
import seaborn as sns
import numpy as np
import polars as pl
from polars import col
from scipy.stats import ttest_ind

df = load_dataset("credit")

grid = sns.catplot(
    x="Student",
    y="Balance",
    kind="bar",
    hue="Student",
    order=["Yes", "No"],
    aspect=0.75,
    height=4,
    estimator=np.count_nonzero,
    data=df.to_pandas(),
)
grid.set_ylabels('Number of Observations');

We can do this in scipy using an unequal variance (“Welch’s”) t-test:

student = df.filter(col('Student') == 'Yes').select('Balance')
non_student = df.filter(col('Student') == 'No').select('Balance')

# Regular independent samples t-test
results = ttest_ind(student, non_student)

# equal_var = False is the important part
welch_results = ttest_ind(student, non_student, equal_var=False)

print(f"t-test = {results.statistic[0]:.3f}")
print(f"Welch's t-test = {welch_results.statistic[0]:.3f}")
t-test = 5.350
Welch's t-test = 4.903

In regression terms this is a weighted-least-squares (WLS) model, where the weights are the inverse of each group’s variance

# Create column called weights which is equal to 1 / var for each group
df = df.with_columns(
    pl.when(col("Student") == "No")
    .then(1 / non_student.var(ddof=1).item())
    .otherwise(1 / student.var(ddof=1).item())
    .alias('student_weights')
)

# Normal regression/t-test
ols = lm('Balance ~ Student', data=df)
ols.set_factors('Student')
ols.fit()

# Weighted regression/unequal var t-test
wls = lm('Balance ~ Student', weights='student_weights', data=df)
wls.set_factors('Student')
wls.fit()
print(f"OLS (t-test): {ols.result_fit[-1, 't_stat']:.3f}")
print(f"WLS (t-test): {wls.result_fit[-1, 't_stat']:.3f}")
OLS (t-test): 5.350
WLS (t-test): 4.903

I know the R code. How do I do it in pymer4?#

In general, you can use the make_rfunc() with a Python string as input to convert an R function to a Python function.
Here’s an example from a previous github issue requesting the output of VarCorr(mixed_model).
While this information is available in the .ranef_var attribute of a model, here’s how we can do it directly in R:

from pymer4 import make_rfunc, load_dataset
from pymer4.models import lmer

# Load dataset and fit the model
sleep = load_dataset("sleep")
model = lmer("Reaction ~ Days + (Days | Subject)", data=sleep)
model.fit()

# Make a function that returns the coefficients of a model
varcorr = make_rfunc("""
function(model) {
output <- VarCorr(model, condVar=TRUE)
return(output)
}
""")

# Use the function passing in a model's .r_model attribute
out = varcorr(model.r_model)

Using the function will return an R data-type

out
ListVector with 1 elements.
Subject [14]

But you can use any of the functions in the pymer4.tidystats.bridge module to convert them to Python types.

The most general purpose of these is to_dict()

import pymer4.tidystats as ts
ts.to_dict(out)
{'Subject': array([[612.10015806,   9.60440895],
        [  9.60440895,  35.07171445]])}

Since out is just a variance-covariance matrix we can use R2numpy() instead

ts.R2numpy(out)
array([[[612.10015806,   9.60440895],
        [  9.60440895,  35.07171445]]])