pymer4.models.skmer#

Scikit-learn compatible estimators#

skmer() models adhere to the scikit-learn API making them compatible with all model validation, estimation, and prediction tools. After being initialized with a formula, they should be used with the .fit() / .predict() API passing in numpy arrays of features (X) and observations (Y).

Note

Note Multilevel models currently support just 1 random-effect term and .fit() expects the feautures matrix X to include an extra column at the end with values for this term. .predict() can accept an X that with/without a feature column for the random-effects term. If present, random-effects will be used to make predictions otherwise predictions will be made using fixed-efffects only.

Linear Models#

from pymer4 import load_dataset
from pymer4.models import skmer
from sklearn.metrics import r2_score

# Prepare data sklearn style
penguins = load_dataset('penguins')
penguins = penguins.drop_nulls()

# Features
X = penguins[["bill_length_mm", "bill_depth_mm", "body_mass_g"]].to_numpy()

# Labels
y = penguins["flipper_length_mm"].to_numpy()

# Split up
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Linear Regression initialized with a formula
ols = skmer("flipper_length_mm ~ bill_length_mm + bill_depth_mm + body_mass_g")

# Fit & predict
ols.fit(X_train, y_train)
preds = ols.predict(X_test)

# Evaluate
r2_score(y_test, preds)

Multi-level models#

from pymer4 import load_dataset
from pymer4.models import skmer
from sklearn.metrics import r2_score
from sklearn.model_selection import LeaveOneGroupOut

# Prepare data sklearn style
penguins = load_dataset('penguins')
penguins = penguins.drop_nulls()

# We pass in the rfx column as the last column of X
X_with_group = penguins[["bill_length_mm", "species"]].to_numpy()
y = penguins["flipper_length_mm"].to_numpy()

# This is for the cross-validator to know how to split up the data
groups = pengins[['species']].to_numpy()

lmm = skmer("flipper_length_mm ~ bill_length_mm + (bill_length_mm | species)")

# Out-of-sample r2 per species
scores = cross_val_score(lmm, X_with_group, y, cv=LeaveOneGroupOut(), groups=group)

API#

class pymer4.models.skmer.skmer(formula, model_class='auto', family=None, link=None)[source]#

Scikit-learn compatible wrapper for pymer4 models.

This class provides a scikit-learn compatible interface to pymer4’s statistical models, allowing them to be used in scikit-learn pipelines, cross-validation, and other workflows.

Parameters:
  • formula (str) – R-style formula string (e.g., “y ~ x1 + x2”)

  • model_class – pymer4 model class (lm, glm, lmer, or glmer)

  • family (str, optional) – Distribution family for GLM models

  • link (str, optional) – Link function for GLM models

  • weights (array-like, optional) – Sample weights

  • **kwargs – Additional arguments passed to the model constructor

coef_#

Model coefficients

Type:

ndarray

coef_rfx_#

Model coefficients (random effects for mixed models)

Type:

ndarray

model_#

Fitted pymer4 model instance

n_features_in_#

Number of features seen during fit

Type:

int

feature_names_#

Names of features seen during fit

Type:

list

model_terms_#

Model terms object

Type:

ModelTerms

term_response_#

Response term object

Type:

Term

term_ffx_#

Fixed effects terms

Type:

list

term_rfx_#

Random effects terms

Type:

list

pymer4.models.skmer.skmer.fit(self, X, y, **kwargs)#

Fit the model to training data.

For mixed-effects models (lmer/glmer), the group variable should be passed as the last column of X.

Parameters:
  • X (array-like) – Feature matrix of shape (n_samples, n_features) For mixed models, last column should contain group labels

  • y (array-like) – Target values of shape (n_samples,)

Returns:

Returns the instance itself

Return type:

self

pymer4.models.skmer.skmer.predict(self, X)#

Generate predictions from the fitted model.

For mixed-effects models (lmer/glmer), the group variable should be passed as the last column of X. If it is ommitted, predictions will be made using only fixed-effects.

Parameters:

X (array-like) – Feature matrix of shape (n_samples, n_features) For mixed models, last column should contain group labels

Returns:

Predicted values of shape (n_samples,)

Return type:

ndarray