pymer4.models.skmer#
Scikit-learn compatible estimators#
skmer() models adhere to the scikit-learn API making them compatible with all model validation, estimation, and prediction tools. After being initialized with a formula, they should be used with the .fit() / .predict() API passing in numpy arrays of features (X) and observations (Y).
Note
Note Multilevel models currently support just 1 random-effect term and .fit() expects the feautures matrix X to include an extra column at the end with values for this term. .predict() can accept an X that with/without a feature column for the random-effects term. If present, random-effects will be used to make predictions otherwise predictions will be made using fixed-efffects only.
Linear Models#
from pymer4 import load_dataset
from pymer4.models import skmer
from sklearn.metrics import r2_score
# Prepare data sklearn style
penguins = load_dataset('penguins')
penguins = penguins.drop_nulls()
# Features
X = penguins[["bill_length_mm", "bill_depth_mm", "body_mass_g"]].to_numpy()
# Labels
y = penguins["flipper_length_mm"].to_numpy()
# Split up
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Linear Regression initialized with a formula
ols = skmer("flipper_length_mm ~ bill_length_mm + bill_depth_mm + body_mass_g")
# Fit & predict
ols.fit(X_train, y_train)
preds = ols.predict(X_test)
# Evaluate
r2_score(y_test, preds)
Multi-level models#
from pymer4 import load_dataset
from pymer4.models import skmer
from sklearn.metrics import r2_score
from sklearn.model_selection import LeaveOneGroupOut
# Prepare data sklearn style
penguins = load_dataset('penguins')
penguins = penguins.drop_nulls()
# We pass in the rfx column as the last column of X
X_with_group = penguins[["bill_length_mm", "species"]].to_numpy()
y = penguins["flipper_length_mm"].to_numpy()
# This is for the cross-validator to know how to split up the data
groups = pengins[['species']].to_numpy()
lmm = skmer("flipper_length_mm ~ bill_length_mm + (bill_length_mm | species)")
# Out-of-sample r2 per species
scores = cross_val_score(lmm, X_with_group, y, cv=LeaveOneGroupOut(), groups=group)
API#
- class pymer4.models.skmer.skmer(formula, model_class='auto', family=None, link=None)[source]#
Scikit-learn compatible wrapper for pymer4 models.
This class provides a scikit-learn compatible interface to pymer4’s statistical models, allowing them to be used in scikit-learn pipelines, cross-validation, and other workflows.
- Parameters:
formula (str) – R-style formula string (e.g., “y ~ x1 + x2”)
model_class – pymer4 model class (lm, glm, lmer, or glmer)
family (str, optional) – Distribution family for GLM models
link (str, optional) – Link function for GLM models
weights (array-like, optional) – Sample weights
**kwargs – Additional arguments passed to the model constructor
- coef_#
Model coefficients
- Type:
ndarray
- coef_rfx_#
Model coefficients (random effects for mixed models)
- Type:
ndarray
- model_#
Fitted pymer4 model instance
- n_features_in_#
Number of features seen during fit
- Type:
int
- feature_names_#
Names of features seen during fit
- Type:
list
- model_terms_#
Model terms object
- Type:
ModelTerms
- term_response_#
Response term object
- Type:
Term
- term_ffx_#
Fixed effects terms
- Type:
list
- term_rfx_#
Random effects terms
- Type:
list
- pymer4.models.skmer.skmer.fit(self, X, y, **kwargs)#
Fit the model to training data.
For mixed-effects models (lmer/glmer), the group variable should be passed as the last column of X.
- Parameters:
X (array-like) – Feature matrix of shape (n_samples, n_features) For mixed models, last column should contain group labels
y (array-like) – Target values of shape (n_samples,)
- Returns:
Returns the instance itself
- Return type:
self
- pymer4.models.skmer.skmer.predict(self, X)#
Generate predictions from the fitted model.
For mixed-effects models (lmer/glmer), the group variable should be passed as the last column of X. If it is ommitted, predictions will be made using only fixed-effects.
- Parameters:
X (array-like) – Feature matrix of shape (n_samples, n_features) For mixed models, last column should contain group labels
- Returns:
Predicted values of shape (n_samples,)
- Return type:
ndarray