Adding more R libraries#
This guide shows you how to add new functionality to pymer4 from additional R libaries. Feel free to open a pull-request so we can integrate your changes!
Step 1#
Use the R Documentation Search to find the functionality you want, taking care to note what library it comes from.
For example let’s add the tidy() function from the broom library
Then verify this package is available in the conda-forge repository. Package names are prefixed with r- and we verify that r-broom exists!
Let’s add it as a new dependency to pymer4:
pixi add r-broom
Step 2#
Then we can use importr from rpy2 to load the package in Python and inspect what the Python converted name of the that function is. The easiest way to do this is to just use your code editor try to auto-complete after typing a . to see what functions are available (e.g. pressing <tab>)
Doing so we can see that broom has a .tidy_lm() function. Let’s check it out
from rpy2.robjects.packages import importr
broom = importr("broom")
help(broom.tidy_lm)
Help on DocumentedSTFunction in module rpy2.robjects.functions:
<rpy2.robjects.functions.DocumentedSTFunction object at 0x117a77790> [3]
R classes: ('function',)
Wrapper around an R function.
The docstring below is built from the R documentation.
description
-----------
Tidy summarizes information about the components of a model.
A model component might be a single term in a regression, a single
hypothesis, a cluster, or a class. Exactly what tidy considers to be a
model component varies across models but is usually self-evident.
If a model has several distinct types of components, you will need to
specify which components to return.
tidy.lm(
x,
conf_int = False,
conf_level = 0.95,
exponentiate = False,
___ = (was "..."). R ellipsis (any number of parameters),
)
Args:
x : An ‘lm’ object created by ‘stats::lm()’.
conf.int : Logical indicating whether or not to include a confidence
interval in the tidied output. Defaults to ‘FALSE’.
conf.level : The confidence level to use for the confidence interval if
‘conf.int = TRUE’. Must be strictly greater than 0 and less
than 1. Defaults to 0.95, which corresponds to a 95 percent
confidence interval.
exponentiate : Logical indicating whether or not to exponentiate the the
coefficient estimates. This is typical for logistic and
multinomial regressions, but a bad idea if there is no log or
logit link. Defaults to ‘FALSE’.
... : Additional arguments. Not used. Needed to match generic
signature only. *Cautionary note:* Misspelled arguments will
be absorbed in ‘...’, where they will be ignored. If the
misspelled argument has a default value, the default value
will be used. For example, if you pass ‘conf.lvel = 0.9’, all
computation will proceed using ‘conf.level = 0.95’. Two
exceptions here are:
details
-------
If the linear model is an mlm object (multiple linear model),
there is an additional column response . See tidy.mlm() .
Step 3#
Let’s try out the function to determine its input and output types. We recommend doing this by trying to build against existing functions in the pymer4.tidystats module. This is because they already intelligently handle converting between R and Python data-types
For example, we’ll if use the lm() function already implemented in tidystats to create a model, it will automatically convert a Python DataFrame to an R DataFrame saving us the trouble.
Since the broom.tidy_lm() function expects a model as input, let’s try it with lm()
import pymer4.tidystats as ts
import polars as pl
df = pl.DataFrame({"x": [1, 2, 3, 4, 5], "y": [10, 30, 20, 50, 40]})
model = ts.lm("y ~ x", data=df)
tidy_summary = broom.tidy_lm(model)
tidy_summary
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |
Ah it looks like an R DataFrame but we want a Python polars DataFrame.
pymer4 offers several functions for automatically figuring out how to do this conversion for you.
You can check them out in the tidystats.bridge module.
A very handy one is R2polars()
ts.R2polars(tidy_summary)
| term | estimate | std_error | statistic | p_value |
|---|---|---|---|---|
| str | f64 | f64 | f64 | f64 |
| "(Intercept)" | 6.0 | 11.489125 | 0.522233 | 0.637618 |
| "x" | 8.0 | 3.464102 | 2.309401 | 0.104088 |
Step 4#
That looks great! To finish up we can wrap this in a new function. In-fact the bridge module offers a special function decorator ensure_py_output that will automatically ensure the output of any new function you write is automatically converted to Python types, without you having to write things like R2polars().
# In broom.py
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from pymer4.tidystats.bridge import ensure_py_output
# Import library
broom = importr("broom")
# Will make sure the output is a polars DataFrame
@ensure_py_output
def tidy(model):
return broom.tidy_lm(model)
If we try our function out with the same model as before we get back a nicely useable polars DataFrame, with all the calculations happening in R!
tidy(model)
| term | estimate | std_error | statistic | p_value |
|---|---|---|---|---|
| str | f64 | f64 | f64 | f64 |
| "(Intercept)" | 6.0 | 11.489125 | 0.522233 | 0.637618 |
| "x" | 8.0 | 3.464102 | 2.309401 | 0.104088 |
Step 5#
After adding a test by following the Contribution Guide, you can open a pull-request on Github for review!
For more complicated functions or for automatically handling different types of models (e.g. lm and lmer) check-out how the various functions in the tidystats.multimodel module are written.