Adding more R libraries

Adding more R libraries#

This guide shows you how to add new functionality to pymer4 from additional R libaries. Feel free to open a pull-request so we can integrate your changes!

Step 1#

Use the R Documentation Search to find the functionality you want, taking care to note what library it comes from.

For example let’s add the tidy() function from the broom library

Then verify this package is available in the conda-forge repository. Package names are prefixed with r- and we verify that r-broom exists!

Let’s add it as a new dependency to pymer4:

pixi add r-broom

Step 2#

Then we can use importr from rpy2 to load the package in Python and inspect what the Python converted name of the that function is. The easiest way to do this is to just use your code editor try to auto-complete after typing a . to see what functions are available (e.g. pressing <tab>)

Doing so we can see that broom has a .tidy_lm() function. Let’s check it out

from rpy2.robjects.packages import importr

broom = importr("broom")

help(broom.tidy_lm)
Help on DocumentedSTFunction in module rpy2.robjects.functions:

<rpy2.robjects.functions.DocumentedSTFunction object at 0x117a77790> [3]
R classes: ('function',)
    Wrapper around an R function.

    The docstring below is built from the R documentation.

    description
    -----------


     Tidy summarizes information about the components of a model.
     A model component might be a single term in a regression, a single
     hypothesis, a cluster, or a class. Exactly what tidy considers to be a
     model component varies across models but is usually self-evident.
     If a model has several distinct types of components, you will need to
     specify which components to return.



    tidy.lm(
        x,
        conf_int = False,
        conf_level = 0.95,
        exponentiate = False,
        ___ = (was "..."). R ellipsis (any number of parameters),
    )

    Args:
       x :  An ‘lm’ object created by ‘stats::lm()’.

       conf.int :  Logical indicating whether or not to include a confidence
      interval in the tidied output. Defaults to ‘FALSE’.

       conf.level :  The confidence level to use for the confidence interval if
      ‘conf.int = TRUE’. Must be strictly greater than 0 and less
      than 1. Defaults to 0.95, which corresponds to a 95 percent
      confidence interval.

       exponentiate :  Logical indicating whether or not to exponentiate the the
      coefficient estimates. This is typical for logistic and
      multinomial regressions, but a bad idea if there is no log or
      logit link. Defaults to ‘FALSE’.

       ... :  Additional arguments. Not used. Needed to match generic
      signature only. *Cautionary note:* Misspelled arguments will
      be absorbed in ‘...’, where they will be ignored. If the
      misspelled argument has a default value, the default value
      will be used. For example, if you pass ‘conf.lvel = 0.9’, all
      computation will proceed using ‘conf.level = 0.95’. Two
      exceptions here are:

    details
    -------


     If the linear model is an  mlm  object (multiple linear model),
     there is an additional column  response . See  tidy.mlm() .

Step 3#

Let’s try out the function to determine its input and output types. We recommend doing this by trying to build against existing functions in the pymer4.tidystats module. This is because they already intelligently handle converting between R and Python data-types

For example, we’ll if use the lm() function already implemented in tidystats to create a model, it will automatically convert a Python DataFrame to an R DataFrame saving us the trouble.

Since the broom.tidy_lm() function expects a model as input, let’s try it with lm()

import pymer4.tidystats as ts
import polars as pl

df = pl.DataFrame({"x": [1, 2, 3, 4, 5], "y": [10, 30, 20, 50, 40]})

model = ts.lm("y ~ x", data=df)

tidy_summary = broom.tidy_lm(model)
tidy_summary
R/rpy2 DataFrame (2 x 5)
term estimate std.error statistic p.value
... ... ... ... ...

Ah it looks like an R DataFrame but we want a Python polars DataFrame.
pymer4 offers several functions for automatically figuring out how to do this conversion for you.
You can check them out in the tidystats.bridge module.

A very handy one is R2polars()

ts.R2polars(tidy_summary)
shape: (2, 5)
termestimatestd_errorstatisticp_value
strf64f64f64f64
"(Intercept)"6.011.4891250.5222330.637618
"x"8.03.4641022.3094010.104088

Step 4#

That looks great! To finish up we can wrap this in a new function. In-fact the bridge module offers a special function decorator ensure_py_output that will automatically ensure the output of any new function you write is automatically converted to Python types, without you having to write things like R2polars().

# In broom.py
from rpy2.robjects.packages import importr
import rpy2.robjects as ro
from pymer4.tidystats.bridge import ensure_py_output

# Import library
broom = importr("broom")

# Will make sure the output is a polars DataFrame
@ensure_py_output  
def tidy(model):
    return broom.tidy_lm(model)

If we try our function out with the same model as before we get back a nicely useable polars DataFrame, with all the calculations happening in R!

tidy(model)
shape: (2, 5)
termestimatestd_errorstatisticp_value
strf64f64f64f64
"(Intercept)"6.011.4891250.5222330.637618
"x"8.03.4641022.3094010.104088

Step 5#

After adding a test by following the Contribution Guide, you can open a pull-request on Github for review!

For more complicated functions or for automatically handling different types of models (e.g. lm and lmer) check-out how the various functions in the tidystats.multimodel module are written.