Expose df in Rubin pooling + support varying d.f.#562
Open
munoztd0 wants to merge 1 commit into
Open
Conversation
Author
|
In accord to our previous conservation #560 with @danielinteractive and @luwidmer |
Author
Collaborator
|
Hi @luwidmer , what are your thoughts on this one? 😃 |
Collaborator
|
Thank you for flagging this again @danielinteractive, I was out of office. Will take a look over the next days @munoztd0 |
|
It would be good to see a reference for the statement "median fallback is the standard pragmatic choice". Such a decision would need to be thoroughly documented and also highlighted in the methods vignette. |
luwidmer
requested changes
Jun 24, 2026
luwidmer
left a comment
Collaborator
There was a problem hiding this comment.
- I agree with @tobiasmuetze here RE the statement "median fallback is the standard pragmatic choice". I would also like to see references for this added to the documentation.
In addition:
- The old code threw a clear error when
dfsvaried. Now it silently proceeds withmedian(dfs). This can silently introduce unexpected behavior in case a user relied on this error. From a software engineering standpoint I don't think this is desirable. - This PR introduces test failures, which would need to be addressed.
- If the new median
dfis indeed desirable, the behavior there should have tests as well. pool_internal.jackknife(),pool_internal.bootstrap(), andpool_internal.bmlmi()all return the parametric_ci() list without$df. Onlypool_internal.rubin()now appends it. This inconsistency means downstream code cannot reliably access$dfwithout first checking which method was used. Ifdfshould be exposed, one should consider to do this as consistently as possible (and/oras_data_frame_internal()should be updated to include it).as.data.frame.pool()won't surface the newdf. Theas_data_frame_internal()function extractsest,se,ci,pvaluebut notdf. If the goal is to exposedfto downstream callers, it should appear in the data frame representation too, which is the primary user-facing output.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pool_internal.rubin() now appends df to each per-parameter result, making it accessible to downstream callers without re-computing it.
The strict requirement that all per-imputation degrees of freedom be identical is replaced with a median fallback: when d.f. are constant the behaviour is unchanged; when they vary (as occurs with MMRM analysis functions, where each imputed dataset yields slightly different residual d.f.) the median is used as v_com in Rubin's rules rather than throwing an error.
Both changes should low-risk: the constant-d.f. path is unaffected, and the median fallback is the standard pragmatic choice