feat: Experimental transpilation of unannotated python callables by TrevorBergeron · Pull Request #17419 · googleapis/google-cloud-python

TrevorBergeron · 2026-06-11T00:14:33Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

gemini-code-assist

Code Review

This pull request introduces an experimental Python transpiler feature, allowing standard Python callables to be transpiled and executed within DataFrame.map, DataFrame.apply, Series.apply, and Series.combine. It replaces the func_to_op helper with func_to_expr, which generates a new CallableExpression to manage argument binding. Feedback on the changes highlights critical validation gaps in both DataFrame.apply and CallableExpression.apply where missing checks for unexpected keyword arguments, duplicate arguments, or extra positional arguments could lead to silent correctness bugs or cryptic errors.

tswast · 2026-06-16T19:21:43Z

System test looks like it might indicate a real error, but probably unrelated to this change:

___________________ test_read_parquet_gcs[bigquery_wildcard] ___________________
[gw19] linux -- Python 3.12.12 /tmpfs/src/github/google-cloud-python/packages/bigframes/.nox/system/bin/python

session = 
scalars_dfs = (          bool_col                                          bytes_col  \
rowindex                                    ...7             True  ...  0 days 00:00:00.000004
8            False  ...         5 days 00:00:00

[9 rows x 14 columns])
gcs_folder = 'gs://bigframes-dev-testing/bigframes_tests_system_20260616000152_914011/'
engine = 'bigquery', filename = '*.parquet'

    @pytest.mark.parametrize(
        ("engine", "filename"),
        (
            pytest.param(
                "auto",
                "000000000000.parquet",
                id="auto",
            ),
            pytest.param(
                "pyarrow",
                "000000000000.parquet",
                id="pyarrow",
            ),
            pytest.param(
                "bigquery",
                "000000000000.parquet",
                id="bigquery",
            ),
            pytest.param(
                "bigquery",
                "*.parquet",
                id="bigquery_wildcard",
            ),
            pytest.param(
                "auto",
                "*.parquet",
                id="auto_wildcard",
                marks=pytest.mark.xfail(
                    raises=ValueError,
                ),
            ),
        ),
    )
    def test_read_parquet_gcs(
        session: bigframes.Session, scalars_dfs, gcs_folder, engine, filename
    ):
        scalars_df, _ = scalars_dfs
        # Include wildcard so that multiple files can be written/read if > 1 GB.
        # https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files
        write_path = gcs_folder + test_read_parquet_gcs.__name__ + "*.parquet"
        read_path = gcs_folder + test_read_parquet_gcs.__name__ + filename
    
        df_in: bigframes.dataframe.DataFrame = scalars_df.copy()
        # GEOGRAPHY not supported in parquet export.
        df_in = df_in.drop(columns="geography_col")
        # Make sure we can also serialize the order.
        df_write = df_in.reset_index(drop=False)
        df_write.index.name = f"ordering_id_{random.randrange(1_000_000)}"
        df_write.to_parquet(write_path, index=True)
    
        df_out = (
            session.read_parquet(read_path, engine=engine)
            # Restore order.
>           .set_index(df_write.index.name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            .sort_index()
            # Restore index.
            .set_index(typing.cast(str, df_in.index.name))
        )

tests/system/small/test_session.py:1916: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
bigframes/core/logging/log_adapter.py:183: in wrapper
    return method(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self =     ordering_id_430420  rowindex  bool_col  \
0                          4     False   
2                     ...                  432000000000  
10                                  432000000000  
...

[36 rows x 15 columns]
keys = ('ordering_id_463089',), append = False, drop = True

    def set_index(
        self,
        keys: typing.Union[blocks.Label, typing.Sequence[blocks.Label]],
        append: bool = False,
        drop: bool = True,
    ) -> DataFrame:
        if not utils.is_list_like(keys):
            keys = typing.cast(typing.Sequence[blocks.Label], (keys,))
        else:
            keys = typing.cast(typing.Sequence[blocks.Label], tuple(keys))
        col_ids = [self._resolve_label_exact(key) for key in keys]
        missing = [keys[i] for i in range(len(col_ids)) if col_ids[i] is None]
        if len(missing) > 0:
>           raise KeyError(f"None of {missing} are in the columns")
E           KeyError: "None of ['ordering_id_463089'] are in the columns"

bigframes/dataframe.py:2419: KeyError

CC oncall @sycai for visibility

tswast

Very cool! A few comments.

tswast · 2026-06-16T19:23:11Z

+                "Python transpiler is an unstable, experimental feature, and not yet fully "
+                "validated, use at your own risk."
+            )
+            warnings.warn(msg, category=bfe.PreviewWarning)


Nit: I like to make custom exceptions the subclass from PreviewWarning for more explicit opt-in, but probably overkill in retrospect.

tswast · 2026-06-16T19:24:04Z

+
+    name: str
+    default_value: typing.Any
+    is_varargs: bool


Do we want to track keyword-only and/or kwargs dictionary separately? Or maybe that's not really inferrable from the Python AST?

tswast · 2026-06-16T19:27:43Z

+                )
+            )
+
+        from bigframes.core.bytecode import dis_to_expr


"dis" is hard for me to understand without context. Could we use a more descriptive name?

renamed to py_to_expression

tswast · 2026-06-16T19:29:26Z

+        bindings: typing.Mapping[ids.ColumnId, ex.Expression],
+        allow_partial_bindings: bool = False,
+    ) -> CallableExpression:
+        return dataclasses.replace(


These bind and transform functions are pretty similar to the other implementations, right? Maybe we can do some sort of mixin class to implement these?

IDK about specific impl (mixin or otherwise), but yeah, not that have expanded from 3 expression subclasses (op, literal, reference) to several more, things are getting redundant.

The logic is actually a bit custom per expression type, but it could maybe just be derived from dataclass field metadata, as most of it amounts to descending through fields (known statically) that subclass expression and returning/replacing them.

Created internal issue 525095335 to track a refactor here.

tswast · 2026-06-16T19:32:21Z

+                for col in self.columns:
+                    if col in expr.free_variables:
+                        col_id = block.resolve_label_exact(col)


Would we expect any trouble from mixing the high-level (dataframe columns) representation and lower level (block column labels) representation?

Also, this looks relatively familiar, such as in our bbq.sql_scalar implementation. Perhaps there's some shared utilities we can refactor for this column mapping logic?

The idea here is to cross that label to id boundary. The python function references series attributes which are labels, but we want to map those to unambiguous block ids.

The implementation here is a bit messy though, refactored it in new revision as apply_to_block_rows

tswast · 2026-06-16T19:33:42Z

@@ -4692,13 +4692,17 @@ def _prepare_export(
        return array_value, id_overrides

    def map(self, func, na_action: Optional[str] = None) -> DataFrame:


I'd love to see an example in the docstrings, especially if this is compatible with the polars engine and thus would make for a relatively speedy flakeless doctest.

Added doctests

feat: Experimental transpilation of unannotated python callables

8287f97

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread packages/bigframes/bigframes/dataframe.py Outdated

Comment thread packages/bigframes/bigframes/operations/to_op.py Outdated

parthea assigned TrevorBergeron Jun 11, 2026

TrevorBergeron added 3 commits June 15, 2026 23:32

refactor, more tests

b442f18

ruff

573f0a2

fix test for python versions

9271e12

TrevorBergeron requested a review from tswast June 16, 2026 00:22

TrevorBergeron marked this pull request as ready for review June 16, 2026 00:22

TrevorBergeron requested review from a team as code owners June 16, 2026 00:22

tswast reviewed Jun 16, 2026

View reviewed changes

TrevorBergeron added 4 commits June 17, 2026 23:43

fixes

e4c0857

rename dis_to_expr

7bb178d

fix names

8629db3

doctests

94ff174

		@@ -4692,13 +4692,17 @@ def _prepare_export(
		return array_value, id_overrides

		def map(self, func, na_action: Optional[str] = None) -> DataFrame:

Conversation

TrevorBergeron commented Jun 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

tswast commented Jun 16, 2026

Uh oh!

tswast left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants