Skip to content

Strict type-checking + unit tests across the foundry shared library#326

Open
lyskov-ai wants to merge 11 commits into
RosettaCommons:productionfrom
lyskov-ai:0049-foundry-xpu-testing-mypy-strict-and-tests
Open

Strict type-checking + unit tests across the foundry shared library#326
lyskov-ai wants to merge 11 commits into
RosettaCommons:productionfrom
lyskov-ai:0049-foundry-xpu-testing-mypy-strict-and-tests

Conversation

@lyskov-ai

Copy link
Copy Markdown
Contributor

Continues opting the Foundry shared library (src/foundry + src/foundry_cli) into strict type-checking — disallow_untyped_defs + check_untyped_defs, module by module — and fills unit-test gaps in the code brought under the checker. After this change the entire shared layer is strictly type-checked except the large trainers/fabric module.

Modules brought under strict checking (annotations added as needed)

The rest of utils/ (instantiators, ddp, squashfs, logging, datasets), training/ (schedulers, EMA, checkpoint), the top-level common/constants, metrics/ (metric, losses), the whole callbacks/ package, inference_engines/, hydra/resolvers, model/layers/blocks, the foundry_cli checkpoint CLI, the Intel-XPU plugins (utils/xpu/), and the testing/ helpers.

New unit tests

Cover the previously-untested logic in those modules — dataset/sampler fallback wiring, schedulers/EMA/checkpoint, the metric manager and loss aggregation, callback CSV merging, checkpoint-directory resolution, the Hydra resolvers, and the XPU plugin contracts.

Notable changes beyond annotations

  • Behaviour fix in utils/datasets. The weighted-sampler fallback was gated on "weights" in sampler, but Sampler has no __contains__, so that branch was dead and the fallback silently used uniform weights. Switched to hasattr(...). Behaviour-neutral for the distributed-loader path, but models/mpnn/train.py passes a WeightedRandomSampler built from computed PDB weights, so its fallback sampling now uses those weights (the documented intent). mpnn training is cluster-only and unverified in CI — flagged for mpnn owners.
  • Removed dead code. rf3.data.paired_msa subclassed an atomworks class that is now a factory function, so it raised TypeError on import and was reachable only through a disabled config; removed it along with its orphaned dataset config.
  • Small latent fixes surfaced by the stricter checking — e.g. a float total typed as int in the checkpoint CLI, and a None-initialised start_time later used as a float in a training-loss callback.

lyskov and others added 11 commits June 16, 2026 19:43
instantiators is already fully annotated, so this adds it to the
per-module strict mypy override (a lock-in for future defs) and fills the
test gap with tests/test_instantiators.py covering the callback/logger
instantiation control flow.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…st logging

Adds the three remaining small foundry.utils modules to the per-module strict
mypy override. squashfs was already fully annotated (pure lock-in); ddp needed
one annotation (RankedLogger.log varargs); logging needed five. New
tests/test_logging.py covers the two pure helpers (CachedDataFilter.filter and
condense_count_columns_of_grouped_df).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
rf3.data.paired_msa no longer imports against the installed atomworks:
MultiInputDatasetWrapper subclasses StructuralDatasetWrapper, which atomworks
turned into a deprecated factory function, so subclassing it raises TypeError
at import. The module was reachable only through domain_distillation.yaml,
which is itself referenced only in commented-out lines of
pdb_and_distillation.yaml, and its LoadPairedMSAs class was used nowhere.

Remove the module, its orphaned domain_distillation.yaml config, the dangling
commented references in pdb_and_distillation.yaml, and the stale pyproject
mypy-exemption comment. rf3 now type-checks fully with no in-file suppressions.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…+ tests

In wrap_dataset_and_sampler_with_fallbacks the fallback-weights branch was
gated on `"weights" in sampler_to_fallback_to`. Sampler defines no
__contains__, so `in` iterates the sampler's integer indices and never
matches the string — the documented "use the sampler's weights" branch was
dead and the membership test needlessly drew samples. Switch to
`hasattr(sampler_to_fallback_to, "weights")` (the idiomatic form atomworks
itself uses); mypy then narrows the type so the `# type: ignore[attr-defined]`
on `.weights` is dropped.

Behaviour change: assemble_distributed_loader's reachable samplers
(DistributedSampler / DistributedMixedSampler) have no `.weights`, so it stays
uniform (behaviour-neutral). But models/mpnn/src/mpnn/train.py passes a
WeightedRandomSampler built from computed PDB weights as the fallback sampler,
so its training fallback sampling now uses those weights instead of
silently-uniform ones — the function's documented intent. mpnn train.py is a
cluster-coupled script (not in CI), so this is unverified here; flagged for
mpnn owners to validate.

Also add foundry.utils.datasets to the strict mypy override (already fully
annotated — a pure lock-in) and add tests/test_datasets.py (18 tests, no prior
coverage) covering the fallback-weights fix, the config-key sampler dispatch,
and the distributed-loader sampler conversions + validation guards.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…t mypy + tests

Add foundry.training.{schedulers,EMA,checkpoint} to the per-module direction-(b)
strict mypy override (disallow_untyped_defs/check_untyped_defs). schedulers was
already fully annotated (pure lock-in); EMA and checkpoint get annotation-only
type hints on their update/forward and decorator/wrapper signatures.

Add unit tests for each module's real logic: the EMA update formula, frozen-
param skip, verbatim buffer copy, training guard, and train/eval dispatch; the
AF3 warmup/decay LR schedule and SchedulerConfig round-trip; and the activation-
checkpointing wrappers (kwarg binding, both grad branches, gradient flow).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
…on helpers

Add foundry.common and foundry.constants to the per-module direction-(b) strict
mypy override. No source change: common is already fully annotated (pure lock-in)
and constants is data-only (the override is a forward-looking no-op there).
version.py is VCS-generated/untracked and is not a target.

Add tests/test_common.py covering the 10 previously-untested pure helpers,
including the easy-to-miss edges: exists/default treat only None as absent,
run_once fires exactly once, listmap consumes iterables, and ensure_dtype is a
same-object no-op when the dtype already matches.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add foundry.metrics.metric and foundry.metrics.losses to the per-module
direction-(b) strict mypy override. metric.py is already annotated (pure
lock-in). losses.py gets honest annotations on Loss.__init__/forward and a
documented cast(torch.Tensor, loss) for the int-zero accumulator (kept as int 0
so the running sum adopts the child losses' device/dtype via scalar promotion;
a 0-d CPU tensor would break GPU training).

Add tests for the MetricManager/Metric introspection machinery (tag filtering,
result key-prefixing, compute_from_kwargs remapping + optional handling,
tag-conflict validation, from_metrics) and the Loss aggregator forward (sum +
dict merge, detached total_loss vs grad-carrying return).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add the whole src/foundry/callbacks/ package (the BaseCallback base plus
timing_logging, metrics_logging, train_logging, health_logging) to the
direction-(b) disallow_untyped_defs/check_untyped_defs override and annotate
the fallout.

Hook annotation is mostly mechanical. Four hooks carry a documented
# type: ignore[override]: subclasses override the base's positional-param hooks
with **kwargs, which is safe because the trainer dispatches every hook by
keyword (fabric.call(name, trainer=..., batch=..., ...)). check_untyped_defs
also surfaced a real bug in LogAF3TrainingLossesCallback.start_time (None
initialised, then used as a float) — typed float | None with an assert
documenting that on_train_epoch_start always precedes on_train_epoch_end.

Add tests/test_callbacks.py covering the one non-obvious CPU-portable helper,
StoreValidationMetricsInDFCallback._load_and_concatenate_csvs (cross-rank CSV
de-duplication keyed on example_id|dataset).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
… + tests

Add the small remaining src/foundry modules (inference_engines.base,
inference_engines.checkpoint_registry, hydra.resolvers, model.layers.blocks) to
the direction-(b) disallow_untyped_defs/check_untyped_defs override and annotate
the fallout (annotation-only on source).

check_untyped_defs surfaced one inference quirk in resolvers: a dict literal
mixing two differently-signed functions joined to the bare 'function' type;
annotating it dict[str, Callable[..., Any]] fixes it.

Add tests/test_checkpoint_registry.py, tests/test_resolvers.py and
tests/test_blocks.py for the previously-uncovered pure helpers (checkpoint dir
search order, resolver attribute walking, FourierEmbedding/Dropout behaviour).

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add foundry_cli.download_checkpoints to the direction-(b)
disallow_untyped_defs/check_untyped_defs override. Annotate the four typer
commands with -> None and fix list_installed's total_size accumulator
(int 0 -> 0.0); this also clears the long-standing download_checkpoints.py:201
annotation-unchecked note.

Add tests/test_download_checkpoints.py covering the checkpoint-dir priority
logic and the list-available / list-installed commands via typer's CliRunner.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
Add the Intel-XPU Lightning plugins (utils/xpu, already annotated -> pure
lock-ins) and the testing/ pytest helpers to the direction-(b)
disallow_untyped_defs/check_untyped_defs override. testing/ needed two
annotations: gpu() -> bool and configure_pytest(config: pytest.Config, ...).

Add tests/test_xpu.py (device-independent XPU accelerator/precision/strategy
contracts, CPU-reachable only) and tests/test_testing_helpers.py
(get_test_data_dir). This leaves trainers/fabric as the only shared-layer
module not yet under the strict override.

Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>
@lyskov-ai lyskov-ai requested a review from woodsh17 June 17, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants