Strict type-checking + unit tests across the foundry shared library by lyskov-ai · Pull Request #326 · RosettaCommons/foundry

lyskov-ai · 2026-06-17T23:18:36Z

Continues opting the Foundry shared library (src/foundry + src/foundry_cli) into strict type-checking — disallow_untyped_defs + check_untyped_defs, module by module — and fills unit-test gaps in the code brought under the checker. After this change the entire shared layer is strictly type-checked except the large trainers/fabric module.

Modules brought under strict checking (annotations added as needed)

The rest of utils/ (instantiators, ddp, squashfs, logging, datasets), training/ (schedulers, EMA, checkpoint), the top-level common/constants, metrics/ (metric, losses), the whole callbacks/ package, inference_engines/, hydra/resolvers, model/layers/blocks, the foundry_cli checkpoint CLI, the Intel-XPU plugins (utils/xpu/), and the testing/ helpers.

New unit tests

Cover the previously-untested logic in those modules — dataset/sampler fallback wiring, schedulers/EMA/checkpoint, the metric manager and loss aggregation, callback CSV merging, checkpoint-directory resolution, the Hydra resolvers, and the XPU plugin contracts.

Notable changes beyond annotations

Behaviour fix in utils/datasets. The weighted-sampler fallback was gated on "weights" in sampler, but Sampler has no __contains__, so that branch was dead and the fallback silently used uniform weights. Switched to hasattr(...). Behaviour-neutral for the distributed-loader path, but models/mpnn/train.py passes a WeightedRandomSampler built from computed PDB weights, so its fallback sampling now uses those weights (the documented intent). mpnn training is cluster-only and unverified in CI — flagged for mpnn owners.
Removed dead code. rf3.data.paired_msa subclassed an atomworks class that is now a factory function, so it raised TypeError on import and was reachable only through a disabled config; removed it along with its orphaned dataset config.
Small latent fixes surfaced by the stricter checking — e.g. a float total typed as int in the checkpoint CLI, and a None-initialised start_time later used as a float in a training-loss callback.

instantiators is already fully annotated, so this adds it to the per-module strict mypy override (a lock-in for future defs) and fills the test gap with tests/test_instantiators.py covering the callback/logger instantiation control flow. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…st logging Adds the three remaining small foundry.utils modules to the per-module strict mypy override. squashfs was already fully annotated (pure lock-in); ddp needed one annotation (RankedLogger.log varargs); logging needed five. New tests/test_logging.py covers the two pure helpers (CachedDataFilter.filter and condense_count_columns_of_grouped_df). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

rf3.data.paired_msa no longer imports against the installed atomworks: MultiInputDatasetWrapper subclasses StructuralDatasetWrapper, which atomworks turned into a deprecated factory function, so subclassing it raises TypeError at import. The module was reachable only through domain_distillation.yaml, which is itself referenced only in commented-out lines of pdb_and_distillation.yaml, and its LoadPairedMSAs class was used nowhere. Remove the module, its orphaned domain_distillation.yaml config, the dangling commented references in pdb_and_distillation.yaml, and the stale pyproject mypy-exemption comment. rf3 now type-checks fully with no in-file suppressions. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…+ tests In wrap_dataset_and_sampler_with_fallbacks the fallback-weights branch was gated on `"weights" in sampler_to_fallback_to`. Sampler defines no __contains__, so `in` iterates the sampler's integer indices and never matches the string — the documented "use the sampler's weights" branch was dead and the membership test needlessly drew samples. Switch to `hasattr(sampler_to_fallback_to, "weights")` (the idiomatic form atomworks itself uses); mypy then narrows the type so the `# type: ignore[attr-defined]` on `.weights` is dropped. Behaviour change: assemble_distributed_loader's reachable samplers (DistributedSampler / DistributedMixedSampler) have no `.weights`, so it stays uniform (behaviour-neutral). But models/mpnn/src/mpnn/train.py passes a WeightedRandomSampler built from computed PDB weights as the fallback sampler, so its training fallback sampling now uses those weights instead of silently-uniform ones — the function's documented intent. mpnn train.py is a cluster-coupled script (not in CI), so this is unverified here; flagged for mpnn owners to validate. Also add foundry.utils.datasets to the strict mypy override (already fully annotated — a pure lock-in) and add tests/test_datasets.py (18 tests, no prior coverage) covering the fallback-weights fix, the config-key sampler dispatch, and the distributed-loader sampler conversions + validation guards. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…t mypy + tests Add foundry.training.{schedulers,EMA,checkpoint} to the per-module direction-(b) strict mypy override (disallow_untyped_defs/check_untyped_defs). schedulers was already fully annotated (pure lock-in); EMA and checkpoint get annotation-only type hints on their update/forward and decorator/wrapper signatures. Add unit tests for each module's real logic: the EMA update formula, frozen- param skip, verbatim buffer copy, training guard, and train/eval dispatch; the AF3 warmup/decay LR schedule and SchedulerConfig round-trip; and the activation- checkpointing wrappers (kwarg binding, both grad branches, gradient flow). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

…on helpers Add foundry.common and foundry.constants to the per-module direction-(b) strict mypy override. No source change: common is already fully annotated (pure lock-in) and constants is data-only (the override is a forward-looking no-op there). version.py is VCS-generated/untracked and is not a target. Add tests/test_common.py covering the 10 previously-untested pure helpers, including the easy-to-miss edges: exists/default treat only None as absent, run_once fires exactly once, listmap consumes iterables, and ensure_dtype is a same-object no-op when the dtype already matches. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add foundry.metrics.metric and foundry.metrics.losses to the per-module direction-(b) strict mypy override. metric.py is already annotated (pure lock-in). losses.py gets honest annotations on Loss.__init__/forward and a documented cast(torch.Tensor, loss) for the int-zero accumulator (kept as int 0 so the running sum adopts the child losses' device/dtype via scalar promotion; a 0-d CPU tensor would break GPU training). Add tests for the MetricManager/Metric introspection machinery (tag filtering, result key-prefixing, compute_from_kwargs remapping + optional handling, tag-conflict validation, from_metrics) and the Loss aggregator forward (sum + dict merge, detached total_loss vs grad-carrying return). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add the whole src/foundry/callbacks/ package (the BaseCallback base plus timing_logging, metrics_logging, train_logging, health_logging) to the direction-(b) disallow_untyped_defs/check_untyped_defs override and annotate the fallout. Hook annotation is mostly mechanical. Four hooks carry a documented # type: ignore[override]: subclasses override the base's positional-param hooks with **kwargs, which is safe because the trainer dispatches every hook by keyword (fabric.call(name, trainer=..., batch=..., ...)). check_untyped_defs also surfaced a real bug in LogAF3TrainingLossesCallback.start_time (None initialised, then used as a float) — typed float | None with an assert documenting that on_train_epoch_start always precedes on_train_epoch_end. Add tests/test_callbacks.py covering the one non-obvious CPU-portable helper, StoreValidationMetricsInDFCallback._load_and_concatenate_csvs (cross-rank CSV de-duplication keyed on example_id|dataset). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

… + tests Add the small remaining src/foundry modules (inference_engines.base, inference_engines.checkpoint_registry, hydra.resolvers, model.layers.blocks) to the direction-(b) disallow_untyped_defs/check_untyped_defs override and annotate the fallout (annotation-only on source). check_untyped_defs surfaced one inference quirk in resolvers: a dict literal mixing two differently-signed functions joined to the bare 'function' type; annotating it dict[str, Callable[..., Any]] fixes it. Add tests/test_checkpoint_registry.py, tests/test_resolvers.py and tests/test_blocks.py for the previously-uncovered pure helpers (checkpoint dir search order, resolver attribute walking, FourierEmbedding/Dropout behaviour). Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add foundry_cli.download_checkpoints to the direction-(b) disallow_untyped_defs/check_untyped_defs override. Annotate the four typer commands with -> None and fix list_installed's total_size accumulator (int 0 -> 0.0); this also clears the long-standing download_checkpoints.py:201 annotation-unchecked note. Add tests/test_download_checkpoints.py covering the checkpoint-dir priority logic and the list-available / list-installed commands via typer's CliRunner. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

Add the Intel-XPU Lightning plugins (utils/xpu, already annotated -> pure lock-ins) and the testing/ pytest helpers to the direction-(b) disallow_untyped_defs/check_untyped_defs override. testing/ needed two annotations: gpu() -> bool and configure_pytest(config: pytest.Config, ...). Add tests/test_xpu.py (device-independent XPU accelerator/precision/strategy contracts, CPU-reachable only) and tests/test_testing_helpers.py (get_test_data_dir). This leaves trainers/fabric as the only shared-layer module not yet under the strict override. Co-authored-by: lyskov-ai <277346777+lyskov-ai@users.noreply.github.com>

lyskov and others added 11 commits June 16, 2026 19:43

lyskov-ai requested a review from woodsh17 June 17, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strict type-checking + unit tests across the foundry shared library#326

Strict type-checking + unit tests across the foundry shared library#326
lyskov-ai wants to merge 11 commits into
RosettaCommons:productionfrom
lyskov-ai:0049-foundry-xpu-testing-mypy-strict-and-tests

lyskov-ai commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lyskov-ai commented Jun 17, 2026

Modules brought under strict checking (annotations added as needed)

New unit tests

Notable changes beyond annotations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants