Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.
This repository was archived by the owner on May 14, 2026. It is now read-only.

Use pnpm's @pnpm/cli.default-reporter for terminal output (via NDJSON) #344

@zkochan

Description

@zkochan

Summary

Introduce a typed Reporter trait inside pacquet whose call sites emit
strongly-typed log events. Ship one implementation that serializes those
events to newline-delimited JSON in pnpm's @pnpm/core-loggers schema and
pipes them through pnpm's reporter (@pnpm/cli.default-reporter) to render
terminal output. This gets us pnpm-identical progress, lifecycle, stats,
deprecation, peer-dependency, and summary output without reimplementing a
non-trivial reporter in Rust — and leaves a clean seam for a future native
Rust reporter that consumes the same events with no JSON round-trip.

The roadmap (#299) already lists "Implement progress reporting to the
terminal that looks the same as the one printed by pnpm" under Stage 1. This
issue proposes the cheapest path to satisfying that line item.

Why pnpm's reporter, rather than a native Rust one

  • Rendering parity with pnpm is the cardinal rule of this project. Reusing
    pnpm's reporter makes parity automatic instead of a perpetual chase. Any
    visual change pnpm makes lands in pacquet's output for free.
  • The reporter is not small: ~20 log channels (pnpm:progress,
    pnpm:fetching-progress, pnpm:stage, pnpm:lifecycle, pnpm:stats,
    pnpm:deprecation, pnpm:peer-dependency-issues, pnpm:summary,
    pnpm:request-retry, pnpm:execution-time, etc.), driven through rxjs
    with throttling, an ansi-diff repaint loop, and chalk/boxen formatting.
    Reimplementing it just to throw it away when pacquet integrates into the
    pnpm CLI directly would be wasteful.
  • pnpm already supports --reporter=ndjson, so the NDJSON schema (defined by
    @pnpm/core-loggers) is a stable, documented contract. Anything that emits
    records in that schema is automatically consumable by
    @pnpm/cli.default-reporter. We don't have to invent anything; we have to
    match.
  • Once pacquet is integrated into the pnpm CLI as an install backend (the
    Integration Milestone in the roadmap), pnpm's reporter is what will be
    consuming pacquet's log events anyway. Aligning the schema now is a
    prerequisite for that work, not extra effort.

Architecture: typed events, pluggable sinks

The crucial point is that emission is a typed-event API, NDJSON is one
sink
. Call sites never construct JSON; they hand a Rust enum to a
Reporter trait. The wire format lives in the sink, not in the call sites.

Reporter follows the repo's established dependency-injection pattern
(#339): capability methods take no &self and are associated functions,
production sinks are unit structs, and call sites bind the implementation
through a generic parameter. This gives zero-cost dispatch (monomorphised
and inlined) and matches the shape already used in crates/modules-yaml
and crates/cmd-shim. Reporter-internal state — throttle map, MPSC sender
to the writer task, etc. — lives in module-level statics, not in self.

// Mirrors `@pnpm/core-loggers` 1:1, one variant per pnpm log channel.
#[derive(Serialize)]
#[serde(tag = "name", rename_all = "kebab-case")]
pub enum LogEvent {
    #[serde(rename = "pnpm:progress")]          Progress(ProgressLog),
    #[serde(rename = "pnpm:fetching-progress")] FetchingProgress(FetchingProgressLog),
    #[serde(rename = "pnpm:stage")]             Stage(StageLog),
    #[serde(rename = "pnpm:lifecycle")]         Lifecycle(LifecycleLog),
    // ... ~20 variants total
}

// Capability trait per the pnpm/pacquet#339 pattern: associated function, no `&self`.
pub trait Reporter {
    fn emit(event: &LogEvent);
}

// Today's sink. Unit struct; throttle state and the writer-task MPSC sender
// live in module-level `static`s initialised at startup. See Implementation
// notes for the concurrency model and error handling.
pub struct NdjsonReporter;
impl Reporter for NdjsonReporter {
    fn emit(e: &LogEvent) {
        // Serialize into a thread-local buffer, try_send onto a static MPSC
        // sender. Failures are swallowed (optionally `tracing::debug!`-d).
    }
}

// Tomorrow's sink: same shape, no JSON path. Renderer state (ansi-diff,
// throttling) also lives in module-level `static`s.
pub struct NativeReporter;
impl Reporter for NativeReporter {
    fn emit(e: &LogEvent) {
        match e {
            LogEvent::Progress(p)  => render_progress(p),
            LogEvent::Lifecycle(l) => render_lifecycle(l),
            // ...
        }
    }
}

// `--reporter=silent` is a no-op impl.
pub struct SilentReporter;
impl Reporter for SilentReporter {
    fn emit(_: &LogEvent) {}
}

// Call sites bind the generic; `R::emit` monomorphises away.
pub fn install<R: Reporter>(opts: InstallOpts) -> Result<()> {
    R::emit(&LogEvent::Stage(StageLog { /* ... */ }));
    // ...
}

// Production entry point turbofishes the chosen sink.
match reporter_type {
    ReporterType::Default | ReporterType::Ndjson => install::<NdjsonReporter>(opts),
    ReporterType::Silent                          => install::<SilentReporter>(opts),
}

Design rules that make this replacement-friendly:

  • Define LogEvent to mirror @pnpm/core-loggers 1:1. Both sinks share
    the same vocabulary; a future Rust reporter doesn't have to invent a new
    event taxonomy, it just renders the same events differently.
  • Don't let the wire format leak into call sites. camelCase field names,
    the "pnpm:..." name strings, the bunyan-style envelope (level, time,
    prefix) — all of that lives in serde attributes, not in LogEvent's
    in-memory shape. Call sites work with idiomatic Rust enums.
  • Carry data, not pre-formatted strings. If a call site emits
    msg: "Resolving foo@1.2.3...", both sinks are stuck with that wording
    forever. Emit a package_id plus a structured stage and let the reporter
    format. (pnpm's existing logs are mostly already shaped this way; follow
    upstream's lead.)
  • Throttling, repainting, and aggregation belong in the sink, not the
    emission layer.
    A future native sink will reimplement those (rxjs-style
    throttle, ansi-diff-equivalent repaint loop) — but that's reporter-
    internal work, doesn't touch any call site, and doesn't require any JSON
    round-trip.

Implementation notes (Rust idioms)

Concerns a Rust-native reviewer would raise; folding the answers in up front
so the design captured here is what we'd build.

  • Capability shape (no &self, generic threading). Follows the DI
    pattern documented in Refactor and document the dependency injection pattern for tests #339 and shipped in crates/modules-yaml /
    crates/cmd-shim. Reporter::emit is an associated function, not a
    method; functions that emit take a generic R: Reporter and call
    R::emit(...); the production entry point turbofishes the concrete
    sink (install::<NdjsonReporter>(opts)). Result: dispatch monomorphises
    away, no dyn vtable, no Arc clone, and the shape stays consistent
    with the rest of the workspace. Reporter-internal state (throttle map,
    MPSC sender) lives in module-level statics — there is one production
    sink active per process, so global state is appropriate. Tests follow
    Refactor and document the dependency injection pattern for tests #339's static-per-test pattern: a unit-struct fake declared inside the
    #[test] fn, recording into a static Mutex<Vec<LogEvent>> declared in
    the same body.
  • Relationship to tracing. pacquet already uses tracing for
    developer-facing diagnostics (crates/diagnostics/src/local_tracing.rs),
    and that stays. The community-reflex question is "why not a
    tracing::Subscriber?" — pnpm's schema is a closed channel-keyed union
    with rich payloads, while tracing events are open key-value bags whose
    Visit API forces every sink to re-parse the same fields. A typed enum
    dispatching through a capability trait is a better fit for user-facing
    output. tracing continues to handle developer-facing logs, with no
    overlap.
  • Error handling. emit must not panic or propagate errors. A
    serialization or pipe-write failure is swallowed (optionally surfaced
    via tracing::debug!) rather than crashing an install.
  • Sync emit, async-friendly offload. emit is sync but called from
    async contexts. The recommended NDJSON path is a bounded MPSC channel
    feeding one dedicated writer task: emitters pre-serialize into a
    thread-local buffer and try_send onto a static OnceLock<Sender<_>>,
    and the writer drains and write_alls to stderr in batches. This
    decouples emit latency from pipe-write latency and avoids serializing
    tokio tasks on a stderr lock. (std::io::stderr().lock() per emit is
    the simpler fallback if benchmarks show contention isn't a problem.)
  • No async fn emit. Making emit async would force .await at every
    call site for a fundamentally fire-and-forget operation, and would also
    conflict with the no-self capability shape. The MPSC offload above
    gets the same decoupling without coloring every emission point.
  • emit(&LogEvent) vs. per-channel methods. Cargo's Shell exposes
    per-purpose methods (status(), note(), warn()). We're constrained
    to mirror pnpm's ~20 channels, so a single emit(&LogEvent) is the
    right call: smaller surface area, simpler to fake in tests, and the
    construct-an-enum cost is sub-millisecond on a multi-second install.
    Per-channel methods would shave nanoseconds per emit at the cost of
    ~20 trait methods and a more rigid surface — not worth it.

Proposed shape

  1. Define LogEvent mirroring @pnpm/core-loggers' Log union, plus
    serde attributes that produce pnpm's wire format (bunyan envelope:
    level, time, name, plus channel-specific payload).
  2. Define trait Reporter per Refactor and document the dependency injection pattern for tests #339: associated fn emit(event: &LogEvent)
    with no &self. Functions that emit take a generic R: Reporter and
    call R::emit(...). The production entry point turbofishes the chosen
    sink (install::<NdjsonReporter>(opts)).
  3. Wire the existing call sites (tarball fetching, store linking, lifecycle
    scripts, install summary, etc.) to call R::emit(...).
  4. Implement NdjsonReporter as a unit struct that writes to stderr, one
    record per line. The writer-task MPSC sender, throttle map, and any
    other reporter-internal state live in module-level statics,
    initialised at startup (see Implementation notes).
  5. By default, spawn @pnpm/cli.default-reporter (or pnpm itself in
    --reporter=default consumer mode) as a child process and pipe
    pacquet's stderr to its stdin. Open question — see below.
  6. Support --reporter=ndjson (raw passthrough, no child process) and
    --reporter=silent (a SilentReporter unit struct with a no-op
    emit) to match pnpm's surface. Reporter selection happens at the
    entry point and threads through as the generic parameter — a small,
    bounded number of monomorphised copies of the install pipeline.
  7. A native Rust reporter is explicitly out of scope for this issue. The
    capability shape makes it a drop-in addition (another unit-struct impl)
    later if and when the Integration Milestone is delayed or shelved.

Performance

Cost decomposition per emit: enum construction + (NDJSON path) serialization

  • writer handoff. Dispatch is monomorphised away by the generic capability
    shape, so there is no vtable cost. At install scale (~5K–10K events for a
    1300-package install), the dominant Rust-side cost is allocation in the
    event payloads, not serialization.

Hygiene that keeps the Rust path near-optimal in both phases:

  1. Intern identifiers; don't allocate per event. A naive
    LogEvent::Progress(ProgressLog { name: pkg.to_string(), version: ver.to_string() })
    allocates twice per emit. Intern package names/versions to Arc<str>
    (or via lasso / ustr) once when a package enters the resolver, then
    pass references through every event. This is the single largest
    Rust-side win and pays off again after the native swap.
  2. Serialize to a thread-local Vec<u8> then one write_all.
    serde_json::to_writer issues many small writes; buffering matches what
    tracing-subscriber::fmt does and avoids amplifying writer-lock
    contention.
  3. MPSC channel for backpressure decoupling. See Implementation notes.
    Bounded channel; on overflow, decide between drop (lossy, no stalls) and
    apply-backpressure (matches pnpm's in-process behavior).
  4. Throttle on the emit side, not just downstream. pnpm's reporter
    throttles per-package progress to 200ms; if pacquet emits every event,
    serialization + pipe cost is paid for events the JS side will drop.
    Dedup pnpm:fetching-progress per package within a small window before
    serializing. Care: throttle within a channel only, not across channels,
    so the JS reporter's diff/repaint expectations stay intact.

Ceiling, today (NDJSON sink): with the four rules above, Rust-side cost
is sub-1% of install wall time. The dominant cost is the JS reporter and
the pipe — structural, not fixable on the Rust side.

Ceiling, after native swap: the JS-side mask disappears, so Rust-side
cost matters more in absolute terms. The same hygiene continues to pay; the
renderer (ansi-diff-equivalent diff and repaint) becomes the dominant
cost and is reporter-internal work, unaffected by the trait shape. The
unified emit(&LogEvent) design loses on the order of nanoseconds per
event vs. per-channel methods — below the noise floor of an actual
install.

Measurement. This needs to be measured, not asserted.
tasks/integrated-benchmark is the vehicle. Benchmarks worth running once
an implementation lands:

  • Cold install with --reporter=silent vs. --reporter=ndjson (raw, no
    Node) vs. --reporter=default (piped to Node) — isolates Rust-side cost
    from JS-side cost.
  • The same three on a 1300-package install to expose any per-event
    linearity issue.
  • Post-native-swap: native vs. ndjson-piped-to-Node, same install —
    quantifies how much of the gap is structural (Node) vs. fixable
    (renderer).

Open questions

  • How is the JS reporter delivered? Options:
    • Require a system Node + run npx @pnpm/cli.default-reporter (zero
      bundling, but adds a runtime dependency users have to satisfy).
    • Bundle a single-file reporter.js alongside the pacquet binary and
      require Node on PATH only.
    • Embed via an N-API addon once the Integration Milestone lands, removing
      the spawn entirely.
  • Schema version pinning. @pnpm/core-loggers evolves. We should pin a
    specific pnpm version per pacquet release and add a CI check that the
    emitted schema still parses.
  • Error stream separation. pnpm's reporter expects to control both
    stdout and stderr. We need to decide whether pacquet's own diagnostics
    (currently tracing-based) keep flowing to the user's stderr alongside
    rendered output, or get folded into the log stream as pnpm: events.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions