Skip to content

feat: add import profiler tool with dynamic loaded lines code volume …#17467

Draft
hebaalazzeh wants to merge 15 commits into
mainfrom
feat/import-profiler-tool
Draft

feat: add import profiler tool with dynamic loaded lines code volume …#17467
hebaalazzeh wants to merge 15 commits into
mainfrom
feat/import-profiler-tool

Conversation

@hebaalazzeh

Copy link
Copy Markdown
Contributor

This PR introduces profiler.py, a standalone, process-isolated import profiling tool designed to benchmark Python modules without dynamic cache interference.

To evaluate accurate import load times without observer effect, the profiler runs using a Master/Worker architecture to execute each import in a completely isolated sub-process. This ensures a 100% "cold start" evaluation.

Key Features / Telemetry Vectors:

Initialization Latency (ms): Measured at high resolution natively.
Peak Memory Usage (MB): Measured via tracemalloc.
Dynamic Code Volume (LOC): Tracks the deterministic delta in sys.modules to aggregate the exact physical Lines of Code loaded by the interpreter.
Includes detailed breakdown in scripts/import_profiler/documentation.md.

Related Links:

Design Doc: go/sdk-performance-design
For googleapis/python-aiplatform#4749

@hebaalazzeh hebaalazzeh self-assigned this Jun 15, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a process-isolated Python SDK Import Profiler tool (profiler.py) along with its comprehensive documentation (documentation.md). The profiler measures initialization latency, peak memory usage, and dynamic code volume across multiple cold-start iterations. Feedback on the implementation highlights several key improvements: adding a guard to prevent crashes during single-iteration runs, replacing absolute local file paths in the documentation with relative links, handling .pyc files and logging exceptions during line counting, and exposing the worker process's stderr to facilitate debugging when subprocesses fail.

Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/documentation.md Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
- Add fallback to handle single-iteration quantile calculation crashes
- Normalize .pyc file paths during line counting and log exceptions
- Expose worker process stderr to facilitate debugging CalledProcessError
- Fix absolute paths in documentation.md to use relative directory paths

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK import profiler tool along with its documentation. The profiler is designed to run process-isolated benchmarks to measure import latency, peak memory usage, and dynamic code volume. The feedback provided aims to improve the robustness of the tool. Key recommendations include using importlib.util.source_from_cache to correctly resolve source paths from compiled .pyc files, moving imports to the module level, parsing only the last line of subprocess stdout to handle noisy module initializations safely, and explicitly specifying encoding="utf-8" when writing files to ensure platform-independent behavior.

Comment on lines +34 to +36
if file_path.endswith('.pyc'):
file_path = file_path[:-1]
if file_path.endswith('.py'):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Python 3, compiled .pyc files are stored in __pycache__ directories with interpreter-specific suffixes (e.g., module.cpython-312.pyc). Simply stripping the last character (file_path[:-1]) results in a non-existent path like __pycache__/module.cpython-312.py, causing the line counter to fail for almost all cached imports.

Use importlib.util.source_from_cache to correctly resolve the source .py file path.

            if file_path.endswith('.pyc'):
                try:
                    file_path = importlib.util.source_from_cache(file_path)
                except ValueError:
                    pass
            if file_path.endswith('.py'):

Comment on lines +7 to +9
import importlib
import csv
import os

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Import importlib.util and logging at the module level to support robust .pyc path resolution and avoid importing logging inside a loop.

import importlib
import importlib.util
import csv
import os
import logging

Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
…g, and encodings

- Use importlib.util.source_from_cache for robust .pyc resolution
- Move importlib.util and logging imports to module level
- Refine json.loads to parse only the last line of stdout
- Specify UTF-8 encoding when opening files for writing
@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK import profiling tool (profiler.py) along with its comprehensive documentation (documentation.md). The profiler uses a process-isolated master/worker architecture to accurately measure cold-start import latency, peak memory usage, and dynamic code volume. The review feedback highlights two key areas for improvement: robustly parsing the worker's stdout in run_master to prevent crashes from unexpected output (such as warnings) and validating the iteration count, and checking the subprocess return code in run_trace to warn users if an import fails instead of silently writing the error log.

Comment on lines +56 to +104
def run_master(iterations, target_module, cpu="0", csv_path=None):
"""Orchestrates the benchmark."""
times, memories = [], []
loaded_modules_val, loaded_lines_val = 0, 0

print(f"Profiling start... Running {iterations} cold-start iterations for {target_module}.")
if cpu.lower() != "none":
print(f"CPU Pinning enabled: Pinning processes to core {cpu} using taskset.")
else:
print("CPU Pinning disabled.")

for i in range(iterations):
# Build command line
cmd = []
if cpu.lower() != "none":
cmd += ["taskset", "-c", cpu]

cmd += [sys.executable, __file__, "--worker", f"--module={target_module}"]

try:
result = subprocess.run(
cmd, capture_output=True, text=True, check=True
)
data = json.loads(result.stdout.strip().splitlines()[-1])
times.append(data["time_ms"])
memories.append(data["peak_ram_mb"])
loaded_modules_val = data.get("loaded_modules", 0)
loaded_lines_val = data.get("loaded_lines", 0)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
# Fallback if taskset is not found or fails
if cpu.lower() != "none" and i == 0:
print("WARNING: taskset CPU pinning failed or is not available. Falling back to unpinned execution...")
# Try running without taskset
cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"]
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
data = json.loads(result.stdout.strip().splitlines()[-1])
times.append(data["time_ms"])
memories.append(data["peak_ram_mb"])
loaded_modules_val = data.get("loaded_modules", 0)
loaded_lines_val = data.get("loaded_lines", 0)
cpu = "none" # Disable cpu pinning for remaining iterations
except subprocess.CalledProcessError as err:
print(f"Error in worker process:\n{err.stderr}", file=sys.stderr)
raise err
else:
if isinstance(e, subprocess.CalledProcessError):
print(f"Error in worker process:\n{e.stderr}", file=sys.stderr)
raise e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of run_master parses the worker's stdout by splitting lines and taking the last line. However, if the worker process prints any warnings, debug messages, or if the imported module itself prints to stdout, the last line might not be valid JSON. This will cause the master process to crash with an unhandled json.JSONDecodeError or IndexError without displaying the worker's actual output or traceback, making debugging extremely difficult.

Additionally, there is no validation that iterations is at least 1, which can lead to division-by-zero or statistics errors later in the function.

To address these issues, we can extract the worker execution and robust parsing logic into a helper function _run_worker_and_parse that catches parsing errors and prints the raw stdout/stderr of the worker before raising, and validate iterations at the start of run_master.

def _run_worker_and_parse(cmd):
    result = subprocess.run(cmd, capture_output=True, text=True, check=True)
    try:
        lines = result.stdout.strip().splitlines()
        if not lines:
            raise ValueError("Worker produced no output on stdout.")
        data = json.loads(lines[-1])
        for key in ("time_ms", "peak_ram_mb", "loaded_modules", "loaded_lines"):
            if key not in data:
                raise KeyError(f"Missing key '{key}' in worker output")
        return data
    except (json.JSONDecodeError, IndexError, KeyError, ValueError) as parse_err:
        print(f"Error parsing worker output: {parse_err}", file=sys.stderr)
        print(f"Worker stdout:\\n{result.stdout}", file=sys.stderr)
        print(f"Worker stderr:\\n{result.stderr}", file=sys.stderr)
        raise parse_err

def run_master(iterations, target_module, cpu="0", csv_path=None):
    """Orchestrates the benchmark."""
    if iterations < 1:
        raise ValueError("Number of iterations must be at least 1.")
    times, memories = [], []
    loaded_modules_val, loaded_lines_val = 0, 0
    
    print(f"Profiling start... Running {iterations} cold-start iterations for {target_module}.")
    if cpu.lower() != "none":
        print(f"CPU Pinning enabled: Pinning processes to core {cpu} using taskset.")
    else:
        print("CPU Pinning disabled.")
    
    for i in range(iterations):
        cmd = []
        if cpu.lower() != "none":
            cmd += ["taskset", "-c", cpu]
        
        cmd += [sys.executable, __file__, "--worker", f"--module={target_module}"]
        
        try:
            data = _run_worker_and_parse(cmd)
            times.append(data["time_ms"])
            memories.append(data["peak_ram_mb"])
            loaded_modules_val = data["loaded_modules"]
            loaded_lines_val = data["loaded_lines"]
        except (subprocess.CalledProcessError, FileNotFoundError) as e:
            if cpu.lower() != "none" and i == 0:
                print("WARNING: taskset CPU pinning failed or is not available. Falling back to unpinned execution...")
                cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"]
                try:
                    data = _run_worker_and_parse(cmd)
                    times.append(data["time_ms"])
                    memories.append(data["peak_ram_mb"])
                    loaded_modules_val = data["loaded_modules"]
                    loaded_lines_val = data["loaded_lines"]
                    cpu = "none"
                except subprocess.CalledProcessError as err:
                    print(f"Error in worker process:\n{err.stderr}", file=sys.stderr)
                    raise err
            else:
                if isinstance(e, subprocess.CalledProcessError):
                    print(f"Error in worker process:\n{e.stderr}", file=sys.stderr)
                raise e

Comment on lines +153 to +167
def run_trace(target_module):
"""Generates importtime trace log and writes it to a file."""
trace_file = f"import_trace_{target_module.replace('.', '_')}.log"
print(f"Generating importtime trace log for {target_module} -> {trace_file}...")

# We run: python -X importtime -c "import <module>"
result = subprocess.run(
[sys.executable, "-X", "importtime", "-c", f"import {target_module}"],
capture_output=True, text=True
)

with open(trace_file, "w", encoding="utf-8") as f:
f.write(result.stderr)

print(f"Trace log successfully written to {trace_file}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In run_trace, if the import fails, subprocess.run will complete with a non-zero exit code, but the script will silently write the stderr to the trace file and print a success message without warning the user that the import actually failed. Adding a check for result.returncode and printing a warning with the worker's stdout/stderr will significantly improve usability and prevent users from analyzing incomplete or failed traces.

def run_trace(target_module):
    """Generates importtime trace log and writes it to a file."""
    trace_file = f"import_trace_{target_module.replace('.', '_')}.log"
    print(f"Generating importtime trace log for {target_module} -> {trace_file}...")
    
    # We run: python -X importtime -c "import <module>"
    result = subprocess.run(
        [sys.executable, "-X", "importtime", "-c", f"import {target_module}"],
        capture_output=True, text=True
    )
    
    if result.returncode != 0:
        print(f"WARNING: Import failed with exit code {result.returncode}. The trace log may be incomplete or contain errors.", file=sys.stderr)
        if result.stdout:
            print(f"Worker stdout:\\n{result.stdout}", file=sys.stderr)
        if result.stderr:
            print(f"Worker stderr:\\n{result.stderr}", file=sys.stderr)
            
    with open(trace_file, "w", encoding="utf-8") as f:
        f.write(result.stderr)
        
    print(f"Trace log successfully written to {trace_file}")

- Extract worker logic to _run_worker_and_parse for robust JSON parsing
- Add early validation for iterations parameter in run_master
- Add non-zero exit code check and warning for run_trace failures
@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler, which includes a comprehensive documentation guide and a self-spawning profiling script (profiler.py) designed to benchmark import latency, peak memory, and dynamic code volume using a process-isolated master/worker architecture. The feedback recommends enhancing the robustness of the worker's stdout parsing by prefixing metrics with a unique identifier to prevent interference from module initialization logs, and replacing the manual CLI argument parsing with the standard argparse library for improved usability and maintainability.

Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
Comment on lines +227 to +262
if __name__ == "__main__":
# Parse CLI arguments
target_module = "google.cloud.compute"
iterations = 50
trace = False
cprofile = False
mprofile = False
cpu = "0"
csv_path = None

for arg in sys.argv[1:]:
if arg.startswith("--module="):
target_module = arg.split("=")[1]
elif arg.startswith("--iterations="):
iterations = int(arg.split("=")[1])
elif arg.startswith("--cpu="):
cpu = arg.split("=")[1]
elif arg.startswith("--csv="):
csv_path = arg.split("=")[1]
elif arg == "--trace":
trace = True
elif arg == "--cprofile":
cprofile = True
elif arg == "--mprofile":
mprofile = True

if "--worker" in sys.argv:
run_worker(target_module)
elif trace:
run_trace(target_module)
elif cprofile:
run_cprofile(target_module)
elif mprofile:
run_mprofile(target_module)
else:
run_master(iterations, target_module, cpu, csv_path) No newline at end of file

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using manual argument parsing is fragile, does not support standard help flags (-h/--help), and silently ignores invalid arguments. Replacing it with the standard library argparse module improves maintainability, usability, and robustness.

if __name__ == "__main__":
    import argparse
    parser = argparse.ArgumentParser(description="Python SDK Import Profiler")
    parser.add_argument("--module", default="google.cloud.compute", help="Target module to profile")
    parser.add_argument("--iterations", type=int, default=50, help="Number of iterations")
    parser.add_argument("--cpu", default="0", help="CPU core to pin to (or 'none')")
    parser.add_argument("--csv", help="Path to export CSV results")
    parser.add_argument("--trace", action="store_true", help="Generate importtime trace log")
    parser.add_argument("--cprofile", action="store_true", help="Run cProfile")
    parser.add_argument("--mprofile", action="store_true", help="Run tracemalloc memory snapshot")
    parser.add_argument("--worker", action="store_true", help=argparse.SUPPRESS)
    
    args = parser.parse_args()
    
    if args.worker:
        run_worker(args.module)
    elif args.trace:
        run_trace(args.module)
    elif args.cprofile:
        run_cprofile(args.module)
    elif args.mprofile:
        run_mprofile(args.module)
    else:
        run_master(args.iterations, args.module, args.cpu, args.csv)

hebaalazzeh and others added 2 commits June 15, 2026 15:18
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool, consisting of a detailed documentation guide and a self-spawning, process-isolated benchmarking script (profiler.py) designed to measure cold-start import latency, peak memory usage, and dynamic code volume. The reviewer's feedback focuses on improving the robustness and usability of the script. Key recommendations include converting the cpu parameter to a string to prevent potential runtime errors, automatically creating parent directories for the CSV export path to avoid FileNotFoundError, and replacing manual CLI argument parsing with Python's standard argparse module for a more robust and user-friendly interface.

I am having trouble creating individual review comments. Click here to see my feedback.

scripts/import_profiler/profiler.py (78-81)

medium

If cpu is passed as an integer (e.g., cpu=0), calling cpu.lower() will raise an AttributeError. Additionally, passing an integer to the subprocess command list will cause a TypeError during execution. Converting cpu to a string at the beginning of run_master ensures robust handling of both integer and string inputs.

def run_master(iterations, target_module, cpu="0", csv_path=None):
    cpu = str(cpu) if cpu is not None else "none"
    if iterations < 1:
        raise ValueError("Number of iterations must be at least 1.")

scripts/import_profiler/profiler.py (127-128)

medium

If the provided csv_path contains nested directories that do not exist yet, open(csv_path, "w") will raise a FileNotFoundError. Automatically creating the parent directories using os.makedirs makes the tool more robust and utilizes the otherwise unused os import.

    if csv_path:
        os.makedirs(os.path.dirname(os.path.abspath(csv_path)), exist_ok=True)
        with open(csv_path, "w", newline="", encoding="utf-8") as f:

scripts/import_profiler/profiler.py (232-267)

medium

Using manual argument parsing is error-prone, doesn't support standard --help / -h flags (which currently falls back to running the full benchmark), and only supports specific formats like --module=value instead of --module value. Replacing this with Python's standard argparse module provides robust validation, automatic help documentation, and a much better user experience.

if __name__ == "__main__":
    import argparse
    
    parser = argparse.ArgumentParser(
        description="Python SDK Import Profiler: benchmark and profile module imports."
    )
    parser.add_argument(
        "--module",
        default="google.cloud.compute",
        help="Target module to profile (e.g., google.cloud.aiplatform)"
    )
    parser.add_argument(
        "--iterations",
        type=int,
        default=50,
        help="Number of cold-start iterations to run"
    )
    parser.add_argument(
        "--cpu",
        default="0",
        help="CPU core to pin processes to using taskset (set to 'none' to disable)"
    )
    parser.add_argument(
        "--csv",
        help="Path to export raw iteration metrics as a CSV file"
    )
    parser.add_argument(
        "--trace",
        action="store_true",
        help="Generate an importtime trace log file"
    )
    parser.add_argument(
        "--cprofile",
        action="store_true",
        help="Run cProfile to capture latency stack traces"
    )
    parser.add_argument(
        "--mprofile",
        action="store_true",
        help="Run tracemalloc memory snapshot"
    )
    parser.add_argument(
        "--worker",
        action="store_true",
        help="Run in worker mode (internal use)"
    )
    
    args = parser.parse_args()
            
    if args.worker:
        run_worker(args.module)
    elif args.trace:
        run_trace(args.module)
    elif args.cprofile:
        run_cprofile(args.module)
    elif args.mprofile:
        run_mprofile(args.module)
    else: 
        run_master(args.iterations, args.module, args.cpu, args.csv)

@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool, consisting of a master/worker profiling script (profiler.py) and its comprehensive documentation (documentation.md). The profiler measures import latency, peak memory, and dynamic code volume in a process-isolated manner to avoid caching issues. The feedback suggests a platform-aware improvement to default the CPU pinning option to 'none' on non-Linux systems, preventing unnecessary warnings and fallback attempts on macOS or Windows.

Comment thread scripts/import_profiler/profiler.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool, consisting of a comprehensive documentation guide and a master/worker execution script (profiler.py) designed to benchmark module import performance under cold-start conditions. Feedback on the implementation highlights opportunities to preserve the cold-start design by running both cProfile and tracemalloc profiling in isolated subprocesses rather than directly within the master process. Additionally, it is recommended to separate FileNotFoundError and subprocess.CalledProcessError handling in the master process to avoid misleading warnings about taskset availability when a worker process crashes.

Comment on lines +195 to +216
def run_cprofile(target_module):
"""Runs cProfile to capture stack traces for latency."""
import cProfile
import pstats

prof_file = f"cprofile_{target_module.replace('.', '_')}.prof"
print(f"Generating cProfile data for {target_module} -> {prof_file}...")

# Run profiling
pr = cProfile.Profile()
pr.enable()
importlib.import_module(target_module)
pr.disable()

# Save for flame charts (e.g. via snakeviz)
pr.dump_stats(prof_file)
print(f"cProfile stats successfully written to {prof_file}")

# Print top bottlenecks
print("\n--- Top 15 functions by cumulative time ---")
ps = pstats.Stats(pr).sort_stats(pstats.SortKey.CUMULATIVE)
ps.print_stats(15)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Running cProfile directly in the master process defeats the "cold start" design of the profiler. Since profiler.py already imports heavy standard library modules (like subprocess, json, csv, logging, tracemalloc, statistics), any imports of these modules by the target module will be resolved instantly from Python's sys.modules cache and won't be captured in the profile.

To ensure a true cold-start profile, cProfile should be executed in an isolated subprocess.

def run_cprofile(target_module):
    """Runs cProfile in a clean subprocess to capture stack traces for latency."""
    import pstats
    
    prof_file = f"cprofile_{target_module.replace('.', '_')}.prof"
    print(f"Generating cProfile data for {target_module} -> {prof_file}...")
    
    # Run profiling in a clean subprocess to ensure cold-start
    result = subprocess.run(
        [sys.executable, "-m", "cProfile", "-o", prof_file, "-c", f"import importlib; importlib.import_module('{target_module}')"],
        capture_output=True, text=True
    )
    if result.returncode != 0:
        print(f"Error generating cProfile data:\n{result.stderr}", file=sys.stderr)
        return

    print(f"cProfile stats successfully written to {prof_file}")
    
    # Print top bottlenecks
    print("\n--- Top 15 functions by cumulative time ---")
    ps = pstats.Stats(prof_file).sort_stats(pstats.SortKey.CUMULATIVE)
    ps.print_stats(15)

Comment thread scripts/import_profiler/profiler.py Outdated
Comment thread scripts/import_profiler/profiler.py Outdated
- Execute cProfile in a fresh subprocess to ensure cold-start accuracy
- Execute tracemalloc in a fresh subprocess to capture all memory allocations
- Clean up master process exception handling for missing taskset vs crashed workers
@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool, which includes a comprehensive documentation file (documentation.md) and a core profiling script (profiler.py) designed to measure import latency, peak memory, and dynamic code volume using process-isolated worker runs. The review feedback highlights several critical security and robustness improvements: addressing potential code injection vulnerabilities in run_trace, run_cprofile, and run_mprofile by safely escaping the target_module parameter using json.dumps(), and preventing potential UnicodeDecodeError failures in run_worker when reading source files by adding errors='ignore' to the file-opening logic.

Comment thread scripts/import_profiler/profiler.py
Comment thread scripts/import_profiler/profiler.py
Comment on lines +222 to +232
"import tracemalloc\n"
"import importlib\n"
"tracemalloc.start()\n"
f"importlib.import_module('{target_module}')\n"
"snapshot = tracemalloc.take_snapshot()\n"
"tracemalloc.stop()\n"
"top_stats = snapshot.statistics('lineno')\n"
"for stat in top_stats[:15]:\n"
" print(stat)\n"
)
result = subprocess.run([sys.executable, "-c", code], capture_output=True, text=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Directly interpolating target_module into the code string is vulnerable to code injection. Use json.dumps() to safely escape the module name.

    code = (
        "import tracemalloc\n"
        "import importlib\n"
        "tracemalloc.start()\n"
        f"importlib.import_module({json.dumps(target_module)})\n"
        "snapshot = tracemalloc.take_snapshot()\n"
        "tracemalloc.stop()\n"
        "top_stats = snapshot.statistics('lineno')\n"
        "for stat in top_stats[:15]:\n"
        "    print(stat)\n"
    )

Comment thread scripts/import_profiler/profiler.py
hebaalazzeh and others added 4 commits June 15, 2026 16:00
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Use json.dumps to safely escape target_module string in all subprocess commands (mprofile, cprofile, trace)
@hebaalazzeh

Copy link
Copy Markdown
Contributor Author

/gemini review

Comment on lines +11 to +12
* **[plan.md](./plan.md)**: The current project phases and roadmap checklist.
* **[status.md](./status.md)**: Tracks the active task state and hosts recorded baseline performance metrics.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these files?

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Python SDK Import Profiler tool, including a detailed documentation guide and a self-spawning script (profiler.py) that benchmarks module imports in process-isolated environments. The reviewer feedback highlights two main improvement opportunities: preventing a potential AttributeError by verifying that the module's __file__ attribute is a string before checking its extension, and refactoring the error handling block in run_master to eliminate duplicate metric collection logic.

Comment on lines +34 to +35
if mod and getattr(mod, '__file__', None):
file_path = mod.__file__

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The __file__ attribute of a module can sometimes be None (e.g., for namespace packages or dynamically created modules) or potentially a non-string object in custom environments. Checking if it is a string before performing string operations like .endswith() prevents potential AttributeErrors.

Suggested change
if mod and getattr(mod, '__file__', None):
file_path = mod.__file__
if mod and isinstance(getattr(mod, '__file__', None), str):
file_path = mod.__file__

Comment on lines +99 to +123
try:
data = _run_worker_and_parse(cmd)
times.append(data["time_ms"])
memories.append(data["peak_ram_mb"])
loaded_modules_val = data["loaded_modules"]
loaded_lines_val = data["loaded_lines"]
except FileNotFoundError as e:
if cpu.lower() != "none" and i == 0:
print("WARNING: taskset CPU pinning is not available. Falling back to unpinned execution...")
cpu = "none"
cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"]
try:
data = _run_worker_and_parse(cmd)
times.append(data["time_ms"])
memories.append(data["peak_ram_mb"])
loaded_modules_val = data["loaded_modules"]
loaded_lines_val = data["loaded_lines"]
except subprocess.CalledProcessError as err:
print(f"Error in worker process:\n{err.stderr}", file=sys.stderr)
raise err
else:
raise e
except subprocess.CalledProcessError as e:
print(f"Error in worker process:\n{e.stderr}", file=sys.stderr)
raise e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication in the error handling block where metrics are appended to lists and assigned to variables. Refactoring this block to perform the list appending and variable assignment once after the try-except block simplifies the logic and improves maintainability.

        try:
            data = _run_worker_and_parse(cmd)
        except FileNotFoundError as e:
            if cpu.lower() != "none" and i == 0:
                print("WARNING: taskset CPU pinning is not available. Falling back to unpinned execution...")
                cpu = "none"
                cmd = [sys.executable, __file__, "--worker", f"--module={target_module}"]
                try:
                    data = _run_worker_and_parse(cmd)
                except subprocess.CalledProcessError as err:
                    print(f"Error in worker process:\\n{err.stderr}", file=sys.stderr)
                    raise err
            else:
                raise e
        except subprocess.CalledProcessError as e:
            print(f"Error in worker process:\\n{e.stderr}", file=sys.stderr)
            raise e

        times.append(data["time_ms"])
        memories.append(data["peak_ram_mb"])
        loaded_modules_val = data["loaded_modules"]
        loaded_lines_val = data["loaded_lines"]
References
  1. Remove duplicate lines of code, especially duplicate assertions in tests, to keep the codebase clean and avoid redundancy.

Comment thread scripts/import_profiler/documentation.md Outdated
- Replace detailed implementation architecture and alternative solutions with a concise high-level usage and expected output format per PR review feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants