Skip to content

refactor: decouple GO enrichment logic and improve data flow safety #10

Open
hjn0415a wants to merge 7 commits intoOpenMS:mainfrom
hjn0415a:feature/go-term-in-execution
Open

refactor: decouple GO enrichment logic and improve data flow safety #10
hjn0415a wants to merge 7 commits intoOpenMS:mainfrom
hjn0415a:feature/go-term-in-execution

Conversation

@hjn0415a
Copy link

@hjn0415a hjn0415a commented Feb 10, 2026

Key Changes

  • Extracted the complex GO enrichment logic from the execution method into a dedicated internal method: _run_go_enrichment.
  • Replaced st.session_state["workspace"] with a local variable workspace_path.

Summary by CodeRabbit

  • New Features

    • Added a Proteomics LFQ results page with protein abundance tables and integrated GO enrichment (BP/CC/MF) visualizations and tables.
  • Improvements

    • Volcano and PCA views now use FDR-adjusted p-values (p-adj) for ranking, filtering, labels, and plotting.
    • Improved Windows compatibility and more robust local workflow cleanup.
  • Chores

    • Updated project dependencies.

@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

Adds a Proteomics LFQ results page and Streamlit module; implements GO enrichment in WorkflowTest (new _run_go_enrichment) using mygene and saves GO results as JSON; updates results helpers to compute p‑adj; makes WorkflowManager stop routine Windows‑compatible; adds dependencies and small UI/logic tweaks.

Changes

Cohort / File(s) Summary
App registration & new page
app.py, content/results_proteomicslfq.py
Registers a "Proteomics LFQ" results page and adds a Streamlit page that loads abundance data, shows protein-level tables, persists pivot_df in session, and reads/renders go_results.json (BP/CC/MF) with conditional messaging.
Workflow & GO enrichment
src/WorkflowTest.py
Adds _run_go_enrichment(self, pivot_df, results_dir) and invokes it after ProteomicsLFQ quantification: UniProt mapping (mygene), GO term retrieval, Fisher exact tests per ontology, Plotly figures generation, and persistence of go_results.json and figures.
Results processing / stats
src/common/results_helpers.py
Computes Benjamini–Hochberg adjusted p-values (p-adj) via statsmodels.multitest and includes p-adj in stats and pivot outputs for downstream use.
Volcano / PCA UI tweaks
content/results_volcano.py, content/results_pca.py
Switches significance/filtering logic from p-value to p‑adj across volcano and PCA displays, updates labels/metrics and plot hover data and colors accordingly.
Process management
src/workflow/WorkflowManager.py
Adds Windows compatibility to local workflow stop routine (uses taskkill on Windows), expands exception handling for PID cleanup, and guards pid file unlinking.
Dependencies & minor edits
requirements.txt, content/workflow_run.py
Adds mygene and statsmodels to requirements and removes a stray blank line in content/workflow_run.py.

Sequence Diagram(s)

sequenceDiagram
    participant User as "User"
    participant UI as "Streamlit UI\n(results_proteomicslfq)"
    participant WF as "WorkflowTest"
    participant LFQ as "ProteomicsLFQ Module"
    participant MG as "mygene (External API)"
    participant Store as "Results Storage\n(go_results.json, figures)"

    User->>UI: open Proteomics LFQ page
    UI->>Store: request go_results.json
    alt go_results.json exists
        Store-->>UI: return go_results.json + figures
        UI->>UI: render BP/CC/MF tabs (plots + tables)
        UI-->>User: display GO results
    else no go_results.json
        UI-->>User: show info (no GO results)
    end

    Note over WF,LFQ: (separate workflow execution)
    WF->>LFQ: run quantification -> pivot_df
    WF->>WF: _run_go_enrichment(pivot_df, results_dir)
    WF->>MG: query UniProt / GO annotations
    MG-->>WF: return GO annotations
    WF->>WF: compute Fisher tests, build figures
    WF->>Store: write go_results.json and figures
Loading

Poem

🐰 I hopped through LFQ fields so bright,
Counted proteins by day and night,
BP, CC, MF in tidy rows,
Fisher found what garden shows,
Saved the treasures — what a sight! 🧪✨

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (15 files):

⚔️ .github/workflows/build-windows-executable-app.yaml (content)
⚔️ .streamlit/config.toml (content)
⚔️ Dockerfile (content)
⚔️ app.py (content)
⚔️ content/results_pca.py (content)
⚔️ content/results_volcano.py (content)
⚔️ content/workflow_run.py (content)
⚔️ docker-compose.yml (content)
⚔️ requirements.txt (content)
⚔️ src/WorkflowTest.py (content)
⚔️ src/common/common.py (content)
⚔️ src/common/results_helpers.py (content)
⚔️ src/workflow/CommandExecutor.py (content)
⚔️ src/workflow/StreamlitUI.py (content)
⚔️ src/workflow/WorkflowManager.py (content)

These conflicts must be resolved before merging into main.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: refactoring GO enrichment logic into a dedicated _run_go_enrichment method and improving data flow safety by replacing session state reliance with local variables.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feature/go-term-in-execution
  • Post resolved changes as copyable diffs in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/WorkflowTest.py (1)

356-362: ⚠️ Potential issue | 🔴 Critical

results_dir is silently overwritten two lines below its creation, causing downstream confusion.

Line 356 sets results_dir = Path(self.workflow_dir, "results"), then line 362 overwrites it with Path(self.workflow_dir, "input-files"). This is the root cause of the GO output path bug (line 825) and also makes the final report (line 834) show the wrong directory. Consider using a distinct variable name for the input-files path.

🐛 Proposed fix
-        results_dir = Path(self.workflow_dir, "input-files")
+        input_files_dir = Path(self.workflow_dir, "input-files")

Then update any downstream references that actually need the input-files path accordingly.

🤖 Fix all issues with AI agents
In `@content/results_proteomicslfq.py`:
- Around line 64-65: Replace the hardcoded construction of results_dir using
"topp-workflow" with the shared helper: import get_workflow_dir from
src.common.results_helpers (alongside get_abundance_data if present) and call
workflow_dir = get_workflow_dir(st.session_state["workspace"]); then build
results_dir = workflow_dir / "results" / "go-terms" (keep go_json_file =
results_dir / "go_results.json") and remove the hardcoded Path(...) use so the
file uses get_workflow_dir consistently.

In `@src/WorkflowTest.py`:
- Around line 984-985: The subprocess function execution() is attempting to
write to st.session_state (setting "go_results" and "go_ready") while running
under WorkflowManager.workflow_process, but Streamlit's session_state isn't
available in child processes; remove these dead writes and rely on the existing
JSON export that already persists go_results/go_data (the persistence logic
surrounding go_results, go_data and the JSON export) so the subprocess only
writes to the JSON and does not touch st.session_state.
- Around line 817-825: The bug is that results_dir (intended to point at the
"results" folder) is overwritten with "input-files" before being passed to
_run_go_enrichment, causing GO output to be written to the wrong location; fix
by ensuring results_dir is set to self.workflow_dir / "results" (and not
overwritten) before calling self._run_go_enrichment(pivot_df, results_dir) —
locate where results_dir is assigned and where it is reassigned (near uses of
self.workflow_dir and the call site of _run_go_enrichment) and remove or change
the reassignment to preserve the "results" path so the GO output is written to
workspace/.../results/go-terms/go_results.json where the UI expects it.
- Around line 872-874: mg.querymany(bg_ids, ...) performs an external
MyGene.info HTTP call and must be protected from network hangs; wrap the call to
MyGeneInfo.querymany in a try/except and implement a retry loop with exponential
backoff (or use a retry helper) that catches network-related exceptions (e.g.,
requests exceptions, ConnectionError, timeout) and handles timeouts by passing a
timeout to the request if supported or by failing after N retries—update the
code around the mg.querymany call (identifying mg.querymany and bg_ids) to log
the error and either return a sensible fallback (empty res_list) or re-raise a
controlled exception so the workflow doesn't hang indefinitely.
🧹 Nitpick comments (10)
requirements.txt (1)

148-148: Consider pinning mygene to a specific version.

The auto-generated section above pins every dependency, but the manually-added block (lines 139–148) leaves versions open. For reproducible builds, consider pinning mygene (e.g., mygene>=3.2.2,<4). This is a pre-existing pattern for the other manual entries too, so not blocking.

src/workflow/WorkflowManager.py (1)

207-212: Use subprocess.run instead of os.system for process termination.

While pid is always an integer here (validated by int() on line 206), os.system launches a shell unnecessarily. subprocess.run avoids the shell and the static analysis warning (S605).

♻️ Proposed fix
-                # Windows
-                if platform.system() == "Windows":
-                    os.system(f"taskkill /F /T /PID {pid}")
-                else:
-                    # Linux/macOS
-                    os.kill(pid, signal.SIGTERM)
+                # Windows
+                if platform.system() == "Windows":
+                    import subprocess
+                    subprocess.run(
+                        ["taskkill", "/F", "/T", "/PID", str(pid)],
+                        check=False,
+                    )
+                else:
+                    # Linux/macOS
+                    os.kill(pid, signal.SIGTERM)
content/results_proteomicslfq.py (2)

4-4: Unused import: numpy.

numpy is imported but not used anywhere in this file.

♻️ Proposed fix
-import numpy as np

67-74: No error handling for malformed JSON.

If go_results.json is corrupted or has an unexpected structure, json.load or the subsequent key accesses (lines 88-89) will raise unhandled exceptions, crashing the page. Wrap in a try/except to degrade gracefully.

🛡️ Proposed fix
 if not go_json_file.exists():
     st.info("GO Enrichment results are not available yet. Please run the analysis first.")
 else:
     import json
     import plotly.io as pio
-    
-    with open(go_json_file, "r") as f:
-        go_data = json.load(f)
+
+    try:
+        with open(go_json_file, "r") as f:
+            go_data = json.load(f)
+    except (json.JSONDecodeError, OSError) as e:
+        st.error(f"Failed to load GO results: {e}")
+        st.stop()
src/WorkflowTest.py (6)

829-834: Duplicate step numbering: two "5️⃣" sections.

Lines 818–819 label the GO enrichment as step 5️⃣, and lines 829–830 also label the final report as step 5️⃣. Renumber one of them (e.g., final report → 6️⃣).


875-877: Use ~ operator instead of != True for pandas boolean filtering.

res_go["notfound"] != True works but is flagged by linters (E712) and is non-idiomatic for pandas. The notfound column may contain True or NaN; using ~ with fillna handles both correctly.

♻️ Proposed fix
-                    if "notfound" in res_go.columns:
-                        res_go = res_go[res_go["notfound"] != True]
+                    if "notfound" in res_go.columns:
+                        res_go = res_go[~res_go["notfound"].fillna(False).astype(bool)]

838-846: _run_go_enrichment returns None on the error path without logging success — consider early return pattern.

When analysis_df is empty (line 844), the method logs an error and falls through. The deep nesting (4+ levels of indentation) throughout this method makes the logic hard to follow. Consider using early returns to flatten the structure.

♻️ Sketch of flattened structure
     def _run_go_enrichment(self, pivot_df: pd.DataFrame, results_dir: Path):
         p_cutoff = 0.05
         fc_cutoff = 1.0
 
         analysis_df = pivot_df.dropna(subset=["p-value", "log2FC"]).copy()
 
-        if analysis_df.empty:
-            st.error("No valid statistical data found for GO enrichment.")
-            self.logger.log("❗ analysis_df is empty")
-        else:
-            with st.spinner("Fetching GO terms from MyGene.info API..."):
-                # ... 140+ lines deeply nested ...
+        if analysis_df.empty:
+            st.error("No valid statistical data found for GO enrichment.")
+            self.logger.log("❗ analysis_df is empty")
+            return
+
+        with st.spinner("Fetching GO terms from MyGene.info API..."):
+            # ... rest of logic at one less indentation level ...

893-895: Remove extraneous f prefixes from strings without placeholders.

Lines 895, 944, and 963 use f-strings but contain no {} expressions. This is flagged by Ruff (F541).

♻️ Proposed fix
-                    self.logger.log(f"✅ fg_set bg_set are set")
+                    self.logger.log("✅ fg_set bg_set are set")
-                        self.logger.log(f"✅ Plotly Figure generated")
+                        self.logger.log("✅ Plotly Figure generated")
-                    self.logger.log(f"✅ go_type generated")
+                    self.logger.log("✅ go_type generated")

Also applies to: 943-944, 962-963


9-9: page_setup is imported (line 14) but only referenced in commented-out code (line 19).

Either remove the import or uncomment line 19 if it's needed.

♻️ Proposed fix
-from src.common.common import page_setup

Also applies to: 14-14, 19-19


932-932: Replacing exact zeros in p-values with 1e-10 may mask data issues.

df["p_value"].replace(0, 1e-10) silently changes exact-zero p-values before the -log10 transform. A p-value of exactly 0.0 from Fisher's exact test can indicate a numerical underflow for very significant terms. Consider using np.finfo(float).tiny or sys.float_info.min for a more principled floor, or at minimum documenting this choice.

Comment on lines +64 to +65
results_dir = Path(st.session_state["workspace"]) / "topp-workflow" / "results" / "go-terms"
go_json_file = results_dir / "go_results.json"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for all references to "topp-workflow" in the codebase
rg -n "topp-workflow" --type=py

Repository: OpenMS/quantms-web

Length of output: 334


🏁 Script executed:

# Check the actual file and surrounding context
find . -name "results_proteomicslfq.py" -type f

Repository: OpenMS/quantms-web

Length of output: 96


🏁 Script executed:

# Find and examine WorkflowManager and WorkflowTest to understand the naming convention
fd -e py -type f | xargs grep -l "class WorkflowManager\|class WorkflowTest" | head -5

Repository: OpenMS/quantms-web

Length of output: 232


🏁 Script executed:

# Examine the results_helpers.py file to see the helper function
cat -n src/common/results_helpers.py

Repository: OpenMS/quantms-web

Length of output: 12765


🏁 Script executed:

# Search for WorkflowManager and WorkflowTest class definitions
rg -n "class WorkflowManager|class WorkflowTest" --type py -A 10

Repository: OpenMS/quantms-web

Length of output: 1713


🏁 Script executed:

# Check if there are other imports or usages of results_helpers in content files
rg -n "from.*results_helpers|import.*results_helpers" --type py

Repository: OpenMS/quantms-web

Length of output: 1045


Use the centralized get_workflow_dir() helper instead of hardcoding the path.

Line 64 hardcodes "topp-workflow", but src/common/results_helpers.py provides get_workflow_dir(workspace) for this purpose. Other result files (e.g., results_abundance.py, results_library.py, results_filtered.py) already import and use this helper. For consistency and maintainability, import get_workflow_dir and replace the hardcoded path:

from src.common.results_helpers import get_abundance_data, get_workflow_dir

Then on line 64:

workflow_dir = get_workflow_dir(st.session_state["workspace"])
results_dir = workflow_dir / "results" / "go-terms"
🤖 Prompt for AI Agents
In `@content/results_proteomicslfq.py` around lines 64 - 65, Replace the hardcoded
construction of results_dir using "topp-workflow" with the shared helper: import
get_workflow_dir from src.common.results_helpers (alongside get_abundance_data
if present) and call workflow_dir =
get_workflow_dir(st.session_state["workspace"]); then build results_dir =
workflow_dir / "results" / "go-terms" (keep go_json_file = results_dir /
"go_results.json") and remove the hardcoded Path(...) use so the file uses
get_workflow_dir consistently.

Comment on lines +872 to +874
res_list = mg.querymany(
bg_ids, scopes="uniprot", fields="go", as_dataframe=False
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

mg.querymany makes an external API call with no timeout or retry — network failure will crash the workflow.

mygene.MyGeneInfo().querymany() calls the MyGene.info REST API. If the service is slow or unreachable, this will block indefinitely and potentially leave the workflow hung. Consider wrapping this in a try/except with a timeout or retry strategy.

🛡️ Proposed fix — add error handling
-                    res_list = mg.querymany(
-                        bg_ids, scopes="uniprot", fields="go", as_dataframe=False
-                    )
+                    try:
+                        res_list = mg.querymany(
+                            bg_ids, scopes="uniprot", fields="go", as_dataframe=False
+                        )
+                    except Exception as e:
+                        self.logger.log(f"❗ MyGene API call failed: {e}")
+                        st.warning("GO enrichment skipped: failed to fetch GO terms from MyGene.info.")
+                        return
🤖 Prompt for AI Agents
In `@src/WorkflowTest.py` around lines 872 - 874, mg.querymany(bg_ids, ...)
performs an external MyGene.info HTTP call and must be protected from network
hangs; wrap the call to MyGeneInfo.querymany in a try/except and implement a
retry loop with exponential backoff (or use a retry helper) that catches
network-related exceptions (e.g., requests exceptions, ConnectionError, timeout)
and handles timeouts by passing a timeout to the request if supported or by
failing after N retries—update the code around the mg.querymany call
(identifying mg.querymany and bg_ids) to log the error and either return a
sensible fallback (empty res_list) or re-raise a controlled exception so the
workflow doesn't hang indefinitely.

Comment on lines +984 to +985
st.session_state["go_results"] = go_results
st.session_state["go_ready"] = True if go_data else False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Storing Plotly figures and flags in st.session_state from a workflow subprocess.

execution() runs inside multiprocessing.Process (see WorkflowManager.workflow_process). Streamlit's st.session_state is not available in child processes — these writes will either silently fail or raise an error. The JSON file export (lines 981-983) is the correct persistence mechanism. These session state writes are dead code in the subprocess context.

♻️ Proposed fix — remove dead session state writes
                     with open(go_json_file, "w") as f:
                         json.dump(go_data, f)
-                    st.session_state["go_results"] = go_results
-                    st.session_state["go_ready"] = True if go_data else False
                     self.logger.log("✅ GO enrichment analysis complete")
🤖 Prompt for AI Agents
In `@src/WorkflowTest.py` around lines 984 - 985, The subprocess function
execution() is attempting to write to st.session_state (setting "go_results" and
"go_ready") while running under WorkflowManager.workflow_process, but
Streamlit's session_state isn't available in child processes; remove these dead
writes and rely on the existing JSON export that already persists
go_results/go_data (the persistence logic surrounding go_results, go_data and
the JSON export) so the subprocess only writes to the JSON and does not touch
st.session_state.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/WorkflowTest.py`:
- Line 942: In the calls to self.logger.log that currently use f-strings with no
interpolation (e.g., self.logger.log(f"✅ Plotly Figure generated") and the
similar call around line 961), remove the unnecessary f prefix so the string is
a plain literal (self.logger.log("✅ Plotly Figure generated")), ensuring no
functional change and eliminating the F541 lint warning; search for other
self.logger.log(...) occurrences in WorkflowTest.py and convert any other
non-interpolated f-strings the same way.
- Around line 875-888: Replace the boolean Series comparison and lambda capture
to silence the linter: change the filtering res_go = res_go[res_go["notfound"]
!= True] to use the bitwise not on the boolean Series (e.g., res_go =
res_go[~res_go["notfound"]]) and update the loop that assigns GO term columns so
extract_go_terms is bound directly in a small helper function or use a defaulted
lambda to capture go_type (e.g., define a local function get_terms(x,
gt=go_type): return extract_go_terms(x, gt) and use that in
res_go["{go_type}_terms".format(go_type=go_type)] =
res_go["go"].apply(get_terms)); also remove the unnecessary f-prefix for string
literals when building column names (use normal string formatting or .format
instead of f-strings with no placeholders).
🧹 Nitpick comments (2)
src/WorkflowTest.py (2)

966-966: Move import json to the top of the file.

Inline imports hurt readability and are non-idiomatic. json is a stdlib module with no import cost concern.

♻️ Proposed fix

Add at the top of the file with other imports:

import json

Then remove line 966.


836-985: Large method with deeply nested logic — consider flattening.

_run_go_enrichment is ~150 lines with 4+ levels of nesting (if/else → with spinner → if/else → for loop). The inner run_go function (lines 895-950) is well-extracted, but the overall method would benefit from early returns to reduce nesting depth. For example, the else block starting at line 845 could be inverted:

if analysis_df.empty:
    st.error("No valid statistical data found for GO enrichment.")
    self.logger.log("❗ analysis_df is empty")
    return
# ... rest of method at top level

This is a readability improvement and not urgent.

Comment on lines +875 to +888
res_go = res_go[res_go["notfound"] != True]

def extract_go_terms(go_data, go_type):
if not isinstance(go_data, dict) or go_type not in go_data:
return []
terms = go_data[go_type]
if isinstance(terms, dict):
terms = [terms]
return list({t.get("term") for t in terms if "term" in t})

for go_type in ["BP", "CC", "MF"]:
res_go[f"{go_type}_terms"] = res_go["go"].apply(
lambda x: extract_go_terms(x, go_type)
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix lint issues flagged by static analysis.

Three issues in this block:

  1. Line 875 (E712): != True on a pandas Series — use ~ operator instead.
  2. Line 887 (B023): Lambda captures loop variable go_type by reference. It's safe here since .apply() consumes it immediately, but the fix is trivial and silences the warning.
  3. Lines 893 (F541): f-string without placeholders — remove the f prefix.
🔧 Proposed fix
                     if "notfound" in res_go.columns:
-                        res_go = res_go[res_go["notfound"] != True]
+                        res_go = res_go[~res_go["notfound"].fillna(False)]
 
                     def extract_go_terms(go_data, go_type):
                         if not isinstance(go_data, dict) or go_type not in go_data:
                             return []
                         terms = go_data[go_type]
                         if isinstance(terms, dict):
                             terms = [terms]
                         return list({t.get("term") for t in terms if "term" in t})
 
                     for go_type in ["BP", "CC", "MF"]:
                         res_go[f"{go_type}_terms"] = res_go["go"].apply(
-                            lambda x: extract_go_terms(x, go_type)
+                            lambda x, _gt=go_type: extract_go_terms(x, _gt)
                         )
 
                     annotated_ids = set(res_go["query"].astype(str))
                     fg_set = annotated_ids.intersection(fg_ids)
                     bg_set = annotated_ids
-                    self.logger.log(f"✅ fg_set bg_set are set")
+                    self.logger.log("✅ fg_set bg_set are set")
🧰 Tools
🪛 Ruff (0.14.14)

[error] 875-875: Avoid inequality comparisons to True; use not res_go["notfound"]: for false checks

Replace with not res_go["notfound"]

(E712)


[warning] 887-887: Function definition does not bind loop variable go_type

(B023)

🤖 Prompt for AI Agents
In `@src/WorkflowTest.py` around lines 875 - 888, Replace the boolean Series
comparison and lambda capture to silence the linter: change the filtering res_go
= res_go[res_go["notfound"] != True] to use the bitwise not on the boolean
Series (e.g., res_go = res_go[~res_go["notfound"]]) and update the loop that
assigns GO term columns so extract_go_terms is bound directly in a small helper
function or use a defaulted lambda to capture go_type (e.g., define a local
function get_terms(x, gt=go_type): return extract_go_terms(x, gt) and use that
in res_go["{go_type}_terms".format(go_type=go_type)] =
res_go["go"].apply(get_terms)); also remove the unnecessary f-prefix for string
literals when building column names (use normal string formatting or .format
instead of f-strings with no placeholders).

title=f"GO Enrichment ({go_type})",
)

self.logger.log(f"✅ Plotly Figure generated")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefixes (F541).

These f-strings have no interpolation placeholders.

🔧 Proposed fix
-                        self.logger.log(f"✅ Plotly Figure generated")
+                        self.logger.log("✅ Plotly Figure generated")
-                    self.logger.log(f"✅ go_type generated")
+                    self.logger.log("✅ go_type generated")

Also applies to: 961-961

🧰 Tools
🪛 Ruff (0.14.14)

[error] 942-942: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents
In `@src/WorkflowTest.py` at line 942, In the calls to self.logger.log that
currently use f-strings with no interpolation (e.g., self.logger.log(f"✅ Plotly
Figure generated") and the similar call around line 961), remove the unnecessary
f prefix so the string is a plain literal (self.logger.log("✅ Plotly Figure
generated")), ensuring no functional change and eliminating the F541 lint
warning; search for other self.logger.log(...) occurrences in WorkflowTest.py
and convert any other non-interpolated f-strings the same way.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
content/results_pca.py (1)

46-46: ⚠️ Potential issue | 🟡 Minor

Stale message text: still references "p-value" instead of "p-adj".

Line 46 says "Not enough proteins after p-value filtering" but the filtering now uses p-adj.

Proposed fix
-    st.info("Not enough proteins after p-value filtering for PCA.")
+    st.info("Not enough proteins after p-adj filtering for PCA.")
🤖 Fix all issues with AI agents
In `@src/common/results_helpers.py`:
- Around line 264-270: The current block that computes adjusted p-values can
skip creating the "p-adj" column when stats_df is empty, causing a KeyError
later; ensure "p-adj" always exists by initializing stats_df["p-adj"] = np.nan
when stats_df is empty (or pre-create it before the not-empty check), then keep
the existing logic that uses mask, multipletests and assigns p_adj back to
stats_df.loc[mask, "p-adj"] so the column is present whether or not any rows
exist.
🧹 Nitpick comments (1)
src/common/results_helpers.py (1)

296-296: Nit: prefer iterable unpacking over concatenation (Ruff RUF005).

Proposed fix
-    pivot_df = pivot_df[["ProteinName", "log2FC", "p-value", "p-adj"] + all_samples + ["PeptideSequence"]]
+    pivot_df = pivot_df[["ProteinName", "log2FC", "p-value", "p-adj", *all_samples, "PeptideSequence"]]

Comment on lines +264 to +270
if not stats_df.empty:
mask = stats_df["p-value"].notna()
if mask.any():
_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
stats_df.loc[mask, "p-adj"] = p_adj
else:
stats_df["p-adj"] = np.nan
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing p-adj column when stats_df is empty.

If stats_df is empty (no proteins after groupby), the if not stats_df.empty block is skipped entirely, so the "p-adj" column is never created. The subsequent reference on line 296 would then raise a KeyError.

While unlikely given the upstream guards, it's a simple defensive fix:

Proposed fix
     if not stats_df.empty:
         mask = stats_df["p-value"].notna()
         if mask.any():
             _, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
             stats_df.loc[mask, "p-adj"] = p_adj
         else:
             stats_df["p-adj"] = np.nan
+    else:
+        stats_df["p-adj"] = np.nan
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not stats_df.empty:
mask = stats_df["p-value"].notna()
if mask.any():
_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
stats_df.loc[mask, "p-adj"] = p_adj
else:
stats_df["p-adj"] = np.nan
if not stats_df.empty:
mask = stats_df["p-value"].notna()
if mask.any():
_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
stats_df.loc[mask, "p-adj"] = p_adj
else:
stats_df["p-adj"] = np.nan
else:
stats_df["p-adj"] = np.nan
🤖 Prompt for AI Agents
In `@src/common/results_helpers.py` around lines 264 - 270, The current block that
computes adjusted p-values can skip creating the "p-adj" column when stats_df is
empty, causing a KeyError later; ensure "p-adj" always exists by initializing
stats_df["p-adj"] = np.nan when stats_df is empty (or pre-create it before the
not-empty check), then keep the existing logic that uses mask, multipletests and
assigns p_adj back to stats_df.loc[mask, "p-adj"] so the column is present
whether or not any rows exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant