refactor: decouple GO enrichment logic and improve data flow safety by hjn0415a · Pull Request #10 · OpenMS/quantms-web

hjn0415a · 2026-02-10T06:20:29Z

Key Changes

Extracted the complex GO enrichment logic from the execution method into a dedicated internal method: _run_go_enrichment.
Replaced st.session_state["workspace"] with a local variable workspace_path.

Summary by CodeRabbit

New Features
- Added a Proteomics LFQ results page with protein abundance tables and integrated GO enrichment (BP/CC/MF) visualizations and tables.
Improvements
- Volcano and PCA views now use FDR-adjusted p-values (p-adj) for ranking, filtering, labels, and plotting.
- Improved Windows compatibility and more robust local workflow cleanup.
Chores
- Updated project dependencies.

coderabbitai · 2026-02-10T06:24:39Z

📝 Walkthrough

Walkthrough

Adds a Proteomics LFQ results page and Streamlit module; implements GO enrichment in WorkflowTest (new _run_go_enrichment) using mygene and saves GO results as JSON; updates results helpers to compute p‑adj; makes WorkflowManager stop routine Windows‑compatible; adds dependencies and small UI/logic tweaks.

Changes

Cohort / File(s)	Summary
App registration & new page `app.py`, `content/results_proteomicslfq.py`	Registers a "Proteomics LFQ" results page and adds a Streamlit page that loads abundance data, shows protein-level tables, persists pivot_df in session, and reads/renders `go_results.json` (BP/CC/MF) with conditional messaging.
Workflow & GO enrichment `src/WorkflowTest.py`	Adds `_run_go_enrichment(self, pivot_df, results_dir)` and invokes it after ProteomicsLFQ quantification: UniProt mapping (mygene), GO term retrieval, Fisher exact tests per ontology, Plotly figures generation, and persistence of `go_results.json` and figures.
Results processing / stats `src/common/results_helpers.py`	Computes Benjamini–Hochberg adjusted p-values (`p-adj`) via statsmodels.multitest and includes `p-adj` in stats and pivot outputs for downstream use.
Volcano / PCA UI tweaks `content/results_volcano.py`, `content/results_pca.py`	Switches significance/filtering logic from p-value to p‑adj across volcano and PCA displays, updates labels/metrics and plot hover data and colors accordingly.
Process management `src/workflow/WorkflowManager.py`	Adds Windows compatibility to local workflow stop routine (uses `taskkill` on Windows), expands exception handling for PID cleanup, and guards pid file unlinking.
Dependencies & minor edits `requirements.txt`, `content/workflow_run.py`	Adds `mygene` and `statsmodels` to requirements and removes a stray blank line in `content/workflow_run.py`.

Sequence Diagram(s)

sequenceDiagram
    participant User as "User"
    participant UI as "Streamlit UI\n(results_proteomicslfq)"
    participant WF as "WorkflowTest"
    participant LFQ as "ProteomicsLFQ Module"
    participant MG as "mygene (External API)"
    participant Store as "Results Storage\n(go_results.json, figures)"

    User->>UI: open Proteomics LFQ page
    UI->>Store: request go_results.json
    alt go_results.json exists
        Store-->>UI: return go_results.json + figures
        UI->>UI: render BP/CC/MF tabs (plots + tables)
        UI-->>User: display GO results
    else no go_results.json
        UI-->>User: show info (no GO results)
    end

    Note over WF,LFQ: (separate workflow execution)
    WF->>LFQ: run quantification -> pivot_df
    WF->>WF: _run_go_enrichment(pivot_df, results_dir)
    WF->>MG: query UniProt / GO annotations
    MG-->>WF: return GO annotations
    WF->>WF: compute Fisher tests, build figures
    WF->>Store: write go_results.json and figures

Poem

🐰 I hopped through LFQ fields so bright,
Counted proteins by day and night,
BP, CC, MF in tidy rows,
Fisher found what garden shows,
Saved the treasures — what a sight! 🧪✨

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection	⚠️ Warning	❌ Merge conflicts detected (15 files): ⚔️ `.github/workflows/build-windows-executable-app.yaml` (content) ⚔️ `.streamlit/config.toml` (content) ⚔️ `Dockerfile` (content) ⚔️ `app.py` (content) ⚔️ `content/results_pca.py` (content) ⚔️ `content/results_volcano.py` (content) ⚔️ `content/workflow_run.py` (content) ⚔️ `docker-compose.yml` (content) ⚔️ `requirements.txt` (content) ⚔️ `src/WorkflowTest.py` (content) ⚔️ `src/common/common.py` (content) ⚔️ `src/common/results_helpers.py` (content) ⚔️ `src/workflow/CommandExecutor.py` (content) ⚔️ `src/workflow/StreamlitUI.py` (content) ⚔️ `src/workflow/WorkflowManager.py` (content) These conflicts must be resolved before merging into `main`.	Resolve conflicts locally and push changes to this branch.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: refactoring GO enrichment logic into a dedicated _run_go_enrichment method and improving data flow safety by replacing session state reliance with local variables.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

⚔️ Resolve merge conflicts (beta)

Auto-commit resolved conflicts to branch feature/go-term-in-execution
Post resolved changes as copyable diffs in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/WorkflowTest.py (1)
356-362: ⚠️ Potential issue | 🔴 Critical

results_dir is silently overwritten two lines below its creation, causing downstream confusion.

Line 356 sets results_dir = Path(self.workflow_dir, "results"), then line 362 overwrites it with Path(self.workflow_dir, "input-files"). This is the root cause of the GO output path bug (line 825) and also makes the final report (line 834) show the wrong directory. Consider using a distinct variable name for the input-files path.
🐛 Proposed fix
-        results_dir = Path(self.workflow_dir, "input-files")
+        input_files_dir = Path(self.workflow_dir, "input-files")
Then update any downstream references that actually need the input-files path accordingly.

🤖 Fix all issues with AI agents

In `@content/results_proteomicslfq.py`:
- Around line 64-65: Replace the hardcoded construction of results_dir using
"topp-workflow" with the shared helper: import get_workflow_dir from
src.common.results_helpers (alongside get_abundance_data if present) and call
workflow_dir = get_workflow_dir(st.session_state["workspace"]); then build
results_dir = workflow_dir / "results" / "go-terms" (keep go_json_file =
results_dir / "go_results.json") and remove the hardcoded Path(...) use so the
file uses get_workflow_dir consistently.

In `@src/WorkflowTest.py`:
- Around line 984-985: The subprocess function execution() is attempting to
write to st.session_state (setting "go_results" and "go_ready") while running
under WorkflowManager.workflow_process, but Streamlit's session_state isn't
available in child processes; remove these dead writes and rely on the existing
JSON export that already persists go_results/go_data (the persistence logic
surrounding go_results, go_data and the JSON export) so the subprocess only
writes to the JSON and does not touch st.session_state.
- Around line 817-825: The bug is that results_dir (intended to point at the
"results" folder) is overwritten with "input-files" before being passed to
_run_go_enrichment, causing GO output to be written to the wrong location; fix
by ensuring results_dir is set to self.workflow_dir / "results" (and not
overwritten) before calling self._run_go_enrichment(pivot_df, results_dir) —
locate where results_dir is assigned and where it is reassigned (near uses of
self.workflow_dir and the call site of _run_go_enrichment) and remove or change
the reassignment to preserve the "results" path so the GO output is written to
workspace/.../results/go-terms/go_results.json where the UI expects it.
- Around line 872-874: mg.querymany(bg_ids, ...) performs an external
MyGene.info HTTP call and must be protected from network hangs; wrap the call to
MyGeneInfo.querymany in a try/except and implement a retry loop with exponential
backoff (or use a retry helper) that catches network-related exceptions (e.g.,
requests exceptions, ConnectionError, timeout) and handles timeouts by passing a
timeout to the request if supported or by failing after N retries—update the
code around the mg.querymany call (identifying mg.querymany and bg_ids) to log
the error and either return a sensible fallback (empty res_list) or re-raise a
controlled exception so the workflow doesn't hang indefinitely.

🧹 Nitpick comments (10)

requirements.txt (1)

148-148: Consider pinning mygene to a specific version.

The auto-generated section above pins every dependency, but the manually-added block (lines 139–148) leaves versions open. For reproducible builds, consider pinning mygene (e.g., mygene>=3.2.2,<4). This is a pre-existing pattern for the other manual entries too, so not blocking.
src/workflow/WorkflowManager.py (1)
207-212: Use subprocess.run instead of os.system for process termination.

While pid is always an integer here (validated by int() on line 206), os.system launches a shell unnecessarily. subprocess.run avoids the shell and the static analysis warning (S605).
♻️ Proposed fix
-                # Windows
-                if platform.system() == "Windows":
-                    os.system(f"taskkill /F /T /PID {pid}")
-                else:
-                    # Linux/macOS
-                    os.kill(pid, signal.SIGTERM)
+                # Windows
+                if platform.system() == "Windows":
+                    import subprocess
+                    subprocess.run(
+                        ["taskkill", "/F", "/T", "/PID", str(pid)],
+                        check=False,
+                    )
+                else:
+                    # Linux/macOS
+                    os.kill(pid, signal.SIGTERM)
content/results_proteomicslfq.py (2)
4-4: Unused import: numpy.

numpy is imported but not used anywhere in this file.
♻️ Proposed fix
-import numpy as np
67-74: No error handling for malformed JSON.

If go_results.json is corrupted or has an unexpected structure, json.load or the subsequent key accesses (lines 88-89) will raise unhandled exceptions, crashing the page. Wrap in a try/except to degrade gracefully.
🛡️ Proposed fix
 if not go_json_file.exists():
     st.info("GO Enrichment results are not available yet. Please run the analysis first.")
 else:
     import json
     import plotly.io as pio
-    
-    with open(go_json_file, "r") as f:
-        go_data = json.load(f)
+
+    try:
+        with open(go_json_file, "r") as f:
+            go_data = json.load(f)
+    except (json.JSONDecodeError, OSError) as e:
+        st.error(f"Failed to load GO results: {e}")
+        st.stop()
src/WorkflowTest.py (6)
829-834: Duplicate step numbering: two "5️⃣" sections.

Lines 818–819 label the GO enrichment as step 5️⃣, and lines 829–830 also label the final report as step 5️⃣. Renumber one of them (e.g., final report → 6️⃣).

875-877: Use ~ operator instead of != True for pandas boolean filtering.

res_go["notfound"] != True works but is flagged by linters (E712) and is non-idiomatic for pandas. The notfound column may contain True or NaN; using ~ with fillna handles both correctly.
♻️ Proposed fix
-                    if "notfound" in res_go.columns:
-                        res_go = res_go[res_go["notfound"] != True]
+                    if "notfound" in res_go.columns:
+                        res_go = res_go[~res_go["notfound"].fillna(False).astype(bool)]
838-846: _run_go_enrichment returns None on the error path without logging success — consider early return pattern.

When analysis_df is empty (line 844), the method logs an error and falls through. The deep nesting (4+ levels of indentation) throughout this method makes the logic hard to follow. Consider using early returns to flatten the structure.
♻️ Sketch of flattened structure
     def _run_go_enrichment(self, pivot_df: pd.DataFrame, results_dir: Path):
         p_cutoff = 0.05
         fc_cutoff = 1.0
 
         analysis_df = pivot_df.dropna(subset=["p-value", "log2FC"]).copy()
 
-        if analysis_df.empty:
-            st.error("No valid statistical data found for GO enrichment.")
-            self.logger.log("❗ analysis_df is empty")
-        else:
-            with st.spinner("Fetching GO terms from MyGene.info API..."):
-                # ... 140+ lines deeply nested ...
+        if analysis_df.empty:
+            st.error("No valid statistical data found for GO enrichment.")
+            self.logger.log("❗ analysis_df is empty")
+            return
+
+        with st.spinner("Fetching GO terms from MyGene.info API..."):
+            # ... rest of logic at one less indentation level ...
893-895: Remove extraneous f prefixes from strings without placeholders.

Lines 895, 944, and 963 use f-strings but contain no {} expressions. This is flagged by Ruff (F541).
♻️ Proposed fix
-                    self.logger.log(f"✅ fg_set bg_set are set")
+                    self.logger.log("✅ fg_set bg_set are set")
-                        self.logger.log(f"✅ Plotly Figure generated")
+                        self.logger.log("✅ Plotly Figure generated")
-                    self.logger.log(f"✅ go_type generated")
+                    self.logger.log("✅ go_type generated")
Also applies to: 943-944, 962-963

9-9: page_setup is imported (line 14) but only referenced in commented-out code (line 19).

Either remove the import or uncomment line 19 if it's needed.
♻️ Proposed fix
-from src.common.common import page_setup
Also applies to: 14-14, 19-19

932-932: Replacing exact zeros in p-values with 1e-10 may mask data issues.

df["p_value"].replace(0, 1e-10) silently changes exact-zero p-values before the -log10 transform. A p-value of exactly 0.0 from Fisher's exact test can indicate a numerical underflow for very significant terms. Consider using np.finfo(float).tiny or sys.float_info.min for a more principled floor, or at minimum documenting this choice.

coderabbitai · 2026-02-10T06:24:42Z

content/results_proteomicslfq.py

+results_dir = Path(st.session_state["workspace"]) / "topp-workflow" / "results" / "go-terms"
+go_json_file = results_dir / "go_results.json"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for all references to "topp-workflow" in the codebase rg -n "topp-workflow" --type=py

Repository: OpenMS/quantms-web

Length of output: 334

🏁 Script executed:

# Check the actual file and surrounding context find . -name "results_proteomicslfq.py" -type f

Repository: OpenMS/quantms-web

Length of output: 96

🏁 Script executed:

# Find and examine WorkflowManager and WorkflowTest to understand the naming convention fd -e py -type f | xargs grep -l "class WorkflowManager\|class WorkflowTest" | head -5

Repository: OpenMS/quantms-web

Length of output: 232

🏁 Script executed:

# Examine the results_helpers.py file to see the helper function cat -n src/common/results_helpers.py

Repository: OpenMS/quantms-web

Length of output: 12765

🏁 Script executed:

# Search for WorkflowManager and WorkflowTest class definitions rg -n "class WorkflowManager|class WorkflowTest" --type py -A 10

Repository: OpenMS/quantms-web

Length of output: 1713

🏁 Script executed:

# Check if there are other imports or usages of results_helpers in content files rg -n "from.*results_helpers|import.*results_helpers" --type py

Repository: OpenMS/quantms-web

Length of output: 1045

Use the centralized get_workflow_dir() helper instead of hardcoding the path.

Line 64 hardcodes "topp-workflow", but src/common/results_helpers.py provides get_workflow_dir(workspace) for this purpose. Other result files (e.g., results_abundance.py, results_library.py, results_filtered.py) already import and use this helper. For consistency and maintainability, import get_workflow_dir and replace the hardcoded path:

from src.common.results_helpers import get_abundance_data, get_workflow_dir

Then on line 64:

workflow_dir = get_workflow_dir(st.session_state["workspace"]) results_dir = workflow_dir / "results" / "go-terms"

🤖 Prompt for AI Agents

In `@content/results_proteomicslfq.py` around lines 64 - 65, Replace the hardcoded construction of results_dir using "topp-workflow" with the shared helper: import get_workflow_dir from src.common.results_helpers (alongside get_abundance_data if present) and call workflow_dir = get_workflow_dir(st.session_state["workspace"]); then build results_dir = workflow_dir / "results" / "go-terms" (keep go_json_file = results_dir / "go_results.json") and remove the hardcoded Path(...) use so the file uses get_workflow_dir consistently.

src/WorkflowTest.py

coderabbitai · 2026-02-10T06:24:42Z

src/WorkflowTest.py

+                    res_list = mg.querymany(
+                        bg_ids, scopes="uniprot", fields="go", as_dataframe=False
+                    )


⚠️ Potential issue | 🟠 Major

mg.querymany makes an external API call with no timeout or retry — network failure will crash the workflow.

mygene.MyGeneInfo().querymany() calls the MyGene.info REST API. If the service is slow or unreachable, this will block indefinitely and potentially leave the workflow hung. Consider wrapping this in a try/except with a timeout or retry strategy.

🛡️ Proposed fix — add error handling

- res_list = mg.querymany( - bg_ids, scopes="uniprot", fields="go", as_dataframe=False - ) + try: + res_list = mg.querymany( + bg_ids, scopes="uniprot", fields="go", as_dataframe=False + ) + except Exception as e: + self.logger.log(f"❗ MyGene API call failed: {e}") + st.warning("GO enrichment skipped: failed to fetch GO terms from MyGene.info.") + return

🤖 Prompt for AI Agents

In `@src/WorkflowTest.py` around lines 872 - 874, mg.querymany(bg_ids, ...) performs an external MyGene.info HTTP call and must be protected from network hangs; wrap the call to MyGeneInfo.querymany in a try/except and implement a retry loop with exponential backoff (or use a retry helper) that catches network-related exceptions (e.g., requests exceptions, ConnectionError, timeout) and handles timeouts by passing a timeout to the request if supported or by failing after N retries—update the code around the mg.querymany call (identifying mg.querymany and bg_ids) to log the error and either return a sensible fallback (empty res_list) or re-raise a controlled exception so the workflow doesn't hang indefinitely.

coderabbitai · 2026-02-10T06:24:42Z

src/WorkflowTest.py

+                    st.session_state["go_results"] = go_results
+                    st.session_state["go_ready"] = True if go_data else False


⚠️ Potential issue | 🟡 Minor

Storing Plotly figures and flags in st.session_state from a workflow subprocess.

execution() runs inside multiprocessing.Process (see WorkflowManager.workflow_process). Streamlit's st.session_state is not available in child processes — these writes will either silently fail or raise an error. The JSON file export (lines 981-983) is the correct persistence mechanism. These session state writes are dead code in the subprocess context.

♻️ Proposed fix — remove dead session state writes

with open(go_json_file, "w") as f: json.dump(go_data, f) - st.session_state["go_results"] = go_results - st.session_state["go_ready"] = True if go_data else False self.logger.log("✅ GO enrichment analysis complete")

🤖 Prompt for AI Agents

In `@src/WorkflowTest.py` around lines 984 - 985, The subprocess function execution() is attempting to write to st.session_state (setting "go_results" and "go_ready") while running under WorkflowManager.workflow_process, but Streamlit's session_state isn't available in child processes; remove these dead writes and rely on the existing JSON export that already persists go_results/go_data (the persistence logic surrounding go_results, go_data and the JSON export) so the subprocess only writes to the JSON and does not touch st.session_state.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/WorkflowTest.py`:
- Line 942: In the calls to self.logger.log that currently use f-strings with no
interpolation (e.g., self.logger.log(f"✅ Plotly Figure generated") and the
similar call around line 961), remove the unnecessary f prefix so the string is
a plain literal (self.logger.log("✅ Plotly Figure generated")), ensuring no
functional change and eliminating the F541 lint warning; search for other
self.logger.log(...) occurrences in WorkflowTest.py and convert any other
non-interpolated f-strings the same way.
- Around line 875-888: Replace the boolean Series comparison and lambda capture
to silence the linter: change the filtering res_go = res_go[res_go["notfound"]
!= True] to use the bitwise not on the boolean Series (e.g., res_go =
res_go[~res_go["notfound"]]) and update the loop that assigns GO term columns so
extract_go_terms is bound directly in a small helper function or use a defaulted
lambda to capture go_type (e.g., define a local function get_terms(x,
gt=go_type): return extract_go_terms(x, gt) and use that in
res_go["{go_type}_terms".format(go_type=go_type)] =
res_go["go"].apply(get_terms)); also remove the unnecessary f-prefix for string
literals when building column names (use normal string formatting or .format
instead of f-strings with no placeholders).

🧹 Nitpick comments (2)

src/WorkflowTest.py (2)
966-966: Move import json to the top of the file.

Inline imports hurt readability and are non-idiomatic. json is a stdlib module with no import cost concern.
♻️ Proposed fix

Add at the top of the file with other imports:
import json
Then remove line 966.
836-985: Large method with deeply nested logic — consider flattening.

_run_go_enrichment is ~150 lines with 4+ levels of nesting (if/else → with spinner → if/else → for loop). The inner run_go function (lines 895-950) is well-extracted, but the overall method would benefit from early returns to reduce nesting depth. For example, the else block starting at line 845 could be inverted:
if analysis_df.empty:
    st.error("No valid statistical data found for GO enrichment.")
    self.logger.log("❗ analysis_df is empty")
    return
# ... rest of method at top level
This is a readability improvement and not urgent.

coderabbitai · 2026-02-10T07:47:23Z

src/WorkflowTest.py

+                        res_go = res_go[res_go["notfound"] != True]
+
+                    def extract_go_terms(go_data, go_type):
+                        if not isinstance(go_data, dict) or go_type not in go_data:
+                            return []
+                        terms = go_data[go_type]
+                        if isinstance(terms, dict):
+                            terms = [terms]
+                        return list({t.get("term") for t in terms if "term" in t})
+
+                    for go_type in ["BP", "CC", "MF"]:
+                        res_go[f"{go_type}_terms"] = res_go["go"].apply(
+                            lambda x: extract_go_terms(x, go_type)
+                        )


⚠️ Potential issue | 🟡 Minor

Fix lint issues flagged by static analysis.

Three issues in this block:

Line 875 (E712): != True on a pandas Series — use ~ operator instead.

Line 887 (B023): Lambda captures loop variable go_type by reference. It's safe here since .apply() consumes it immediately, but the fix is trivial and silences the warning.

Lines 893 (F541): f-string without placeholders — remove the f prefix.

🔧 Proposed fix

if "notfound" in res_go.columns: - res_go = res_go[res_go["notfound"] != True] + res_go = res_go[~res_go["notfound"].fillna(False)] def extract_go_terms(go_data, go_type): if not isinstance(go_data, dict) or go_type not in go_data: return [] terms = go_data[go_type] if isinstance(terms, dict): terms = [terms] return list({t.get("term") for t in terms if "term" in t}) for go_type in ["BP", "CC", "MF"]: res_go[f"{go_type}_terms"] = res_go["go"].apply( - lambda x: extract_go_terms(x, go_type) + lambda x, _gt=go_type: extract_go_terms(x, _gt) ) annotated_ids = set(res_go["query"].astype(str)) fg_set = annotated_ids.intersection(fg_ids) bg_set = annotated_ids - self.logger.log(f"✅ fg_set bg_set are set") + self.logger.log("✅ fg_set bg_set are set")

🧰 Tools

🪛 Ruff (0.14.14)

[error] 875-875: Avoid inequality comparisons to True; use not res_go["notfound"]: for false checks

Replace with not res_go["notfound"]

(E712)

[warning] 887-887: Function definition does not bind loop variable go_type

(B023)

🤖 Prompt for AI Agents

In `@src/WorkflowTest.py` around lines 875 - 888, Replace the boolean Series comparison and lambda capture to silence the linter: change the filtering res_go = res_go[res_go["notfound"] != True] to use the bitwise not on the boolean Series (e.g., res_go = res_go[~res_go["notfound"]]) and update the loop that assigns GO term columns so extract_go_terms is bound directly in a small helper function or use a defaulted lambda to capture go_type (e.g., define a local function get_terms(x, gt=go_type): return extract_go_terms(x, gt) and use that in res_go["{go_type}_terms".format(go_type=go_type)] = res_go["go"].apply(get_terms)); also remove the unnecessary f-prefix for string literals when building column names (use normal string formatting or .format instead of f-strings with no placeholders).

coderabbitai · 2026-02-10T07:47:23Z

src/WorkflowTest.py

+                            title=f"GO Enrichment ({go_type})",
+                        )
+
+                        self.logger.log(f"✅ Plotly Figure generated")


⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefixes (F541).

These f-strings have no interpolation placeholders.

🔧 Proposed fix

- self.logger.log(f"✅ Plotly Figure generated") + self.logger.log("✅ Plotly Figure generated")

- self.logger.log(f"✅ go_type generated") + self.logger.log("✅ go_type generated")

Also applies to: 961-961

🧰 Tools

🪛 Ruff (0.14.14)

[error] 942-942: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In `@src/WorkflowTest.py` at line 942, In the calls to self.logger.log that currently use f-strings with no interpolation (e.g., self.logger.log(f"✅ Plotly Figure generated") and the similar call around line 961), remove the unnecessary f prefix so the string is a plain literal (self.logger.log("✅ Plotly Figure generated")), ensuring no functional change and eliminating the F541 lint warning; search for other self.logger.log(...) occurrences in WorkflowTest.py and convert any other non-interpolated f-strings the same way.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

content/results_pca.py (1)
46-46: ⚠️ Potential issue | 🟡 Minor

Stale message text: still references "p-value" instead of "p-adj".

Line 46 says "Not enough proteins after p-value filtering" but the filtering now uses p-adj.
Proposed fix
-    st.info("Not enough proteins after p-value filtering for PCA.")
+    st.info("Not enough proteins after p-adj filtering for PCA.")

🤖 Fix all issues with AI agents

In `@src/common/results_helpers.py`:
- Around line 264-270: The current block that computes adjusted p-values can
skip creating the "p-adj" column when stats_df is empty, causing a KeyError
later; ensure "p-adj" always exists by initializing stats_df["p-adj"] = np.nan
when stats_df is empty (or pre-create it before the not-empty check), then keep
the existing logic that uses mask, multipletests and assigns p_adj back to
stats_df.loc[mask, "p-adj"] so the column is present whether or not any rows
exist.

🧹 Nitpick comments (1)

src/common/results_helpers.py (1)
296-296: Nit: prefer iterable unpacking over concatenation (Ruff RUF005).
Proposed fix
-    pivot_df = pivot_df[["ProteinName", "log2FC", "p-value", "p-adj"] + all_samples + ["PeptideSequence"]]
+    pivot_df = pivot_df[["ProteinName", "log2FC", "p-value", "p-adj", *all_samples, "PeptideSequence"]]

coderabbitai · 2026-02-13T05:32:10Z

src/common/results_helpers.py

+    if not stats_df.empty:
+        mask = stats_df["p-value"].notna()
+        if mask.any():
+            _, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")
+            stats_df.loc[mask, "p-adj"] = p_adj
+        else:
+            stats_df["p-adj"] = np.nan


⚠️ Potential issue | 🟡 Minor

Missing p-adj column when stats_df is empty.

If stats_df is empty (no proteins after groupby), the if not stats_df.empty block is skipped entirely, so the "p-adj" column is never created. The subsequent reference on line 296 would then raise a KeyError.

While unlikely given the upstream guards, it's a simple defensive fix:

Proposed fix

if not stats_df.empty: mask = stats_df["p-value"].notna() if mask.any(): _, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh") stats_df.loc[mask, "p-adj"] = p_adj else: stats_df["p-adj"] = np.nan + else: + stats_df["p-adj"] = np.nan

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if not stats_df.empty:

mask = stats_df["p-value"].notna()

if mask.any():

_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")

stats_df.loc[mask, "p-adj"] = p_adj

else:

stats_df["p-adj"] = np.nan

if not stats_df.empty:

mask = stats_df["p-value"].notna()

if mask.any():

_, p_adj, _, _ = multipletests(stats_df.loc[mask, "p-value"], method="fdr_bh")

stats_df.loc[mask, "p-adj"] = p_adj

else:

stats_df["p-adj"] = np.nan

else:

stats_df["p-adj"] = np.nan

🤖 Prompt for AI Agents

In `@src/common/results_helpers.py` around lines 264 - 270, The current block that computes adjusted p-values can skip creating the "p-adj" column when stats_df is empty, causing a KeyError later; ensure "p-adj" always exists by initializing stats_df["p-adj"] = np.nan when stats_df is empty (or pre-create it before the not-empty check), then keep the existing logic that uses mask, multipletests and assigns p_adj back to stats_df.loc[mask, "p-adj"] so the column is present whether or not any rows exist.

hjn0415a added 5 commits February 6, 2026 17:13

test commit

d8e402d

feat: temporal commit

9bf39df

feat: integrate GO term analysis into execution method

272924f

refactor: extract GO enrichment analysis into a separate method

c1b5bc9

refactor: pass results_dir to _run_go_enrichment to handle output paths

ebeb023

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

fix: remove incorrect reassignment of results_dir to input-files

b754582

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

feat: upgrade statistical analysis from p-value to p-adj (FDR)

6762873

coderabbitai bot reviewed Feb 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: decouple GO enrichment logic and improve data flow safety #10

refactor: decouple GO enrichment logic and improve data flow safety #10
hjn0415a wants to merge 7 commits intoOpenMS:mainfrom
hjn0415a:feature/go-term-in-execution

hjn0415a commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

coderabbitai bot Feb 10, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		results_dir = Path(st.session_state["workspace"]) / "topp-workflow" / "results" / "go-terms"
		go_json_file = results_dir / "go_results.json"

		st.session_state["go_results"] = go_results
		st.session_state["go_ready"] = True if go_data else False

Conversation

hjn0415a commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hjn0415a commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading