Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an ALL_MODE scanner constant and propagates it through scanner selection, KB activation, merging, and reporting; implements KB reachability retries and HTTPError handling; introduces KB reference assembly/injection into reports; extends merge_results with excluded_files; adds kb_origin_url and kb_evidence to SourceItem. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/fosslight_source/cli.py (1)
265-287: Unreachable code at line 287.The
return Falseat line 287 is dead code. All control flow paths within the loop either returnTrue(on success or HTTPError) or returnFalseinside the except blocks whenattempt == 2. The loop can never complete naturally without returning.🧹 Remove unreachable return statement
except Exception as e: logger.debug(f"Unexpected error checking KB server: {e}") if attempt < 2: time.sleep(1) else: return False - return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 265 - 287, The final "return False" in check_kb_server_reachable is unreachable because the except blocks return on the last attempt; change those inner "else: return False" branches to "break" (or set a flag) so the loop can exit naturally, then keep a single "return False" after the loop to handle failure; update references within check_kb_server_reachable (the except blocks handling urllib.error.URLError and generic Exception, and the attempt retry logic) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/fosslight_source/cli.py`:
- Around line 265-287: The final "return False" in check_kb_server_reachable is
unreachable because the except blocks return on the last attempt; change those
inner "else: return False" branches to "break" (or set a flag) so the loop can
exit naturally, then keep a single "return False" after the loop to handle
failure; update references within check_kb_server_reachable (the except blocks
handling urllib.error.URLError and generic Exception, and the attempt retry
logic) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ce86a3dc-c565-4d15-b7bb-0940d95597b4
📒 Files selected for processing (1)
src/fosslight_source/cli.py
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/fosslight_source/cli.py (1)
229-241:⚠️ Potential issue | 🟠 Major
kb_referenceis incorrectly gated by-m.The new sheet is built only under
if need_license, butneed_licenseis tied to matched-text output. A default--scanner kb/--scanner allrun without-mwill never emit the KB reference sheet even whenmerged_resultcontains KB URLs.Proposed fix
if need_license: if selected_scanner == 'scancode': sheet_list["scancode_reference"] = get_license_list_to_print(license_list) elif selected_scanner == 'scanoss': sheet_list["scanoss_reference"] = get_scanoss_extra_info(scanoss_result) - elif selected_scanner == 'kb': - kb_ref = get_kb_reference_to_print(merged_result) - sheet_list["kb_reference"] = kb_ref else: sheet_list["scancode_reference"] = get_license_list_to_print(license_list) sheet_list["scanoss_reference"] = get_scanoss_extra_info(scanoss_result) - kb_ref = get_kb_reference_to_print(merged_result) - sheet_list["kb_reference"] = kb_ref + + if selected_scanner in ['kb', ALL_MODE]: + sheet_list["kb_reference"] = get_kb_reference_to_print(merged_result)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 229 - 241, The KB reference sheet is only being created inside the need_license branch, so runs with --scanner kb or --scanner all without -m never get kb_reference even when merged_result has KB URLs; update the logic around selected_scanner so that get_kb_reference_to_print(merged_result) is called and sheet_list["kb_reference"] is set whenever selected_scanner == 'kb' or selected_scanner == 'all' (or the default branch for all scanners), independent of need_license, while leaving the existing license/scanoss logic under need_license unchanged.
🧹 Nitpick comments (1)
src/fosslight_source/cli.py (1)
311-315: Avoid mutable defaults inmerge_results().The function declares mutable defaults (
[]and{}), which is a Python anti-pattern. Although the current call site at line 469 explicitly passes all arguments, these defaults should be replaced withNoneto follow best practices and prevent accidental state bleeding if the function is called differently in the future.Proposed fix
def merge_results( - scancode_result: list = [], scanoss_result: list = [], spdx_downloads: dict = {}, + scancode_result: list = None, scanoss_result: list = None, spdx_downloads: dict = None, - path_to_scan: str = "", run_kb: bool = False, manifest_licenses: dict = {}, + path_to_scan: str = "", run_kb: bool = False, manifest_licenses: dict = None, excluded_files: set = None ) -> list: + if scancode_result is None: + scancode_result = [] + if scanoss_result is None: + scanoss_result = [] + if spdx_downloads is None: + spdx_downloads = {} + if manifest_licenses is None: + manifest_licenses = {} if excluded_files is None: excluded_files = set()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 311 - 315, Replace the mutable default arguments in merge_results (scancode_result, scanoss_result, spdx_downloads, manifest_licenses) with None and initialize them inside the function (e.g., if scancode_result is None: scancode_result = []) to avoid shared-state bugs; keep path_to_scan and run_kb as-is and preserve the excluded_files default if intended, but ensure all mutable parameters are reliably initialized inside merge_results before use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fosslight_source/_scan_item.py`:
- Around line 187-190: The code currently overwrites a missing API score by
setting kb_match_pct to "100"; instead, preserve the API's original response by
assigning kb_match_pct directly from _get_origin_url_from_md5_hash without
defaulting to "100". Update the branch that sets self.kb_origin_url and
self.kb_match_pct (references: _get_origin_url_from_md5_hash, kb_origin_url,
kb_match_pct) so that kb_match_pct is set to match_pct as-is (allowing
empty/None) or left unset, rather than substituting "100".
In `@src/fosslight_source/cli.py`:
- Around line 212-214: The cover sheet is using a fresh
check_kb_server_reachable() call instead of the actual decision made earlier (so
it can mismatch merge_results()); stop re-pinging and instead thread the boolean
that indicates whether KB lookup actually ran (e.g., kb_used) from the code that
resolved KB reachability and call site that invokes create_report_file(), then
inside create_report_file() use that kb_used value to set
scan_item.set_cover_comment("KB Enabled" if kb_used else "KB Unreachable")
rather than calling check_kb_server_reachable(); update the call sites of
create_report_file() to accept the new kb_used parameter and remove the
redundant reachability call near run_kb/scan_item.set_cover_comment.
---
Outside diff comments:
In `@src/fosslight_source/cli.py`:
- Around line 229-241: The KB reference sheet is only being created inside the
need_license branch, so runs with --scanner kb or --scanner all without -m never
get kb_reference even when merged_result has KB URLs; update the logic around
selected_scanner so that get_kb_reference_to_print(merged_result) is called and
sheet_list["kb_reference"] is set whenever selected_scanner == 'kb' or
selected_scanner == 'all' (or the default branch for all scanners), independent
of need_license, while leaving the existing license/scanoss logic under
need_license unchanged.
---
Nitpick comments:
In `@src/fosslight_source/cli.py`:
- Around line 311-315: Replace the mutable default arguments in merge_results
(scancode_result, scanoss_result, spdx_downloads, manifest_licenses) with None
and initialize them inside the function (e.g., if scancode_result is None:
scancode_result = []) to avoid shared-state bugs; keep path_to_scan and run_kb
as-is and preserve the excluded_files default if intended, but ensure all
mutable parameters are reliably initialized inside merge_results before use.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b8092cdf-4bff-4aa8-9f03-9ca3d11ddc93
📒 Files selected for processing (2)
src/fosslight_source/_scan_item.pysrc/fosslight_source/cli.py
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/fosslight_source/cli.py (3)
318-322: Avoid mutable default arguments.Per static analysis (B006), using mutable defaults like
[]and{}can lead to unexpected behavior if the defaults are modified. While the current code appears safe, this is a maintenance hazard.Proposed fix
def merge_results( - scancode_result: list = [], scanoss_result: list = [], spdx_downloads: dict = {}, - path_to_scan: str = "", run_kb: bool = False, manifest_licenses: dict = {}, - excluded_files: set = None + scancode_result: list = None, scanoss_result: list = None, spdx_downloads: dict = None, + path_to_scan: str = "", run_kb: bool = False, manifest_licenses: dict = None, + excluded_files: set = None ) -> list: + if scancode_result is None: + scancode_result = [] + if scanoss_result is None: + scanoss_result = [] + if spdx_downloads is None: + spdx_downloads = {} + if manifest_licenses is None: + manifest_licenses = {} if excluded_files is None: excluded_files = set()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 318 - 322, The function merge_results currently uses mutable default arguments (scancode_result, scanoss_result, spdx_downloads, manifest_licenses) which can cause state leakage; change their signatures to use None defaults and inside merge_results initialize them to empty structures if None (e.g., if scancode_result is None: scancode_result = []; same for scanoss_result and manifest_licenses with dicts for spdx_downloads and manifest_licenses), leaving excluded_files as None or converting to an empty set inside if needed; update references to these variables in merge_results accordingly to avoid mutating shared defaults.
400-400: Mutable default argumentpath_to_exclude=[].Same concern as
merge_results()- mutable default arguments should be avoided.Proposed fix
- selected_scanner: str = ALL_MODE, path_to_exclude: list = [], + selected_scanner: str = ALL_MODE, path_to_exclude: list = None,Then add at the start of the function:
if path_to_exclude is None: path_to_exclude = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` at line 400, The function signature currently uses a mutable default for path_to_exclude (path_to_exclude: list = []); change the default to None (e.g., path_to_exclude: Optional[list] = None) and at the start of the function (the same function that declares selected_scanner and path_to_exclude) add a guard: if path_to_exclude is None: path_to_exclude = [] to avoid shared mutable state; follow the same pattern used for merge_results and update any type hints/imports as needed.
287-292: Catching bareExceptionmay hide unexpected errors.Per static analysis (BLE001), catching blind
Exceptioncan mask programming errors. Consider catching more specific exceptions or at minimum logging the exception type.Proposed improvement
- except Exception as e: - logger.debug(f"Unexpected error checking KB server: {e}") + except (OSError, ValueError) as e: + logger.debug(f"Error checking KB server: {type(e).__name__}: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 287 - 292, Replace the broad "except Exception as e" in the KB server check block with more specific exception handling: catch network/IO related exceptions (e.g., requests.exceptions.RequestException, socket.error, TimeoutError) and handle them the same way (use logger to record full exception type/message), and only fall back to a generic except Exception as e that re-raises after logging if truly needed; update the import list to include the specific exception classes and use logger.exception or logger.debug to include exception type and stack trace while preserving the existing retry logic that uses attempt, time.sleep, and return False.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fosslight_source/cli.py`:
- Around line 216-217: When selected_scanner == ALL_MODE the display string is
built from SCANNER_TYPE and incorrectly includes the 'all' entry; update the
logic that sets display_mode (the block using selected_scanner, ALL_MODE,
display_mode, and SCANNER_TYPE) to exclude the ALL_MODE value from the list
before joining (e.g., filter SCANNER_TYPE to skip entries equal to ALL_MODE) so
the output lists only actual scanner names.
- Around line 296-315: get_kb_reference_to_print builds rows with 3 columns but
KB_REFERENCE_HEADER has 4 columns, causing misalignment; update
get_kb_reference_to_print to include the missing ID column for each data row
(e.g., use an explicit identifier from the SourceItem like item.id or generate a
stable index with enumerate) so each row matches KB_REFERENCE_HEADER's ['ID',
'Source Path', 'KB Origin URL', 'Match %']; ensure the ID value is added as the
first element in the list comprehension that builds data and maintain sorting by
item.source_name_or_path.
- Around line 271-293: The final "return False" at the end of
check_kb_server_reachable is dead code because every path inside the
three-attempt loop already returns; remove that unreachable return (or
alternatively convert the loop to while/for and place a single return False
after the loop), updating only the check_kb_server_reachable function and
leaving KB_URL/logging logic unchanged.
---
Nitpick comments:
In `@src/fosslight_source/cli.py`:
- Around line 318-322: The function merge_results currently uses mutable default
arguments (scancode_result, scanoss_result, spdx_downloads, manifest_licenses)
which can cause state leakage; change their signatures to use None defaults and
inside merge_results initialize them to empty structures if None (e.g., if
scancode_result is None: scancode_result = []; same for scanoss_result and
manifest_licenses with dicts for spdx_downloads and manifest_licenses), leaving
excluded_files as None or converting to an empty set inside if needed; update
references to these variables in merge_results accordingly to avoid mutating
shared defaults.
- Line 400: The function signature currently uses a mutable default for
path_to_exclude (path_to_exclude: list = []); change the default to None (e.g.,
path_to_exclude: Optional[list] = None) and at the start of the function (the
same function that declares selected_scanner and path_to_exclude) add a guard:
if path_to_exclude is None: path_to_exclude = [] to avoid shared mutable state;
follow the same pattern used for merge_results and update any type hints/imports
as needed.
- Around line 287-292: Replace the broad "except Exception as e" in the KB
server check block with more specific exception handling: catch network/IO
related exceptions (e.g., requests.exceptions.RequestException, socket.error,
TimeoutError) and handle them the same way (use logger to record full exception
type/message), and only fall back to a generic except Exception as e that
re-raises after logging if truly needed; update the import list to include the
specific exception classes and use logger.exception or logger.debug to include
exception type and stack trace while preserving the existing retry logic that
uses attempt, time.sleep, and return False.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d996ecbd-675a-4ba9-b3f5-352adedab326
📒 Files selected for processing (2)
src/fosslight_source/_scan_item.pysrc/fosslight_source/cli.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/fosslight_source/_scan_item.py (1)
108-125:⚠️ Potential issue | 🟠 MajorReturn KB evidence with the URL.
This helper only returns the origin URL, but the new
kb_referencesheet also exportskb_evidence. Since the request can includewfp_base64,set_oss_item()currently has to hardcode"exact_match"for every hit, which mislabels snippet/WFP-based matches. Please return the evidence from the KB response here and propagate that value instead of synthesizing it later.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/_scan_item.py` around lines 108 - 125, The helper _get_origin_url_from_md5_hash currently only extracts and returns the origin URL from the KB response; update it to also extract the KB evidence field from the response (e.g., from data['output'] or the returned object) and return both values (e.g., as a tuple or small dict) so callers can use the actual evidence instead of synthesizing "exact_match"; then update set_oss_item() to accept and propagate the returned evidence value into the kb_evidence/kb_reference export path rather than hardcoding "exact_match".src/fosslight_source/cli.py (1)
229-244:⚠️ Potential issue | 🟠 MajorDon't hide
kb_referencebehind-m.
kb_referenceis unrelated to matched-license text, but this block only adds it whenneed_licenseis true. In--scanner kborallruns without-m, the new sheet never gets emitted. Move the KB sheet generation outside theneed_licensegate and keep that flag only forscancode_reference/scanoss_reference.💡 Possible split
if need_license: if selected_scanner == 'scancode': sheet_list["scancode_reference"] = get_license_list_to_print(license_list) elif selected_scanner == 'scanoss': sheet_list["scanoss_reference"] = get_scanoss_extra_info(scanoss_result) - elif selected_scanner == 'kb': - kb_ref = get_kb_reference_to_print(merged_result) - sheet_list["kb_reference"] = kb_ref else: sheet_list["scancode_reference"] = get_license_list_to_print(license_list) sheet_list["scanoss_reference"] = get_scanoss_extra_info(scanoss_result) - kb_ref = get_kb_reference_to_print(merged_result) - sheet_list["kb_reference"] = kb_ref + + if selected_scanner in ['kb', ALL_MODE]: + kb_ref = get_kb_reference_to_print(merged_result) + if len(kb_ref) > 1: + sheet_list["kb_reference"] = kb_ref🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 229 - 244, The current logic only builds kb_reference when need_license is true, causing KB-only runs (selected_scanner == 'kb' or 'all' without -m) to omit the KB sheet; change the block so that get_kb_reference_to_print(merged_result) and sheet_list["kb_reference"] = kb_ref are executed outside/after the need_license conditional while keeping get_license_list_to_print(license_list) and get_scanoss_extra_info(scanoss_result) generation inside the need_license branch; ensure you still respect selected_scanner checks (selected_scanner == 'kb' should produce only kb_reference when appropriate), and leave the final if sheet_list: scan_item.external_sheets = sheet_list check as-is.
♻️ Duplicate comments (2)
src/fosslight_source/cli.py (2)
212-214:⚠️ Potential issue | 🟡 MinorUse the resolved KB state here, not a second health check.
merge_results()already decides whether KB lookups ran. Re-probing here can make the cover sheet sayKB Enabled/KB Unreachableindependently of what was actually used to buildmerged_result. Please thread that finalkb_usedboolean intocreate_report_file()and use it here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 212 - 214, Thread the authoritative kb_used boolean (determined by merge_results()) into create_report_file() and use that value when setting the cover comment instead of re-checking the server; specifically, stop calling check_kb_server_reachable() in the block that currently computes run_kb and calls scan_item.set_cover_comment, accept a kb_used parameter in create_report_file (or add it to the call site that produces the report), and replace the cover comment logic with scan_item.set_cover_comment("KB Enabled" if kb_used else "KB Unreachable") so the cover sheet reflects the actual merged_result usage.
306-314:⚠️ Potential issue | 🟠 MajorAdd the missing
IDvalue to each KB row.
KB_REFERENCE_HEADERhas four columns, but each row here has only three values. That shifts every exported column left and produces a malformedkb_referencesheet.🧩 Minimal fix
- data = [ - [ - item.source_name_or_path, - item.kb_origin_url, - str(getattr(item, 'kb_evidence', '') or '') - ] - for item in rows - ] + data = [ + [ + idx, + item.source_name_or_path, + item.kb_origin_url, + str(getattr(item, 'kb_evidence', '') or '') + ] + for idx, item in enumerate(rows, start=1) + ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 306 - 314, The KB rows list is missing the ID column so each row has three values while KB_REFERENCE_HEADER has four; update the list comprehension that builds data (the block using rows and fields item.source_name_or_path, item.kb_origin_url, getattr(item, 'kb_evidence', '')) to include the ID value as the first (or appropriate) element for each item—e.g., use getattr(item, 'id', '') or the correct identifier attribute on the item—so each row contains four entries matching KB_REFERENCE_HEADER before inserting the header.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/fosslight_source/_scan_item.py`:
- Around line 108-125: The helper _get_origin_url_from_md5_hash currently only
extracts and returns the origin URL from the KB response; update it to also
extract the KB evidence field from the response (e.g., from data['output'] or
the returned object) and return both values (e.g., as a tuple or small dict) so
callers can use the actual evidence instead of synthesizing "exact_match"; then
update set_oss_item() to accept and propagate the returned evidence value into
the kb_evidence/kb_reference export path rather than hardcoding "exact_match".
In `@src/fosslight_source/cli.py`:
- Around line 229-244: The current logic only builds kb_reference when
need_license is true, causing KB-only runs (selected_scanner == 'kb' or 'all'
without -m) to omit the KB sheet; change the block so that
get_kb_reference_to_print(merged_result) and sheet_list["kb_reference"] = kb_ref
are executed outside/after the need_license conditional while keeping
get_license_list_to_print(license_list) and
get_scanoss_extra_info(scanoss_result) generation inside the need_license
branch; ensure you still respect selected_scanner checks (selected_scanner ==
'kb' should produce only kb_reference when appropriate), and leave the final if
sheet_list: scan_item.external_sheets = sheet_list check as-is.
---
Duplicate comments:
In `@src/fosslight_source/cli.py`:
- Around line 212-214: Thread the authoritative kb_used boolean (determined by
merge_results()) into create_report_file() and use that value when setting the
cover comment instead of re-checking the server; specifically, stop calling
check_kb_server_reachable() in the block that currently computes run_kb and
calls scan_item.set_cover_comment, accept a kb_used parameter in
create_report_file (or add it to the call site that produces the report), and
replace the cover comment logic with scan_item.set_cover_comment("KB Enabled" if
kb_used else "KB Unreachable") so the cover sheet reflects the actual
merged_result usage.
- Around line 306-314: The KB rows list is missing the ID column so each row has
three values while KB_REFERENCE_HEADER has four; update the list comprehension
that builds data (the block using rows and fields item.source_name_or_path,
item.kb_origin_url, getattr(item, 'kb_evidence', '')) to include the ID value as
the first (or appropriate) element for each item—e.g., use getattr(item, 'id',
'') or the correct identifier attribute on the item—so each row contains four
entries matching KB_REFERENCE_HEADER before inserting the header.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 77a083ad-9036-4a1b-b9f5-0ed66750a9ac
📒 Files selected for processing (2)
src/fosslight_source/_scan_item.pysrc/fosslight_source/cli.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/fosslight_source/cli.py (1)
318-336:⚠️ Potential issue | 🟠 MajorMutable default arguments are mutated in function body - this is a bug.
The function uses mutable defaults (
scancode_result: list = [], etc.) and then mutates them via.extend()(line 337) and.append()(lines 349, 362, 381). This means subsequent calls without explicit arguments will see leftover data from previous calls, causing data corruption.Proposed fix
def merge_results( - scancode_result: list = [], scanoss_result: list = [], spdx_downloads: dict = {}, - path_to_scan: str = "", run_kb: bool = False, manifest_licenses: dict = {}, - excluded_files: set = None + scancode_result: list = None, scanoss_result: list = None, spdx_downloads: dict = None, + path_to_scan: str = "", run_kb: bool = False, manifest_licenses: dict = None, + excluded_files: set | None = None ) -> list: ... if excluded_files is None: excluded_files = set() + if scancode_result is None: + scancode_result = [] + if scanoss_result is None: + scanoss_result = [] + if spdx_downloads is None: + spdx_downloads = {} + if manifest_licenses is None: + manifest_licenses = {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 318 - 336, The merge_results function currently uses mutable default arguments (scancode_result, scanoss_result, spdx_downloads, manifest_licenses) which are mutated inside (via .extend()/.append()), causing state leakage across calls; change each of these default params to None and initialize them at the top of merge_results (e.g., if scancode_result is None: scancode_result = []; same for scanoss_result, spdx_downloads = {}, manifest_licenses = {}) so every call gets fresh containers, keeping the existing logic that already handles excluded_files being None.
♻️ Duplicate comments (1)
src/fosslight_source/cli.py (1)
296-315:⚠️ Potential issue | 🟠 MajorColumn count mismatch between header and data rows.
KB_REFERENCE_HEADERhas 4 columns (['ID', 'Source Path', 'KB Origin URL', 'Evidence']) but the data rows only have 3 elements ([source_name_or_path, kb_origin_url, kb_evidence]). This will cause misaligned columns in the output spreadsheet.Proposed fix - add ID column
data = [ [ + idx + 1, # ID column item.source_name_or_path, item.kb_origin_url, str(getattr(item, 'kb_evidence', '') or '') ] - for item in rows + for idx, item in enumerate(rows) ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 296 - 315, get_kb_reference_to_print builds rows with only 3 values but KB_REFERENCE_HEADER expects 4 columns; update the data row construction to include an ID as the first column so each row matches KB_REFERENCE_HEADER. In get_kb_reference_to_print add an ID value (prefer item.id via getattr(item, 'id', None) if present, otherwise a generated sequential index) and include it before item.source_name_or_path; keep using item.source_name_or_path, item.kb_origin_url and getattr(item, 'kb_evidence', '') for the remaining columns so all rows align with KB_REFERENCE_HEADER.
🧹 Nitpick comments (3)
src/fosslight_source/cli.py (3)
271-293: Consider removing unreachable code at line 293.The
return Falseon line 293 is unreachable. The loop runs forattempt in [0, 1, 2], and on the final attempt (2), both exception handlers explicitly returnFalse. The loop cannot complete without returning.Proposed fix
except Exception as e: logger.debug(f"Unexpected error checking KB server: {e}") if attempt < 2: time.sleep(1) else: return False - return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 271 - 293, The final `return False` in check_kb_server_reachable is unreachable because the URLError and generic Exception handlers return False on the last attempt; instead remove the inner `else: return False` branches in the urllib.error.URLError and generic `except Exception as e` blocks so the loop can naturally retry and fall through to the single final `return False` at the end of check_kb_server_reachable; keep the debug logs and the delay logic (time.sleep(1)) intact and reference the function name check_kb_server_reachable and the exception handlers for URLError and Exception when making the change.
391-394: Mutable default arguments in function signature.
formats: list = []andpath_to_exclude: list = []use mutable defaults. While these aren't mutated directly in this function, it's still a Python anti-pattern that can lead to subtle bugs.Proposed fix
print_matched_text: bool = False, - formats: list = [], time_out: int = 120, + formats: list = None, time_out: int = 120, correct_mode: bool = True, correct_filepath: str = "", - selected_scanner: str = ALL_MODE, path_to_exclude: list = [], + selected_scanner: str = ALL_MODE, path_to_exclude: list = None, all_exclude_mode: tuple = ()Then at the start of the function:
if formats is None: formats = [] if path_to_exclude is None: path_to_exclude = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` around lines 391 - 394, The function signature currently uses mutable defaults formats: list = [] and path_to_exclude: list = [], which is an anti-pattern; change those defaults to None (e.g., formats: Optional[list] = None, path_to_exclude: Optional[list] = None) and at the start of the same function initialize them with empty lists if None (if formats is None: formats = []; if path_to_exclude is None: path_to_exclude = []); keep the rest of the parameters (time_out, correct_mode, correct_filepath, selected_scanner/ALL_MODE, all_exclude_mode) unchanged and ensure any type hints/imports are updated accordingly.
143-143: Avoid mutable default argument.Using
formats: list = []as a default can cause subtle bugs if the list is ever mutated. Although the current implementation doesn't modify it in place, it's safer to useNoneas the default.Proposed fix
correct_filepath: str = "", path_to_scan: str = "", path_to_exclude: list = [], - formats: list = [], api_limit_exceed: bool = False, files_count: int = 0, final_output_path: str = "", + formats: list = None, api_limit_exceed: bool = False, files_count: int = 0, final_output_path: str = "", run_kb_msg: str = "" ) -> 'ScannerItem':Then at the start of the function body:
if formats is None: formats = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fosslight_source/cli.py` at line 143, Change the function parameter default for formats from a mutable list to None (i.e., replace "formats: list = []" with "formats: list | None = None") and at the start of the function body add a guard that checks if formats is None and assigns an empty list if so (e.g., if formats is None: formats = []). This ensures the parameter "formats" is not a shared mutable default; update any type hints or usages in the function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/fosslight_source/cli.py`:
- Around line 318-336: The merge_results function currently uses mutable default
arguments (scancode_result, scanoss_result, spdx_downloads, manifest_licenses)
which are mutated inside (via .extend()/.append()), causing state leakage across
calls; change each of these default params to None and initialize them at the
top of merge_results (e.g., if scancode_result is None: scancode_result = [];
same for scanoss_result, spdx_downloads = {}, manifest_licenses = {}) so every
call gets fresh containers, keeping the existing logic that already handles
excluded_files being None.
---
Duplicate comments:
In `@src/fosslight_source/cli.py`:
- Around line 296-315: get_kb_reference_to_print builds rows with only 3 values
but KB_REFERENCE_HEADER expects 4 columns; update the data row construction to
include an ID as the first column so each row matches KB_REFERENCE_HEADER. In
get_kb_reference_to_print add an ID value (prefer item.id via getattr(item,
'id', None) if present, otherwise a generated sequential index) and include it
before item.source_name_or_path; keep using item.source_name_or_path,
item.kb_origin_url and getattr(item, 'kb_evidence', '') for the remaining
columns so all rows align with KB_REFERENCE_HEADER.
---
Nitpick comments:
In `@src/fosslight_source/cli.py`:
- Around line 271-293: The final `return False` in check_kb_server_reachable is
unreachable because the URLError and generic Exception handlers return False on
the last attempt; instead remove the inner `else: return False` branches in the
urllib.error.URLError and generic `except Exception as e` blocks so the loop can
naturally retry and fall through to the single final `return False` at the end
of check_kb_server_reachable; keep the debug logs and the delay logic
(time.sleep(1)) intact and reference the function name check_kb_server_reachable
and the exception handlers for URLError and Exception when making the change.
- Around line 391-394: The function signature currently uses mutable defaults
formats: list = [] and path_to_exclude: list = [], which is an anti-pattern;
change those defaults to None (e.g., formats: Optional[list] = None,
path_to_exclude: Optional[list] = None) and at the start of the same function
initialize them with empty lists if None (if formats is None: formats = []; if
path_to_exclude is None: path_to_exclude = []); keep the rest of the parameters
(time_out, correct_mode, correct_filepath, selected_scanner/ALL_MODE,
all_exclude_mode) unchanged and ensure any type hints/imports are updated
accordingly.
- Line 143: Change the function parameter default for formats from a mutable
list to None (i.e., replace "formats: list = []" with "formats: list | None =
None") and at the start of the function body add a guard that checks if formats
is None and assigns an empty list if so (e.g., if formats is None: formats =
[]). This ensures the parameter "formats" is not a shared mutable default;
update any type hints or usages in the function accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 26b9de3e-c64f-4336-a54d-8a1677e123c5
📒 Files selected for processing (2)
src/fosslight_source/_scan_item.pysrc/fosslight_source/cli.py
Description
New Features
Refactor