Skip to content

Add department notice crawler with parsers and integration to controller/scheduler#200

Open
hyunbin1 wants to merge 1 commit intomainfrom
codex/crawl-diverse-department-notices
Open

Add department notice crawler with parsers and integration to controller/scheduler#200
hyunbin1 wants to merge 1 commit intomainfrom
codex/crawl-diverse-department-notices

Conversation

@hyunbin1
Copy link
Member

Motivation

  • Provide automated crawling and ingestion of department notice lists from various site types to populate department notices.
  • Support multiple source formats (MJU subview, Gnuboard/PHP board, WordPress) with fallback parsing to maximize successful extraction.
  • Expose an on-demand crawl API and run department crawling as part of the existing scheduler.

Description

  • Added DepartmentNoticeCrawlingService to fetch, parse, deduplicate (by link) and persist new DepartmentNotice entries and to produce a CrawlReport summary.
  • Introduced a parsing subsystem under service.notice.crawl including DepartmentNoticeSource, DepartmentNoticeSourceRegistry (preconfigured sources), DepartmentNoticeSourceType, DepartmentNoticeListParser interface, AbstractDepartmentNoticeParser helper, concrete parsers (MjuSubViewDepartmentNoticeParser, GnuboardDepartmentNoticeParser, WordpressDepartmentNoticeParser) and CrawledDepartmentNotice record.
  • Integrated the crawler into the REST API by adding POST /api/v1/departments/notices/crawl in DepartmentController and wired the service into the SchedulerService to run with existing notice crawling tasks.
  • Added repository helper existsByDepartmentAndLink to DepartmentNoticeRepository to avoid inserting duplicates and implemented parsing fallback logic that picks the parser producing the most items when the preferred parser yields none.

Testing

  • Ran the project's automated test suite with ./mvnw test and the tests completed successfully.
  • Verified application boots and Spring context initializes the new beans by running the application startup (no automated failure observed).

Codex Task

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant