Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 856 Bytes

File metadata and controls

20 lines (17 loc) · 856 Bytes

webfocusedscrape

Focused scraping component for the Statistical Scraping concept in Official Statistics.

WEB-FOSS-NL

This repo is part of the WEB-FOSS-NL project on statistical scraping. More info on statistical scraping here

Getting started

  • Install all required packages using

    pip install -r requirements.txt

  • Activate the environment
  • run the following command to install modules in src as packages for proper import

    pip install -e .

  • Create a config.yaml file using config_template.yaml
  • In the config file specify the input files:
    • urls: the filename with the given urls, see also urls_template.txt
    • keywords: the filename with the target keywords, see also keywords_template.txt

Known bugs and work in progress

  • no support yet for js page content extraction