Skip to content

SNStatComp/webfocusedscrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webfocusedscrape

Focused scraping component for the Statistical Scraping concept in Official Statistics.

WEB-FOSS-NL

This repo is part of the WEB-FOSS-NL project on statistical scraping. More info on statistical scraping here

Getting started

  • Install all required packages using

    pip install -r requirements.txt

  • Activate the environment
  • run the following command to install modules in src as packages for proper import

    pip install -e .

  • Create a config.yaml file using config_template.yaml
  • In the config file specify the input files:
    • urls: the filename with the given urls, see also urls_template.txt
    • keywords: the filename with the target keywords, see also keywords_template.txt

Known bugs and work in progress

  • no support yet for js page content extraction

About

Focused scraping component for the Statistical Scraping concept

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages