A searchable archive of BBC Food Recipes.
Homepage: https://www.auntiesrecipes.co.uk
Features I would like help with:
- Write a scraper for other recipe sites ;)
- Paginated / infinite-scrollable results
- Download all the images and add them to the repo so I don't have to hotlink them
- Better isVegetarian detection in the scraper
- Search option based on prep/cooking/total time
- More dietary requirements (vegan, gluten-free, etc)
The download can be done through a shell script and wget or through a Node.js download app.
- Sitemap.xml is pulled (this file is huge and contains each recipe)
- Each HTML file is pulled at a rate of 3-10 at a time depending on config
npm install
Note: this will take a long time and creates over 1GB of stored text.
node index.js
First run the download tool to gather all the HTML files from the BBC website.
Use npm to install dependencies such as Cheerio.
cd scraper
npm install
cd scraper
node scrape.js
Optional: environmental variable RECIPE_DIR can be set.