This is a Scrapy-based web scraper that extracts book titles, prices, and URLs from Books to Scrape. The extracted data is stored in a JSON file for further analysis.
Ensure you have Python 3.6+ installed. Check with:
python --versionIf not installed, download it from python.org.
pip install scrapygit clone https://github.com/your-username/books_scraper.git
cd books_scraperOpen books.json to see the extracted data:
[
{
"title": "A Light in the Attic",
"price": "£51.77",
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
}
]- Make sure you are inside the Scrapy project directory before running the command.
- Check that
books_spider.pyexists inbooks_scraper/spiders/.
- Ensure Scrapy is installed correctly using
pip install scrapy. - Try running with Python:
python -m scrapy crawl books -o books.json
- Scrapy Documentation
- Books to Scrape for providing sample data.