Google News Scraper for Python

Real-time Google News scraping via API. Extract headlines, sources, and dates instantly.
Powered by Thordata's high-speed SERP infrastructure.

🎯 Quick Start: AI News Briefing

Get the latest AI industry news with one command!

# One command to get latest AI news
python main.py --ai-brief

# Get AI breakthroughs only
python main.py --ai-breakthroughs --limit 10

# Export to CSV
python main.py --ai-brief --format csv --limit 30

This feature automatically searches multiple AI-related keywords and combines the results into a comprehensive briefing. Perfect for staying updated on the latest AI developments!

⚡ Features

🤖 AI News Briefing: One-command feature to get latest AI industry news and breakthroughs
📰 Real-Time Data: Get the latest news as it happens (no cache lag when needed).
⚡ Smart Caching: Automatic response caching (5min TTL) for instant repeated queries
🔄 Auto Retry: Exponential backoff retry mechanism for reliable requests
📊 Progress Indicators: Visual feedback for long-running operations
🌍 Global Coverage: Support for any country (us, uk, jp, cn, etc.) and language.
🚀 High Speed: Synchronous API response (<3s average), cached responses <0.1s
🧹 Clean Output: Automatically parses complex JSON into simple lists (JSON/CSV).
🛡️ No Bans: Full proxy rotation and anti-bot handling managed by Thordata.
🔧 Advanced API: Uses latest SerpRequest and serp_search_advanced for better control.
📱 Device Support: Specify device type (desktop, mobile, tablet) for different results.
🌐 Language Control: Fine-tune language settings for localized results.

📦 Sample Output

[
  {
    "title": "OpenAI Announces GPT-5 with Revolutionary Capabilities",
    "source": "TechCrunch",
    "date": "2 hours ago",
    "snippet": "OpenAI has unveiled GPT-5, featuring unprecedented reasoning capabilities...",
    "link": "https://techcrunch.com/...",
    "thumbnail": "data:image/png;base64,..."
  },
  {
    "title": "Google DeepMind Breakthrough in Protein Folding",
    "source": "Nature",
    "date": "5 hours ago",
    "snippet": "New AI model predicts protein structures with 95% accuracy...",
    "link": "https://nature.com/...",
    "thumbnail": "data:image/png;base64,..."
  }
]

🚀 Installation & Setup

1. Get Your Token

Get your free scraping token from the Thordata Dashboard.

2. Install Dependencies

git clone https://github.com/Thordata/google-news-scraper-python.git
cd google-news-scraper-python
pip install -r requirements.txt

3. Configure

Copy .env.example to .env and fill in your token:

THORDATA_SCRAPER_TOKEN=your_token_here

💡 Usage Examples

AI News Briefing (Featured!)

# Get comprehensive AI news briefing
python main.py --ai-brief

# Get AI breakthroughs only
python main.py --ai-breakthroughs --limit 15

# AI news with custom settings
python main.py --ai-brief --limit 50 --country uk --format csv

Basic Search

# Simple search
python main.py "Artificial Intelligence"

# Search with custom limit
python main.py "Crypto Market" --limit 50

Advanced Search

# Search with country and language
python main.py "Tesla News" --country uk --language en

# Search with device type
python main.py "AI Updates" --device mobile --no-cache

# Full example with all options
python main.py "Bitcoin Price" \
  --limit 100 \
  --country jp \
  --language ja \
  --device desktop \
  --format csv \
  --no-cache

📋 Command Line Arguments

Argument	Description	Default
`query`	Search topic (required unless using `--ai-brief`)	-
`--ai-brief`	Get latest AI industry news (one-command feature)	False
`--ai-breakthroughs`	Get latest AI breakthroughs only	False
`--limit`	Maximum number of results	20
`--country`	Country code (`us`, `uk`, `jp`, `cn`, etc.)	`us`
`--language`	Language code (`en`, `zh`, `ja`, etc.)	Auto
`--device`	Device type (`desktop`, `mobile`, `tablet`)	Auto
`--format`	Output format (`json`, `csv`)	`json`
`--no-cache`	Bypass cache for fresh results	False

🎨 Use Cases

1. Daily AI News Monitoring

# Run this daily to stay updated
python main.py --ai-brief --limit 30 --format csv

2. Research & Analysis

# Collect news for specific research topics
python main.py "machine learning research" --limit 100 --format csv

3. Market Intelligence

# Track industry news by country
python main.py "tech industry" --country us --limit 50
python main.py "tech industry" --country uk --limit 50

4. Content Aggregation

# Aggregate news from multiple sources
python main.py "climate change" --limit 50 --format json

5. Competitive Intelligence

# Monitor competitor news
python main.py "competitor name" --no-cache --limit 20

📁 Output Format

Results are saved to the output/ directory in your chosen format:

JSON: Structured data with all fields
CSV: Spreadsheet-friendly format

Each file is named based on your query: news_{query}.{format}

🔧 Advanced Configuration

Environment Variables

THORDATA_SCRAPER_TOKEN=your_token_here

Programmatic Usage

from src.scraper import GoogleNewsScraper
from src.ai_news import AINewsBriefing

# Basic search (with automatic caching)
scraper = GoogleNewsScraper()
results = scraper.search("AI", num=20, country="us")  # Cached for 5 minutes

# Bypass cache for fresh results
results = scraper.search("AI", num=20, no_cache=True)

# Clear cache manually
scraper.clear_cache()

# AI news briefing
ai_briefing = AINewsBriefing()
briefing = ai_briefing.get_latest_ai_news(num=30)

Performance Features

Caching:

Automatic caching of API responses
Default TTL: 5 minutes
Instant response for cached queries (<0.1s)
Manual cache control available

Retry Mechanism:

Automatic retry on transient failures
Exponential backoff (1s, 2s, 4s delays)
Up to 3 retry attempts
Prevents cascading failures

🌟 Why This Scraper?

Compared to Other Solutions

Feature	This Scraper	Others
AI News Briefing	✅ One-command feature	❌ Manual keyword setup
Smart Caching	✅ Automatic (5min TTL)	❌ No caching
Auto Retry	✅ Exponential backoff	⚠️ Single attempt
Progress Indicators	✅ Visual feedback	❌ No feedback
Real-time Data	✅ <3s response, <0.1s cached	⚠️ Varies
No Bans	✅ Managed by Thordata	⚠️ Risk of blocking
Global Coverage	✅ 195+ countries	⚠️ Limited
Easy Setup	✅ 2 minutes	⚠️ Complex
Output Formats	✅ JSON + CSV	⚠️ Limited
Error Handling	✅ Robust with retries	⚠️ Basic

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

MIT License. See LICENSE for details.

🙏 Acknowledgments

Powered by Thordata SERP API
Built with ❤️ by the Thordata Developer Team

📞 Support

Documentation: Check this README
Issues: GitHub Issues
Email: support@thordata.com

📄 License

MIT License. See LICENSE for details.

📚 Additional Documentation

CHANGELOG.md - Version history and changes

_{Built with ❤️ by the Thordata Developer Team.}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
examples		examples
src		src
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test_comprehensive.py		test_comprehensive.py

License

Thordata/google-news-scraper-python

Folders and files

Latest commit

History

Repository files navigation

Google News Scraper for Python

🎯 Quick Start: AI News Briefing

⚡ Features

📦 Sample Output

🚀 Installation & Setup

1. Get Your Token

2. Install Dependencies

3. Configure

💡 Usage Examples

AI News Briefing (Featured!)

Basic Search

Advanced Search

📋 Command Line Arguments

🎨 Use Cases

1. Daily AI News Monitoring

2. Research & Analysis

3. Market Intelligence

4. Content Aggregation

5. Competitive Intelligence

📁 Output Format

🔧 Advanced Configuration

Environment Variables

Programmatic Usage

Performance Features

🌟 Why This Scraper?

Compared to Other Solutions

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support

📄 License

📚 Additional Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages