Real-time Google News scraping via API. Extract headlines, sources, and dates instantly.
Powered by Thordata's high-speed SERP infrastructure.
Get the latest AI industry news with one command!
# One command to get latest AI news
python main.py --ai-brief
# Get AI breakthroughs only
python main.py --ai-breakthroughs --limit 10
# Export to CSV
python main.py --ai-brief --format csv --limit 30This feature automatically searches multiple AI-related keywords and combines the results into a comprehensive briefing. Perfect for staying updated on the latest AI developments!
- π€ AI News Briefing: One-command feature to get latest AI industry news and breakthroughs
- π° Real-Time Data: Get the latest news as it happens (no cache lag when needed).
- β‘ Smart Caching: Automatic response caching (5min TTL) for instant repeated queries
- π Auto Retry: Exponential backoff retry mechanism for reliable requests
- π Progress Indicators: Visual feedback for long-running operations
- π Global Coverage: Support for any country (
us,uk,jp,cn, etc.) and language. - π High Speed: Synchronous API response (<3s average), cached responses <0.1s
- π§Ή Clean Output: Automatically parses complex JSON into simple lists (JSON/CSV).
- π‘οΈ No Bans: Full proxy rotation and anti-bot handling managed by Thordata.
- π§ Advanced API: Uses latest
SerpRequestandserp_search_advancedfor better control. - π± Device Support: Specify device type (desktop, mobile, tablet) for different results.
- π Language Control: Fine-tune language settings for localized results.
[
{
"title": "OpenAI Announces GPT-5 with Revolutionary Capabilities",
"source": "TechCrunch",
"date": "2 hours ago",
"snippet": "OpenAI has unveiled GPT-5, featuring unprecedented reasoning capabilities...",
"link": "https://techcrunch.com/...",
"thumbnail": "data:image/png;base64,..."
},
{
"title": "Google DeepMind Breakthrough in Protein Folding",
"source": "Nature",
"date": "5 hours ago",
"snippet": "New AI model predicts protein structures with 95% accuracy...",
"link": "https://nature.com/...",
"thumbnail": "data:image/png;base64,..."
}
]Get your free scraping token from the Thordata Dashboard.
git clone https://github.com/Thordata/google-news-scraper-python.git
cd google-news-scraper-python
pip install -r requirements.txtCopy .env.example to .env and fill in your token:
THORDATA_SCRAPER_TOKEN=your_token_here# Get comprehensive AI news briefing
python main.py --ai-brief
# Get AI breakthroughs only
python main.py --ai-breakthroughs --limit 15
# AI news with custom settings
python main.py --ai-brief --limit 50 --country uk --format csv# Simple search
python main.py "Artificial Intelligence"
# Search with custom limit
python main.py "Crypto Market" --limit 50# Search with country and language
python main.py "Tesla News" --country uk --language en
# Search with device type
python main.py "AI Updates" --device mobile --no-cache
# Full example with all options
python main.py "Bitcoin Price" \
--limit 100 \
--country jp \
--language ja \
--device desktop \
--format csv \
--no-cache| Argument | Description | Default |
|---|---|---|
query |
Search topic (required unless using --ai-brief) |
- |
--ai-brief |
Get latest AI industry news (one-command feature) | False |
--ai-breakthroughs |
Get latest AI breakthroughs only | False |
--limit |
Maximum number of results | 20 |
--country |
Country code (us, uk, jp, cn, etc.) |
us |
--language |
Language code (en, zh, ja, etc.) |
Auto |
--device |
Device type (desktop, mobile, tablet) |
Auto |
--format |
Output format (json, csv) |
json |
--no-cache |
Bypass cache for fresh results | False |
# Run this daily to stay updated
python main.py --ai-brief --limit 30 --format csv# Collect news for specific research topics
python main.py "machine learning research" --limit 100 --format csv# Track industry news by country
python main.py "tech industry" --country us --limit 50
python main.py "tech industry" --country uk --limit 50# Aggregate news from multiple sources
python main.py "climate change" --limit 50 --format json# Monitor competitor news
python main.py "competitor name" --no-cache --limit 20Results are saved to the output/ directory in your chosen format:
- JSON: Structured data with all fields
- CSV: Spreadsheet-friendly format
Each file is named based on your query: news_{query}.{format}
THORDATA_SCRAPER_TOKEN=your_token_herefrom src.scraper import GoogleNewsScraper
from src.ai_news import AINewsBriefing
# Basic search (with automatic caching)
scraper = GoogleNewsScraper()
results = scraper.search("AI", num=20, country="us") # Cached for 5 minutes
# Bypass cache for fresh results
results = scraper.search("AI", num=20, no_cache=True)
# Clear cache manually
scraper.clear_cache()
# AI news briefing
ai_briefing = AINewsBriefing()
briefing = ai_briefing.get_latest_ai_news(num=30)Caching:
- Automatic caching of API responses
- Default TTL: 5 minutes
- Instant response for cached queries (<0.1s)
- Manual cache control available
Retry Mechanism:
- Automatic retry on transient failures
- Exponential backoff (1s, 2s, 4s delays)
- Up to 3 retry attempts
- Prevents cascading failures
| Feature | This Scraper | Others |
|---|---|---|
| AI News Briefing | β One-command feature | β Manual keyword setup |
| Smart Caching | β Automatic (5min TTL) | β No caching |
| Auto Retry | β Exponential backoff | |
| Progress Indicators | β Visual feedback | β No feedback |
| Real-time Data | β <3s response, <0.1s cached | |
| No Bans | β Managed by Thordata | |
| Global Coverage | β 195+ countries | |
| Easy Setup | β 2 minutes | |
| Output Formats | β JSON + CSV | |
| Error Handling | β Robust with retries |
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
MIT License. See LICENSE for details.
- Powered by Thordata SERP API
- Built with β€οΈ by the Thordata Developer Team
- Documentation: Check this README
- Issues: GitHub Issues
- Email: support@thordata.com
MIT License. See LICENSE for details.
- CHANGELOG.md - Version history and changes