🔒 IPBlocklist

Threat intelligence aggregator that collects, processes, and serves IP reputation data from 127 security feeds into an optimized binary format for fast lookups.

🚀 Key Features

✅ Fast IP lookups in <1ms using binary search
✅ 5.0M+ IPs and CIDR ranges from 127 threat intelligence feeds
✅ Malware C&C servers, botnets, spam networks, compromised hosts
✅ VPN providers, Tor nodes, datacenter/hosting ASNs
✅ Optimized integer storage for minimal memory footprint
✅ Support for both IPv4 and IPv6
✅ Automated daily updates via GitHub Actions

📥 Download & Extract

The dataset is available as a downloadable binary file.

Threat Intelligence Data

The threat intelligence dataset is approximately 12MB.

# Download the file
wget https://github.com/tn3w/IPBlocklist/releases/latest/download/blocklist.bin

# Verify the file
ls -lh blocklist.bin

📊 Architecture

feeds.json ──────────> aggregator.py ──────────> blocklist.bin
  (config)              (processor)              (threat intel)

📖 Overview

IPBlocklist downloads threat intelligence from multiple sources (malware C&C servers, botnets, spam networks, VPN providers, Tor nodes, etc.) and converts them into a compact, searchable binary format. IP addresses and CIDR ranges are stored as delta-encoded integers for efficient binary search lookups.

The system uses open-source security feeds configured in feeds.json, which are processed by aggregator.py into a unified blocklist.bin file.

📁 Data Models

feeds.json

Configuration file defining all threat intelligence sources. Each feed is an independent object with complete metadata.

Structure: Array of feed objects

[
    {
        "name": "feodotracker",
        "url": "https://feodotracker.abuse.ch/downloads/ipblocklist.txt",
        "description": "Feodo Tracker - Botnet C&C",
        "regex": "^(?![#;/])([0-9a-fA-F:.]+(?:/\\d+)?)",
        "base_score": 1.0,
        "confidence": 0.95,
        "flags": ["is_malware", "is_botnet", "is_c2_server"],
        "categories": ["malware", "botnet"]
    }
]

Required Fields:

name: Unique identifier for the feed
url: Download URL for the threat list
description: Human-readable description
regex: Pattern to extract IPs/CIDRs from feed content
base_score: Threat severity (0.0-1.0)
confidence: Data reliability (0.0-1.0)
flags: Boolean indicators (is_anycast, is_botnet, is_brute_force, is_c2_server, is_cdn, is_cloud, is_compromised, is_datacenter, is_forum_spammer, is_isp, is_malware, is_mobile, is_phishing, is_proxy, is_scanner, is_spammer, is_tor, is_vpn, is_web_attacker)
categories: Categories for scoring (anonymizer, attacks, botnet, compromised, infrastructure, malware, spam)

Optional Fields:

provider_name: VPN/hosting provider name

datacenter_asns.json

List of Autonomous System Numbers (ASNs) associated with datacenter and hosting providers.

Structure: Array of ASN strings

["15169", "16509", "13335", "8075", "14061"]

This file is automatically generated when processing the datacenter_asns feed and can be used for O(1) ASN lookups to identify datacenter traffic.

blocklist.bin

Processed binary output with delta-encoded IP ranges for fast lookups.

Structure: Binary format with varint encoding

[4 bytes: timestamp (u32)]
[2 bytes: feed count (u16)]
For each feed:
  [1 byte: name length (u8)]
  [N bytes: feed name (utf-8)]
  [4 bytes: range count (u32)]
  For each range:
    [varint: from_delta]
    [varint: range_size]

Encoding:

Timestamp: Unix timestamp as 32-bit unsigned integer
Feed names: Length-prefixed UTF-8 strings
Ranges: Delta-encoded start positions with varint compression
Range size: End - start encoded as varint

Integer Conversion:

IPv4: 10.0.0.1 → 167772161
IPv6: 2001:db8::1 → 42540766411282592856903984951653826561
CIDR: 10.0.0.0/27 → (167772160, 167772191) (network to broadcast)
Single IP: Stored as range with size 0

⚙️ aggregator.py

Downloads and processes all feeds in parallel, handling multiple formats and edge cases.

Features:

Parallel downloads with ThreadPoolExecutor (10 workers)
IPv4/IPv6 support with embedded address extraction
CIDR range expansion to [start, end] pairs
ASN resolution for datacenter and Tor networks
Deduplication and sorting for binary search
Regex-based parsing for diverse feed formats

Special Handling:

datacenter_asns: Resolves ASN numbers to IP ranges via RIPE API
tor_onionoo: Combines Tor relay list with known Tor ASNs
IPv6 mapped addresses: Extracts embedded IPv4 (::ffff:192.0.2.1)
6to4 tunnels: Extracts IPv4 from 2002::/16 addresses

Usage:

python aggregator.py

Output: Creates/updates blocklist.bin with all processed feeds and datacenter_asns.json with datacenter ASN list

🐍 Python Lookup Examples

Database Loader

import struct
import ipaddress
from typing import Dict, List, Tuple, Optional


def read_varint(f) -> int:
    result = shift = 0
    while True:
        byte = f.read(1)[0]
        result |= (byte & 0x7F) << shift
        if not (byte & 0x80):
            return result
        shift += 7


def binary_search(ranges: List[Tuple], target: int) -> Optional[int]:
    left, right = 0, len(ranges) - 1
    best_match = None
    best_size = float('inf')

    while left <= right:
        mid = (left + right) // 2
        start, end = ranges[mid]

        if start <= target <= end:
            size = end - start
            if size < best_size:
                best_size = size
                best_match = mid
            left = mid + 1
        elif target < start:
            right = mid - 1
        else:
            left = mid + 1

    return best_match


class BlocklistLoader:
    def __init__(self, path: str = "blocklist.bin"):
        self.feeds: Dict[str, List[Tuple[int, int]]] = {}
        self.timestamp: int = 0
        self._load(path)

    def _load(self, path: str):
        with open(path, "rb") as f:
            self.timestamp = struct.unpack("<I", f.read(4))[0]
            feed_count = struct.unpack("<H", f.read(2))[0]

            for _ in range(feed_count):
                name_len = struct.unpack("<B", f.read(1))[0]
                feed_name = f.read(name_len).decode("utf-8")
                range_count = struct.unpack("<I", f.read(4))[0]

                ranges = []
                current = 0
                for _ in range(range_count):
                    current += read_varint(f)
                    size = read_varint(f)
                    ranges.append((current, current + size))

                self.feeds[feed_name] = ranges

    def check_ip(self, ip: str) -> List[str]:
        target = int(ipaddress.ip_address(ip))
        matches = []

        for feed_name, ranges in self.feeds.items():
            if binary_search(ranges, target) is not None:
                matches.append(feed_name)

        return matches


blocklist = BlocklistLoader()
result = blocklist.check_ip("8.8.8.8")
print(result)

Batch Lookup

def check_batch(blocklist: BlocklistLoader, ip_list: List[str]) -> Dict[str, List[str]]:
    results = {}
    for ip in ip_list:
        results[ip] = blocklist.check_ip(ip)
    return results


ips = ["10.0.0.1", "192.168.1.1", "8.8.8.8"]
results = check_batch(blocklist, ips)
for ip, feeds in results.items():
    print(f"{ip}: {feeds}")

Datacenter ASN Lookup

import json

def load_datacenter_asns(asn_file="datacenter_asns.json"):
    """Load datacenter ASNs into a set for O(1) lookups."""
    try:
        with open(asn_file) as f:
            return set(json.load(f))
    except Exception as e:
        print(f"Error loading ASNs: {e}")
        return set()

def is_datacenter_asn(asn, asns=None):
    """Check if ASN belongs to a datacenter."""
    if not asns:
        asns = load_datacenter_asns()
    return asn.replace("AS", "").strip() in asns

asns = load_datacenter_asns()
for asn in ["AS16509", "AS13335", "AS15169"]:
    result = "is" if is_datacenter_asn(asn, asns) else "is not"
    print(f"{asn} {result} a datacenter ASN")

Reputation Scoring

import json


with open("feeds.json") as f:
    feeds_config = json.load(f)

sources = {feed["name"]: feed for feed in feeds_config}


def check_ip_with_reputation(blocklist: BlocklistLoader, ip: str) -> Dict:
    matches = blocklist.check_ip(ip)

    if not matches:
        return {"ip": ip, "score": 0.0, "feeds": []}

    flags = {}
    scores = {
        "anonymizer": [], "attacks": [], "botnet": [],
        "compromised": [], "infrastructure": [], "malware": [], "spam": []
    }

    for list_name in matches:
        source = sources.get(list_name)
        if not source:
            continue

        for flag in source.get("flags", []):
            flags[flag] = True

        provider = source.get("provider_name")
        if provider:
            flags["vpn_provider"] = provider

        base_score = source.get("base_score", 0.5)
        for category in source.get("categories", []):
            if category in scores:
                scores[category].append(base_score)

    total = 0.0
    for category_scores in scores.values():
        if not category_scores:
            continue
        combined = 1.0
        for score in sorted(category_scores, reverse=True):
            combined *= 1.0 - score
        total += 1.0 - combined

    return {
        "ip": ip,
        "score": min(total / 1.5, 1.0),
        "feeds": matches,
        **flags
    }


result = check_ip_with_reputation(blocklist, "8.8.8.8")
print(json.dumps(result, indent=2))

⚡ Performance Characteristics

Dataset Statistics:

Total feeds: 127
Individual IPs: 4.4M (4.4M IPv4, 6k IPv6)
CIDR ranges: 552K (545K IPv4, 7K IPv6)
Total entries: 5.0M
File size: 12MB (compressed with varint encoding)

Lookup Complexity:

Binary search: O(log n) per feed
Typical lookup: <1ms for 127 feeds with 5.0M entries

Memory Usage:

Delta encoding: ~2-3 bytes per range (varint compressed)
Feed names: Length-prefixed UTF-8 strings
Total memory: ~12MB loaded in RAM

💡 Use Cases

API Rate Limiting: Block known malicious IPs
Fraud Detection: Flag VPN/proxy/datacenter traffic
Security Analytics: Enrich logs with threat intelligence
Access Control: Restrict Tor exit nodes or anonymizers
Compliance: Block traffic from sanctioned networks

📜 License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
aggregator.py		aggregator.py
feeds.json		feeds.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔒 IPBlocklist

🚀 Key Features

📥 Download & Extract

Threat Intelligence Data

📊 Architecture

📖 Overview

📁 Data Models

feeds.json

datacenter_asns.json

blocklist.bin

⚙️ aggregator.py

🐍 Python Lookup Examples

Database Loader

Batch Lookup

Datacenter ASN Lookup

Reputation Scoring

⚡ Performance Characteristics

💡 Use Cases

📜 License

About

Uh oh!

Releases 3

Languages

License

tn3w/IPBlocklist

Folders and files

Latest commit

History

Repository files navigation

🔒 IPBlocklist

🚀 Key Features

📥 Download & Extract

Threat Intelligence Data

📊 Architecture

📖 Overview

📁 Data Models

feeds.json

datacenter_asns.json

blocklist.bin

⚙️ aggregator.py

🐍 Python Lookup Examples

Database Loader

Batch Lookup

Datacenter ASN Lookup

Reputation Scoring

⚡ Performance Characteristics

💡 Use Cases

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Languages