This code is designed to be executed on Google Colab.
- Before running the code, upload the raw dataset folder named
kquake_dataset. - Create an empty folder named
preprocessed_csv, where the preprocessed.csvfiles will be automatically saved.
⚠️ Note: The raw datasetkquake_datasetis stored separately in our Git repository under the pathjiyun/raw_data(due to its large size). Please ensure the entire dataset is placed inside thekquake_datasetfolder before execution.
Since the dataset originates from South Korea, all timestamps have been converted from UTC to KST (Korea Standard Time).
The preprocessing consists of the following three steps:
- DC offset removal
- Cosine tapering (20%)
- Bandpass filtering (0.1 Hz – 10 Hz)
The cutoff frequencies are based on the following study:
“Analysis of Frequency-Specific Sources of Background Noise in Seismic Observatories” “Signals recorded in ground vibration data typically fall within characteristic frequency bands. For example, P-waves and S-waves from local earthquakes, as well as anthropogenic noise, are most prominent in the 0.1–1 second period range, while surface waves generated by earthquakes are generally dominant in the 1–20 second range.”
- Preprocessed signals are saved as
.csvfiles per sample.
- Lines 17 and 18 of the code must be edited to reflect the correct file paths on your system (Colab or local).
This script demonstrates the preprocessing pipeline using a sample dataset:
- File:
KMA20230026_KG.BOG..HG.raw.mseed
To preprocess all 651 data files, use pre-processing_final.py.
This folder contains the final preprocessed data, organized into training, validation, and test sets.
Due to file size limitations, the data has been split into multiple compressed .zip files:
- Train: 9 zip files
- Validation: 2 zip files
- Test: 3 zip files
Each subset contains the following number of files:
- Train: 1,368 files (70%)
- Validation: 195 files (10%)
- Test: 390 files (20%)
⚠️ Note: The division into multiple zip files per split is solely due to upload size restrictions. Please extract and merge the zip files within each split (train/val/test) before using them as input for your deep learning model.
This folder contains visualization images showing the before and after of the preprocessing steps, using the sample data:
KMA20230026_KG.BOG..HG.raw.mseed
| File Name | Description |
|---|---|
sample.png |
Raw 3-channel (HGZ, HGN, HGE) waveform before preprocessing |
sample_fft.png |
Frequency-power spectrum graph after DC offset removal (used to inspect frequency distribution) |
sample_preprocessed.png |
Preprocessed 3-channel waveform. Raw and processed waveforms are overlaid for clear comparison |
📌 These visualizations help validate and understand the effect of each preprocessing step.
- The dataset is sourced from the K-ESM (KIGAM Engineering Strong Motion) DB on the Geo Big Data Open Platform (https://data.kigam.re.kr/quake/data/kesmdb).
- Each file contains raw acceleration waveform data recorded at an individual seismic station.
- Raw, unprocessed format
- Time segments selected using normalized Arias intensity (Arias, 1970)
- Based on the Korea Meteorological Administration earthquake catalog (https://www.weather.go.kr/)
- Extracted: 600 seconds of continuous waveform data after the earthquake origin time
- Signal segments defined as 1–99% range of normalized Arias intensity
- Pre- and post-noise included to ensure full P-wave and coda coverage
- File:
KMA20230026_KG.BOG..HG.raw.mseed - Channels: 3
- Sampling Frequency: 100 Hz
- Data Length: 8940 samples
- Start Time (UTC): 2023-07-29T10:08:11.258390Z
- End Time (UTC): 2023-07-29T10:09:40.648390Z
Note: Timestamps are converted to KST during preprocessing.
| Attribute | Value |
|---|---|
| Number of Data Files | 651 |
| Sampling Frequency | 100 Hz |
| Data Length Range | 38.4 – 462.1 sec |
| Channels per File | 3 |
| Number of Seismic Stations | 31 |
| Station Code | Location | Station Code | Location | |
|---|---|---|---|---|
| AJD | 안좌도 | GHR | 가학리 | |
| BBK | 방방골 | GKP1 | 경북대 | |
| BGD | 보길도 | GKP2 | 경북대 | |
| BOG | 봉계 | GRE | 구례 | |
| BRN | 북백령도 | GSU | 경상대 | |
| BRS | 남백령도 | HAK | 학계리 | |
| CGD | 청도 | HCH | 학천 | |
| CGU | 천군 | HDB | 효동리 | |
| CHNB | 철원 | HKU | 교원대 | |
| CHS | 청송 | HSB | 홍성 | |
| CRB | 원주KSRS | HWSB | 화순 | |
| DES | 덕성 | IBA | 입암산 | |
| DKJ | 덕정리 | JJB | 제주도 | |
| DKJ2 | 덕정리 | JRB | 지리산 | |
| DOKDO | 독도 | JSB | 정선 | |
| DUC | 덕천 | JUC | 죽천 | |
| GCN | 건천 | KIP | 김포 | |
| KJM | 거제 | SNU | 서울대 | |
| KMC | 김천 | SIG | 신계 | |
| KNUC | 강원대 | SND | 상동 | |
| KNUD | 도계 | TJN | 대전 | |
| KSA | 간성 | UNI | 울산과기원 | |
| MAK | 매곡리 | WDL | 원달리 | |
| MGB | 문경 | WID | 위도 | |
| MKL | 명계리 | YIN | 용인 | |
| MRD | 마라도 | YKB | 양구 | |
| MUN | 무안 | YPD | 연평도 | |
| NPR | 나포리 | YSB | 양산 | |
| OJR | 옥정리 | YSUK | 연세대 국제 | |
| PCH | 포천 | YSUM | 연세대 미래 | |
| PKNU | 부경대 | |||
| POHB | 포항 | |||
| POSB | 포항공대 |