Jie Wu*, Haoling Li*, Xin Zhang*†‡, Jiani Guo, Jane Luo, Steven Liu, Yangyu Huang, Ruihang Chu‡, Scarlett Li, Yujiu Yang
*Equal Contribution †Project Lead ‡Corresponding Author
Tsinghua University | Microsoft
Left: SynthSmith generates high-quality synthetic tasks, solutions, and test cases to support both SFT and RL training. Right: Avg@8 results on LiveCodeBench. X-Coder achieves significant performance gains on competitive programming using fully synthetic data. For the full data synthesis workflow (question/answer/test generation), see data-recipe/README.md.
Framework of SynthSmith. SynthSmith first extracts and evolves competitive programming related features from small-scale code instruction data and merges them into tree structures. It then samples subtrees from the feature tree, selects a compatible feature set, and formulates a scenario that naturally integrates these consistent features. Novel tasks are generated based on a proposed scenario according to specific styles. Advanced reasoning models are used to synthesize solutions and tests for the generated tasks, which are further cross-verified using the proposed dual-verification strategy to yield reliable test outputs and the top solution.
Performance on LiveCodeBench v5. X-Coder shows strong coding expertise with fewer, fully synthetic tasks, and achieves additional gains through subsequent RL stages.
SFT training can be performed using various frameworks such as ms-swift, LLaMA-Factory, or Megatron-LM. Below we provide a simple example using ms-swift.
pip install ms-swift -UDownload and convert the SFT training data from HuggingFace:
cd sft-recipe
python download_and_convert_data.pyThis will download IIGroup/X-Coder-SFT-376k and convert it to hybrid_376k.jsonl format with query and response fields.
For multi-node training (8 nodes x 8 GPUs):
# On each node, set the appropriate environment variables:
export NODE_RANK=<node_rank> # 0, 1, 2, ..., 7
export MASTER_ADDR=<master_node_ip>
export MASTER_PORT=29500
cd sft-recipe
bash train_sft.shFor single-node training, modify train_sft.sh:
- Set
NNODES=1 - Adjust
CUDA_VISIBLE_DEVICESas needed
# 1. Clone the repository
git clone https://github.com/JieWu02/X-Coder.git
cd X-Coder
# 2. Start Docker container
sudo docker run -it --rm \
--gpus all \
--ipc=host \
-v $(pwd):/workspace \
whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6.post5-mcore0.12.1-te2.3-deepseekv3 \
/bin/bash
# 3. Install dependencies
pip install sandbox_fusion pyext
cd rl-recipe
# 4. Download rl training data
python download_data.py
# 5. Start training
bash train_scripts/install.sh
bash train_scripts/xcoder-rl-train.shThe rl training data (~17GB total) is hosted on HuggingFace: IIGroup/X-Coder-RL-40k
cd rl-recipe
# Download all data (~17GB)
python download_data.py
# Or download only synthetic data (~8.5GB)
python download_data.py --syn-onlyAfter downloading, the data will be organized as:
rl-recipe/
├── syn_rl_data/
│ └── xcoder_data/
│ └── sorted_by_passrate/
│ ├── part_0000.parquet
│ ├── part_0001.parquet
│ ├── part_0002.parquet
│ ├── part_0003.parquet
│ └── rl_tasks_easy.parquet
└── real_rl_data/
└── non_sys_prompt/
├── codeforces_9763.parquet
├── klear_code.parquet
├── leetcode_2772.parquet
├── taco_13064.parquet
└── test_wo_prompt.parquet
A code execution and evaluation service is included in rl-recipe/code-judge/.
If you find this work helpful, please cite:
@misc{wu2026xcoderadvancingcompetitiveprogramming,
title={X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests},
author={Jie Wu and Haoling Li and Xin Zhang and Jiani Guo and Jane Luo and Steven Liu and Yangyu Huang and Ruihang Chu and Scarlett Li and Yujiu Yang},
year={2026},
eprint={2601.06953},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.06953},
}This project is licensed under the Apache License 2.0.



