× Home About Contact DMCA
Categories +
files Zero - Dix.Tools

files Zero

AI
Updated 19 Apr 2025
15 views
4 stars
6 opens
OpenCodeReasoning is the largest reasoning-based synthetic dataset to date for coding, comprises 735,255 samples in Python across 28,319 unique competitive programming questions. OpenCodeReasoning is designed for supervised fine-tuning (SFT).
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Data Overview
OpenCodeReasoning is the largest reasoning-based synthetic dataset to date for coding, comprises 735,255 samples in Python across 28,319 unique competitive programming questions. OpenCodeReasoning is designed for supervised fine-tuning (SFT).

Technical Report - Discover the methodology and technical details behind OpenCodeReasoning.
Github Repo - Access the complete pipeline used to perform SFT.
This dataset is ready for commercial/non-commercial use.

Data distribution
The CodeForces problems are sourced from http://codeforces.com.
The question collections are gathered from TACO (https://huggingface.co/datasets/BAAI/TACO), APPS (https://huggingface.co/datasets/codeparrot/apps), CodeContests (https://huggingface.co/datasets/deepmind/code_contests), and open-r1/codeforces (https://huggingface.co/datasets/open-r1/codeforces).
We do not include the test split of CodeContests and open-r1/codeforces.
The output responses are generated by R1.

Thank You!

Thank you for starring this tool! Hope you like it.