Home / AI / files Zero

files Zero

Updated 6 Aug 2025

939 views

11 stars

19 opens

OpenCodeReasoning is the largest reasoning-based synthetic dataset to date for coding, comprises 735,255 samples in Python across 28,319 unique competitive programming questions. OpenCodeReasoning is designed for supervised fine-tuning (SFT).

OpenCodeReasoning: Advancing Data Distillation for Competitive Coding
Data Overview
OpenCodeReasoning is the largest reasoning-based synthetic dataset to date for coding, comprises 735,255 samples in Python across 28,319 unique competitive programming questions. OpenCodeReasoning is designed for supervised fine-tuning (SFT).

Technical Report - Discover the methodology and technical details behind OpenCodeReasoning.
Github Repo - Access the complete pipeline used to perform SFT.
This dataset is ready for commercial/non-commercial use.

Data distribution
The CodeForces problems are sourced from http://codeforces.com.
The question collections are gathered from TACO (https://huggingface.co/datasets/BAAI/TACO), APPS (https://huggingface.co/datasets/codeparrot/apps), CodeContests (https://huggingface.co/datasets/deepmind/code_contests), and open-r1/codeforces (https://huggingface.co/datasets/open-r1/codeforces).
We do not include the test split of CodeContests and open-r1/codeforces.
The output responses are generated by R1.

Information

Category AI
Added 16 Apr 2025
Last Updated 6 Aug 2025
Views 939
Clicks 19
Stars 11

files Zero

Information

Report Tool

Related AI Tools

Report This Tool

Thank You!