Simon ARC Lab

My experiments with ARC-AGI.

Inside the datasets that I'm using for LLM training

My datasets are public on huggingface. https://huggingface.co/neoneye

My dataset generators are public on github (this repo). https://github.com/neoneye/simon-arc-lab

I use RLE compression, so it may be a bit hard to decipher what is going on.

Below, is the job to identify the colors that are present in a histogram, by eliminating the color counters.

{
    "instruction": "SIMONS-HISTOGRAM, unique colors", 
    "input": "0:3626,2:3280,3:2819,8:677", 
    "output": "0,2,3,8", 
    "benchmark": "dataset=histogram_one group=unique_colors histogram_size=e"
}

Below, is the job to subtract 2 histograms, and return what colors and color counters that are remaining.

{
    "instruction": "simons-Arc-Histogram, remove histogram b colors from histogram a", 
    "input": "6:1549,7:1428,2:1325,5:1166,8:1120,0:926,1:734,3:733,9:633\n4:1524,6:97", 
    "output": "7:1428,2:1325,5:1166,8:1120,0:926,1:734,3:733,9:633", 
    "benchmark": "dataset=histogram_two group=a_remove_b_colors histogram_size=e"
}

Below, is the job to recognize what cellular automata transformations is happening between input/output images.

{
    "instruction": "SimonCellularAutomata, Recognize the transformation. gameoflife_nowrap,gameoflife_wrap,serviettes_wrap,maze_wrap", 
    "input": "12 21 b262d6a2,a2h62,2626b2d6,b6f2a6,a6h26,b6g26,c6g2,2c6b2b62,a2b6b2b62,a2b6a2a6262,62b6a26b26,c6f26,6j2,i262,d26b2a62,b2g62,b2c62a6a2,b2b6b26a2,a2b6c2a62,i262,h26a2\n12 21 d6d2a6,a6d2b626,6262b6d2,6a2f6a2,j62,a62g62,2a62g6,62a62c62a6,a6262b62626,a6262b62b6,26262e62,c2f62,6,,h6a26,c6f26,b62a626a2a6,b6262b62a6,b6a2f6,6,", 
    "output": "gameoflife_nowrap=0,gameoflife_wrap=0,serviettes_wrap=1,maze_wrap=0", 
    "benchmark": "dataset=cellular_automaton group=recognize_transformation ca_step=1 image_width=medium image_height=large"
}

The "benchmark" field is not used when training the LLM. I use it afterwards to identify areas where the model fails to predict the correct output.

Name		Name	Last commit message	Last commit date
Latest commit History 1,228 Commits
metadata		metadata
run_tasks_result		run_tasks_result
simon_arc_dataset		simon_arc_dataset
simon_arc_dataset_run		simon_arc_dataset_run
simon_arc_lab		simon_arc_lab
simon_arc_model		simon_arc_model
simon_arc_model_run		simon_arc_model_run
testdata		testdata
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_DATASET.md		README_DATASET.md
README_INSTALL.md		README_INSTALL.md
requirements.txt		requirements.txt
requirements_kaggle.txt		requirements_kaggle.txt
requirements_simon_arc_model.txt		requirements_simon_arc_model.txt
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simon ARC Lab

Inside the datasets that I'm using for LLM training

About

Languages

License

neoneye/simon-arc-lab

Folders and files

Latest commit

History

Repository files navigation

Simon ARC Lab

Inside the datasets that I'm using for LLM training

About

Topics

Resources

License

Stars

Watchers

Forks

Languages