This is a Draft Specification for a proof of concept for specifying how humans solve problem, restricted at the moment to ARC-AGI-2 puzzles. Implementation to begin soon™. I've got a couple projects that require less runway. Also, I'd like to have potential solutions to all the major problems that I've identified (current progress: 2/3)
Discord: Join me on a dedicated Discord server to discuss this project.
Github: Join me on Github to actually begin the process of implementing the puzzle solver and test suite.
Google Docs: If you would like to just quickly view the Specification Document and send me some suggestions, you can do so through Our Google Document.
Current artificial intelligence (AI) reasoning systems score low on ARC-AGI-2 because the challenge is specifically designed to expose their weaknesses in novel visual reasoning, compositional reasoning, and contextual rule application, whereas earlier versions of the benchmark were susceptible to brute-force methods and pattern matching from training data. Unlike ARC-AGI-1, every task in the newer benchmark is unique, demanding a deeper, human-like understanding of concepts rather than just recognizing or adapting pre-learned patterns.
Specific challenges of ARC-AGI-2:
- Novelty: Every ARC-AGI-2 task is entirely unique, and models cannot solve them by finding similar patterns in their training data, a weakness that was exploited in ARC-AGI-1.
- Complex composition: Tasks require applying multiple rules simultaneously, and even rules that interact with each other, which is something current systems struggle with.
- Symbolic interpretation: AI models often fail to understand the symbolic meaning behind visual elements, only analyzing superficial patterns, as seen in tasks requiring the interpretation of symbols with semantic significance.
- Contextual application: Systems have difficulty applying rules differently based on the context of the task, instead tending to apply a single rule in a fixed way.
- Inability to adapt: The benchmark is designed to prevent models from overcoming problems with sheer computational power or brute-force approaches, forcing them to solve tasks efficiently and adapt to new situations in a way that mimics human problem-solving strategies.
- Go through a number of the ARC-AGI-2 training set problems & document my personal organic solution process until the solution process can be abstracted into a competitive problem solver.
- Implement the abstracted solution process & test it on the remaining problems & ARC-AGI-2 evaluation set (and other similar significant benchmarks, such as those listed here: https://news.ycombinator.com/item?id=45964748)
There are several different possible ways to contribute:
- The main way to contribute right now is by asking questions or making suggesions regarding documentation improvements. Each question asked provides insight into possible ways to optimize the structure of this documentation for onboarding new contributors so all questions are welcome.
- Another way to contribute is by creating a solution specification for an additional puzzle. Creating a solution specification requires use of the various heuristics & other concepts that are described in the solution specification for puzzle 1ae2feb7 & Syntax, Glossary, & Heuristics sections below. Once again, all questions related to this task are welcome as even the simplest questions provide motivating feedback. The basic start for any solution specification should be an identification step, followed by a pattern recognition step, followed by a Significance Hypothesis search.
- A third way to contribute is by examining current open issues & offering possible solutions either by commenting on the issue or opening a pull request.
When trying to pull a stuck sock out of a vacuum hose, I searched for various tools after being able to think of an accessible version of the first, most obvious tool I thought of (a hook). After trying to use other tools (an umbrella to push the sock through the hose, a chopsticks like measuring device, a serrated knife, a two-pronged meat skewer, pliers which proved too wide for the hole, tweezers which were far too small), I expended more cognitive effort on possible hooks & realized that I had some appropriately shaped dental hygiene tools. (My search heuristics repeatedly swapped between searching specific locations for possible tools & searching possible locations for specific tools)
- Significance Hypothesis: a hypothesis that a relationship between two entities is significant to solving the puzzle
- ~~~X~~~ represents a meta TODO comment for improving this document
- Pattern Recognition (X): Basically a stubbed tool call for pattern recognition logic that is essential, but beyond me at the moment
- Naivety Check Aspects: number, directionality, color, proximity, preference for relationships not involving the puzzle borders, constants (If not for the guarantee of the puzzle graph itself or the guarantee of the density of the squares, I wonder how abstract we could get? 😂)
- Constant Colors: a constant color is defined by the lack of any change in number or position of squares of the same value between input & output graphs
- Variable Colors: a variable color is every color that is not constant, a.k.a. changes in number of position between input & output graphs
- Inputs: squares of a variable color that exist in the input graph
- Outputs: squares of a variable color that exist in the output graph
- Constants: Inputs that match the general location & shape or relational constraints of a constant color piece in another example ~~~Still needs work~~~
- Experimental Constraint: a relationship that remains true across all examples or all example and test input graphs that may be encoded in our rules, reducing possible abstraction
- Belief: a hypothesis with high significance, which is awarded to hypotheses that satisfy one or more predictions
- Isolate: The prediction of application of all beliefs to a minimal group of inputs
- Piece: a group of squares of the same color that are near each other, as defined by the relevant starting significance hypothesis
- Search: From simple to more complex. For graph search specifically, top to bottom, left to right
- Relationship Complexity:
- Order from simplest to most complex is aspects of a piece, equation of the aspect of a piece to the same aspect of another piece, etc…
- Determining the order from simplest to most complex will be an ongoing task.
- To avoid combinatorial explosion, search limits must be placed at each level as well as search limits on the number of levels to be visited before prior levels are revisited
- Problem Choice:
- When presented with a choice of seemingly isolated problems, start with the problem with the fewest inputs and constants; tie-breakers are number of inputs, number of squares in said inputs, then the position & value of the first problem output diff found by our search heuristic
- My actual organic heuristic is to choose the simplest problem with the least amount of factors involved after identifying likely patterns
- Significance Hypothesis: relationships between squares with the same color are highly significant
- Significance Hypothesis: squares can be grouped with adjacent & diagonally adjacent squares of the same color as a single “piece”
- At some point in a belief search, squares of the same color at greater distances from each other than immediate adjacency can be considered parts of the same piece
There's at least three serious problems that require solving:
- Puzzle categorization (preferably by a model that can admit ignorance rather than hallucinate. Hierarchical temporal memory might work best, but requires specific training)
- Possible Solution: Monthy of the Thousand Brains project
- Hypothesis generation / search heuristic (Actually might have a solid draft that just needs to be fleshed out: breadth first with revisiting to dig deeper if solution not found at dynamic thresholds)
- Possible Solution: Still looking for potential solutions here
- Hypothesis abstract syntax tree (at the very least, replacing prepositional clauses with symbols)
- Possible Solution: OpenCyc's Cycl
- Hypothesis abstraction: The implementation of this will depend on the chosen rule syntax & general implementation details, but will involve the detection & removal of experimental constraints from hypotheses
1 of 120 Public Evaluation Set V2 - 1ae2feb7
-
Human Solution: Figure out how the inputs for each line determine repeating pattern of outputs
-
Identify constant colors (Example 1: red & pink, Example 3: red), constants (the red vertical piece in Example 2), inputs (everything left of the constant), & outputs (all the new blocks)
-
test relationships between everything (I mean everything, including the sizes of the example graphs) for significance (starting with the simplest relationships first)
-
Problem Choice Heuristic: Example 3 is selected with its minima of 3 inputs and 1 constant
- Pattern Recognition (line): the restriction of same colored variables to a line from the input suggests a line pattern manipulation
- Pattern Recognition (composition): each example contains multiple isolated patterns
- Problem Choice Heuristic: The grey line is selected for its minima of 1 square
- Problem Considered Solved: The grey line outputs can be predicted by the Line Pattern definition alone
- Problem Choice Heuristic: The blue line is selected for its minima of 2 squares
- Pattern Recognition (repetition): the fact that the six blue outputs are separated by the same number of empty squares suggests a repeating pattern
- Significance Hypothesis: the number of squares separating the blue output squares is 1, therefore an equation of relationships between input & constant pieces in this line likely equal 1.
- Check relationships between constant & input squares, starting with the simplest relationships first
- The length of the blue input - 1 = 1
- Check relationships between constant & input squares, starting with the simplest relationships first
- Significance Hypothesis: For each line, the length of an input - 1 will equal the number of empty squares separating same colored outputs
- A test of this hypothesis will pass examples 1 & 3, but fail on example 2
- Significance Indicator: Passing two examples suggests that this hypothesis is in the right general ballpack. This hypothesis is now a “belief”.
- Additional Rule: The fact that Example 2 fails (Output state is not completely predicted) means that there is at least one additional rule
- Significance Indicator: Any variables that were unused in the prediction of passing examples are prioritized for determination of additional rules as every variable likely serves a purpose
- A test of this hypothesis will pass examples 1 & 3, but fail on example 2
- Pattern Recognition (line): Each line pattern in Example 2 contains an unused variable
- Significance Hypothesis: An additional rule can be found in the relationship between the unused variable and either the used variable or the constant
- Extrapolation: If the unused variable is significant, then we must identify the confounding relationship by applying all beliefs in as much isolation as each rule allows. Each application is known as an “isolate".
- Isolate Overlay: After creating isolates, we combine them 1:1 to identify collision patterns and to confirm that an overlay accounts for all output variables
- Significance Indicator: The overlay does account for all output variables
- Pattern Recognition (repetition): For each line, one input value wins each collision
- Determine a hypothesis based off the first line pattern & test that hypothesis by predicting the winning input value of contests in the other line pattern. As always, start the search for acceptable candidates from top to bottom, left to right
- Significance Hypothesis: the input furthest to the right determines the winning input value
- This hypothesis, in combination with the first belief, successfully satisfies Example 2
- Naivety Check: With all examples solved, we examine all example input graphs for experimental constraints (see glossary), prioritizing the naivety checks aspects defined elsewhere in this document
- Numbers: There are never more than two inputs
- Directionality: All inputs are to the left of the constant
- Constants: There is only one constant group and only one constant value
- Human Reasoning Fail: While AI should be able to do an exhaustive search of experimental constraints, I definitely did not do so consciously the first time I walked through this puzzle. Instead, I examined the test input graphs for things that surprised me (The fact that I could be surprised suggests I did note some constraints subconsciously, however)
- Abstraction of the second belief:
- Numbers: “inputs are given priority in order of their relative distance from the left border”
- Directionality: “inputs are given priority in order of their relative distance from the closest border”
- Borders: A preference for relationships not involving the puzzle borders requires examining relationships more complex than just between inputs (we started with the simplest explanation and increase complexity as necessary): “inputs are given priority in order of their relative distance from the constant value”
- Constants: “inputs are given priority in order of how close they are to the nearest constant value (in their line?). Lowest constant value serves as tie breaker for equally near constants”
- Naivety Check: With all examples solved, we examine all example input graphs for experimental constraints (see glossary), prioritizing the naivety checks aspects defined elsewhere in this document
- This hypothesis, in combination with the first belief, successfully satisfies Example 2
- Significance Hypothesis: the input furthest to the right determines the winning input value
- Determine a hypothesis based off the first line pattern & test that hypothesis by predicting the winning input value of contests in the other line pattern. As always, start the search for acceptable candidates from top to bottom, left to right
- Isolate Overlay: After creating isolates, we combine them 1:1 to identify collision patterns and to confirm that an overlay accounts for all output variables
For more details on the solution to any puzzle, check https://human-arc.gptpluspro.com/
35 of 400 Public Evaluation Set V1 - 16b78196 3 of 120 Public Evaluation Set V2 - 16b78196
- Human Solution: Recognize that inputs get “plugged” into constant to fill rectangular “limbs”
8 of 120 Public Evaluation Set V2 - 13e47133
- Human Solution: Serious abstraction exercise
13 of 120 Public Evaluation Set V2 - 21897d95
- Human Solution: input & output graphs have inverted sizes
14 of 120 Public Evaluation Set V2 - 221dfab4
- Human Solution: making a trail? Line pattern for sure.
16 of 120 Public Evaluation Set V2 - 269e22fb
- Human Solution: This one is fucking crazy. Got to figure out that the output graph is always the same except for color & orientation, which means comparing examples!
20 of 120 Public Evaluation Set V2 - 2b83f449
- Human Solution: This puzzle contains a chronological element. No idea how to code that! 😂
23 of 120 Public Evaluation Set V2 - 2d0172a1
- Human Solution: Approximate a square? Fuck me.
29 of 120 Public Evaluation Set V2 - 3a25b0d8
- Human Solution: color fill
32 of 120 Public Evaluation Set V2 - 446ef5d2
- Human Solution: the symbol that communicate outer corner will be interesting to figure out
36 of 120 Public Evaluation Set V2 - 4c416de3
- Human Solution: find the odd thing out
37 of 120 Public Evaluation Set V2 - 4c7dc4dd
- Human Solution: Puzzle #1 looks to be filtering out the noise
38 of 120 Public Evaluation Set V2 - 4e34c42c
- Human Solution: another linking puzzle
40 of 120 Public Evaluation Set V2 - 5545f144
- Human Solution: Here, fishy, fishy, fishy
46 of 120 Public Evaluation Set V2 - 62593bfd
- Human Solution: something about gravity
52 of 120 Public Evaluation Set V2 - 6ffbe589
- Human Solution: each block in the legend causes a 90 degree clockwise rotation for blocks of that color
56 of 120 Public Evaluation Set V2 - 78332cb0
- Human Solution: linking path
57 of 120 Public Evaluation Set V2 - 7b0280bc
- Human Solution: pathfinding puzzle
60 of 120 Public Evaluation Set V2 - 7b80bb43
- Human Solution: fix the broken path
66 of 120 Public Evaluation Set V2 - 88bcf3b4
- Human Solution: whip slap
67 of 120 Public Evaluation Set V2 - 88e364bc
- Human Solution: levers & switches
70 of 120 Public Evaluation Set V2 - 8b7bacbf
- Human Solution: flower the path
79 of 120 Public Evaluation Set V2 - 9bbf930d
- Human Solution: two lines = path
82 of 120 Public Evaluation Set V2 - a32d8b75
- Human Solution: combining four different symbols
94 of 120 Public Evaluation Set V2 - b9e38dc0
- Human Solution: pour out the bag
100 of 120 Public Evaluation Set V2 - d35bdbdc
- Human Solution: only flower the end of the path
108 of 120 Public Evaluation Set V2 - de809cff
- Human Solution: packet loss identification
110 of 120 Public Evaluation Set V2 - e12f9a14
- Human Solution: colliding beams
116 of 120 Public Evaluation Set V2 - eee78d87
- Human Solution: XOR combined with static output
117 of 120 Public Evaluation Set V2 - f560132c
- Human Solution: another color swirl
119 of 120 Public Evaluation Set V2 - faa9f03d
- Human Solution: fix the paths
58 of 400 Public Evaluation Set V1 - 212895b5
- Human Solution: Energy Beams & Lightning
120 of 400 Public Evaluation Set V1 - 4ff4c9da
- Human Solution: either a two step check rows & columns for same shape, or a one step check rows, columns, & diagonals
125 of 400 Public Evaluation Set V1 - 50f325b5
- Human Solution: shapes of a specific color spread whenever they fit
192 of 400 Public Evaluation Set V1 - 7d419a02
- Human Solution: Block casts a shadow on other columns
209 of 400 Public Evaluation Set V1 - 8b28cd80
- Human Solution: combine all the examples to determine how the spiral shape has to be applied
281 of 400 Public Evaluation Set V1 - b9630600
- Human Solution: connect the squares
302 of 400 Public Evaluation Set V1 - c6e1b8da
- Human Solution: slide the blocks down the line
381 of 400 Public Evaluation Set V1 - f3b10344
- Human Solution: connect same colored pieces
2 of 120 Public Evaluation Set V2 - 3e6067c3
- Human Solution: Recognize that the outputs link the inputs in the order of the single inputs that are not wrapped in rings of constants (symbol key)
4 of 120 Public Evaluation Set V2 - 142ca369
- Human Solution: I actually had trouble with this puzzle because if the beams grow one square at a time, the examples suggest the beams could deflect off each other and then collide (same block), which would lead to a scenario that isn’t addressed by any of the examples
5 of 120 Public Evaluation Set V2 - 136b0064
- Human Solution: it’s a map puzzle!
6 of 120 Public Evaluation Set V2 - 0934a4d8
- Human Solution: The size of the output graph is highly relevant here
7 of 120 Public Evaluation Set V2 - 135a2760
- Human Solution: simple? Pattern recognition
9 of 120 Public Evaluation Set V2 - 1818057f
- Human Solution: It's all about the plus symbol.
10 of 120 Public Evaluation Set V2 - 195c6913
- Human Solution: another symbolism map
11 of 120 Public Evaluation Set V2 - 20270e3b
- Human Solution: linkage puzzle
12 of 120 Public Evaluation Set V2 - 20a9e565
- Human Solution: output graph size matters again
15 of 120 Public Evaluation Set V2 - 247ef758
- Human Solution: Does it fit?!
17 of 120 Public Evaluation Set V2 - 271d71e2
- Human Solution: Lots of complex relationships in this one
18 of 120 Public Evaluation Set V2 - 28a6681f
- Human Solution: A water/liquid based puzzle
19 of 120 Public Evaluation Set V2 - 291dc1e1
- Human Solution: graph sizes are relevant again
21 of 120 Public Evaluation Set V2 - 2ba387bc
- Human Solution: Seems like a simple? ordering puzzle
22 of 120 Public Evaluation Set V2 - 2c181942
- Human Solution: Seems like a simple? linking puzzle
24 of 120 Public Evaluation Set V2 - 31f7f899
- Human Solution: stacking is simple. Ordering maybe less so.
25 of 120 Public Evaluation Set V2 - 332f06d7
- Human Solution: basic maze puzzle
26 of 120 Public Evaluation Set V2 - 35ab12c3
- Human Solution: basic symbolism puzzle
27 of 120 Public Evaluation Set V2 - 36a08778
- Human Solution: basic liquid puzzle
28 of 120 Public Evaluation Set V2 - 38007db0
- Human Solution: basic liquid puzzle
30 of 120 Public Evaluation Set V2 - 3dc255db
- Human Solution: This puzzle is simple, but I don’t have a simple description for it
31 of 120 Public Evaluation Set V2 - 409aa875
- Human Solution: bullet collisions
33 of 120 Public Evaluation Set V2 - 45a5af55
- Human Solution: lines to square
34 of 120 Public Evaluation Set V2 - 4a21e3da
- Human Solution: shape dissection
35 of 120 Public Evaluation Set V2 - 4c3d4a41
- Human Solution: piece displacement
39 of 120 Public Evaluation Set V2 - 53fb4810
- Human Solution: Beam cannons!!!
41 of 120 Public Evaluation Set V2 - 581f7754
- Human Solution: center line
42 of 120 Public Evaluation Set V2 - 58490d8a
- Human Solution: combining symbols
43 of 120 Public Evaluation Set V2 - 58f5dbd5
- Human Solution: combining symbols
44 of 120 Public Evaluation Set V2 - 5961cc34
- Human Solution: a two-step process
45 of 120 Public Evaluation Set V2 - 5dbc8537
- Human Solution: applying colors
47 of 120 Public Evaluation Set V2 - 64efde09
- Human Solution: combining applied knowledge
48 of 120 Public Evaluation Set V2 - 65b59efc
- Human Solution: tik tak toe
49 of 120 Public Evaluation Set V2 - 67e490f4
- Human Solution: pick the right one & insert it
50 of 120 Public Evaluation Set V2 - 6e453dd6
- Human Solution: I love the color scheme for this puzzle
51 of 120 Public Evaluation Set V2 - 6e4f6532
- Human Solution: definitely can expect changing colors between tests & examples
53 of 120 Public Evaluation Set V2 - 71e489b6
- Human Solution: Identify packet loss
54 of 120 Public Evaluation Set V2 - 7491f3cf
- Human Solution: pattern fusion!
55 of 120 Public Evaluation Set V2 - 7666fa5d
- Human Solution: fill in between the lines
58 of 120 Public Evaluation Set V2 - 7b3084d4
- Human Solution: Go Go, Power Rangers!
59 of 120 Public Evaluation Set V2 - 7b5033c1
- Human Solution: Focus on the straightened worm
61 of 120 Public Evaluation Set V2 - 7c66cb00
- Human Solution: breakdown & shelve the parts
62 of 120 Public Evaluation Set V2 - 7ed72f31
- Human Solution: regenerate missing half
63 of 120 Public Evaluation Set V2 - 800d221b
- Human Solution: ant nest puzzle
64 of 120 Public Evaluation Set V2 - 80a900e0
- Human Solution: pull signal from noise
65 of 120 Public Evaluation Set V2 - 8698868d
- Human Solution: holey, holey, holey
68 of 120 Public Evaluation Set V2 - 89565ca0
- Human Solution: one hell of a simplication
69 of 120 Public Evaluation Set V2 - 898e7135
- Human Solution: combine parts, ignore rest
71 of 120 Public Evaluation Set V2 - 8b9c3697
- Human Solution: attach engines?
72 of 120 Public Evaluation Set V2 - 8e5c0c38
- Human Solution: remove asymmetries
73 of 120 Public Evaluation Set V2 - 8f215267
- Human Solution: count the small pieces
74 of 120 Public Evaluation Set V2 - 8f3a5a89
- Human Solution: draw the border
75 of 120 Public Evaluation Set V2 - 9385bd28
- Human Solution: with our powers combined!
76 of 120 Public Evaluation Set V2 - 97d7923e
- Human Solution: pick sticks by length
77 of 120 Public Evaluation Set V2 - 981571dc
- Human Solution: fill in the missing blocks through symmetry
78 of 120 Public Evaluation Set V2 - 9aaea919
- Human Solution: stacking pots?
80 of 120 Public Evaluation Set V2 - a251c730
- Human Solution: why the fields of flowers?
81 of 120 Public Evaluation Set V2 - a25697e4
- Human Solution: sticking the right end in the hole
83 of 120 Public Evaluation Set V2 - a395ee82
- Human Solution: clone a template
84 of 120 Public Evaluation Set V2 - a47bf94d
- Human Solution: separating interlocked, same value pieces
85 of 120 Public Evaluation Set V2 - a6f40cea
- Human Solution: predict what is hidden
86 of 120 Public Evaluation Set V2 - aa4ec2a5
- Human Solution: make a face!
87 of 120 Public Evaluation Set V2 - abc82100
- Human Solution: encoded templates
88 of 120 Public Evaluation Set V2 - b0039139
- Human Solution: Template, quantity, template color, background color
89 of 120 Public Evaluation Set V2 - b10624e5
- Human Solution: generalize features
90 of 120 Public Evaluation Set V2 - b5ca7ac4
- Human Solution: separate by category
91 of 120 Public Evaluation Set V2 - b6f77b65
- Human Solution: removing colors
92 of 120 Public Evaluation Set V2 - 16de56c4
- Human Solution: basic line patterns
93 of 120 Public Evaluation Set V2 - b99e7126
- Human Solution: blanket pattern?
95 of 120 Public Evaluation Set V2 - bf45cf4b
- Human Solution: clone the template according to the symbol
96 of 120 Public Evaluation Set V2 - c4d067a0
- Human Solution: extending a template according to symbols
97 of 120 Public Evaluation Set V2 - c7f57c3e
- Human Solution: abstract templates (ignore size) or link pieces by color
98 of 120 Public Evaluation Set V2 - cb2d8a2c
- Human Solution: decoding traffic signal
99 of 120 Public Evaluation Set V2 - cbebaa4b
- Human Solution: linking puzzle
101 of 120 Public Evaluation Set V2 - d59b0160
- Human Solution: filter sets by key
102 of 120 Public Evaluation Set V2 - d8e07eb2
- Human Solution: areSymbolsInLine?
103 of 120 Public Evaluation Set V2 - da515329
- Human Solution: the cross bends back until it reaches one block from another source branch. Requires growing cross beyond grid boundaries & then cropping. Reverse rotation a possible solution, but never shown in examples.
104 of 120 Public Evaluation Set V2 - db0c5428
- Human Solution: inverse explosion
105 of 120 Public Evaluation Set V2 - db695cfb
- Human Solution: crossing lines
106 of 120 Public Evaluation Set V2 - dbff022c
- Human Solution: Solve the color swirl
107 of 120 Public Evaluation Set V2 - dd6b8c4b
- Human Solution: prioritization
109 of 120 Public Evaluation Set V2 - dfadab01
- Human Solution: looks like the real test intent here is avoiding contamination from the output pieces that are in the input graph
111 of 120 Public Evaluation Set V2 - e3721c99
- Human Solution: color by hole
112 of 120 Public Evaluation Set V2 - e376de54
- Human Solution: average length
113 of 120 Public Evaluation Set V2 - e8686506
- Human Solution: another combination puzzle
114 of 120 Public Evaluation Set V2 - e87109e9
- Human Solution: symbols to directions
115 of 120 Public Evaluation Set V2 - edb79dae
- Human Solution: shape & color from different symbols
118 of 120 Public Evaluation Set V2 - f931b4a8
- Human Solution: grid size & symbols
120 of 120 Public Evaluation Set V2 - fc7cae8d
- Human Solution: rotate or flip according to color