Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a6ec7c5
Introduction of translation
JulesViennotFranca Dec 17, 2024
9de6f21
Cleaning
JulesViennotFranca Dec 17, 2024
e40d15e
Some explanations and instructions in the README.md
JulesViennotFranca Dec 17, 2024
58d406d
Removing the miniF2F folder to replace it by a smaller version
JulesViennotFranca Dec 17, 2024
b1e7fc4
Extracting the resulting theorem in Coq
JulesViennotFranca Dec 18, 2024
1a99652
Renaming
JulesViennotFranca Dec 18, 2024
c8a2c5b
Using petanque in the translation
JulesViennotFranca Dec 18, 2024
68dd401
Better prompt and adding historic of previous translations
JulesViennotFranca Dec 19, 2024
252fb83
Finishing adapting translation to search mechanism and petanque
JulesViennotFranca Dec 19, 2024
e114d17
Some fixing
JulesViennotFranca Dec 19, 2024
c846365
Adding _CoqProject to the .gitignore
JulesViennotFranca Dec 19, 2024
e31d304
Remove useless last agent query in naive search
JulesViennotFranca Jan 15, 2025
8f2dc59
Merge branch 'LLM4Coq:main' into main
JulesViennotFranca Jan 15, 2025
2bdce02
New agent config
JulesViennotFranca Jan 14, 2025
d88f7c6
Support for human agent and anthropic agents
JulesViennotFranca Jan 14, 2025
c719b98
Adding human interactions
JulesViennotFranca Jan 14, 2025
2312b56
Using get state and run tac for translation
JulesViennotFranca Jan 14, 2025
ee005c7
Improving prompt: hint for complex numbers
JulesViennotFranca Jan 15, 2025
ebe7311
Loc of the result file
JulesViennotFranca Jan 23, 2025
ba3b1fb
More agent configs
JulesViennotFranca Jan 23, 2025
81d4de7
Benchmark added
JulesViennotFranca Jan 23, 2025
700b38b
new prompts
mlelarge Dec 17, 2024
317925e
new prompts
mlelarge Dec 17, 2024
8921fc1
anthropic
mlelarge Dec 17, 2024
b4cebce
added log_dir
mlelarge Dec 17, 2024
8145033
context + replay bench
mlelarge Dec 19, 2024
8fd8d0e
context + replay bench
mlelarge Dec 19, 2024
894792f
delete log file in init + remove env with too many holes in bs
mlelarge Dec 20, 2024
68caaef
remove pb ths
mlelarge Dec 20, 2024
7450bcc
bug search
mlelarge Dec 20, 2024
a0fe8d4
temp cut search too many holes
mlelarge Dec 23, 2024
997f06a
ghost weave
mlelarge Dec 23, 2024
f7476a7
ascii
mlelarge Jan 8, 2025
f8b5745
Add anthropic and mistral deps
gbdrt Jan 15, 2025
17f3f1f
timeout + utf-8
mlelarge Jan 19, 2025
d98815f
Remove useless last agent query in naive search
JulesViennotFranca Jan 15, 2025
f023738
Adding configs
JulesViennotFranca Jan 16, 2025
43514af
Fixing parenthesis issue
JulesViennotFranca Jan 20, 2025
9f1ec41
Removing my_config
JulesViennotFranca Jan 20, 2025
1b90eb3
Improve template useless steps bound
JulesViennotFranca Jan 20, 2025
30f3185
Updating to upstream
JulesViennotFranca Jan 23, 2025
432216e
Better pattern matching
JulesViennotFranca Jan 23, 2025
88b97fe
Improving prompt
JulesViennotFranca Jan 28, 2025
7fc5faa
Simplifying writing the solution in a file
JulesViennotFranca Jan 28, 2025
7863177
Improve the handeling of complex numbers in the prompt
JulesViennotFranca Jan 29, 2025
e99e108
Support for supervision
JulesViennotFranca Jan 29, 2025
2e40581
Better handeling of recurrent errors
JulesViennotFranca Jan 30, 2025
0806997
Merge on branch 'upstream/main'
JulesViennotFranca Feb 7, 2025
78a671a
Prepare to pull request
JulesViennotFranca Feb 7, 2025
e51fc11
Merge on branch 'upstream/main'
JulesViennotFranca Feb 7, 2025
1106153
Ready for pull request
JulesViennotFranca Feb 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
.vscode
.envrc
.python-version
miniF2F
examples/small_miniF2F/coq
__pycache__
poetry.lock
outputs
logs
*.jsonl
*.jsonl
.DS_Store
_CoqProject
53 changes: 49 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving

NLIR leverage LLMs natural language reasoning ability for theorem proving with the Rocq interactive theorem prover (ITP).
We propose two interactive proof protocols both leveraging natural language reasoning:
We propose two interactive proof protocols both leveraging natural language reasoning:

- Tactic-by-tactic proof construction mimics the typical behavior of a standard Coq user: given the current goals, the agent generates one or several tactics that updates the goals and repeats this process until the proof is complete.
- Hierarchical proof templating tries to generate full proofs directly. Failed tactics are then replaced with holes to obtain a proof template. The agent repeats the process of filling each hole until the proof is complete.
- Tactic-by-tactic proof construction mimics the typical behavior of a standard Coq user: given the current goals, the agent generates one or several tactics that updates the goals and repeats this process until the proof is complete.
- Hierarchical proof templating tries to generate full proofs directly. Failed tactics are then replaced with holes to obtain a proof template. The agent repeats the process of filling each hole until the proof is complete.

Our approach’s originality is that although both protocols’ inputs (goals) and outputs (tactics) are Coq code, the agent internally uses natural language as an intermediate representation to analyze the input and guide the code generation.
We couple both protocols with standard search algorithms leveraging feedback from the ITP and using natural language to rerank proof candidates.
Expand Down Expand Up @@ -39,6 +39,7 @@ export OPENAI_PROJECT="your project id"
The default configuration can be found in `conf/config.yaml`.
You can override every field (see below).
E.g., to try the tactics agent without beam search on theorem `foo` defined in `examples/foo.v`:

```
$ python nlir-cli.py workspace=examples file=foo.v theorem=foo search.kind=tactics search.mode=naive
```
Expand All @@ -50,7 +51,7 @@ You should see each iteration of the proof in stdout.
We use [hydra](https://hydra.cc/docs/intro/) to manage the configurations.

```bash
$ python nlir-cli.py --help
$ python nlir-cli.py --help
nlri-cli is powered by Hydra.

There are two possible modes:
Expand Down Expand Up @@ -123,6 +124,50 @@ $ python nlir-cli.py benchmark=example

The conversation logs and the configuration will be stored in `./outputs/date/time/`

### Translating

This respository also provides a translation mechanism. Given a natural language description of a theorem along with its Lean and Isabelle codes, the agent will try to translate the theorem in Rocq.

This add-on main purpose is to translate the set of theorems *miniF2F* found [here](https://github.com/facebookresearch/miniF2F/tree/main). So if used on another set of theorems, the set should have the same structure as the [small_miniF2F](./examples/small_miniF2F/) folder.

There are two modes for this program, one can translate a single theorem by specifying its name or translate the whole set if no theorem is specified.

Additionally, you can also supervise the model by giving him remarks or hints after each error.

The default configuration can be found in `conf/translate_config.yaml`. Every field can be overriden using the [hydra](https://hydra.cc/docs/intro/) syntax:

```bash
$ python translate-cli.py --help
translate-cli is powered by Hydra.

There are two possible modes:
- Use the option `theorem=my_theorem` to translate one theorem.
- With no theorem specified, all theorems are translated.

To enable supervision of the model, use the option `supervise=true`.

Alternatively you can use your own config file with the option `--config-name my_translate_config.yaml`.
Config files should be in the `conf` directory.

== Config ==
Override anything in the config (foo.bar=value)

workspace: examples/small_miniF2F # Path to the set of theorems
theorem: null # Name of the single theorem to be translated
supervise: false # Supervised translation or not
log_dir: "logs/translate" # Directory to store the log files

agent:
model_id: qwencode7b # LLM id
temperature: 1.0 # Temperature used
local: true # Is the model local or not
provider: ollama # Provider of the model


Powered by Hydra (https://hydra.cc)
Use --hydra-help to view Hydra specific help
```

### Using custom config files

You can also try one of the pre-defined benchmark configuration files (or write your own), and override some parameters, e.g.:
Expand Down
51 changes: 0 additions & 51 deletions conf/myconfig.yaml

This file was deleted.

25 changes: 9 additions & 16 deletions conf/my_config.yaml → conf/translate_config.yaml
Original file line number Diff line number Diff line change
@@ -1,40 +1,33 @@
workspace: "../my_nlir/fminiF2F/coq/"
file: null
workspace: "examples/small_miniF2F"
theorem: null
replay: false
benchmark: null
start_theorem: 0
end_theorem: 300
num_theorems: 300
log_dir: "logs"
supervise: false
log_dir: "logs/translate"
weave: false

petanque:
address: 127.0.0.1
port: 8765
timeout: 10
run_opts: null
context: true

search:
kind: template
mode: naive
max_steps: 10
beam_size: 3
n_responses: 4

hydra:
help:
app_name: nlri-cli
app_name: translate-cli

template: |
${hydra.help.header}
There are two possible modes:
- Use options `file=my_file.v` and `theorem=my_thm` to prove one theorem.
- Use option `benchmark=my_bench.yaml` to test a full benchmark.
- Use options `theorem=my_thm` to translate one theorem.
- With no theorem specified, all theorems are translated.

Alternatively you can use your own config file with the option `--config-name myconf.yaml`.
Config files should be in the `conf` directory.
Config files should be in the `conf/translate` directory.

== Config ==
Override anything in the config (foo.bar=value)
Expand All @@ -43,9 +36,9 @@ hydra:

${hydra.help.footer}

# @package _global_
defaults:
- _self_
- agent: gpt-mini
- benchmark: null
- agent: qwencode7b
- override hydra/job_logging: none
- override hydra/hydra_logging: none
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"problem_name": "mathd_algebra_478", "informal_statement": "The volume of a cone is given by the formula $V = \\frac{1}{3}Bh$, where $B$ is the area of the base and $h$ is the height. The area of the base of a cone is 30 square units, and its height is 6.5 units. What is the number of cubic units in its volume? Show that it is 65.", "informal_proof": "We are given that $B = 30$ and $h = 6.5$ and asked to find $\\frac{1}{3}Bh$. We find that \\[\\frac{1}{3}Bh = \\frac{1}{3}(30)(6.5) = (10)(6.5) = 65.\\]"}
1 change: 1 addition & 0 deletions examples/small_miniF2F/informal/valid/amc12a_2015_p10.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"problem_name": "amc12a_2015_p10", "informal_statement": "Integers $x$ and $y$ with $x>y>0$ satisfy $x+y+xy=80$. What is $x$?\n\n$ \\textbf{(A)}\\ 8 \\qquad\\textbf{(B)}\\ 10 \\qquad\\textbf{(C)}\\ 15 \\qquad\\textbf{(D)}\\ 18 \\qquad\\textbf{(E)}\\ 26$ Show that it is \\textbf{(E)}\\ 26.", "informal_proof": "Use [[SFFT]] to get $(x+1)(y+1)=81$. The terms $(x+1)$ and $(y+1)$ must be factors of $81$, which include $1, 3, 9, 27, 81$. Because $x > y$, $x+1$ is equal to $27$ or $81$. But if $x+1=81$, then $y=0$ and so $x=\\textbf{(E)}\\ 26$."}
17 changes: 17 additions & 0 deletions examples/small_miniF2F/isabelle/test/mathd_algebra_478.thy
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
(*
Authors: Wenda Li
*)

theory mathd_algebra_478 imports Complex_Main
begin

theorem mathd_algebra_478:
fixes b h v ::real
assumes "0 < b \<and> 0 < h \<and> 0 < v"
and "v = 1 / 3 * (b * h)"
and "b = 30"
and "h = 13 / 2"
shows "v = 65"
using assms by auto

end
18 changes: 18 additions & 0 deletions examples/small_miniF2F/isabelle/valid/amc12a_2015_p10.thy
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
(*
Authors: Albert Qiaochu Jiang
*)


theory amc12a_2015_p10 imports
Complex_Main
begin

theorem amc12a_2015_p10:
fixes x y:: nat
assumes h0: "0<y"
and h1: "y<x"
and h2: "x+y + (x*y) = 80"
shows "x=26"
sorry

end
87 changes: 87 additions & 0 deletions examples/small_miniF2F/lean/src/minif2f_import.lean
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
import algebra.algebra.basic
import algebra.order.floor
import algebra.associated
import algebra.big_operators.basic
import algebra.big_operators.enat
import algebra.big_operators.order
import algebra.big_operators.pi
import algebra.geom_sum
import algebra.group.pi
import algebra.group.commute
import algebra.group_power.basic
import algebra.group_power.identities
import algebra.order.floor
import algebra.quadratic_discriminant
import algebra.ring.basic
import analysis.asymptotics.asymptotic_equivalent
import analysis.mean_inequalities
import analysis.normed_space.basic
import analysis.inner_product_space.basic
import analysis.inner_product_space.euclidean_dist
import analysis.normed_space.pi_Lp
import analysis.special_functions.exp
import analysis.special_functions.exp_deriv
import analysis.special_functions.log
import analysis.special_functions.logb
import analysis.special_functions.log_deriv
import analysis.special_functions.pow
import analysis.special_functions.sqrt
import analysis.special_functions.trigonometric.basic
import analysis.special_functions.trigonometric.complex
import combinatorics.simple_graph.basic
import data.complex.basic
import data.complex.exponential
import data.finset.basic
import data.fintype.card
import data.int.basic
import data.int.gcd
import data.int.modeq
import data.int.parity
import data.list.intervals
import data.list.palindrome
import data.multiset.basic
import data.nat.basic
import data.nat.choose.basic
import data.nat.digits
import data.nat.factorial.basic
import data.nat.fib
import data.nat.modeq
import data.nat.multiplicity
import data.nat.parity
import data.nat.prime
import data.pnat.basic
import data.pnat.prime
import data.polynomial
import data.polynomial.basic
import data.polynomial.eval
import data.rat.basic
import data.real.basic
import data.real.ennreal
import data.real.irrational
import data.real.nnreal
import data.real.sqrt
import data.real.golden_ratio
import data.set.finite
import data.sym.sym2
import data.zmod.basic
import dynamics.fixed_points.basic
import field_theory.finite.basic
import geometry.euclidean.basic
import geometry.euclidean.circumcenter
import geometry.euclidean.monge_point
import geometry.euclidean.sphere
import init.data.nat.gcd
import linear_algebra.affine_space.affine_map
import linear_algebra.affine_space.independent
import linear_algebra.affine_space.ordered
import linear_algebra.finite_dimensional
import logic.equiv.basic
import measure_theory.integral.interval_integral
import number_theory.arithmetic_function
import number_theory.legendre_symbol.quadratic_reciprocity
import number_theory.primes_congruent_one
import order.bounds
import order.filter.basic
import order.well_founded
import topology.basic
import topology.instances.nnreal
24 changes: 24 additions & 0 deletions examples/small_miniF2F/lean/src/test.lean
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/-
Copyright (c) 2021 OpenAI. All rights reserved.
Released under Apache 2.0 license as described in the file LICENSE.
Authors: Kunhao Zheng, Kudzo Ahegbebu, Stanislas Polu, David Renshaw, OpenAI GPT-f
-/
import minif2f_import

open_locale big_operators
open_locale nat
open_locale real
open_locale rat

theorem mathd_algebra_478
(b h v : ℝ)
(h₀ : 0 < b ∧ 0 < h ∧ 0 < v)
(h₁ : v = 1 / 3 * (b * h))
(h₂ : b = 30)
(h₃ : h = 13 / 2) :
v = 65 :=
begin
rw [h₂, h₃] at h₁,
rw h₁,
norm_num,
end
21 changes: 21 additions & 0 deletions examples/small_miniF2F/lean/src/valid.lean
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/-
Copyright (c) 2021 OpenAI. All rights reserved.
Released under Apache 2.0 license as described in the file LICENSE.
Authors: Kunhao Zheng, Stanislas Polu, David Renshaw, OpenAI GPT-f
-/
import minif2f_import

open_locale big_operators
open_locale real
open_locale nat
open_locale topological_space

theorem amc12a_2015_p10
(x y : ℤ)
(h₀ : 0 < y)
(h₁ : y < x)
(h₂ : x + y + (x * y) = 80) :
x = 26 :=
begin
sorry
end
1 change: 1 addition & 0 deletions nlir-cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ def main(cfg: DictConfig):
results["success"].append(status.success)
results["steps"].append(status.steps)
with open(res_path, "w") as rf:
print(f"Result writen in {res_path}")
json.dump(results, rf, indent=2)

print(f"\n\n--- Summary ---")
Expand Down
Loading