WIP: Update Java bindings and add cstest_java implementation#2904
WIP: Update Java bindings and add cstest_java implementation#2904peace-maker wants to merge 4 commits into
Conversation
Bring the existing bindings in sync with the C implementation and add all other architectures that got added to Capstone over the years. Java doesn't support unsigned integers, so some values might be represented as a negative number when it is supposed to be unsigned. Some more thought is required to work around this. There is no compatibility layer for the arm64 -> aarch64 rename yet.
Parses the yaml test cases and compares the disassembly the same as cstool does. Copilot helped with the busy work.
The cstest Java implementation was a bit stricter about lists in the yaml and uncovered some bad tests.
Rot127
left a comment
There was a problem hiding this comment.
How important is backwards compatibility?
At this point not important anymore I think. The Java bindings were not updated for a long time. And are possibly no longer in use.
But if we update them now, this API will be the new base line and should not change afterwards. So we need to make those decisions now and make them properly.
I wouldn't want to have people change their API again, just because we added the generator.
API stability is one of the things we wanted to annoy people only once with v6 (because if they switch from v4->v6 there are way more breaking changes anyways, unrelated to Auto-Sync).
Maybe use JNA's IntegerType for unsigned types.
I think it depends how available this specific dependency is on embedded systems.
Currently we don't have diet mode, but in theory at least Capstone should be able to run on a embedded device with a very minimal JVM. Apparently, JavaCard only supports the native types, not even Integer. So I think having IntegerType might be too much.
But maybe my whole "Capstone on a minimal JVM" is not good to begin with.
But we should ask someone with Java knowledge. Better not free style that one.
This was just an exercise in burning tokens and maybe doing something useful on the way.
How much of the changes here are AI generated?
Generally it is fine to use AI. But you absolutely must ensure that the testing code of the java bindings is rock solid. No compromises there.
Same for the API. It must be properly done. For the stuff in between we can then get a test coverage of close to 100%.
The tester requires Java 21 and having capstone built/installed.
Why is Java 21 needed?
Do you think there is a way to also support old Java versions?
Having support for old systems is one of the features of Capstone.
(also related to the embedded argument from above).
@Skyerhw @pabx06 @ByThePowerOfScience @PAX523 I hope Github notifies you. Would you mind giving your feedback here?
| @@ -0,0 +1,182 @@ | |||
| // Copyright © 2025 Peace-Maker <[email protected]> | |||
There was a problem hiding this comment.
Please use SPDX style copyright header
|
Will take a more closer look in the coming days, but regarding those ones:
Shouldn't be too complicated. And it is a nice feature to make the dev headache less intense.
Yes, please. |
We are already using the Java Native Access library for easy FFI, the old bindings used that library too. They claim Java 1.4 compatibility. So using their abstraction for unsigned types doesn't cause breakage on embedded systems I'd assume.
I've pasted the python code in the file and told the AI to transpile it to Java. That didn't work 100% so I had to review all of the code manually, thus I'm confident the end result is good. I cross referenced the cstest C implementation too and even noticed some discrepancies there. I didn't want to include those changes in this branch since they might require separate discussion and can be merged independently. But the test fixes are the same territory. diff --git a/bindings/python/capstone/sparc.py b/bindings/python/capstone/sparc.py
index 3ccbcf68..15cf1928 100644
--- a/bindings/python/capstone/sparc.py
+++ b/bindings/python/capstone/sparc.py
@@ -56,7 +56,7 @@ class CsSparc(ctypes.Structure):
('hint', ctypes.c_uint),
('format', ctypes.c_uint),
('op_count', ctypes.c_uint8),
- ('operands', SparcOp * 4),
+ ('operands', SparcOp * 6),
)
def get_arch_info(a):
diff --git a/bindings/python/cstest_py/src/cstest_py/details.py b/bindings/python/cstest_py/src/cstest_py/details.py
index 34a04a9b..79353318 100644
--- a/bindings/python/cstest_py/src/cstest_py/details.py
+++ b/bindings/python/cstest_py/src/cstest_py/details.py
@@ -1348,7 +1348,7 @@ def test_expected_riscv(actual: CsInsn, expected: dict) -> bool:
if not compare_int64(aop.mem.disp, eop.get("mem_disp"), "mem_disp"):
return False
elif aop.type == RISCV_OP_FP:
- if not compare_fp(aop.dimm, eop.get("dimm"), "dimm"):
+ if not compare_dp(aop.dimm, eop.get("dimm"), "dimm"):
return False
elif aop.type == RISCV_OP_CSR:
if not compare_uint16(aop.csr, capstone.riscv.SYSREG_NAME_TO_VAL[eop.get("csr")], "csr"):
@@ -1399,6 +1399,10 @@ def test_expected_SystemZ(actual: CsInsn, expected: dict) -> bool:
len(actual.operands), len(expected.get("operands")), "operands_count"
):
return False
+ elif not compare_enum(
+ actual.cc, expected.get("cc"), "cc"
+ ):
+ return False
elif not compare_enum(
actual.format, expected.get("format"), "format"
):
@@ -1416,7 +1420,7 @@ def test_expected_SystemZ(actual: CsInsn, expected: dict) -> bool:
elif aop.type == SYSTEMZ_OP_IMM:
if not compare_int64(aop.imm, eop.get("imm"), "imm"):
return False
- if not compare_int64(aop.imm_width, eop.get("imm_width"), "imm_width"):
+ if not compare_uint8(aop.imm_width, eop.get("imm_width"), "imm_width"):
return False
elif aop.type == SYSTEMZ_OP_MEM:
if not compare_enum(aop.mem.am, eop.get("mem_am"), "mem_am"):
@@ -1581,7 +1585,7 @@ def test_expected_arc(actual: CsInsn, expected: dict) -> bool:
if not compare_reg(actual, aop.reg, eop.get("reg"), "reg"):
return False
elif aop.type == ARC_OP_IMM:
- if not compare_int32(aop.imm, eop.get("imm"), "imm"):
+ if not compare_int64(aop.imm, eop.get("imm"), "imm"):
return False
else:
raise ValueError("Operand type not handled.")
diff --git a/suite/cstest/include/test_detail_systemz.h b/suite/cstest/include/test_detail_systemz.h
index 13f4f3ad..792d8f4e 100644
--- a/suite/cstest/include/test_detail_systemz.h
+++ b/suite/cstest/include/test_detail_systemz.h
@@ -55,12 +55,15 @@ static const cyaml_schema_value_t test_detail_systemz_op_schema = {
};
typedef struct {
+ char *cc;
char *format;
TestDetailSystemZOp **operands;
uint32_t operands_count;
} TestDetailSystemZ;
static const cyaml_schema_field_t test_detail_systemz_mapping_schema[] = {
+ CYAML_FIELD_STRING_PTR("cc", CYAML_FLAG_POINTER | CYAML_FLAG_OPTIONAL,
+ TestDetailSystemZ, cc, 0, CYAML_UNLIMITED),
CYAML_FIELD_STRING_PTR("format",
CYAML_FLAG_POINTER | CYAML_FLAG_OPTIONAL,
TestDetailSystemZ, format, 0, CYAML_UNLIMITED),
diff --git a/suite/cstest/src/test_detail.c b/suite/cstest/src/test_detail.c
index beba988e..eb811b1b 100644
--- a/suite/cstest/src/test_detail.c
+++ b/suite/cstest/src/test_detail.c
@@ -259,22 +259,18 @@ static bool test_reg_rw_access(csh *handle, const cs_insn *insn,
return false;
}
- if (expected->regs_read_count > 0) {
- compare_uint32_ret(regs_read_count, expected->regs_read_count,
- false);
- for (size_t i = 0; i < regs_read_count; ++i) {
- compare_reg_ret(*handle, regs_read[i],
- expected->regs_read[i], false);
- }
+ compare_uint32_ret(regs_read_count, expected->regs_read_count,
+ false);
+ for (size_t i = 0; i < regs_read_count; ++i) {
+ compare_reg_ret(*handle, regs_read[i],
+ expected->regs_read[i], false);
}
- if (expected->regs_write_count > 0) {
- compare_uint32_ret(regs_write_count, expected->regs_write_count,
- false);
- for (size_t i = 0; i < regs_write_count; ++i) {
- compare_reg_ret(*handle, regs_write[i],
- expected->regs_write[i], false);
- }
+ compare_uint32_ret(regs_write_count, expected->regs_write_count,
+ false);
+ for (size_t i = 0; i < regs_write_count; ++i) {
+ compare_reg_ret(*handle, regs_write[i],
+ expected->regs_write[i], false);
}
return true;
}
diff --git a/suite/cstest/src/test_detail_systemz.c b/suite/cstest/src/test_detail_systemz.c
index 578531ca..85777bdb 100644
--- a/suite/cstest/src/test_detail_systemz.c
+++ b/suite/cstest/src/test_detail_systemz.c
@@ -85,6 +85,7 @@ bool test_expected_systemz(csh *handle, const cs_systemz *actual,
{
assert(handle && actual && expected);
+ compare_enum_ret(actual->cc, expected->cc, false);
compare_enum_ret(actual->format, expected->format, false);
compare_uint8_ret(actual->op_count, expected->operands_count, false);
for (size_t i = 0; i < actual->op_count; ++i) {
Only the cstest_java project requires Java 21 since that's been the default when generating a new Gradle project. I don't think that's important though since the actual capstone bindings are still Java 1.4 I guess. |
|
Nice catch with the test tools! |
There was a problem hiding this comment.
Looks really good.
One thing which is missing are the docs. Once in the bindings dir itself, and once in the v6 release guide.
I couldn't check yet the binding files but checked all the test code.
So pretty confident it works.
We will need some testing anyways from users. Its too big to catch all in one PR.
Thanks a lot for the initiative! Really appreciate it!
|
|
||
| # - name: Setup MSVC | ||
| # if: runner.os == 'Windows' | ||
| # uses: ilammy/msvc-dev-cmd@v1 |
There was a problem hiding this comment.
| # - name: Setup MSVC | |
| # if: runner.os == 'Windows' | |
| # uses: ilammy/msvc-dev-cmd@v1 |
|
|
||
| // String repoRoot = getRepoRoot(); | ||
| // if (repoRoot != null) { | ||
| // options.addOption(Option.builder() | ||
| // .desc("Directory to search for .yaml test files.") | ||
| // .hasArg() | ||
| // .argName("search_dir") | ||
| // .get()); | ||
| // } else { | ||
| // options.addOption(Option.builder() | ||
| // .desc("Directory to search for .yaml test files.") | ||
| // .hasArg() | ||
| // .argName("search_dir") | ||
| // .required() | ||
| // .get()); | ||
| // } |
| } | ||
| return cmd; | ||
| } | ||
| public static void main(String[] args) { |
|
|
||
| for (String flag : expected) { | ||
| long enumVal = cs_const_getattr(flag); | ||
| if ((actual & enumVal) == 0L) { |
There was a problem hiding this comment.
| if ((actual & enumVal) == 0L) { | |
| if ((actual & enumVal) != enumVal) { |
Just in case we have enum values with multiple bits set, but actual is 1+ missing.
| if (address instanceof Integer) { | ||
| this.address = (Integer) address; | ||
| } else if (address instanceof Long) { | ||
| this.address = (Long) address; |
There was a problem hiding this comment.
Shouldn't we always cast to Long? Above the field is also defined as primitive long
| if (expected.containsKey("x86")) { | ||
| return X86Details.testExpected(insn, (Map<String, Object>)expected.get("x86")); | ||
| } | ||
| if (expected.containsKey("aarch64")) { | ||
| return Aarch64Details.testExpected(insn, (Map<String, Object>)expected.get("aarch64")); | ||
| } | ||
| if (expected.containsKey("ppc")) { | ||
| return PpcDetails.testExpected(insn, (Map<String, Object>)expected.get("ppc")); | ||
| } | ||
| if (expected.containsKey("arm")) { | ||
| return ArmDetails.testExpected(insn, (Map<String, Object>)expected.get("arm")); | ||
| } | ||
| if (expected.containsKey("m680x")) { | ||
| return M680xDetails.testExpected(insn, (Map<String, Object>)expected.get("m680x")); | ||
| } | ||
| if (expected.containsKey("sparc")) { | ||
| return SparcDetails.testExpected(insn, (Map<String, Object>)expected.get("sparc")); | ||
| } | ||
| if (expected.containsKey("tricore")) { |
| } | ||
|
|
||
| for (int i = 0; i < insn.groups.length; i++) { | ||
| int group = insn.groups[i] & 0xff; |
There was a problem hiding this comment.
Move the 0xff to a constant with name and docs (only the lower 8bits are general groups. Above 8bits we have arch specific groups.). Otherwise it is hard to tell what it does.
| Ppc.Operand i = (Ppc.Operand) operands.op[c]; | ||
| if (operands.operands.length != 0) { | ||
| System.out.printf("\top_count: %d\n", operands.operands.length); | ||
| for (int c=0; c<operands.operands.length; c++) { |
There was a problem hiding this comment.
What is the default linting for Java?
We should probably add it. Or at least run it once here.
| // if (operands.writeback) | ||
| // System.out.println("\tWrite-back: True"); |
There was a problem hiding this comment.
| // if (operands.writeback) | |
| // System.out.println("\tWrite-back: True"); |
There was a problem hiding this comment.
How did you generate this one? I remember the const_generator was broken for RISCV.
Your checklist for this pull request
Detailed description
This brings the existing Java bindings in sync with the C implementation and adds all other architectures that got added to Capstone over the years. I've started
generatingwriting the cstest port last year for fun and I'm not invested in Java at all. This was just an exercise in burning tokens and maybe doing something useful on the way. The cstest_java implementation is based on the Python version cstest_py.@Rot127 mentioned several times, that the golden future would be to write a binding generator (#2468) which keeps the C implementation in sync with the bindings. This PR helps by having a known good baseline to compare against when someone actually starts writing such a generator. But it adds another binding that gets out of sync before the generator exists too. :/
There are some implementation problems where feedback by other developers actually using Capstone in their Java projects is useful. Java doesn't support unsigned integers, so some values might be represented as a negative number when it is supposed to be unsigned. Maybe use JNA's
IntegerTypefor unsigned types. How important is backwards compatibility?This is work in progress because I'd like some feedback about handling unsigned integers and packaging. The binding itself is fully up-to-date and working.
Todo:
Test plan
You can test the bindings by running the cstest_java tool. The tester requires Java 21 and having capstone built/installed.
apt install libjna-java cd bindings/java make cstestFixes #2641
Fixes #1641
Fixes #2470
Refs #2291