Skip to content

WIP: Update Java bindings and add cstest_java implementation#2904

Draft
peace-maker wants to merge 4 commits into
capstone-engine:nextfrom
peace-maker:java_bindings
Draft

WIP: Update Java bindings and add cstest_java implementation#2904
peace-maker wants to merge 4 commits into
capstone-engine:nextfrom
peace-maker:java_bindings

Conversation

@peace-maker
Copy link
Copy Markdown
Contributor

@peace-maker peace-maker commented Apr 24, 2026

Your checklist for this pull request

  • I've documented or updated the documentation of every API function and struct this PR changes.
  • I've added tests that prove my fix is effective or that my feature works (if possible)

Detailed description

This brings the existing Java bindings in sync with the C implementation and adds all other architectures that got added to Capstone over the years. I've started generating writing the cstest port last year for fun and I'm not invested in Java at all. This was just an exercise in burning tokens and maybe doing something useful on the way. The cstest_java implementation is based on the Python version cstest_py.

@Rot127 mentioned several times, that the golden future would be to write a binding generator (#2468) which keeps the C implementation in sync with the bindings. This PR helps by having a known good baseline to compare against when someone actually starts writing such a generator. But it adds another binding that gets out of sync before the generator exists too. :/

There are some implementation problems where feedback by other developers actually using Capstone in their Java projects is useful. Java doesn't support unsigned integers, so some values might be represented as a negative number when it is supposed to be unsigned. Maybe use JNA's IntegerType for unsigned types. How important is backwards compatibility?

This is work in progress because I'd like some feedback about handling unsigned integers and packaging. The binding itself is fully up-to-date and working.

Todo:

  • Compatibility layer for the arm64 -> aarch64 rename?
  • Package libcapstone.so with JNA and publish on Maven?
  • Handle unsigned integers
  • Documentation
  • Implement skipdata callback?

Test plan

You can test the bindings by running the cstest_java tool. The tester requires Java 21 and having capstone built/installed.

apt install libjna-java
cd bindings/java
make cstest

Fixes #2641
Fixes #1641
Fixes #2470
Refs #2291

Bring the existing bindings in sync with the C implementation and add all other architectures that got added to Capstone over the years.

Java doesn't support unsigned integers, so some values might be represented as a negative number when it is supposed to be unsigned. Some more thought is required to work around this.

There is no compatibility layer for the arm64 -> aarch64 rename yet.
Parses the yaml test cases and compares the disassembly the same as cstool does.

Copilot helped with the busy work.
The cstest Java implementation was a bit stricter about lists in the yaml and uncovered some bad tests.
@github-actions github-actions Bot added X86 Arch java bindings Github-files Github related files labels Apr 24, 2026
@peace-maker peace-maker marked this pull request as draft April 24, 2026 22:19
Copy link
Copy Markdown
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is backwards compatibility?

At this point not important anymore I think. The Java bindings were not updated for a long time. And are possibly no longer in use.

But if we update them now, this API will be the new base line and should not change afterwards. So we need to make those decisions now and make them properly.

I wouldn't want to have people change their API again, just because we added the generator.
API stability is one of the things we wanted to annoy people only once with v6 (because if they switch from v4->v6 there are way more breaking changes anyways, unrelated to Auto-Sync).

Maybe use JNA's IntegerType for unsigned types.

I think it depends how available this specific dependency is on embedded systems.
Currently we don't have diet mode, but in theory at least Capstone should be able to run on a embedded device with a very minimal JVM. Apparently, JavaCard only supports the native types, not even Integer. So I think having IntegerType might be too much.
But maybe my whole "Capstone on a minimal JVM" is not good to begin with.

But we should ask someone with Java knowledge. Better not free style that one.

This was just an exercise in burning tokens and maybe doing something useful on the way.

How much of the changes here are AI generated?
Generally it is fine to use AI. But you absolutely must ensure that the testing code of the java bindings is rock solid. No compromises there.
Same for the API. It must be properly done. For the stuff in between we can then get a test coverage of close to 100%.

The tester requires Java 21 and having capstone built/installed.

Why is Java 21 needed?
Do you think there is a way to also support old Java versions?
Having support for old systems is one of the features of Capstone.
(also related to the embedded argument from above).

@Skyerhw @pabx06 @ByThePowerOfScience @PAX523 I hope Github notifies you. Would you mind giving your feedback here?

@@ -0,0 +1,182 @@
// Copyright © 2025 Peace-Maker <[email protected]>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use SPDX style copyright header

@Rot127
Copy link
Copy Markdown
Collaborator

Rot127 commented Apr 27, 2026

Will take a more closer look in the coming days, but regarding those ones:

Compatibility layer for the arm64 -> aarch64 rename?

Shouldn't be too complicated. And it is a nice feature to make the dev headache less intense.

Implement skipdata callback?

Yes, please.

@peace-maker
Copy link
Copy Markdown
Contributor Author

Maybe use JNA's IntegerType for unsigned types.

I think it depends how available this specific dependency is on embedded systems. Currently we don't have diet mode, but in theory at least Capstone should be able to run on a embedded device with a very minimal JVM. Apparently, JavaCard only supports the native types, not even Integer. So I think having IntegerType might be too much. But maybe my whole "Capstone on a minimal JVM" is not good to begin with.

We are already using the Java Native Access library for easy FFI, the old bindings used that library too. They claim Java 1.4 compatibility. So using their abstraction for unsigned types doesn't cause breakage on embedded systems I'd assume.
They just require you to work with the object instead of literal numbers, which might be annoying. You can represent the unsigned integers with the corresponding larger integer type in Java (uint16_t -> int, uint32_t -> long). But uint64_t wouldn't work. That could be abstracted away to some extend using getter/setter methods maybe.

How much of the changes here are AI generated? Generally it is fine to use AI. But you absolutely must ensure that the testing code of the java bindings is rock solid. No compromises there. Same for the API. It must be properly done. For the stuff in between we can then get a test coverage of close to 100%.

I've pasted the python code in the file and told the AI to transpile it to Java. That didn't work 100% so I had to review all of the code manually, thus I'm confident the end result is good. I cross referenced the cstest C implementation too and even noticed some discrepancies there. I didn't want to include those changes in this branch since they might require separate discussion and can be merged independently. But the test fixes are the same territory.

diff --git a/bindings/python/capstone/sparc.py b/bindings/python/capstone/sparc.py
index 3ccbcf68..15cf1928 100644
--- a/bindings/python/capstone/sparc.py
+++ b/bindings/python/capstone/sparc.py
@@ -56,7 +56,7 @@ class CsSparc(ctypes.Structure):
         ('hint', ctypes.c_uint),
         ('format', ctypes.c_uint),
         ('op_count', ctypes.c_uint8),
-        ('operands', SparcOp * 4),
+        ('operands', SparcOp * 6),
     )

 def get_arch_info(a):
diff --git a/bindings/python/cstest_py/src/cstest_py/details.py b/bindings/python/cstest_py/src/cstest_py/details.py
index 34a04a9b..79353318 100644
--- a/bindings/python/cstest_py/src/cstest_py/details.py
+++ b/bindings/python/cstest_py/src/cstest_py/details.py
@@ -1348,7 +1348,7 @@ def test_expected_riscv(actual: CsInsn, expected: dict) -> bool:
             if not compare_int64(aop.mem.disp, eop.get("mem_disp"), "mem_disp"):
                 return False
         elif aop.type == RISCV_OP_FP:
-            if not compare_fp(aop.dimm, eop.get("dimm"), "dimm"):
+            if not compare_dp(aop.dimm, eop.get("dimm"), "dimm"):
                 return False
         elif aop.type == RISCV_OP_CSR:
             if not compare_uint16(aop.csr, capstone.riscv.SYSREG_NAME_TO_VAL[eop.get("csr")], "csr"):
@@ -1399,6 +1399,10 @@ def test_expected_SystemZ(actual: CsInsn, expected: dict) -> bool:
         len(actual.operands), len(expected.get("operands")), "operands_count"
     ):
         return False
+    elif not compare_enum(
+        actual.cc, expected.get("cc"), "cc"
+    ):
+        return False
     elif not compare_enum(
         actual.format, expected.get("format"), "format"
     ):
@@ -1416,7 +1420,7 @@ def test_expected_SystemZ(actual: CsInsn, expected: dict) -> bool:
         elif aop.type == SYSTEMZ_OP_IMM:
             if not compare_int64(aop.imm, eop.get("imm"), "imm"):
                 return False
-            if not compare_int64(aop.imm_width, eop.get("imm_width"), "imm_width"):
+            if not compare_uint8(aop.imm_width, eop.get("imm_width"), "imm_width"):
                 return False
         elif aop.type == SYSTEMZ_OP_MEM:
             if not compare_enum(aop.mem.am, eop.get("mem_am"), "mem_am"):
@@ -1581,7 +1585,7 @@ def test_expected_arc(actual: CsInsn, expected: dict) -> bool:
             if not compare_reg(actual, aop.reg, eop.get("reg"), "reg"):
                 return False
         elif aop.type == ARC_OP_IMM:
-            if not compare_int32(aop.imm, eop.get("imm"), "imm"):
+            if not compare_int64(aop.imm, eop.get("imm"), "imm"):
                 return False
         else:
             raise ValueError("Operand type not handled.")
diff --git a/suite/cstest/include/test_detail_systemz.h b/suite/cstest/include/test_detail_systemz.h
index 13f4f3ad..792d8f4e 100644
--- a/suite/cstest/include/test_detail_systemz.h
+++ b/suite/cstest/include/test_detail_systemz.h
@@ -55,12 +55,15 @@ static const cyaml_schema_value_t test_detail_systemz_op_schema = {
 };

 typedef struct {
+       char *cc;
        char *format;
        TestDetailSystemZOp **operands;
        uint32_t operands_count;
 } TestDetailSystemZ;

 static const cyaml_schema_field_t test_detail_systemz_mapping_schema[] = {
+       CYAML_FIELD_STRING_PTR("cc", CYAML_FLAG_POINTER | CYAML_FLAG_OPTIONAL,
+                              TestDetailSystemZ, cc, 0, CYAML_UNLIMITED),
        CYAML_FIELD_STRING_PTR("format",
                               CYAML_FLAG_POINTER | CYAML_FLAG_OPTIONAL,
                               TestDetailSystemZ, format, 0, CYAML_UNLIMITED),
diff --git a/suite/cstest/src/test_detail.c b/suite/cstest/src/test_detail.c
index beba988e..eb811b1b 100644
--- a/suite/cstest/src/test_detail.c
+++ b/suite/cstest/src/test_detail.c
@@ -259,22 +259,18 @@ static bool test_reg_rw_access(csh *handle, const cs_insn *insn,
                return false;
        }

-       if (expected->regs_read_count > 0) {
-               compare_uint32_ret(regs_read_count, expected->regs_read_count,
-                                  false);
-               for (size_t i = 0; i < regs_read_count; ++i) {
-                       compare_reg_ret(*handle, regs_read[i],
-                                       expected->regs_read[i], false);
-               }
+       compare_uint32_ret(regs_read_count, expected->regs_read_count,
+                               false);
+       for (size_t i = 0; i < regs_read_count; ++i) {
+               compare_reg_ret(*handle, regs_read[i],
+                               expected->regs_read[i], false);
        }

-       if (expected->regs_write_count > 0) {
-               compare_uint32_ret(regs_write_count, expected->regs_write_count,
-                                  false);
-               for (size_t i = 0; i < regs_write_count; ++i) {
-                       compare_reg_ret(*handle, regs_write[i],
-                                       expected->regs_write[i], false);
-               }
+       compare_uint32_ret(regs_write_count, expected->regs_write_count,
+                               false);
+       for (size_t i = 0; i < regs_write_count; ++i) {
+               compare_reg_ret(*handle, regs_write[i],
+                               expected->regs_write[i], false);
        }
        return true;
 }
diff --git a/suite/cstest/src/test_detail_systemz.c b/suite/cstest/src/test_detail_systemz.c
index 578531ca..85777bdb 100644
--- a/suite/cstest/src/test_detail_systemz.c
+++ b/suite/cstest/src/test_detail_systemz.c
@@ -85,6 +85,7 @@ bool test_expected_systemz(csh *handle, const cs_systemz *actual,
 {
        assert(handle && actual && expected);

+       compare_enum_ret(actual->cc, expected->cc, false);
        compare_enum_ret(actual->format, expected->format, false);
        compare_uint8_ret(actual->op_count, expected->operands_count, false);
        for (size_t i = 0; i < actual->op_count; ++i) {

Why is Java 21 needed?

Only the cstest_java project requires Java 21 since that's been the default when generating a new Gradle project. I don't think that's important though since the actual capstone bindings are still Java 1.4 I guess.

@Rot127
Copy link
Copy Markdown
Collaborator

Rot127 commented Apr 28, 2026

Nice catch with the test tools!
Please open a separate PR.

Copy link
Copy Markdown
Collaborator

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good.

One thing which is missing are the docs. Once in the bindings dir itself, and once in the v6 release guide.

I couldn't check yet the binding files but checked all the test code.
So pretty confident it works.

We will need some testing anyways from users. Its too big to catch all in one PR.

Thanks a lot for the initiative! Really appreciate it!

Comment on lines +27 to +30

# - name: Setup MSVC
# if: runner.os == 'Windows'
# uses: ilammy/msvc-dev-cmd@v1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# - name: Setup MSVC
# if: runner.os == 'Windows'
# uses: ilammy/msvc-dev-cmd@v1

Comment on lines +177 to +192

// String repoRoot = getRepoRoot();
// if (repoRoot != null) {
// options.addOption(Option.builder()
// .desc("Directory to search for .yaml test files.")
// .hasArg()
// .argName("search_dir")
// .get());
// } else {
// options.addOption(Option.builder()
// .desc("Directory to search for .yaml test files.")
// .hasArg()
// .argName("search_dir")
// .required()
// .get());
// }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

}
return cmd;
}
public static void main(String[] args) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line


for (String flag : expected) {
long enumVal = cs_const_getattr(flag);
if ((actual & enumVal) == 0L) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((actual & enumVal) == 0L) {
if ((actual & enumVal) != enumVal) {

Just in case we have enum values with multiple bits set, but actual is 1+ missing.

Comment on lines +46 to +49
if (address instanceof Integer) {
this.address = (Integer) address;
} else if (address instanceof Long) {
this.address = (Long) address;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we always cast to Long? Above the field is also defined as primitive long

Comment on lines +124 to +142
if (expected.containsKey("x86")) {
return X86Details.testExpected(insn, (Map<String, Object>)expected.get("x86"));
}
if (expected.containsKey("aarch64")) {
return Aarch64Details.testExpected(insn, (Map<String, Object>)expected.get("aarch64"));
}
if (expected.containsKey("ppc")) {
return PpcDetails.testExpected(insn, (Map<String, Object>)expected.get("ppc"));
}
if (expected.containsKey("arm")) {
return ArmDetails.testExpected(insn, (Map<String, Object>)expected.get("arm"));
}
if (expected.containsKey("m680x")) {
return M680xDetails.testExpected(insn, (Map<String, Object>)expected.get("m680x"));
}
if (expected.containsKey("sparc")) {
return SparcDetails.testExpected(insn, (Map<String, Object>)expected.get("sparc"));
}
if (expected.containsKey("tricore")) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those can be else if

}

for (int i = 0; i < insn.groups.length; i++) {
int group = insn.groups[i] & 0xff;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the 0xff to a constant with name and docs (only the lower 8bits are general groups. Above 8bits we have arch specific groups.). Otherwise it is hard to tell what it does.

Ppc.Operand i = (Ppc.Operand) operands.op[c];
if (operands.operands.length != 0) {
System.out.printf("\top_count: %d\n", operands.operands.length);
for (int c=0; c<operands.operands.length; c++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default linting for Java?
We should probably add it. Or at least run it once here.

Comment on lines +84 to +85
// if (operands.writeback)
// System.out.println("\tWrite-back: True");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if (operands.writeback)
// System.out.println("\tWrite-back: True");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you generate this one? I remember the const_generator was broken for RISCV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Github-files Github related files java bindings X86 Arch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Java Bindings: New Constants Not In Jar Why are the groups of print instructions all 0? java binding not maintained and doesn't work

2 participants