Skip to content

Conversation

@fabiomadge
Copy link

Java Code Generator for DDM Dialects

Generates Java source files from DDM dialect definitions for Java tooling to build ASTs that Lean can consume.

Generated:

  • Sealed interfaces for categories
  • Record classes for operators
  • Static factory methods (num(1) vs new Num(SourceRange.NONE, ...))
  • Ion serializer for Lean interop

Usage:

lake exe strata javaGen dialect.st com.example.pkg output/

Tests: 12 tests including javac compilation and Java→Ion→Lean roundtrip.

@fabiomadge fabiomadge marked this pull request as ready for review December 23, 2025 04:01
let files := generateDialect simple "com.test"

let dir := "/tmp/strata-java-test"
IO.FS.createDirAll (dir ++ "/com/test")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use writeJavaFiles instead of generateDialect, to prevent needing this awkward part?


#testCompile

elab "#testRoundtrip" : command => do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this test relies on comprehensive.ion and point to the README for updating that file? I can even imagine inlining that README.md here, because it only seems relevant for this test.

#testBoogie

-- Test 11: Generated Java compiles (requires javac)
-- Test 12: Roundtrip - verify Lean can read Java-generated Ion
Copy link
Contributor

@keyboardDrummer keyboardDrummer Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't -- Test 12: Roundtrip - verify Lean can read Java-generated Ion occur after the body of test 11 ? The tests are independent right?

@@ -0,0 +1,302 @@
/-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test 11 seems problematic since it uses javac. Is that even available in CI? I don't see it being installed in the CI script. How it is possible that this test passes in CI?

I think it would be better to have generated .java files checked into source and have a test compare those against what's currently generated.

| ⟨"Init", "Option"⟩ =>
match cat.args[0]? with
| some inner => .optional (syntaxCatToJavaType inner)
| none => .optional (.simple "java.lang.Object")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The none case here (and with "Seq") would be an malformed SyntaxCat. I'd recommend using panic!.

"

/-- Assign unique Java names to all generated types -/
def assignAllNames (d : Dialect) : NameAssignments :=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this isn't (yet) supporting functions or types. I think that's ok if they aren't needed now, but I'd recommend adding a check that they are empty in the dialect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Laurel grammar is using them at the moment, for which this PR is intended. I think we can refactor the Laurel grammar not to use functions or types. Despite reading the DDM documentation their purpose isn't entirely clear to me. I don't think we need any type checking features for Laurel.

- Use writeJavaFiles helper in Test 11 (keyboardDrummer)
- Document Test 12's dependency on comprehensive.ion (keyboardDrummer)
- Fail if javac not found instead of skip (keyboardDrummer)
- Use panic! for malformed SyntaxCat (joehendrix)
- Add doc comments to GeneratedFiles and NameAssignments (joehendrix)
- Collect warnings for unsupported type/function declarations (joehendrix)
@fabiomadge fabiomadge force-pushed the feat/java-codegen branch 3 times, most recently from f0fbfb2 to 0a43cda Compare December 24, 2025 21:23
Comment on lines +242 to +245
let name := name.stripPrefix "«" |>.stripSuffix "»"
match name.splitOn "." with
| [dialect, rest] => .qid { dialect, name := rest }
| _ => .name name
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikaelMayer your PR (#293) broke my Simple.dialect.st‎. Does this fix look reasonable to you?

PR strata-org#293 added dots to valid identifier characters, which broke qualified
name resolution (e.g., 'Simple.Expr' was parsed as one identifier instead
of dialect 'Simple' + name 'Expr').

Fix: In translateQualifiedIdent, split dotted identifiers on the first
dot to extract dialect and name components. Also strip guillemets that
the parser adds around special identifiers.
@keyboardDrummer
Copy link
Contributor

As a follow-up, could you enable the Laurel grammer, which is currently in a .lean file, to be consumed by the command that this PR adds? I think it would mean moving that grammar into a .dialect.st file, with some of the surrounding code stripped, and updating references to LaurelGrammar so they load that dialect file instead of loading a lean module.

@fabiomadge
Copy link
Author

fabiomadge commented Jan 2, 2026

@keyboardDrummer Would it just take the first grammar it finds? Could you use generateDialect in the .lean instead?

@keyboardDrummer
Copy link
Contributor

Could you add a test related to source locations? Note that the original program will have sources divided across different files, and each AST node except for the root will have a start and an end location, where each location is a line and column. My impression is that this PR doesn't support the above.

| some inner => .list (syntaxCatToJavaType inner)
| none => panic! "Init.Seq/CommaSepBy requires a type argument"
| ⟨"Init", "Expr"⟩ => .simple "Expr"
| ⟨"Init", "TypeExpr"⟩ => .simple "TypeExpr"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been treating Init.TypeExpr as a implementation detail that I don't think dialects should directly use. It essentially exists for the parser, and they should use Init.Type. It essentially is a type or category expression and I think dialects should be Init.Type or Init.TypeP.

I'd like to better codify this, but the Lean generator code has a notion of forbidden categories.

In short, I think if the generator sees references to Init.TypeExpr, it should issue a warning.

| ⟨"Init", "TypeExpr"⟩ => .simple "TypeExpr"
| ⟨"Init", "Type"⟩ => .simple "Type_"
| ⟨"Init", "TypeP"⟩ => .simple "TypeP"
| ⟨_, name⟩ => .simple (escapeJavaName (toPascalCase name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the DDM would allow two dialects to have categories with the same name (e.g., A.Num and B.Num) but would need them to be qualified in the source. I think this code won't handle that case. The Lean generator has a some additional code for this here.

(cd "$STRATA_ROOT" && lake exe strata javaGen "$TESTDATA/Simple.dialect.st" com.strata.simple "$STRATA_ROOT/Tools/Java/src/main/java")

echo "=== Building and running test data generator ==="
./gradlew run -PmainClass=com.strata.test.GenerateTestData --args="$TESTDATA/comprehensive.ion" --quiet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which gradlew is this referring to? Is that checked in somewhere? This script is not run in CI right? How do we know that it runs?

fabiomadge and others added 2 commits January 7, 2026 19:30
- Fail on Init.TypeExpr (internal DDM machinery)
- Fail on type/function declarations instead of warning
- Remove warnings mechanism from GeneratedFiles
- Fix cross-dialect category collision (A.Num vs B.Num)
  - Track QualifiedIdent instead of String for refs
  - Prefix with dialect name when names collide
- Replace gradle with plain javac/java in regenerate script
- Add test for cross-dialect name collision
match cat.name with
-- Primitives map to Java types, no interface needed
| ⟨"Init", "Ident"⟩ | ⟨"Init", "Num"⟩ | ⟨"Init", "Decimal"
| ⟨"Init", "Str"⟩ | ⟨"Init", "ByteArray"⟩ | ⟨"Init", "Bool"⟩ => none
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Lean generator, I call these "declaredCategories" and define a set here.

Maybe it should be a separate PR, but I suspect this and some of the other Init category handling could be shared.

return sexp;
}
}
"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if some of these functions could be moved into a fixed file with a Java extension (e.g., for syntax highlighting)?

If it needs to ship in the Strata executable we could have a custom elaborator read it in from a file.

joehendrix
joehendrix previously approved these changes Jan 8, 2026
Copy link
Contributor

@joehendrix joehendrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple comments, but didn't see anything that would block merging.

I didn't look at the tests in as much detail as Remy, but I'd recommend making sure they run in CI in this or a future PR.

- Add Strata/DDM/Integration/Categories.lean with shared:
  - primitiveCategories (Init types mapping to primitives)
  - forbiddenCategories (internal DDM machinery)
  - abstractCategories (dialect extension points)

- Extract Java templates to .java files for syntax highlighting:
  - templates/SourceRange.java
  - templates/Node.java
  - templates/IonSerializer.java

- Update both generators to use shared categories:
  - Java Gen.lean uses all three directly
  - Lean Gen.lean uses forbiddenCategories and abstractCategories,
    with guard to verify declaredCategories matches primitiveCategories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants