#777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) #779

yruslan · 2025-09-04T09:04:21Z

Closes #777

Summary by CodeRabbit

New Features
- Support for binary numeric COBOL types (big- and little-endian) with sign, precision and scale handling; new binary encoder used for relevant numeric fields.
Bug Fixes
- Append mode no longer errors when output exists; repeated writes produce separate part files.
- Per-task unique output filenames and output-directory validation to avoid collisions.
Documentation
- EBCDIC Writer docs updated with supported types list; prior append-vs-overwrite bullet removed.
Tests
- Added and updated tests covering binary numeric encodings and append behavior.

…g convention.

coderabbitai · 2025-09-04T09:04:28Z

Walkthrough

Adds binary COMP numeric encoding to the writer: new BinaryEncoders, encoder selection for COMP-4/COMP-9, tests and test utilities, writer output-file naming and output-directory checks, README updates to list supported EBCDIC types, removal of an unused variable in BinaryUtils, and change of append behavior to a no-op.

Changes

Cohort / File(s)	Summary
Docs: EBCDIC writer types `README.md`	Updates EBCDIC Writer supported types to include numeric types (PIC S9(n) with COMP/COMP-4/COMP-5/COMP-3/COMP-9); removes the Save mode append vs overwrite bullet.
Binary encoder implementation `cobol-parser/.../encoding/BinaryEncoders.scala`	New public object `BinaryEncoders` with `encodeBinaryNumber(...)` implementing fixed-size two's-complement binary encoding with endianness, sign, precision/scale handling and validation.
Encoder selection and routing `cobol-parser/.../encoding/EncoderSelector.scala`	Adds `getBinaryEncoder(...)` and routes Integral/Decimal fields with COMP4/COMP9 to the new binary encoder; selects endianness and byte length via `BinaryUtils`.
Binary utility cleanup `cobol-parser/.../decoders/BinaryUtils.scala`	Removed an unused local variable and trimmed an inline comment; added note about COMP-9 being a little-endian Cobrix extension. No behavior change.
Parser tests & test utils `cobol-parser/.../encoding/BCDNumberEncodersSuite.scala`, `cobol-parser/.../encoding/BinaryEncodersSuite.scala`, `cobol-parser/.../testutils/ComparisonUtils.scala`	Adds `BinaryEncodersSuite`; introduces `ComparisonUtils.assertArraysEqual`; replaces custom array assertions and standardizes BigDecimal usage and test names.
Spark writer: save/append behavior `spark-cobol/.../source/DefaultSource.scala`	In SaveMode.Append writer path, removed the IllegalArgumentException and made append a no-op (tasks will write separate files).
Spark writer: output naming & validation `spark-cobol/.../writer/RawBinaryOutputFormat.scala`	Adds per-task unique output filenames using task/attempt IDs and job UUID fallback; implements `checkOutputSpecs` to validate output directory.
Spark writer tests `spark-cobol/.../writer/FixedLengthEbcdicWriterSuite.scala`	Adds tests for COMP/COMP-4/BINARY/COMP-9 and COMP-3 encodings; updates append-mode test to check multiple part files; contains a duplicated test block.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as Spark Job
  participant DS as DefaultSource
  participant OF as RawBinaryOutputFormat
  participant ES as EncoderSelector
  participant BE as BinaryEncoders
  participant FS as Filesystem

  User->>DS: save(DataFrame, SaveMode, options)
  DS->>OF: configure job (output path, writeJobUUID)
  OF->>OF: checkOutputSpecs()
  note right of OF #DDEBF7: validate output directory
  par Per task
    OF->>OF: getDefaultWorkFile(taskId, jobUUID, attemptId)
    DS->>ES: select encoder (field metadata: COMP4/COMP9, precision, scale, sign, endian)
    ES->>BE: encodeBinaryNumber(BigDecimal, isSigned, size, bigEndian, precision, scale, scaleFactor)
    BE-->>ES: Array[Byte] (encoded)
    ES-->>DS: encoded bytes
    DS->>FS: write bytes to part file
  end
  FS-->>User: per-task part files

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Add support for binary COMP numeric types in the writer (#777)	✅

Assessment against linked issues: Out-of-scope changes

Code Change	Explanation
Append-mode behavior changed to no-op instead of throwing (`spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala`)	Not required by #777 (binary COMP support); modifies writer semantics beyond adding encoders.
Per-task file naming and output-dir validation (`spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala`)	Infrastructure/IO naming change unrelated to the encoder feature requested in #777.
Duplicate test block in `FixedLengthEbcdicWriterSuite` (`spark-cobol/.../writer/FixedLengthEbcdicWriterSuite.scala`)	Likely accidental duplication unrelated to adding binary COMP encoding.

Possibly related PRs

#769 Add EBCDIC processor as a library routine #771 — Related encoder-selection and encoder wiring changes overlapping with EncoderSelector modifications.
#415 Implement the basic version of EBCDIC writer #775 — Related writer pipeline and output-format adjustments (per-task filenames, append behavior).
#776 Add COMP-3 decoders for the EBCDIC writer #778 — Related numeric/packed-decimal encoder additions and tests (BCD/COMP-3 work).

Poem

I nibble bits with whiskered care,
From COMP-4 burrows to COMP-9 lair,
Endians flip and bytes align,
Two's-complement carrots, oh so fine,
New part-files bloom beneath moonlight. 🐇

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d9e158c and 9ce6f4d.

📒 Files selected for processing (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: Spark 3.5.5 on Scala 2.13.16
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.4.4 on Scala 2.12.20

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/777-add-support-comp-writer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary or {PR Summary} to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2025-09-04T09:06:47Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	91.65% `-0.09%`	🍏
Files changed	84.55%	🍏

File	Coverage
BinaryUtils.scala	90.38% `-0.48%`	🍏
BinaryEncoders.scala	89.05% `-10.95%`	🍏
EncoderSelector.scala	88.38% `-9.79%`	🍏

github-actions · 2025-09-04T09:06:49Z

JaCoCo code coverage report - 'spark-cobol'

File	Coverage [91.24%]	🍏
DefaultSource.scala	93.54%	🍏
RawBinaryOutputFormat.scala	86.39%	🍏

Total Project Coverage	79.8%	🍏

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (7)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (1)
46-49: Harden output path validation.

Also assert it’s a directory and accessible; fail fast with a clearer message.
   override def checkOutputSpecs(job: JobContext): Unit = {
-    val outDir = getOutputPath(job)
-    if (outDir == null) throw new IllegalStateException("Output directory not set.")
+    val outDir = getOutputPath(job)
+    if (outDir == null) throw new IllegalStateException("Output directory not set.")
+    val fs = outDir.getFileSystem(job.getConfiguration)
+    if (fs.exists(outDir) && !fs.getFileStatus(outDir).isDirectory)
+      throw new IllegalStateException(s"Output path '$outDir' is not a directory.")
   }
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (1)
101-113: Unify 1–2 digit byte-size rule across COMP variants.

IBM mapping (1–2 digits → 1 byte) applies to COMP/COMP-4/COMP-5/COMP-9. The comp == COMP9() guard is unnecessary and obscures intent.
-      case Some(comp) if comp == COMP4() || comp == COMP5() || comp == COMP9() =>     // || comp == binary2()
+      case Some(comp) if comp == COMP4() || comp == COMP5() || comp == COMP9() =>
         // if native binary follow IBM guide to digit binary length
         precision match {
-          case p if p >= 1 && p <= 2 && comp == COMP9() => 1 // byte
+          case p if p >= 1 && p <= 2 => 1 // byte
           case p if p >= minShortPrecision && p <= maxShortPrecision => binaryShortSizeBytes
           case p if p >= minIntegerPrecision && p <= maxIntegerPrecision => binaryIntSizeBytes
           case p if p >= minLongPrecision && p <= maxLongPrecision => binaryLongSizeBytes
README.md (1)
1687-1690: Clarify COMP endianness note.

Since COMP is big-endian and COMP-9 is little-endian (Cobrix extension), add a parenthetical to avoid confusion.
-  - `PIC S9(n)` numeric (integral and decimal) with `COMP`, `COMP-3`, `COMP-4`, `COMP-9` (little-endian).
+  - `PIC S9(n)` numeric (integral and decimal) with `COMP`/`COMP-4`/`COMP-5` (big-endian), `COMP-3`, and `COMP-9` (Cobrix little-endian).
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (2)
25-28: Avoid new BigDecimal(double) in tests.

Construct from string or use BigDecimal.valueOf to prevent binary-float rounding artifacts.

Example pattern:
- val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal(123.45), 5, 2, 0, signed = true, mandatorySignNibble = true)
+ val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal("123.45"), 5, 2, 0, signed = true, mandatorySignNibble = true)
(Apply similarly across cases.)

Also applies to: 34-35, 41-42, 48-49, 55-56, 62-63, 69-70, 76-77, 83-84, 90-91, 97-98, 104-105, 111-112, 118-119, 125-126, 137-139, 145-146, 152-153, 159-160, 166-167, 173-174, 180-181, 187-188, 194-195, 201-202, 208-209

95-99: Fix typos in test names.

“nbegative” → “negative”; “prexision” → “precision”.
- "encode a number with nbegative scale" in  {
+ "encode a number with negative scale" in  {

- "attempt to encode a number with zero prexision" in  {
+ "attempt to encode a number with zero precision" in  {
Also applies to: 130-132
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (1)
249-249: Consider clarifying the null handling comment.

The comment could be more specific about which field and why.
- 0x00, 0x00               // null, because -20 cannot fix the unsigned type
+ 0x00, 0x00               // null for field F (COMP-9 unsigned): -20 cannot be encoded as unsigned
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1)
40-41: Consider adding a comment about BigInteger's byte array format.

It would be helpful to document that BigInteger.toByteArray() returns bytes in big-endian format with the sign bit.
+ // Note: BigInteger.toByteArray() returns bytes in big-endian two's complement format
  val intValue = bigInt.toByteArray
  val intValueLen = intValue.length

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 504f7cf and 85f9424.

📒 Files selected for processing (10)

README.md (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (3 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (3 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2 hunks)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (5)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/FieldSizeSpec.scala (1)

scale (63-74)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (2)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)

ComparisonUtils (22-33)

assertArraysEqual (23-31)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (2)

BinaryEncoders (21-71)

encodeBinaryNumber (22-70)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)

ComparisonUtils (22-33)

assertArraysEqual (23-31)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (6)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (3)

cobol (164-208)

BinaryUtils (25-247)

getBytesCount (98-125)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/AlphaNumeric.scala (1)

AlphaNumeric (28-36)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Usage.scala (4)

COMP3 (30-32)

COMP3U (34-36)

COMP4 (38-40)

COMP9 (44-46)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Decimal.scala (1)

Decimal (32-63)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Integral.scala (1)

Integral (30-40)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (2)

BinaryEncoders (21-71)

encodeBinaryNumber (22-70)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (2)

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/BinaryFileFixture.scala (1)

withTempDirectory (71-78)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2)

write (71-75)

close (76-78)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.5 on Scala 2.13.16
GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
GitHub Check: Spark 2.4.8 on Scala 2.11.12

🔇 Additional comments (10)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (1)

44-44: Per-job UUID fallback is fine.

Good fallback for environments that don’t set spark.sql.sources.writeJobUUID.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1)

87-87: Append semantics: confirm committer-staged writes.

With Append now a no-op, ensure RawBinaryOutputFormat writes under the committer work path (see separate comment) so speculative attempts don’t leak extra files.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)

19-21: Nice: shared byte-array assertion utility.

The switch to assertArraysEqual improves failure diagnostics and removes duplication.

Also applies to: 29-29, 36-36, 43-43, 50-50, 57-57, 64-64, 71-71, 78-78, 85-85, 92-92, 99-99, 106-106, 113-113, 120-120, 127-127, 140-140, 147-147, 154-154, 161-161, 168-168, 175-175, 182-182, 189-189, 196-196, 203-203, 210-210

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1)

22-33: LGTM! Well-designed test utility for byte array comparisons.

The utility provides clear hex-formatted output for debugging test failures, making it easy to spot differences between expected and actual byte arrays.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1)

23-89: Test coverage looks comprehensive!

The test suite thoroughly covers positive integers, negative integers, big-endian/little-endian formats, boundary cases, and zero values.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (2)

33-50: Binary encoder integration looks correct!

The pattern matching for COMP4 (big-endian) and COMP9 (little-endian) binary types follows the existing pattern and correctly routes to the new binary encoder.

100-119: getBinaryEncoder implementation is well-structured.

The function properly handles various numeric types and delegates to BinaryEncoders.encodeBinaryNumber with the correct parameters.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (2)

186-259: Excellent test coverage for binary COMP fields!

The test thoroughly validates COMP, COMP-4, BINARY, and COMP-9 encodings with appropriate expected byte arrays for big-endian and little-endian formats.

261-287: Append mode test properly validates multi-file output.

The test correctly validates that append mode creates multiple part files rather than overwriting.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1)

22-70: Binary encoder implementation is solid and efficient!

The implementation correctly handles:

Null and invalid parameter guards

Scale factor adjustments for decimal point positioning

Two's complement representation for signed numbers

Both big-endian and little-endian byte ordering

Proper padding with sign extension

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala

…ther PR fixes (Thanks @coderabbitai).

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (1)

54-63: Resolved: now writing under the committer work path (no direct writes to final dir).

This addresses the earlier concern about bypassing the committer’s staging. Naming is deterministic and attempt-aware. Nice.

🧹 Nitpick comments (3)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (3)

44-44: Micro: make the fallback UUID lazy.

Avoids creating a UUID when spark.sql.sources.writeJobUUID is always present.

Apply:

-  private val uniqueUuid = UUID.randomUUID().toString
+  private lazy val uniqueUuid = UUID.randomUUID().toString

46-52: Delegate to FileOutputFormat’s default checks; prefer IOExceptions.

Preserve Hadoop/Spark semantics across versions and use IOExceptions expected by MR APIs; keep the directory-type guard.

Apply:

-  override def checkOutputSpecs(job: JobContext): Unit = {
-    val outDir = getOutputPath(job)
-    if (outDir == null) throw new IllegalStateException("Output directory not set.")
-    val fs = outDir.getFileSystem(job.getConfiguration)
-    if (fs.exists(outDir) && !fs.getFileStatus(outDir).isDirectory)
-      throw new IllegalStateException(s"Output path '$outDir' is not a directory.")
-  }
+  override def checkOutputSpecs(job: JobContext): Unit = {
+    super.checkOutputSpecs(job) // ensures out dir is set and validates committer expectations
+    val outDir = getOutputPath(job)
+    val fs = outDir.getFileSystem(job.getConfiguration)
+    if (fs.exists(outDir) && !fs.getFileStatus(outDir).isDirectory) {
+      throw new java.io.IOException(s"Output path '$outDir' is not a directory.")
+    }
+  }

56-57: Optional: prefer JobID as deterministic fallback for writeJobId.

When spark.sql.sources.writeJobUUID is absent, using the TaskAttempt’s JobID aids traceability and consistency across tasks/attempts.

Apply:

-    val writeJobId = Option(conf.get("spark.sql.sources.writeJobUUID")).getOrElse(uniqueUuid)
-    val attempt = context.getTaskAttemptID
+    val attempt = context.getTaskAttemptID
+    val writeJobId = Option(conf.get("spark.sql.sources.writeJobUUID"))
+      .orElse(Option(attempt.getJobID).map(_.toString))
+      .getOrElse(uniqueUuid)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 85f9424 and d9e158c.

📒 Files selected for processing (5)

README.md (1 hunks)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (1 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (2 hunks)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1 hunks)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala
README.md
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
GitHub Check: Spark 3.4.4 on Scala 2.12.20
GitHub Check: Spark 3.5.5 on Scala 2.12.20
GitHub Check: Spark 2.4.8 on Scala 2.11.12
GitHub Check: Spark 3.5.5 on Scala 2.13.16

🔇 Additional comments (2)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2)

23-23: LGTM: static import is appropriate.

Used by checkOutputSpecs; keeps the call concise.

26-26: LGTM: UUID import for filename entropy.

Appropriate for per-job/task uniqueness.

…empt’s JobID aids traceability and consistency across tasks/attempts.

yruslan added 3 commits September 3, 2025 09:11

#777 Add COMP binary encoders implementation.

9d70cad

#777 Add support for appending and for the standard output file namin…

05390ad

…g convention.

#777 Implement the usage of binary encoders for the spark-cobol writer.

85f9424

yruslan changed the title ~~Feature/777 add support comp writer~~ #777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) Sep 4, 2025

coderabbitai bot reviewed Sep 4, 2025

View reviewed changes

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala Outdated Show resolved Hide resolved

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala Outdated Show resolved Hide resolved

#777 Fix spark-cobol writer working with file-based committers, and o…

d9e158c

…ther PR fixes (Thanks @coderabbitai).

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

#777 When spark.sql.sources.writeJobUUID is absent, using the TaskAtt…

9ce6f4d

…empt’s JobID aids traceability and consistency across tasks/attempts.

yruslan merged commit 387c8a8 into master Sep 5, 2025
7 checks passed

yruslan deleted the feature/777-add-support-comp-writer branch September 5, 2025 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

#777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) #779

#777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) #779

Uh oh!

yruslan commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 4, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

github-actions bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

#777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) #779

#777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) #779

Uh oh!

Conversation

yruslan commented Sep 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

github-actions bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions bot commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yruslan commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 4, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

github-actions bot commented Sep 4, 2025 •

edited

Loading

github-actions bot commented Sep 4, 2025 •

edited

Loading