Skip to content

Conversation

yruslan
Copy link
Collaborator

@yruslan yruslan commented Sep 4, 2025

Closes #777

Summary by CodeRabbit

  • New Features

    • Support for binary numeric COBOL types (big- and little-endian) with sign, precision and scale handling; new binary encoder used for relevant numeric fields.
  • Bug Fixes

    • Append mode no longer errors when output exists; repeated writes produce separate part files.
    • Per-task unique output filenames and output-directory validation to avoid collisions.
  • Documentation

    • EBCDIC Writer docs updated with supported types list; prior append-vs-overwrite bullet removed.
  • Tests

    • Added and updated tests covering binary numeric encodings and append behavior.

Copy link

coderabbitai bot commented Sep 4, 2025

Walkthrough

Adds binary COMP numeric encoding to the writer: new BinaryEncoders, encoder selection for COMP-4/COMP-9, tests and test utilities, writer output-file naming and output-directory checks, README updates to list supported EBCDIC types, removal of an unused variable in BinaryUtils, and change of append behavior to a no-op.

Changes

Cohort / File(s) Summary
Docs: EBCDIC writer types
README.md
Updates EBCDIC Writer supported types to include numeric types (PIC S9(n) with COMP/COMP-4/COMP-5/COMP-3/COMP-9); removes the Save mode append vs overwrite bullet.
Binary encoder implementation
cobol-parser/.../encoding/BinaryEncoders.scala
New public object BinaryEncoders with encodeBinaryNumber(...) implementing fixed-size two's-complement binary encoding with endianness, sign, precision/scale handling and validation.
Encoder selection and routing
cobol-parser/.../encoding/EncoderSelector.scala
Adds getBinaryEncoder(...) and routes Integral/Decimal fields with COMP4/COMP9 to the new binary encoder; selects endianness and byte length via BinaryUtils.
Binary utility cleanup
cobol-parser/.../decoders/BinaryUtils.scala
Removed an unused local variable and trimmed an inline comment; added note about COMP-9 being a little-endian Cobrix extension. No behavior change.
Parser tests & test utils
cobol-parser/.../encoding/BCDNumberEncodersSuite.scala, cobol-parser/.../encoding/BinaryEncodersSuite.scala, cobol-parser/.../testutils/ComparisonUtils.scala
Adds BinaryEncodersSuite; introduces ComparisonUtils.assertArraysEqual; replaces custom array assertions and standardizes BigDecimal usage and test names.
Spark writer: save/append behavior
spark-cobol/.../source/DefaultSource.scala
In SaveMode.Append writer path, removed the IllegalArgumentException and made append a no-op (tasks will write separate files).
Spark writer: output naming & validation
spark-cobol/.../writer/RawBinaryOutputFormat.scala
Adds per-task unique output filenames using task/attempt IDs and job UUID fallback; implements checkOutputSpecs to validate output directory.
Spark writer tests
spark-cobol/.../writer/FixedLengthEbcdicWriterSuite.scala
Adds tests for COMP/COMP-4/BINARY/COMP-9 and COMP-3 encodings; updates append-mode test to check multiple part files; contains a duplicated test block.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User as Spark Job
  participant DS as DefaultSource
  participant OF as RawBinaryOutputFormat
  participant ES as EncoderSelector
  participant BE as BinaryEncoders
  participant FS as Filesystem

  User->>DS: save(DataFrame, SaveMode, options)
  DS->>OF: configure job (output path, writeJobUUID)
  OF->>OF: checkOutputSpecs()
  note right of OF #DDEBF7: validate output directory
  par Per task
    OF->>OF: getDefaultWorkFile(taskId, jobUUID, attemptId)
    DS->>ES: select encoder (field metadata: COMP4/COMP9, precision, scale, sign, endian)
    ES->>BE: encodeBinaryNumber(BigDecimal, isSigned, size, bigEndian, precision, scale, scaleFactor)
    BE-->>ES: Array[Byte] (encoded)
    ES-->>DS: encoded bytes
    DS->>FS: write bytes to part file
  end
  FS-->>User: per-task part files
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective Addressed Explanation
Add support for binary COMP numeric types in the writer (#777)

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Append-mode behavior changed to no-op instead of throwing (spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala) Not required by #777 (binary COMP support); modifies writer semantics beyond adding encoders.
Per-task file naming and output-dir validation (spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala) Infrastructure/IO naming change unrelated to the encoder feature requested in #777.
Duplicate test block in FixedLengthEbcdicWriterSuite (spark-cobol/.../writer/FixedLengthEbcdicWriterSuite.scala) Likely accidental duplication unrelated to adding binary COMP encoding.

Possibly related PRs

Poem

I nibble bits with whiskered care,
From COMP-4 burrows to COMP-9 lair,
Endians flip and bytes align,
Two's-complement carrots, oh so fine,
New part-files bloom beneath moonlight. 🐇


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d9e158c and 9ce6f4d.

📒 Files selected for processing (1)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Spark 3.5.5 on Scala 2.12.20
  • GitHub Check: Spark 3.5.5 on Scala 2.13.16
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/777-add-support-comp-writer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary or {PR Summary} to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@yruslan yruslan changed the title Feature/777 add support comp writer #777 Add support for writing numeric data types having binary format (comp, comp-4, comp-9) Sep 4, 2025
Copy link

github-actions bot commented Sep 4, 2025

JaCoCo code coverage report - 'cobol-parser'

Overall Project 91.65% -0.09% 🍏
Files changed 84.55% 🍏

File Coverage
BinaryUtils.scala 90.38% -0.48% 🍏
BinaryEncoders.scala 89.05% -10.95% 🍏
EncoderSelector.scala 88.38% -9.79% 🍏

Copy link

github-actions bot commented Sep 4, 2025

JaCoCo code coverage report - 'spark-cobol'

File Coverage [91.24%] 🍏
DefaultSource.scala 93.54% 🍏
RawBinaryOutputFormat.scala 86.39% 🍏
Total Project Coverage 79.8% 🍏

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (1)

46-49: Harden output path validation.

Also assert it’s a directory and accessible; fail fast with a clearer message.

   override def checkOutputSpecs(job: JobContext): Unit = {
-    val outDir = getOutputPath(job)
-    if (outDir == null) throw new IllegalStateException("Output directory not set.")
+    val outDir = getOutputPath(job)
+    if (outDir == null) throw new IllegalStateException("Output directory not set.")
+    val fs = outDir.getFileSystem(job.getConfiguration)
+    if (fs.exists(outDir) && !fs.getFileStatus(outDir).isDirectory)
+      throw new IllegalStateException(s"Output path '$outDir' is not a directory.")
   }
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (1)

101-113: Unify 1–2 digit byte-size rule across COMP variants.

IBM mapping (1–2 digits → 1 byte) applies to COMP/COMP-4/COMP-5/COMP-9. The comp == COMP9() guard is unnecessary and obscures intent.

-      case Some(comp) if comp == COMP4() || comp == COMP5() || comp == COMP9() =>     // || comp == binary2()
+      case Some(comp) if comp == COMP4() || comp == COMP5() || comp == COMP9() =>
         // if native binary follow IBM guide to digit binary length
         precision match {
-          case p if p >= 1 && p <= 2 && comp == COMP9() => 1 // byte
+          case p if p >= 1 && p <= 2 => 1 // byte
           case p if p >= minShortPrecision && p <= maxShortPrecision => binaryShortSizeBytes
           case p if p >= minIntegerPrecision && p <= maxIntegerPrecision => binaryIntSizeBytes
           case p if p >= minLongPrecision && p <= maxLongPrecision => binaryLongSizeBytes
README.md (1)

1687-1690: Clarify COMP endianness note.

Since COMP is big-endian and COMP-9 is little-endian (Cobrix extension), add a parenthetical to avoid confusion.

-  - `PIC S9(n)` numeric (integral and decimal) with `COMP`, `COMP-3`, `COMP-4`, `COMP-9` (little-endian).
+  - `PIC S9(n)` numeric (integral and decimal) with `COMP`/`COMP-4`/`COMP-5` (big-endian), `COMP-3`, and `COMP-9` (Cobrix little-endian).
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (2)

25-28: Avoid new BigDecimal(double) in tests.

Construct from string or use BigDecimal.valueOf to prevent binary-float rounding artifacts.

Example pattern:

- val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal(123.45), 5, 2, 0, signed = true, mandatorySignNibble = true)
+ val actual = BCDNumberEncoders.encodeBCDNumber(new java.math.BigDecimal("123.45"), 5, 2, 0, signed = true, mandatorySignNibble = true)

(Apply similarly across cases.)

Also applies to: 34-35, 41-42, 48-49, 55-56, 62-63, 69-70, 76-77, 83-84, 90-91, 97-98, 104-105, 111-112, 118-119, 125-126, 137-139, 145-146, 152-153, 159-160, 166-167, 173-174, 180-181, 187-188, 194-195, 201-202, 208-209


95-99: Fix typos in test names.

“nbegative” → “negative”; “prexision” → “precision”.

- "encode a number with nbegative scale" in  {
+ "encode a number with negative scale" in  {

- "attempt to encode a number with zero prexision" in  {
+ "attempt to encode a number with zero precision" in  {

Also applies to: 130-132

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (1)

249-249: Consider clarifying the null handling comment.

The comment could be more specific about which field and why.

- 0x00, 0x00               // null, because -20 cannot fix the unsigned type
+ 0x00, 0x00               // null for field F (COMP-9 unsigned): -20 cannot be encoded as unsigned
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1)

40-41: Consider adding a comment about BigInteger's byte array format.

It would be helpful to document that BigInteger.toByteArray() returns bytes in big-endian format with the sign bit.

+ // Note: BigInteger.toByteArray() returns bytes in big-endian two's complement format
  val intValue = bigInt.toByteArray
  val intValueLen = intValue.length
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 504f7cf and 85f9424.

📒 Files selected for processing (10)
  • README.md (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (3 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (3 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2 hunks)
  • spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/parse/FieldSizeSpec.scala (1)
  • scale (63-74)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (2)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)
  • ComparisonUtils (22-33)
  • assertArraysEqual (23-31)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (2)
  • BinaryEncoders (21-71)
  • encodeBinaryNumber (22-70)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (2)
  • ComparisonUtils (22-33)
  • assertArraysEqual (23-31)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (6)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (3)
  • cobol (164-208)
  • BinaryUtils (25-247)
  • getBytesCount (98-125)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/AlphaNumeric.scala (1)
  • AlphaNumeric (28-36)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Usage.scala (4)
  • COMP3 (30-32)
  • COMP3U (34-36)
  • COMP4 (38-40)
  • COMP9 (44-46)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Decimal.scala (1)
  • Decimal (32-63)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/ast/datatype/Integral.scala (1)
  • Integral (30-40)
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (2)
  • BinaryEncoders (21-71)
  • encodeBinaryNumber (22-70)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (2)
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/source/fixtures/BinaryFileFixture.scala (1)
  • withTempDirectory (71-78)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2)
  • write (71-75)
  • close (76-78)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 3.5.5 on Scala 2.13.16
  • GitHub Check: Spark 3.5.5 on Scala 2.12.20
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
🔇 Additional comments (10)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (1)

44-44: Per-job UUID fallback is fine.

Good fallback for environments that don’t set spark.sql.sources.writeJobUUID.

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/DefaultSource.scala (1)

87-87: Append semantics: confirm committer-staged writes.

With Append now a no-op, ensure RawBinaryOutputFormat writes under the committer work path (see separate comment) so speculative attempts don’t leak extra files.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (1)

19-21: Nice: shared byte-array assertion utility.

The switch to assertArraysEqual improves failure diagnostics and removes duplication.

Also applies to: 29-29, 36-36, 43-43, 50-50, 57-57, 64-64, 71-71, 78-78, 85-85, 92-92, 99-99, 106-106, 113-113, 120-120, 127-127, 140-140, 147-147, 154-154, 161-161, 168-168, 175-175, 182-182, 189-189, 196-196, 203-203, 210-210

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/testutils/ComparisonUtils.scala (1)

22-33: LGTM! Well-designed test utility for byte array comparisons.

The utility provides clear hex-formatted output for debugging test failures, making it easy to spot differences between expected and actual byte arrays.

cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1)

23-89: Test coverage looks comprehensive!

The test suite thoroughly covers positive integers, negative integers, big-endian/little-endian formats, boundary cases, and zero values.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/EncoderSelector.scala (2)

33-50: Binary encoder integration looks correct!

The pattern matching for COMP4 (big-endian) and COMP9 (little-endian) binary types follows the existing pattern and correctly routes to the new binary encoder.


100-119: getBinaryEncoder implementation is well-structured.

The function properly handles various numeric types and delegates to BinaryEncoders.encodeBinaryNumber with the correct parameters.

spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/writer/FixedLengthEbcdicWriterSuite.scala (2)

186-259: Excellent test coverage for binary COMP fields!

The test thoroughly validates COMP, COMP-4, BINARY, and COMP-9 encodings with appropriate expected byte arrays for big-endian and little-endian formats.


261-287: Append mode test properly validates multi-file output.

The test correctly validates that append mode creates multiple part files rather than overwriting.

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncoders.scala (1)

22-70: Binary encoder implementation is solid and efficient!

The implementation correctly handles:

  • Null and invalid parameter guards
  • Scale factor adjustments for decimal point positioning
  • Two's complement representation for signed numbers
  • Both big-endian and little-endian byte ordering
  • Proper padding with sign extension

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (1)

54-63: Resolved: now writing under the committer work path (no direct writes to final dir).

This addresses the earlier concern about bypassing the committer’s staging. Naming is deterministic and attempt-aware. Nice.

🧹 Nitpick comments (3)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (3)

44-44: Micro: make the fallback UUID lazy.

Avoids creating a UUID when spark.sql.sources.writeJobUUID is always present.

Apply:

-  private val uniqueUuid = UUID.randomUUID().toString
+  private lazy val uniqueUuid = UUID.randomUUID().toString

46-52: Delegate to FileOutputFormat’s default checks; prefer IOExceptions.

Preserve Hadoop/Spark semantics across versions and use IOExceptions expected by MR APIs; keep the directory-type guard.

Apply:

-  override def checkOutputSpecs(job: JobContext): Unit = {
-    val outDir = getOutputPath(job)
-    if (outDir == null) throw new IllegalStateException("Output directory not set.")
-    val fs = outDir.getFileSystem(job.getConfiguration)
-    if (fs.exists(outDir) && !fs.getFileStatus(outDir).isDirectory)
-      throw new IllegalStateException(s"Output path '$outDir' is not a directory.")
-  }
+  override def checkOutputSpecs(job: JobContext): Unit = {
+    super.checkOutputSpecs(job) // ensures out dir is set and validates committer expectations
+    val outDir = getOutputPath(job)
+    val fs = outDir.getFileSystem(job.getConfiguration)
+    if (fs.exists(outDir) && !fs.getFileStatus(outDir).isDirectory) {
+      throw new java.io.IOException(s"Output path '$outDir' is not a directory.")
+    }
+  }

56-57: Optional: prefer JobID as deterministic fallback for writeJobId.

When spark.sql.sources.writeJobUUID is absent, using the TaskAttempt’s JobID aids traceability and consistency across tasks/attempts.

Apply:

-    val writeJobId = Option(conf.get("spark.sql.sources.writeJobUUID")).getOrElse(uniqueUuid)
-    val attempt = context.getTaskAttemptID
+    val attempt = context.getTaskAttemptID
+    val writeJobId = Option(conf.get("spark.sql.sources.writeJobUUID"))
+      .orElse(Option(attempt.getJobID).map(_.toString))
+      .getOrElse(uniqueUuid)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 85f9424 and d9e158c.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala (1 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala (2 hunks)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala (1 hunks)
  • spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BCDNumberEncodersSuite.scala
  • cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/parser/decoders/BinaryUtils.scala
  • README.md
  • cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/parser/encoding/BinaryEncodersSuite.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (2.12.20, 2.12, 3.3.4, 0, 80, 20)
  • GitHub Check: Spark 3.4.4 on Scala 2.12.20
  • GitHub Check: Spark 3.5.5 on Scala 2.12.20
  • GitHub Check: Spark 2.4.8 on Scala 2.11.12
  • GitHub Check: Spark 3.5.5 on Scala 2.13.16
🔇 Additional comments (2)
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/writer/RawBinaryOutputFormat.scala (2)

23-23: LGTM: static import is appropriate.

Used by checkOutputSpecs; keeps the call concise.


26-26: LGTM: UUID import for filename entropy.

Appropriate for per-job/task uniqueness.

…empt’s JobID aids traceability and consistency across tasks/attempts.
@yruslan yruslan merged commit 387c8a8 into master Sep 5, 2025
7 checks passed
@yruslan yruslan deleted the feature/777-add-support-comp-writer branch September 5, 2025 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for binary COMP numeric types in the writer
1 participant