Skip to content

AGT-79 - Publish OTA Progress and Status to MQTT#92

Open
nyagamunene wants to merge 6 commits into
absmach:mainfrom
nyagamunene:fix/79
Open

AGT-79 - Publish OTA Progress and Status to MQTT#92
nyagamunene wants to merge 6 commits into
absmach:mainfrom
nyagamunene:fix/79

Conversation

@nyagamunene

@nyagamunene nyagamunene commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What does this do?

Implements OTA status publishing to MQTT as described in #79. The agent now reports download progress, verification status, and update state in real-time over MQTT, and supports receiving firmware binaries directly via MQTT as an alternative to HTTP download.

Status topic publishing (m/{domain}/c/{ctrl_channel}/ota/status)

  • Publishes on every state transition: triggered → downloading → verifying → ready → restarting (or aborted on failure)
  • Reports download progress every 5% with bytes, total, and progress fields
  • Uses retain=true so the broker always holds the last known OTA state
  • Includes an error field in the final status message on failure or abort
  • SenML fields match the spec: state (string), bytes (By), total (By), progress (%)

MQTT binary delivery (m/{domain}/c/{ctrl_channel}/ota)

  • New ota/cfg flow: when url is absent but hash and size are present, the broker primes a pending data receiver
  • Firmware binary arrives as a raw MQTT payload on the ota topic
  • Agent verifies the SHA-256 hash and installs — same verify/replace path as HTTP download
  • HTTP download path is unchanged

API / internals

  • ProgressFn extended to include bytesWritten, totalBytes int64 alongside progress float64
  • New ParseCfgFromRecords() in pkg/ota — parses cfg records without requiring a URL
  • New RunFromData() in pkg/ota — installs firmware from an in-memory byte slice
  • New OTAFromData(ctx, data, sha256hex) method on Service interface (+ logging/metrics middleware)

Which issue(s) does this PR fix/relate to?

Resolves #79

List any changes that modify/break current functionality

  • ota.ProgressFn signature changed: func(State, float64)func(State, int64, int64, float64). Any external code calling ota.Run() with a progress callback must update the callback signature.
  • OTA status SenML field names changed: ota_statestate, ota_progressprogress. New fields bytes and total are added.
  • OTA status messages now use retain=true (previously false).

Have you included tests for your changes?

Yes:

  • Updated all existing ota_test.go tests for the new ProgressFn signature
  • TestParseCfgFromRecords — covers the new URL-optional cfg parser
  • TestRunFromData — covers MQTT binary delivery path (empty hash, hash mismatch, valid hash)
  • TestOTAStatusPublishing — verifies correct SenML fields on the status topic
  • TestOTAStatusPublishesErrorOnFailure — verifies error field is included in the final failure status
  • TestOTAFromDataStatusPublishing — verifies status publishing for the MQTT data-delivery path

Did you document any new/modified functionality?

The ota/cfg and ota topic behaviour is described in the inline code comments and follows the protocol defined in #79.

Notes

syscall.Exec is called on a successful OTA install (process is replaced in-place). Tests that exercise the full verify+replace path intentionally use invalid binary content so syscall.Exec returns ENOEXEC rather than replacing the test process.

@JeffMboya JeffMboya marked this pull request as ready for review June 18, 2026 09:10
Signed-off-by: nyagamunene <stevenyaga2014@gmail.com>
Fixes the gci/gofmt formatting that was failing the CI lint step.

Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
Hoist the duplicated OTA status-publishing closure from OTA and OTAFromData
into a single publishOTAStatus method, and introduce a senmlBaseGateway
constant for the "gw:" base name. This removes ~40 lines of duplication and
clears the goconst lint failure (string "gw:" had 3 occurrences) that was
breaking CI.

Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
A URL-based ota/cfg trigger now clears any pending data-path priming so a
later stray message on the ota data topic cannot install against a stale
hash. Also documents that the ota data topic carries no token of its own and
relies on the prior authenticated ota/cfg priming plus SHA-256 verification.

Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
Update docs/ota.md to match the implementation: the retained status message
now uses state/bytes/total/progress/error fields (was ota_state/ota_progress),
and document MQTT firmware delivery via the ota data topic with ota/cfg
priming, including a test recipe and topic-map entries.

Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
The status sample jumped from 5% to 50%, which read as an inconsistent
cadence. Show consecutive 5% increments with an ellipsis so the example
matches the documented "every 5%" behavior.

Signed-off-by: JeffMboya <jangina.mboya@gmail.com>

@rodneyosodo rodneyosodo left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core logic (MQTT data priming, SHA-256 verification, status publishing) is sound. Tests cover the key paths. No security issues or breaking behavioural regressions.

Comment thread pkg/ota/ota.go
// RunFromData installs firmware from an in-memory binary payload instead of
// downloading from HTTP. sha256hex must be the hex-encoded SHA-256 of data.
// On success it never returns (the process is replaced in-place).
func RunFromData(ctx context.Context, cfg Config, data []byte, sha256hex string, progressFn ProgressFn) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RunFromData ignores ctx — context cancellation not respected during file I/O and hash verification

RunFromData accepts ctx context.Context but never reads from it. Unlike Run, which passes ctx into download (where ctx.Err() is checked on every read loop at line 279), RunFromData performs file I/O and SHA-256 verification without any context check. If the context is cancelled (e.g., user sends ota abort during an MQTT-delivered update), the write-to-disk and hash operations run to completion before the function returns.

Consider checking ctx.Err() at least before/after the file I/O section (lines 208-217) and inside verifyFile (or wrap the file read with a context-aware reader).

Comment thread pkg/ota/ota.go
}
case fieldHash:
if r.StringValue != nil {
t.SHA256Hex = *r.StringValue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseCfgFromRecords doesn't trim hash field

ParseCfgFromRecords trims url via strings.TrimSpace (line 124) but reads hash as-is on line 128. TriggerFromRecords at line 101 has the same omission, so this is consistent with existing code. Low severity since whitespace in a hash won't match — but worth noting for consistency.

Comment thread pkg/ota/ota.go
}

progressFn(StateVerifying, 100)
progressFn(StateVerifying, 0, 0, 100)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent bytes/total between Run and RunFromData

In Run, the StateVerifying (line 162), StateReady (line 176), and StateRestarting (line 182) callbacks pass 0, 0 for bytesWritten/totalBytes. In RunFromData, the same states pass totalBytes, totalBytes (lines 220, 227, 233).

A subscriber reading the status topic after an MQTT-delivered update sees "bytes": 1324740 during verifying while an HTTP-delivered update sees "bytes": 0. This is by design per the ProgressFn doc comment ("meaningful during downloading"), but the inconsistency could confuse consumers that inspect the bytes field. Consider having Run also report the actual file size after download (a os.Stat call after download completes would give it).

Comment thread service.go
}

runErr := ota.RunFromData(otaCtx, otaCfg, data, sha256hex, progressFn)
if runErr != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double StateAborted publish

Both the OTA library functions (Run at lines 157, 167; RunFromData at lines 203, 211, 223) and the service layer (this function at lines 1400-1409) independently publish StateAborted. This means two consecutive MQTT messages are published for a single failure:

  1. From RunFromData: progressFn(StateAborted, …) → MQTT message without error field
  2. From service.go line 1403/1406: publishStatus(StateAborted, …, errMsg) → MQTT message with error field

Since the messages use retain=true, the second (correct) message replaces the first. But subscribers consuming in real-time will see a transient "aborted" without an error before the final one arrives. Consider removing the internal progressFn(StateAborted, …) calls from Run and RunFromData and letting the service layer be the sole publisher of the aborted state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish OTA Progress and Status to MQTT

3 participants