AGT-79 - Publish OTA Progress and Status to MQTT#92
Conversation
Signed-off-by: nyagamunene <stevenyaga2014@gmail.com>
Fixes the gci/gofmt formatting that was failing the CI lint step. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
Hoist the duplicated OTA status-publishing closure from OTA and OTAFromData into a single publishOTAStatus method, and introduce a senmlBaseGateway constant for the "gw:" base name. This removes ~40 lines of duplication and clears the goconst lint failure (string "gw:" had 3 occurrences) that was breaking CI. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
A URL-based ota/cfg trigger now clears any pending data-path priming so a later stray message on the ota data topic cannot install against a stale hash. Also documents that the ota data topic carries no token of its own and relies on the prior authenticated ota/cfg priming plus SHA-256 verification. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
Update docs/ota.md to match the implementation: the retained status message now uses state/bytes/total/progress/error fields (was ota_state/ota_progress), and document MQTT firmware delivery via the ota data topic with ota/cfg priming, including a test recipe and topic-map entries. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
The status sample jumped from 5% to 50%, which read as an inconsistent cadence. Show consecutive 5% increments with an ellipsis so the example matches the documented "every 5%" behavior. Signed-off-by: JeffMboya <jangina.mboya@gmail.com>
| // RunFromData installs firmware from an in-memory binary payload instead of | ||
| // downloading from HTTP. sha256hex must be the hex-encoded SHA-256 of data. | ||
| // On success it never returns (the process is replaced in-place). | ||
| func RunFromData(ctx context.Context, cfg Config, data []byte, sha256hex string, progressFn ProgressFn) error { |
There was a problem hiding this comment.
RunFromData ignores ctx — context cancellation not respected during file I/O and hash verification
RunFromData accepts ctx context.Context but never reads from it. Unlike Run, which passes ctx into download (where ctx.Err() is checked on every read loop at line 279), RunFromData performs file I/O and SHA-256 verification without any context check. If the context is cancelled (e.g., user sends ota abort during an MQTT-delivered update), the write-to-disk and hash operations run to completion before the function returns.
Consider checking ctx.Err() at least before/after the file I/O section (lines 208-217) and inside verifyFile (or wrap the file read with a context-aware reader).
| } | ||
| case fieldHash: | ||
| if r.StringValue != nil { | ||
| t.SHA256Hex = *r.StringValue |
There was a problem hiding this comment.
ParseCfgFromRecords doesn't trim hash field
ParseCfgFromRecords trims url via strings.TrimSpace (line 124) but reads hash as-is on line 128. TriggerFromRecords at line 101 has the same omission, so this is consistent with existing code. Low severity since whitespace in a hash won't match — but worth noting for consistency.
| } | ||
|
|
||
| progressFn(StateVerifying, 100) | ||
| progressFn(StateVerifying, 0, 0, 100) |
There was a problem hiding this comment.
Inconsistent bytes/total between Run and RunFromData
In Run, the StateVerifying (line 162), StateReady (line 176), and StateRestarting (line 182) callbacks pass 0, 0 for bytesWritten/totalBytes. In RunFromData, the same states pass totalBytes, totalBytes (lines 220, 227, 233).
A subscriber reading the status topic after an MQTT-delivered update sees "bytes": 1324740 during verifying while an HTTP-delivered update sees "bytes": 0. This is by design per the ProgressFn doc comment ("meaningful during downloading"), but the inconsistency could confuse consumers that inspect the bytes field. Consider having Run also report the actual file size after download (a os.Stat call after download completes would give it).
| } | ||
|
|
||
| runErr := ota.RunFromData(otaCtx, otaCfg, data, sha256hex, progressFn) | ||
| if runErr != nil { |
There was a problem hiding this comment.
Double StateAborted publish
Both the OTA library functions (Run at lines 157, 167; RunFromData at lines 203, 211, 223) and the service layer (this function at lines 1400-1409) independently publish StateAborted. This means two consecutive MQTT messages are published for a single failure:
- From
RunFromData:progressFn(StateAborted, …)→ MQTT message withouterrorfield - From
service.goline 1403/1406:publishStatus(StateAborted, …, errMsg)→ MQTT message witherrorfield
Since the messages use retain=true, the second (correct) message replaces the first. But subscribers consuming in real-time will see a transient "aborted" without an error before the final one arrives. Consider removing the internal progressFn(StateAborted, …) calls from Run and RunFromData and letting the service layer be the sole publisher of the aborted state.
What does this do?
Implements OTA status publishing to MQTT as described in #79. The agent now reports download progress, verification status, and update state in real-time over MQTT, and supports receiving firmware binaries directly via MQTT as an alternative to HTTP download.
Status topic publishing (
m/{domain}/c/{ctrl_channel}/ota/status)triggered → downloading → verifying → ready → restarting(orabortedon failure)bytes,total, andprogressfieldsretain=trueso the broker always holds the last known OTA stateerrorfield in the final status message on failure or abortstate(string),bytes(By),total(By),progress(%)MQTT binary delivery (
m/{domain}/c/{ctrl_channel}/ota)ota/cfgflow: whenurlis absent buthashandsizeare present, the broker primes a pending data receiverotatopicAPI / internals
ProgressFnextended to includebytesWritten, totalBytes int64alongsideprogress float64ParseCfgFromRecords()inpkg/ota— parses cfg records without requiring a URLRunFromData()inpkg/ota— installs firmware from an in-memory byte sliceOTAFromData(ctx, data, sha256hex)method onServiceinterface (+ logging/metrics middleware)Which issue(s) does this PR fix/relate to?
Resolves #79
List any changes that modify/break current functionality
ota.ProgressFnsignature changed:func(State, float64)→func(State, int64, int64, float64). Any external code callingota.Run()with a progress callback must update the callback signature.ota_state→state,ota_progress→progress. New fieldsbytesandtotalare added.retain=true(previouslyfalse).Have you included tests for your changes?
Yes:
ota_test.gotests for the newProgressFnsignatureTestParseCfgFromRecords— covers the new URL-optional cfg parserTestRunFromData— covers MQTT binary delivery path (empty hash, hash mismatch, valid hash)TestOTAStatusPublishing— verifies correct SenML fields on the status topicTestOTAStatusPublishesErrorOnFailure— verifieserrorfield is included in the final failure statusTestOTAFromDataStatusPublishing— verifies status publishing for the MQTT data-delivery pathDid you document any new/modified functionality?
The
ota/cfgandotatopic behaviour is described in the inline code comments and follows the protocol defined in #79.Notes
syscall.Execis called on a successful OTA install (process is replaced in-place). Tests that exercise the full verify+replace path intentionally use invalid binary content sosyscall.ExecreturnsENOEXECrather than replacing the test process.