gNOI for Optic Firmware Installation #293

ejbrever · 2025-08-04T21:19:15Z

dplore · 2025-08-05T05:08:55Z

optic_firmware_install/optic_firmware_install.proto

+  types.Path optic = 1;
+}
+
+message VerifyResponse {


How much history should the device keep? Just the most recent attempt? 'n' attempts? Should this be emphemeral storage? (not preserved between reboots)

I think just the most recent attempt in ephemeral storage is fine.

optic_firmware_install/optic_firmware_install.proto

coveralls · 2025-08-13T21:27:40Z

Pull Request Test Coverage Report for Build 17222849893

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 1151 (0.0%) changed or added relevant lines in 2 files are covered.
48 unchanged lines in 1 file lost coverage.
Overall coverage remained the same at 0.0%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
transceiver_firmware_install/transceiver_firmware_install_grpc.pb.go	0	104	0.0%
transceiver_firmware_install/transceiver_firmware_install.pb.go	0	1047	0.0%

Files with Coverage Reduction	New Missed Lines	%
bgp/bgp.pb.go	48	0.0%

Totals
Change from base Build 16924193294:	0.0%
Covered Lines:	0
Relevant Lines:	13593

💛 - Coveralls

radu-arista · 2025-08-15T18:32:09Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // - Active|Standby --> 0|1
+  // - A|B --> 0|1
+  // - 1|2 --> 0|1
+  uint32 firmware_bank = 4;


What is the purpose of this field? CMIS doesn't provide a way to choose where a binary is downloaded. The assumption is that the binary always goes into the inactive bank when one is available.

From some discussions it sounded like there could be in the future. There could even be the possibility of greater than two banks, so this would support that. However, I agree that implementations today probably don't have a choice, which is fine and this could be left empty. If filled in when not allowed, the INCOMPATIBLE error type should be returned.

Maybe we should just leave this out for now and define it in a future version of the service if CMIS ever has some mechanism to select it. It may be better to do it then rather than trying to guess the future (and potentially have to rework it).

radu-arista · 2025-08-15T18:40:05Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // The current in-progress step of the install.
+  string step = 7;
+
+  // The percentage complete of the current in-progress step.
+  uint32 step_percent_complete = 8;
+
+  // The percentage complete of the latest install process.
+  uint32 total_percent_complete = 9;


How is a "step" defined? Are the "transfer" and "install" operations described above the "steps"? Or are these referring to individual CDB commands, in which case only commands 0103h/0104h (LPL/EPL download) have a duration that is non-trivial? Or is this "step" something up for interpretation by the switch OS devs?

This is open to interpretation by the switch OS devs. The more updates probably the better, but it is just interesting information humans watch and could aid in debugging, but not something we intend to standardize on.

radu-arista · 2025-08-15T19:05:19Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // - Active|Standby --> 0|1
+  // - A|B --> 0|1
+  // - 1|2 --> 0|1
+  uint32 firmware_bank = 4;


Similar question to the other occurrence in the transfer section. The "run" CDB command (0109h) allows rebooting the transceiver firmware and provides a choice (in CMIS >= 5.0) as to which bank is rebooted, however, in the context of installing a new firmware, only rebooting with the inactive bank into active role makes sense.

The other option of rebooting the active bank into active role is not a firmware install operation, but rather a simple reset of the currently running firmware, which probably should be treated as an independent feature.

Yea there were two use cases here:

Switchover - if one wants to switch from one bank to another. We initially had a dedicated Swichover RPC, but it was very redundant to Install so it seemed better to remove it and just support that ability under Install.

Install again after failure. For example, if the install failed for some reason, but the active label changed, this could allow a retry. This is probably not a common operation, but could be useful in a one-off case.

ejbrever

Thanks for the review!

ejbrever · 2025-08-15T21:57:54Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // - Active|Standby --> 0|1
+  // - A|B --> 0|1
+  // - 1|2 --> 0|1
+  uint32 firmware_bank = 4;


From some discussions it sounded like there could be in the future. There could even be the possibility of greater than two banks, so this would support that. However, I agree that implementations today probably don't have a choice, which is fine and this could be left empty. If filled in when not allowed, the INCOMPATIBLE error type should be returned.

ejbrever · 2025-08-15T22:05:48Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // - Active|Standby --> 0|1
+  // - A|B --> 0|1
+  // - 1|2 --> 0|1
+  uint32 firmware_bank = 4;


Yea there were two use cases here:

Switchover - if one wants to switch from one bank to another. We initially had a dedicated Swichover RPC, but it was very redundant to Install so it seemed better to remove it and just support that ability under Install.

Install again after failure. For example, if the install failed for some reason, but the active label changed, this could allow a retry. This is probably not a common operation, but could be useful in a one-off case.

ejbrever · 2025-08-15T22:07:24Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // The current in-progress step of the install.
+  string step = 7;
+
+  // The percentage complete of the current in-progress step.
+  uint32 step_percent_complete = 8;
+
+  // The percentage complete of the latest install process.
+  uint32 total_percent_complete = 9;


This is open to interpretation by the switch OS devs. The more updates probably the better, but it is just interesting information humans watch and could aid in debugging, but not something we intend to standardize on.

brianneville · 2025-08-20T14:17:24Z

Can you include the compiled .pb.go files with this commit?
you'll need to update the regenerate-files.sh script to build make use of the bazel build file here.

ejbrever · 2025-08-20T23:44:20Z

Can you include the compiled .pb.go files with this commit? you'll need to update the regenerate-files.sh script to build make use of the bazel build file here.

Sure, just updated.

ejbrever · 2025-08-20T23:54:35Z

@dplore based on our discussion about comparing to #160 I think these are different enough that they warrant separate gNOIs. Some initial takeaways/differences:

Software bundle seems to be focused on software patches and therefore has terminology which might not align well with transceiver installs. For example, un-installing a software bundle doesn’t make much sense in the context of transceiver firmware. Similarly, I'm not sure concepts like correlating processes for affected_processes or processes_pending_restart makes sense for a transceiver.
There is not a way to specify the list of transceivers needing to be installed.
For transceivers, there is a concept of staging a firmware package on the Target and then in a second phase being able to install from the Target to a list of transceivers. That doesn't seem like something that could be done in the other PR.
There is a transceiver banks concept in this PR that isn't supported in the other PR.

Please let me know your thoughts. These were just from an cursory first pass. I might have misinterpreted things, but an initial look feels like they are fairly different.

LimeHat · 2025-08-23T00:27:55Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // this RPC is idempotent.
+  // If the RPC is terminated after the RPC is initiated, the install must
+  // continue in the background.
+  rpc Install(InstallRequest) returns (stream InstallResponse);


I have a few questions about the structure/behavior of this RPC.

since this is a server-streaming RPC, what's the purpose of using repeated structure in the InstallResponse?

is the expectation that in each InstallResponse message, the server must bundle updates for all InstallSingleRequest received in this RPC, or the server can group the responses arbitrarily (and each response message can contain 1..N updates)?

if it's the latter, I'd suggest just flattening the structure (each response update can be streamed independently, there's little benefit to bundling)

InstallError description says that the server must close the RPC after encountering an error.

does this mean that the entire InstallRequest is supposed to be an atomic operation (e.g. if an InstallRequest contains 10x InstallSingleRequest, any individual error should trigger the closure of the RPC)?

at the same time, the description above says that If the RPC is terminated after the RPC is initiated, the install must continue in the background., which is a little contradictory (most likely we can't continue after encountering an error, at least for the InstallSingleRequest where the error was encountered. And if we continue with the other installs, then what's the purpose of closing the stream)

The server could bundle arbitrarily if desired. Due to the scale here, there is a request from some people that the bundling is extremely helpful. There can be hundreds of optics in a host where installs are needed and having a way to bundle adds a lot of value to them.

Ah, thanks, that needs to be updated as we should not terminate the stream if one of X installs fail. I'll get that fixed. The note about continuing in the background is related to cases where the RPC may be terminated for undesired reasons (i.e. network communication problem). In that case, the in-progress installs would continue. However, ones that have already failed wouldn't have anything left to do.

is the last in progress install from the request. * Clarify the behavior when the RPC gets terminated.

jsterne · 2025-09-03T17:14:39Z

High level question: although this model is similar to what's being defined in the OIF, it has a number of places it diverges from the latest OIF proposals. I haven't done a full analysis here but here are some examples from a quick look.

Different terminology:
OC service “TransceiverFirmwareInstall” vs OIF top level title "Firmware Update"
OC RPC “Transfer” vs OIF “download”
OC "Target" vs OIF "Host"

Different primitives/commands:
OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".
OC doesn't seem to have a separate "activate" primitive. OIF does.

Note that I'm not necessarily proposing that everything in the current OIF proposal is what we should converge on. Maybe there are changes/updates on both sides to come to alignment.

The latest OIF proposal is here:
https://www.oiforum.com/bin/c5i?mid=4&rid=7&gid=0&k1=54575&k2=15&k3=11

ejbrever · 2025-09-04T00:04:53Z

@jsterne Thanks for the review. Some thoughts:

TransceiverFirmwareInstall vs Firmware Update.

This was specifically changed to reflect that upgrade is as important as downgrade. Similar to os.proto where the RPC is called Install. Upgrade|update gives a connotation of going to a higher release, but that is not always the case.

Transfer vs download

This is another case of aligning to elsewhere in OpenConfig for consistency. I'd probably also loosely argue since this is a streaming RPC, "transfer" is a better fit.

Target vs Host

This is similar to os.proto where OpenConfig seems to prefer Target and trying to keep this consistency.

OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".

Correct, but it differentiates still via a oneof.

OC doesn't seem to have a separate "activate" primitive. OIF does.

This is part of the Install RPC. Could you clarify the separate actions you are referring to?

dplore · 2025-09-04T15:48:34Z

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

ejbrever · 2025-09-04T19:06:35Z

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

That was initially discussed, but the reason we went this way was:

We wanted the device to understand that this was firmware package and store in it's own library of firmware versions. We didn't want operators to have to know where to transfer files to.
OIF is proposing an embedded YAML file within the file that would contain interesting metadata. Having a separate RPC gives the device context to look for this YAML file so it can parse and present this data (i.e. version name, vendor, supported SKUs, etc.)

jsterne · 2025-09-04T19:18:17Z

Some replies inline.

@jsterne Thanks for the review. Some thoughts:

TransceiverFirmwareInstall vs Firmware Update.

This was specifically changed to reflect that upgrade is as important as downgrade. Similar to os.proto where the RPC is called Install. Upgrade|update gives a connotation of going to a higher release, but that is not always the case.

Here we're talking about the overall Service name. How about TransceiverFirmwareManagement since there are a number of different functions in this service (including the transfer, the activate, aborting, checking firmware versions, etc)?

Transfer vs download

This is another case of aligning to elsewhere in OpenConfig for consistency. I'd probably also loosely argue since this is a streaming RPC, "transfer" is a better fit.

I haven't studied other transfer type RPCs in gRPC before but is this streaming because you intend to send a chunk of data, then get an ack, then send more, get an ack, etc?

Transfer seems OK as a term. So maybe we see if the OIF is happy to go back to that term (they had it before). At least for the host->module transfer.
(I'd like to avoid re-creating yet another server->host transfer RPC when we already have something like that elsewhere for moving files around).

Target vs Host

This is similar to os.proto where OpenConfig seems to prefer Target and trying to keep this consistency.

In that case there isn't really ambiguity about the "destination" of the binary file. The target is always the router/NE. But here it may be confusing where a host can be a target and so can a module. Target implies the destination of the transfer (not intuitive for it to be the source of the transfer). i.e. "The source of the transfer is the Target" (ughh).

OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".

Correct, but it differentiates still via a oneof.

OC doesn't seem to have a separate "activate" primitive. OIF does.

This is part of the Install RPC. Could you clarify the separate actions you are referring to?
The whole description of the Install RPC is fairly different from the OIF "activate" command(s). The OC model has an option to install from device (device vs target vs host?). The OIF approach seems to be to transfer then switch banks. And the OIF model has a way to not necessarily commit the newly downloaded/active bank (that can be done as a separate step if desired, e.g. after the operator checks that things are working).

jsterne · 2025-09-04T19:23:04Z

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

That was initially discussed, but the reason we went this way was:

We wanted the device to understand that this was firmware package and store in it's own library of firmware versions. We didn't want operators to have to know where to transfer files to.

This concept of a magic hard-coded store (with pre-defined file system organization then I suppose?) isn't really in the OIF approach from what I can tell. Why not just let different devices and users decide where in a filesystem they may want to place these files? Also note that the latest OIF "download" command says "Note: the path must be included and result in a unique file". It seems a shame to reinvent file copy and would be nice for that part to be generic. Then other "firmware" commands can be special and know how to parse those files, transfer them to modules, etc.

OIF is proposing an embedded YAML file within the file that would contain interesting metadata. Having a separate RPC gives the device context to look for this YAML file so it can parse and present this data (i.e. version name, vendor, supported SKUs, etc.)

Sure but that parsing doesn't have to be part of the transfer of a file from a server to the router.

jsterne · 2025-09-04T19:43:59Z

Is there a way to abort a transfer host->module? That would be useful. The OIF defined an 'abort' command (although rather that be called abort-transfer to be specific which activity we're aborting.

ejbrever · 2025-09-07T22:36:32Z

Some replies inline.

@jsterne Thanks for the review. Some thoughts:
TransceiverFirmwareInstall vs Firmware Update.

This was specifically changed to reflect that upgrade is as important as downgrade. Similar to os.proto where the RPC is called Install. Upgrade|update gives a connotation of going to a higher release, but that is not always the case.

Here we're talking about the overall Service name. How about TransceiverFirmwareManagement since there are a number of different functions in this service (including the transfer, the activate, aborting, checking firmware versions, etc)?

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

Transfer vs download

This is another case of aligning to elsewhere in OpenConfig for consistency. I'd probably also loosely argue since this is a streaming RPC, "transfer" is a better fit.

I haven't studied other transfer type RPCs in gRPC before but is this streaming because you intend to send a chunk of data, then get an ack, then send more, get an ack, etc?

Transfer seems OK as a term. So maybe we see if the OIF is happy to go back to that term (they had it before). At least for the host->module transfer. (I'd like to avoid re-creating yet another server->host transfer RPC when we already have something like that elsewhere for moving files around).

Correct in that we'd stream bytes over the wire here. The preference in OpenConfig has been moving towards this and avoiding the many other permutations to transfer (i.e. FTP, SFTP, HTTP, SCP, etc.). AIUI this has been for security reasons (but also potentially performance).

I think it's fine if OIF wants to align terminology, but I really don't think that's a goal of OpenConfig to have identical naming formats with different standards bodies.

Target vs Host

This is similar to os.proto where OpenConfig seems to prefer Target and trying to keep this consistency.

In that case there isn't really ambiguity about the "destination" of the binary file. The target is always the router/NE. But here it may be confusing where a host can be a target and so can a module. Target implies the destination of the transfer (not intuitive for it to be the source of the transfer). i.e. "The source of the transfer is the Target" (ughh).

OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".

Correct, but it differentiates still via a oneof.

OC doesn't seem to have a separate "activate" primitive. OIF does.

This is part of the Install RPC. Could you clarify the separate actions you are referring to?
The whole description of the Install RPC is fairly different from the OIF "activate" command(s). The OC model has an option to install from device (device vs target vs host?). The OIF approach seems to be to transfer then switch banks. And the OIF model has a way to not necessarily commit the newly downloaded/active bank (that can be done as a separate step if desired, e.g. after the operator checks that things are working).

ejbrever · 2025-09-07T22:48:07Z

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

That was initially discussed, but the reason we went this way was:

We wanted the device to understand that this was firmware package and store in it's own library of firmware versions. We didn't want operators to have to know where to transfer files to.

This concept of a magic hard-coded store (with pre-defined file system organization then I suppose?) isn't really in the OIF approach from what I can tell. Why not just let different devices and users decide where in a filesystem they may want to place these files? Also note that the latest OIF "download" command says "Note: the path must be included and result in a unique file". It seems a shame to reinvent file copy and would be nice for that part to be generic. Then other "firmware" commands can be special and know how to parse those files, transfer them to modules, etc.

There are a few reasons to this. As operators we don't want to know or care about the file management structure for every platform or if it changes in different software branches. We just want to provide the firmware package and let the device put it in the right place. Then, not in this first phase here, but we had a lot of discussion on the device having a readable library of available firmware packages. With that, the device could give us an inventory or available firmware packages, their versions, which SKU's they work with, etc. With something like that we can get to where we eventually hope to be; specifically being able to define default firmware packages for transceivers. So when a new transceiver is plugged in, it could automatically be determined what firmware package to install. This would put us in a plug-and-play approach that we'd like to eventually get to. We are not pushing all of this initially, but that's the vision we are hoping to move towards.

OIF is proposing an embedded YAML file within the file that would contain interesting metadata. Having a separate RPC gives the device context to look for this YAML file so it can parse and present this data (i.e. version name, vendor, supported SKUs, etc.)

Sure but that parsing doesn't have to be part of the transfer of a file from a server to the router.

The preference is for this to just to be a single RPC. We'd expect under the hood vendors could re-use shared libraries for transferring files. However, standing up a new gRPC is trivial and I'd argue having this context specific RPC to do some extra context specific things around the file transfer would be valuable from a simplicity point of view for operators.

jsterne · 2025-09-11T22:09:35Z

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

dplore · 2025-09-11T22:32:12Z

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

jsterne · 2025-09-12T16:45:11Z

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

jsterne · 2025-09-12T17:18:46Z

About 'transfer' vs 'download' for moving the file from the router->transceiver: At the moment:
A) gnoi proposal = 'transfer'
B) OIF proposal = 'download'
C) CMIS spec = 'download'

I think the latest thinking is for OIF to change back to using 'transfer'. That's OK with me. It is nice that (A) and (B) use the same name.

transceiver_firmware_install/transceiver_firmware_install.proto

jsterne · 2025-09-12T17:53:07Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  // Absolute path to the /components/component list member in the OpenConfig
+  // model corresponding to the transceiver to perform the install on.
+  // (e.g., /components/component[name=transceiver-1/2])
+  types.Path transceiver = 1;


OIF proposes this as a list of transceivers (TC). Those could be the same TC type, or totally different TCs (a package can contain binaries for multiple different TCs from multiple different vendors). It would be good to support this concept here in the model (even if maybe some implementations might limit to one TC initially). Maybe the service could say that it is mandatory to support 1 TC in the list, and optional to support multiple.

jsterne · 2025-09-12T18:14:24Z

I'm uncertain how to map the Install RPC to underlying CMIS primitives. CMIS has the following:

Download (move binary into the bank of the transceiver (TC))
Run Firmware Image (i.e. switch to the inactive bank where the new firmware was downloaded)
Commit (mark the running bank as committed so this bank is used at any subsequent restart)

The Install RPC seems to have some hints of transfer aspects sprinkled around. e.g.:

the entire InstallFromDeviceRequest
TransferProgress transfer_progress = 3; as part of the response to Install

The OIF proposal has an 'activate' command that is clearly separated from any transfer/download.

OIF also combines both CMIS Run and Commit (where commit is an optional automatic step of the activate). But I think we should also have a separate Commit primitive in both OIF and OpenConfig. An operator may want to Run the new firmware on a transceiver and verify that everything is working well before doing a Commit.

I'd propose we have an 'Activate' RPC (instead of the Install RPC) that takes the following arguments:

list of transceivers
an 'action' that is one of these options (that match the actions available in the Run command in CMIS 5.4):
- Switch to Inactive Bank
- SwitchHitless to Inactive Bank
- Restart Active Bank
- RestartHitless Active Bank
- Factory Reset to Inactive Bank
- Factory Reset to Active Bank
a no-commit option (otherwise commit the newly active bank by default)

* Switched to "Activate" instead of "Install" for the RPC and wording thoughout as this better explains what is being done whereas install might have had some confusion because if is more easily mixed with transferring the package. * Clarified the behavior in the top header which didn't describe the activate well. * Added top level definition for Device and Transceiver. This also replaced "Target" with "Device" throughout as Target was getting confused with the transceiver itself.

ejbrever · 2025-09-18T05:53:52Z

I'm uncertain how to map the Install RPC to underlying CMIS primitives. CMIS has the following:

Download (move binary into the bank of the transceiver (TC))

Run Firmware Image (i.e. switch to the inactive bank where the new firmware was downloaded)

Commit (mark the running bank as committed so this bank is used at any subsequent restart)

The Install RPC seems to have some hints of transfer aspects sprinkled around. e.g.:

the entire InstallFromDeviceRequest

TransferProgress transfer_progress = 3; as part of the response to Install

The OIF proposal has an 'activate' command that is clearly separated from any transfer/download.

OIF also combines both CMIS Run and Commit (where commit is an optional automatic step of the activate). But I think we should also have a separate Commit primitive in both OIF and OpenConfig. An operator may want to Run the new firmware on a transceiver and verify that everything is working well before doing a Commit.

I'd propose we have an 'Activate' RPC (instead of the Install RPC) that takes the following arguments:

list of transceivers

an 'action' that is one of these options (that match the actions available in the Run command in CMIS 5.4):

Switch to Inactive Bank

SwitchHitless to Inactive Bank

Restart Active Bank

RestartHitless Active Bank

Factory Reset to Inactive Bank

Factory Reset to Active Bank

a no-commit option (otherwise commit the newly active bank by default)

Made a few changes and had a few comments:

I changed "Install" to "Activate" as I agree that seems to have better separation from other verbs.
List of transceivers should already be supported at the layer above as the Activate request takes a repeated list of ActivateSingleRequest.
We've discussed separating commit and want Activate to automatically commit. So this is probably not a primitive OpenConfig would support. Once a version is running on the transceiver that needs to be the committed version. If someone changes their mind that is just another Activation. However, leaving a device is a quasi-state is worrisome because we don't know the behavior and it's possible for the step to be forgotten. For example, if a transceiver's package is running and not-committed, then forgotten about, if the device has a power outage, what version comes back up? Often it can be the previous version which means something like a power outage on a device takes much much longer to recover since firmware upgrades are now needed.
We originally called out Switchovers, but it was determined that it was redundant to an Activation, so we took it out for now.
I'd be interested to learn about the factory reset options. Could the transceiver hold its factory image in a permanent place (not one of its banks)?

ejbrever · 2025-09-18T05:54:48Z

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@dplore curious on your thoughts of using TransceiverFirmware? Please let us know.

dplore · 2025-09-18T17:33:09Z

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@dplore curious on your thoughts of using TransceiverFirmware? Please let us know.

I do not object to that naming. Here's a question for you that might help finalize this name: Are there other transceiver operations that are on the horizon? Any RPC to force a transceiver restart, factory reset, etc? If yes, the structure could be:

gnoi.Transceiver.FirmwareTransfer, FirmwareActivate, FirmwareVerify, Restart, FactoryReset, etc

If that seems very unlikely or there is a reason not to have so many RPC's in this Service, then TransceiverFirmware as the service name seems good enough.

jsterne · 2025-10-02T19:20:47Z

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@dplore curious on your thoughts of using TransceiverFirmware? Please let us know.

I do not object to that naming. Here's a question for you that might help finalize this name: Are there other transceiver operations that are on the horizon? Any RPC to force a transceiver restart, factory reset, etc? If yes, the structure could be:

gnoi.Transceiver.FirmwareTransfer, FirmwareActivate, FirmwareVerify, Restart, FactoryReset, etc

If that seems very unlikely or there is a reason not to have so many RPC's in this Service, then TransceiverFirmware as the service name seems good enough.

I can imagine other non-firmware operations in the future. If we want that all to be part of a single service (and we probably would) because otherwise what's the name of the other transceiver service that doesn't include firmware?), then perhaps Daren's proposal here is good.

jsterne · 2025-10-02T19:26:28Z

transceiver_firmware_install/transceiver_firmware_install.proto

+}
+
+message ActivateRequest {
+  repeated ActivateSingleRequest activate_requests = 1;


I'd propose we remove this construction of a repeated ActivateSingleRequest inside the ActivateRequest.

What we should have instead IMO is that the activate supports multiple transceivers. i.e. Do the same action X (e.g. switch to inactive bank) on a list of transceivers. That is what is in the latest OIF proposal.

jsterne · 2025-10-02T19:27:02Z

transceiver_firmware_install/transceiver_firmware_install.proto

+  types.Path transceiver = 1;
+
+  oneof activate_from {
+    ActivateFromDeviceRequest activate_from_device = 2;


We should remove ActivateFromDeviceRequest. The Activate command should not include any 'transfer' activity.

jsterne · 2025-10-02T19:32:52Z

3. We've discussed separating commit and want Activate to automatically commit. So this is probably not a primitive OpenConfig would support. Once a version is running on the transceiver that needs to be the committed version. If someone changes their mind that is just another Activation. However, leaving a device is a quasi-state is worrisome because we don't know the behavior and it's possible for the step to be forgotten. For example, if a transceiver's package is running and not-committed, then forgotten about, if the device has a power outage, what version comes back up? Often it can be the previous version which means something like a power outage on a device takes much much longer to recover since firmware upgrades are now needed.

I think that's fine to have Activate automatically do a commit by default. But we should have the option to no-commit (as per the OIF proposal). Some operators will want to Activate (i.e. switch the inactive bank that was just updated using the transfer command), but then do some external testing before they use the CMIS "Commit Image" command to mark that new image as the one to use at next reboot. If they are in the middle of testing and there is a power outage, then they would indeed want to reboot using the previous/old image (that was still the committed image/bank). The inactive bank would still contain the new image and they could Activate it again and do their validation. Once that is done, they would issue a "Commit Image" command (which we should add to this service).

jsterne · 2025-10-02T19:39:14Z

We originally called out Switchovers, but it was determined that it was redundant to an Activation, so we took it out for now.

The CMIS command for "activation" is called "Run" and it has 6 options (the last 2 will be new in CMIS 5.4):

Switch to Inactive Bank
SwitchHitless to Inactive Bank
Restart Active Bank
RestartHitless Active Bank
Factory Reset to Inactive Bank
Factory Reset to Active Bank

We need to decide if our new service has:

an RPC with the same scope as the CMIS "Run" command (i.e. could eventually support all 6 of those options), or
an RPC with only a subset of the CMIS Run command (i.e. only the first 2 items).

This is related to the earlier discussion here: #293 (comment)

If we want to eventually have RPCs that cover all the CMIS "Run" options do we want that all in a single RPC with the 6 options, or do we want a set of RPCs, each of which does a subset of the CMIS "Run" command, e.g.:
RPC SwitchToInactiveBank (with some parameter about the Hitless vs non-hitless, and a no-commit option)
RPC RestartActiveBank
RPC FactoryReset

ejbrever added 2 commits August 4, 2025 14:11

Initial proposal.

1a7bf91

Fix lint errs.

550fc6e

dplore reviewed Aug 6, 2025

View reviewed changes

Updated from optic to transceiver to match OC terminology.

1be17d9

Add visibility

0296bf8

radu-arista reviewed Aug 15, 2025

View reviewed changes

ejbrever commented Aug 15, 2025

View reviewed changes

Add generated proto files and update regeneration script.

78ad00e

LimeHat suggested changes Aug 23, 2025

View reviewed changes

* Clarify the behavior for InstallError to only close the RPC if it

4183a95

is the last in progress install from the request. * Clarify the behavior when the RPC gets terminated.

jsterne mentioned this pull request Sep 11, 2025

Component upgrade state. openconfig/public#1242

Open

jsterne reviewed Sep 12, 2025

View reviewed changes

transceiver_firmware_install/transceiver_firmware_install.proto Outdated Show resolved Hide resolved

jsterne reviewed Sep 12, 2025

View reviewed changes

transceiver_firmware_install/transceiver_firmware_install.proto Outdated Show resolved Hide resolved

jsterne reviewed Sep 12, 2025

View reviewed changes

ejbrever added 3 commits September 17, 2025 22:36

Change package_name to file_name which is a little more generic.

a6126c6

In the example, used the OIF naming example for better alignment.

c19eec5

jsterne reviewed Oct 2, 2025

View reviewed changes

gNOI for Optic Firmware Installation #293

Are you sure you want to change the base?

gNOI for Optic Firmware Installation #293

Conversation

ejbrever commented Aug 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 17222849893

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ejbrever left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianneville commented Aug 20, 2025

Uh oh!

ejbrever commented Aug 20, 2025

Uh oh!

ejbrever commented Aug 20, 2025

Uh oh!

LimeHat Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jsterne commented Sep 3, 2025

Uh oh!

ejbrever commented Sep 4, 2025

Uh oh!

dplore commented Sep 4, 2025

Uh oh!

ejbrever commented Sep 4, 2025

Uh oh!

jsterne commented Sep 4, 2025

Uh oh!

jsterne commented Sep 4, 2025

Uh oh!

jsterne commented Sep 4, 2025

Uh oh!

ejbrever commented Sep 7, 2025

Uh oh!

ejbrever commented Sep 7, 2025

Uh oh!

jsterne commented Sep 11, 2025

Uh oh!

dplore commented Sep 11, 2025

Uh oh!

jsterne commented Sep 12, 2025

Uh oh!

jsterne commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Aug 13, 2025 •

edited

Loading

LimeHat Aug 23, 2025 •

edited

Loading

jsterne commented Sep 12, 2025 •

edited

Loading

jsterne commented Sep 12, 2025 •

edited

Loading