Skip to content

Conversation

@ejbrever
Copy link
Contributor

@ejbrever ejbrever commented Aug 4, 2025

types.Path optic = 1;
}

message VerifyResponse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much history should the device keep? Just the most recent attempt? 'n' attempts? Should this be emphemeral storage? (not preserved between reboots)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just the most recent attempt in ephemeral storage is fine.

@coveralls
Copy link

coveralls commented Aug 13, 2025

Pull Request Test Coverage Report for Build 17222849893

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 1151 (0.0%) changed or added relevant lines in 2 files are covered.
  • 48 unchanged lines in 1 file lost coverage.
  • Overall coverage remained the same at 0.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
transceiver_firmware_install/transceiver_firmware_install_grpc.pb.go 0 104 0.0%
transceiver_firmware_install/transceiver_firmware_install.pb.go 0 1047 0.0%
Files with Coverage Reduction New Missed Lines %
bgp/bgp.pb.go 48 0.0%
Totals Coverage Status
Change from base Build 16924193294: 0.0%
Covered Lines: 0
Relevant Lines: 13593

💛 - Coveralls

// - Active|Standby --> 0|1
// - A|B --> 0|1
// - 1|2 --> 0|1
uint32 firmware_bank = 4;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this field? CMIS doesn't provide a way to choose where a binary is downloaded. The assumption is that the binary always goes into the inactive bank when one is available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some discussions it sounded like there could be in the future. There could even be the possibility of greater than two banks, so this would support that. However, I agree that implementations today probably don't have a choice, which is fine and this could be left empty. If filled in when not allowed, the INCOMPATIBLE error type should be returned.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just leave this out for now and define it in a future version of the service if CMIS ever has some mechanism to select it. It may be better to do it then rather than trying to guess the future (and potentially have to rework it).

Comment on lines 387 to 394
// The current in-progress step of the install.
string step = 7;

// The percentage complete of the current in-progress step.
uint32 step_percent_complete = 8;

// The percentage complete of the latest install process.
uint32 total_percent_complete = 9;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a "step" defined? Are the "transfer" and "install" operations described above the "steps"? Or are these referring to individual CDB commands, in which case only commands 0103h/0104h (LPL/EPL download) have a duration that is non-trivial? Or is this "step" something up for interpretation by the switch OS devs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is open to interpretation by the switch OS devs. The more updates probably the better, but it is just interesting information humans watch and could aid in debugging, but not something we intend to standardize on.

// - Active|Standby --> 0|1
// - A|B --> 0|1
// - 1|2 --> 0|1
uint32 firmware_bank = 4;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to the other occurrence in the transfer section. The "run" CDB command (0109h) allows rebooting the transceiver firmware and provides a choice (in CMIS >= 5.0) as to which bank is rebooted, however, in the context of installing a new firmware, only rebooting with the inactive bank into active role makes sense.

The other option of rebooting the active bank into active role is not a firmware install operation, but rather a simple reset of the currently running firmware, which probably should be treated as an independent feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea there were two use cases here:

  1. Switchover - if one wants to switch from one bank to another. We initially had a dedicated Swichover RPC, but it was very redundant to Install so it seemed better to remove it and just support that ability under Install.
  2. Install again after failure. For example, if the install failed for some reason, but the active label changed, this could allow a retry. This is probably not a common operation, but could be useful in a one-off case.

Copy link
Contributor Author

@ejbrever ejbrever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

// - Active|Standby --> 0|1
// - A|B --> 0|1
// - 1|2 --> 0|1
uint32 firmware_bank = 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some discussions it sounded like there could be in the future. There could even be the possibility of greater than two banks, so this would support that. However, I agree that implementations today probably don't have a choice, which is fine and this could be left empty. If filled in when not allowed, the INCOMPATIBLE error type should be returned.

// - Active|Standby --> 0|1
// - A|B --> 0|1
// - 1|2 --> 0|1
uint32 firmware_bank = 4;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea there were two use cases here:

  1. Switchover - if one wants to switch from one bank to another. We initially had a dedicated Swichover RPC, but it was very redundant to Install so it seemed better to remove it and just support that ability under Install.
  2. Install again after failure. For example, if the install failed for some reason, but the active label changed, this could allow a retry. This is probably not a common operation, but could be useful in a one-off case.

Comment on lines 387 to 394
// The current in-progress step of the install.
string step = 7;

// The percentage complete of the current in-progress step.
uint32 step_percent_complete = 8;

// The percentage complete of the latest install process.
uint32 total_percent_complete = 9;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is open to interpretation by the switch OS devs. The more updates probably the better, but it is just interesting information humans watch and could aid in debugging, but not something we intend to standardize on.

@brianneville
Copy link
Contributor

Can you include the compiled .pb.go files with this commit?
you'll need to update the regenerate-files.sh script to build make use of the bazel build file here.

@ejbrever
Copy link
Contributor Author

Can you include the compiled .pb.go files with this commit? you'll need to update the regenerate-files.sh script to build make use of the bazel build file here.

Sure, just updated.

@ejbrever
Copy link
Contributor Author

@dplore based on our discussion about comparing to #160 I think these are different enough that they warrant separate gNOIs. Some initial takeaways/differences:

  • Software bundle seems to be focused on software patches and therefore has terminology which might not align well with transceiver installs. For example, un-installing a software bundle doesn’t make much sense in the context of transceiver firmware. Similarly, I'm not sure concepts like correlating processes for affected_processes or processes_pending_restart makes sense for a transceiver.
  • There is not a way to specify the list of transceivers needing to be installed.
  • For transceivers, there is a concept of staging a firmware package on the Target and then in a second phase being able to install from the Target to a list of transceivers. That doesn't seem like something that could be done in the other PR.
  • There is a transceiver banks concept in this PR that isn't supported in the other PR.

Please let me know your thoughts. These were just from an cursory first pass. I might have misinterpreted things, but an initial look feels like they are fairly different.

// this RPC is idempotent.
// If the RPC is terminated after the RPC is initiated, the install must
// continue in the background.
rpc Install(InstallRequest) returns (stream InstallResponse);
Copy link

@LimeHat LimeHat Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions about the structure/behavior of this RPC.

  1. since this is a server-streaming RPC, what's the purpose of using repeated structure in the InstallResponse?
  • is the expectation that in each InstallResponse message, the server must bundle updates for all InstallSingleRequest received in this RPC, or the server can group the responses arbitrarily (and each response message can contain 1..N updates)?
  • if it's the latter, I'd suggest just flattening the structure (each response update can be streamed independently, there's little benefit to bundling)
  1. InstallError description says that the server must close the RPC after encountering an error.
  • does this mean that the entire InstallRequest is supposed to be an atomic operation (e.g. if an InstallRequest contains 10x InstallSingleRequest, any individual error should trigger the closure of the RPC)?
  • at the same time, the description above says that If the RPC is terminated after the RPC is initiated, the install must continue in the background., which is a little contradictory (most likely we can't continue after encountering an error, at least for the InstallSingleRequest where the error was encountered. And if we continue with the other installs, then what's the purpose of closing the stream)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The server could bundle arbitrarily if desired. Due to the scale here, there is a request from some people that the bundling is extremely helpful. There can be hundreds of optics in a host where installs are needed and having a way to bundle adds a lot of value to them.

  2. Ah, thanks, that needs to be updated as we should not terminate the stream if one of X installs fail. I'll get that fixed. The note about continuing in the background is related to cases where the RPC may be terminated for undesired reasons (i.e. network communication problem). In that case, the in-progress installs would continue. However, ones that have already failed wouldn't have anything left to do.

is the last in progress install from the request.
* Clarify the behavior when the RPC gets terminated.
@jsterne
Copy link

jsterne commented Sep 3, 2025

High level question: although this model is similar to what's being defined in the OIF, it has a number of places it diverges from the latest OIF proposals. I haven't done a full analysis here but here are some examples from a quick look.

Different terminology:
OC service “TransceiverFirmwareInstall” vs OIF top level title "Firmware Update"
OC RPC “Transfer” vs OIF “download”
OC "Target" vs OIF "Host"

Different primitives/commands:
OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".
OC doesn't seem to have a separate "activate" primitive. OIF does.

Note that I'm not necessarily proposing that everything in the current OIF proposal is what we should converge on. Maybe there are changes/updates on both sides to come to alignment.

The latest OIF proposal is here:
https://www.oiforum.com/bin/c5i?mid=4&rid=7&gid=0&k1=54575&k2=15&k3=11

@ejbrever
Copy link
Contributor Author

ejbrever commented Sep 4, 2025

@jsterne Thanks for the review. Some thoughts:

TransceiverFirmwareInstall vs Firmware Update.

  • This was specifically changed to reflect that upgrade is as important as downgrade. Similar to os.proto where the RPC is called Install. Upgrade|update gives a connotation of going to a higher release, but that is not always the case.

Transfer vs download

  • This is another case of aligning to elsewhere in OpenConfig for consistency. I'd probably also loosely argue since this is a streaming RPC, "transfer" is a better fit.

Target vs Host

  • This is similar to os.proto where OpenConfig seems to prefer Target and trying to keep this consistency.

OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".

  • Correct, but it differentiates still via a oneof.

OC doesn't seem to have a separate "activate" primitive. OIF does.

  • This is part of the Install RPC. Could you clarify the separate actions you are referring to?

@dplore
Copy link
Member

dplore commented Sep 4, 2025

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

@ejbrever
Copy link
Contributor Author

ejbrever commented Sep 4, 2025

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

That was initially discussed, but the reason we went this way was:

  1. We wanted the device to understand that this was firmware package and store in it's own library of firmware versions. We didn't want operators to have to know where to transfer files to.
  2. OIF is proposing an embedded YAML file within the file that would contain interesting metadata. Having a separate RPC gives the device context to look for this YAML file so it can parse and present this data (i.e. version name, vendor, supported SKUs, etc.)

@jsterne
Copy link

jsterne commented Sep 4, 2025

Some replies inline.

@jsterne Thanks for the review. Some thoughts:

TransceiverFirmwareInstall vs Firmware Update.

  • This was specifically changed to reflect that upgrade is as important as downgrade. Similar to os.proto where the RPC is called Install. Upgrade|update gives a connotation of going to a higher release, but that is not always the case.

Here we're talking about the overall Service name. How about TransceiverFirmwareManagement since there are a number of different functions in this service (including the transfer, the activate, aborting, checking firmware versions, etc)?

Transfer vs download

  • This is another case of aligning to elsewhere in OpenConfig for consistency. I'd probably also loosely argue since this is a streaming RPC, "transfer" is a better fit.

I haven't studied other transfer type RPCs in gRPC before but is this streaming because you intend to send a chunk of data, then get an ack, then send more, get an ack, etc?

Transfer seems OK as a term. So maybe we see if the OIF is happy to go back to that term (they had it before). At least for the host->module transfer.
(I'd like to avoid re-creating yet another server->host transfer RPC when we already have something like that elsewhere for moving files around).

Target vs Host

  • This is similar to os.proto where OpenConfig seems to prefer Target and trying to keep this consistency.

In that case there isn't really ambiguity about the "destination" of the binary file. The target is always the router/NE. But here it may be confusing where a host can be a target and so can a module. Target implies the destination of the transfer (not intuitive for it to be the source of the transfer). i.e. "The source of the transfer is the Target" (ughh).

OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".

  • Correct, but it differentiates still via a oneof.

OC doesn't seem to have a separate "activate" primitive. OIF does.

  • This is part of the Install RPC. Could you clarify the separate actions you are referring to?
    The whole description of the Install RPC is fairly different from the OIF "activate" command(s). The OC model has an option to install from device (device vs target vs host?). The OIF approach seems to be to transfer then switch banks. And the OIF model has a way to not necessarily commit the newly downloaded/active bank (that can be done as a separate step if desired, e.g. after the operator checks that things are working).

@jsterne
Copy link

jsterne commented Sep 4, 2025

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

That was initially discussed, but the reason we went this way was:

  1. We wanted the device to understand that this was firmware package and store in it's own library of firmware versions. We didn't want operators to have to know where to transfer files to.

This concept of a magic hard-coded store (with pre-defined file system organization then I suppose?) isn't really in the OIF approach from what I can tell. Why not just let different devices and users decide where in a filesystem they may want to place these files? Also note that the latest OIF "download" command says "Note: the path must be included and result in a unique file". It seems a shame to reinvent file copy and would be nice for that part to be generic. Then other "firmware" commands can be special and know how to parse those files, transfer them to modules, etc.

  1. OIF is proposing an embedded YAML file within the file that would contain interesting metadata. Having a separate RPC gives the device context to look for this YAML file so it can parse and present this data (i.e. version name, vendor, supported SKUs, etc.)

Sure but that parsing doesn't have to be part of the transfer of a file from a server to the router.

@jsterne
Copy link

jsterne commented Sep 4, 2025

Is there a way to abort a transfer host->module? That would be useful. The OIF defined an 'abort' command (although rather that be called abort-transfer to be specific which activity we're aborting.

@ejbrever
Copy link
Contributor Author

ejbrever commented Sep 7, 2025

Some replies inline.

@jsterne Thanks for the review. Some thoughts:
TransceiverFirmwareInstall vs Firmware Update.

  • This was specifically changed to reflect that upgrade is as important as downgrade. Similar to os.proto where the RPC is called Install. Upgrade|update gives a connotation of going to a higher release, but that is not always the case.

Here we're talking about the overall Service name. How about TransceiverFirmwareManagement since there are a number of different functions in this service (including the transfer, the activate, aborting, checking firmware versions, etc)?

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

Transfer vs download

  • This is another case of aligning to elsewhere in OpenConfig for consistency. I'd probably also loosely argue since this is a streaming RPC, "transfer" is a better fit.

I haven't studied other transfer type RPCs in gRPC before but is this streaming because you intend to send a chunk of data, then get an ack, then send more, get an ack, etc?

Transfer seems OK as a term. So maybe we see if the OIF is happy to go back to that term (they had it before). At least for the host->module transfer. (I'd like to avoid re-creating yet another server->host transfer RPC when we already have something like that elsewhere for moving files around).

Correct in that we'd stream bytes over the wire here. The preference in OpenConfig has been moving towards this and avoiding the many other permutations to transfer (i.e. FTP, SFTP, HTTP, SCP, etc.). AIUI this has been for security reasons (but also potentially performance).

I think it's fine if OIF wants to align terminology, but I really don't think that's a goal of OpenConfig to have identical naming formats with different standards bodies.

Target vs Host

  • This is similar to os.proto where OpenConfig seems to prefer Target and trying to keep this consistency.

In that case there isn't really ambiguity about the "destination" of the binary file. The target is always the router/NE. But here it may be confusing where a host can be a target and so can a module. Target implies the destination of the transfer (not intuitive for it to be the source of the transfer). i.e. "The source of the transfer is the Target" (ughh).

OC combines transfers of remote->host and host->transceiver into a single primitive. The OIF has "transfer" vs "download".

  • Correct, but it differentiates still via a oneof.

OC doesn't seem to have a separate "activate" primitive. OIF does.

  • This is part of the Install RPC. Could you clarify the separate actions you are referring to?
    The whole description of the Install RPC is fairly different from the OIF "activate" command(s). The OC model has an option to install from device (device vs target vs host?). The OIF approach seems to be to transfer then switch banks. And the OIF model has a way to not necessarily commit the newly downloaded/active bank (that can be done as a separate step if desired, e.g. after the operator checks that things are working).

@ejbrever
Copy link
Contributor Author

ejbrever commented Sep 7, 2025

Reviewed in September 4th, 2025 OC Community meeting to increase visibility of this PR. One comment from a few participants is can we look to not duplicate file transfer RPC's? Maybe existing gNOI file transfer can be used and this PR could add RPCs to perform the transceiver firmware installations.

That was initially discussed, but the reason we went this way was:

  1. We wanted the device to understand that this was firmware package and store in it's own library of firmware versions. We didn't want operators to have to know where to transfer files to.

This concept of a magic hard-coded store (with pre-defined file system organization then I suppose?) isn't really in the OIF approach from what I can tell. Why not just let different devices and users decide where in a filesystem they may want to place these files? Also note that the latest OIF "download" command says "Note: the path must be included and result in a unique file". It seems a shame to reinvent file copy and would be nice for that part to be generic. Then other "firmware" commands can be special and know how to parse those files, transfer them to modules, etc.

There are a few reasons to this. As operators we don't want to know or care about the file management structure for every platform or if it changes in different software branches. We just want to provide the firmware package and let the device put it in the right place. Then, not in this first phase here, but we had a lot of discussion on the device having a readable library of available firmware packages. With that, the device could give us an inventory or available firmware packages, their versions, which SKU's they work with, etc. With something like that we can get to where we eventually hope to be; specifically being able to define default firmware packages for transceivers. So when a new transceiver is plugged in, it could automatically be determined what firmware package to install. This would put us in a plug-and-play approach that we'd like to eventually get to. We are not pushing all of this initially, but that's the vision we are hoping to move towards.

  1. OIF is proposing an embedded YAML file within the file that would contain interesting metadata. Having a separate RPC gives the device context to look for this YAML file so it can parse and present this data (i.e. version name, vendor, supported SKUs, etc.)

Sure but that parsing doesn't have to be part of the transfer of a file from a server to the router.

The preference is for this to just to be a single RPC. We'd expect under the hood vendors could re-use shared libraries for transferring files. However, standing up a new gRPC is trivial and I'd argue having this context specific RPC to do some extra context specific things around the file transfer would be valuable from a simplicity point of view for operators.

@jsterne
Copy link

jsterne commented Sep 11, 2025

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

@dplore
Copy link
Member

dplore commented Sep 11, 2025

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

@jsterne
Copy link

jsterne commented Sep 12, 2025

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@jsterne
Copy link

jsterne commented Sep 12, 2025

About 'transfer' vs 'download' for moving the file from the router->transceiver: At the moment:
A) gnoi proposal = 'transfer'
B) OIF proposal = 'download'
C) CMIS spec = 'download'

I think the latest thinking is for OIF to change back to using 'transfer'. That's OK with me. It is nice that (A) and (B) use the same name.

// Absolute path to the /components/component list member in the OpenConfig
// model corresponding to the transceiver to perform the install on.
// (e.g., /components/component[name=transceiver-1/2])
types.Path transceiver = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OIF proposes this as a list of transceivers (TC). Those could be the same TC type, or totally different TCs (a package can contain binaries for multiple different TCs from multiple different vendors). It would be good to support this concept here in the model (even if maybe some implementations might limit to one TC initially). Maybe the service could say that it is mandatory to support 1 TC in the list, and optional to support multiple.

@jsterne
Copy link

jsterne commented Sep 12, 2025

I'm uncertain how to map the Install RPC to underlying CMIS primitives. CMIS has the following:

  1. Download (move binary into the bank of the transceiver (TC))
  2. Run Firmware Image (i.e. switch to the inactive bank where the new firmware was downloaded)
  3. Commit (mark the running bank as committed so this bank is used at any subsequent restart)

The Install RPC seems to have some hints of transfer aspects sprinkled around. e.g.:

  • the entire InstallFromDeviceRequest
  • TransferProgress transfer_progress = 3; as part of the response to Install

The OIF proposal has an 'activate' command that is clearly separated from any transfer/download.

OIF also combines both CMIS Run and Commit (where commit is an optional automatic step of the activate). But I think we should also have a separate Commit primitive in both OIF and OpenConfig. An operator may want to Run the new firmware on a transceiver and verify that everything is working well before doing a Commit.

I'd propose we have an 'Activate' RPC (instead of the Install RPC) that takes the following arguments:

  • list of transceivers
  • an 'action' that is one of these options (that match the actions available in the Run command in CMIS 5.4):
    • Switch to Inactive Bank
    • SwitchHitless to Inactive Bank
    • Restart Active Bank
    • RestartHitless Active Bank
    • Factory Reset to Inactive Bank
    • Factory Reset to Active Bank
  • a no-commit option (otherwise commit the newly active bank by default)

* Switched to "Activate" instead of "Install" for the RPC and wording
thoughout as this better explains what is being done whereas install
might have had some confusion because if is more easily mixed with
transferring the package.
* Clarified the behavior in the top header which didn't describe the
activate well.
* Added top level definition for Device and Transceiver. This also
replaced "Target" with "Device" throughout as Target was getting
confused with the transceiver itself.
@ejbrever
Copy link
Contributor Author

I'm uncertain how to map the Install RPC to underlying CMIS primitives. CMIS has the following:

  1. Download (move binary into the bank of the transceiver (TC))
  2. Run Firmware Image (i.e. switch to the inactive bank where the new firmware was downloaded)
  3. Commit (mark the running bank as committed so this bank is used at any subsequent restart)

The Install RPC seems to have some hints of transfer aspects sprinkled around. e.g.:

  • the entire InstallFromDeviceRequest
  • TransferProgress transfer_progress = 3; as part of the response to Install

The OIF proposal has an 'activate' command that is clearly separated from any transfer/download.

OIF also combines both CMIS Run and Commit (where commit is an optional automatic step of the activate). But I think we should also have a separate Commit primitive in both OIF and OpenConfig. An operator may want to Run the new firmware on a transceiver and verify that everything is working well before doing a Commit.

I'd propose we have an 'Activate' RPC (instead of the Install RPC) that takes the following arguments:

  • list of transceivers

  • an 'action' that is one of these options (that match the actions available in the Run command in CMIS 5.4):

    • Switch to Inactive Bank
    • SwitchHitless to Inactive Bank
    • Restart Active Bank
    • RestartHitless Active Bank
    • Factory Reset to Inactive Bank
    • Factory Reset to Active Bank
  • a no-commit option (otherwise commit the newly active bank by default)

Made a few changes and had a few comments:

  1. I changed "Install" to "Activate" as I agree that seems to have better separation from other verbs.
  2. List of transceivers should already be supported at the layer above as the Activate request takes a repeated list of ActivateSingleRequest.
  3. We've discussed separating commit and want Activate to automatically commit. So this is probably not a primitive OpenConfig would support. Once a version is running on the transceiver that needs to be the committed version. If someone changes their mind that is just another Activation. However, leaving a device is a quasi-state is worrisome because we don't know the behavior and it's possible for the step to be forgotten. For example, if a transceiver's package is running and not-committed, then forgotten about, if the device has a power outage, what version comes back up? Often it can be the previous version which means something like a power outage on a device takes much much longer to recover since firmware upgrades are now needed.
  4. We originally called out Switchovers, but it was determined that it was redundant to an Activation, so we took it out for now.
  5. I'd be interested to learn about the factory reset options. Could the transceiver hold its factory image in a permanent place (not one of its banks)?

@ejbrever
Copy link
Contributor Author

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@dplore curious on your thoughts of using TransceiverFirmware? Please let us know.

@dplore
Copy link
Member

dplore commented Sep 18, 2025

I am good with "TransceiverFirmwareManagement". We've already changed the name a couple times, so I'll give this a few days unless anyone has objections to this new overall name. Although, any thoughts on just "TransceiverFirmware" for simplicity?

I'd somewhat prefer TransceiverFirmwareManagement. I think that will just match how this would be referred to in other descriptions and in the OIF. I'm also ok with "TransceiverFirmware" but it feels a bit truncated (not sure if other services have similar truncated style names?).

gNOI stle for service naming is migrating to a style with a single noun for the entity being "operated" on. This makes some sense in that gNOI is already telling us we are performing "operations" on something. We just need to know what the thing is we are "operating". So transceiverz or transceiver could be good.

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@dplore curious on your thoughts of using TransceiverFirmware? Please let us know.

I do not object to that naming. Here's a question for you that might help finalize this name: Are there other transceiver operations that are on the horizon? Any RPC to force a transceiver restart, factory reset, etc? If yes, the structure could be:

gnoi.Transceiver.FirmwareTransfer, FirmwareActivate, FirmwareVerify, Restart, FactoryReset, etc

If that seems very unlikely or there is a reason not to have so many RPC's in this Service, then TransceiverFirmware as the service name seems good enough.

@jsterne
Copy link

jsterne commented Oct 2, 2025

I see what you mean. So the "Management" part isn't needed. But I think it would be good to scope this service to firmware. So how about TransceiverFirmware that Eric suggested as a shorter option?

@dplore curious on your thoughts of using TransceiverFirmware? Please let us know.

I do not object to that naming. Here's a question for you that might help finalize this name: Are there other transceiver operations that are on the horizon? Any RPC to force a transceiver restart, factory reset, etc? If yes, the structure could be:

gnoi.Transceiver.FirmwareTransfer, FirmwareActivate, FirmwareVerify, Restart, FactoryReset, etc

If that seems very unlikely or there is a reason not to have so many RPC's in this Service, then TransceiverFirmware as the service name seems good enough.

I can imagine other non-firmware operations in the future. If we want that all to be part of a single service (and we probably would) because otherwise what's the name of the other transceiver service that doesn't include firmware?), then perhaps Daren's proposal here is good.

}

message ActivateRequest {
repeated ActivateSingleRequest activate_requests = 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose we remove this construction of a repeated ActivateSingleRequest inside the ActivateRequest.

What we should have instead IMO is that the activate supports multiple transceivers. i.e. Do the same action X (e.g. switch to inactive bank) on a list of transceivers. That is what is in the latest OIF proposal.

types.Path transceiver = 1;

oneof activate_from {
ActivateFromDeviceRequest activate_from_device = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove ActivateFromDeviceRequest. The Activate command should not include any 'transfer' activity.

@jsterne
Copy link

jsterne commented Oct 2, 2025

3. We've discussed separating commit and want Activate to automatically commit. So this is probably not a primitive OpenConfig would support. Once a version is running on the transceiver that needs to be the committed version. If someone changes their mind that is just another Activation. However, leaving a device is a quasi-state is worrisome because we don't know the behavior and it's possible for the step to be forgotten. For example, if a transceiver's package is running and not-committed, then forgotten about, if the device has a power outage, what version comes back up? Often it can be the previous version which means something like a power outage on a device takes much much longer to recover since firmware upgrades are now needed.

I think that's fine to have Activate automatically do a commit by default. But we should have the option to no-commit (as per the OIF proposal). Some operators will want to Activate (i.e. switch the inactive bank that was just updated using the transfer command), but then do some external testing before they use the CMIS "Commit Image" command to mark that new image as the one to use at next reboot. If they are in the middle of testing and there is a power outage, then they would indeed want to reboot using the previous/old image (that was still the committed image/bank). The inactive bank would still contain the new image and they could Activate it again and do their validation. Once that is done, they would issue a "Commit Image" command (which we should add to this service).

@jsterne
Copy link

jsterne commented Oct 2, 2025

  • We originally called out Switchovers, but it was determined that it was redundant to an Activation, so we took it out for now.

The CMIS command for "activation" is called "Run" and it has 6 options (the last 2 will be new in CMIS 5.4):

  • Switch to Inactive Bank
  • SwitchHitless to Inactive Bank
  • Restart Active Bank
  • RestartHitless Active Bank
  • Factory Reset to Inactive Bank
  • Factory Reset to Active Bank

We need to decide if our new service has:

  1. an RPC with the same scope as the CMIS "Run" command (i.e. could eventually support all 6 of those options), or
  2. an RPC with only a subset of the CMIS Run command (i.e. only the first 2 items).

This is related to the earlier discussion here: #293 (comment)

If we want to eventually have RPCs that cover all the CMIS "Run" options do we want that all in a single RPC with the 6 options, or do we want a set of RPCs, each of which does a subset of the CMIS "Run" command, e.g.:
RPC SwitchToInactiveBank (with some parameter about the Hitless vs non-hitless, and a no-commit option)
RPC RestartActiveBank
RPC FactoryReset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants