Skip to content
This repository was archived by the owner on Mar 28, 2023. It is now read-only.
This repository was archived by the owner on Mar 28, 2023. It is now read-only.

Inbound Message Scanner retries failures indefinitely #1900

@placer14

Description

@placer14
Member

After analyzing the error logs being returned from TestPerformTaskInboundMessageScanner in core/inbound_message_scanner_test.go I found the PerformTask function to never omit records which produce errors during processing, only marking them as resolved if completing the task sucessfully.

This problem also indicates that our unit test does not provide any error output for PerformTask for feedback about worker failures, which was occurring in this unit test due and being missed.

I recommend the following improvements:

  • PerformTask should return errors and make error handling a function of the caller
  • The unit test should provide a fake logger which pipes logger calls into the test (Ex: log.Error will trigger a t.Error call with the error content or log.Info will similarly trigger t.Log. Appropriate log levels triggering appropriate test reporting.)
  • Fix the InboundMessageScanner.PerformTask logic to handle errors so they resolve to a final state instead of skipping over the record for them to repeat the same path on next Perform.

Activity

hoffmabc

hoffmabc commented on Dec 6, 2019

@hoffmabc
Member

What is the direct effect of this problem?

placer14

placer14 commented on Dec 6, 2019

@placer14
MemberAuthor

Worst case: Any errored message which gets processed by this could potentially be processed more than once depending on where the message processing fails. Not all of our message handling is designed to be idempotent so the results would be undefined.

Currently, the only messages which persist with an error are pb.Message_ORDER_COMPLETION messages, but there's is nothing limiting it to just this type.

Best case: We spend CPU cycles we don't need to... which will accumulate over time.

hoffmabc

hoffmabc commented on Dec 6, 2019

@hoffmabc
Member

What would be an errored messaged in this case? How could a message be errored?

placer14

placer14 commented on Dec 6, 2019

@placer14
MemberAuthor

Currently, the only messages which persist with an error are pb.Message_ORDER_COMPLETION messages, but there's is nothing limiting it to just this type.

Errored message means an OB message which is persisted with an error. This is the only case of it so far:

err0 := service.node.Datastore.Messages().Put(
fmt.Sprintf("%s-%d", rc.BuyerOrderCompletion.OrderId, int(pb.Message_ORDER_COMPLETION)),
rc.BuyerOrderCompletion.OrderId, pb.Message_ORDER_COMPLETION, p.Pretty(), repo.Message{Msg: *pmes},
err.Error(), time.Now().UnixNano(), []byte(p))

rmisio

rmisio commented on Dec 6, 2019

@rmisio
Collaborator

Not all of our message handling is designed to be idempotent

@cpacia note for v3 - IMHO every single message without exception should be idempotent. With the fickle nature of this type of networking it's simply not possible for clients to always reliably know whether messages were received and IMHO it's just easier for clients, unless an ACK was clearly received, to be able to just re-send messages it even mildly suspect may have not been received and do so without fear of undesired results.

IMHO, it should just be an inherent part of the protocol that all messaging (offline or online) now and in the future is idempotent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @hoffmabc@placer14@rmisio

        Issue actions

          Inbound Message Scanner retries failures indefinitely · Issue #1900 · OpenBazaar/openbazaar-go