Skip to content

Hybrid Swarm/Pubsub real-time streaming viability #1

@nugaon

Description

@nugaon

Research: Root chunk retrieval timing and streaming viability in PubSub protocol

Summary

Measurement is needed to understand the timing and reliability of chunk retrieval scenarios when streaming data via the PubSub protocol. Specifically, we need to investigate two critical scenarios:

  1. Pre-stored root chunk scenario: When the root chunk is already cached in the pivot node, how long does it take to fetch its data chunks that are already settled on the network?
  2. Concurrent upload scenario: When one node uploads a file and another node immediately receives the root chunk and starts fetching - can it experience missing chunks due to incomplete upload? What is the overall delay between upload and retrieval?
  3. Real-time viability: Is real-time video streaming possible if data chunks are fetched from Swarm but the root chunk comes through a different channel (e.g., PubSub direct message)?

These questions are critical for determining the feasibility of real-time streaming applications (e.g., video calls) over the Swarm PubSub protocol. The data segments should be checked with MEDIUM redundancy encoding.

Expected behavior

For real-time streaming via PubSub to work effectively:

  • Data chunks should be retrievable from the Swarm as soon as they are uploaded
  • There should be predictable, minimal latency between root chunk availability and full data chunk retrieval
  • A subscriber receiving a root chunk should be able to fetch all referenced data chunks without encountering "chunk not found" errors due to upload in-progress

Steps to reproduce / Research methodology

Scenario 1: Pre-stored content retrieval timing

  1. Upload a 2 level BMT file (128*4KB along with redundancy) to swarm using /bytes endpoint
  2. Wait for full replication (verify via /pins/{address} or similar)
  3. From a separate Bee node, retrieve only the root chunk
  4. Measure time to fetch all data chunks from the swarm
  5. Record: total retrieval time, chunk-by-chunk latency, any failures

Scenario 2: Concurrent upload/subscribe

  1. Node A: Start uploading a multi-chunk file (same as in the 1st point) to /bytes (do not wait for completion)
  2. Node B: Subscribe to Node A's updates via PubSub WebSocket or anyhow else get the root chunk immediately. (not through Swarm)
  3. Node B: Immediately upon caching the root chunk, start fetching data chunks from swarm
  4. Record:
    • How many chunks return "not found" initially
    • Retry count until all chunks are available
    • Total time from root chunk receive to complete data assembly
    • Whether missing chunks correlate with upload progress
    • Measure end-to-end latency from sender to playable stream on receiver

Test with varying chunk sizes and network conditions

Output

  • Documentation of expected latency bounds for real-time applications

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions