Research: Root chunk retrieval timing and streaming viability in PubSub protocol
Summary
Measurement is needed to understand the timing and reliability of chunk retrieval scenarios when streaming data via the PubSub protocol. Specifically, we need to investigate two critical scenarios:
- Pre-stored root chunk scenario: When the root chunk is already cached in the pivot node, how long does it take to fetch its data chunks that are already settled on the network?
- Concurrent upload scenario: When one node uploads a file and another node immediately receives the root chunk and starts fetching - can it experience missing chunks due to incomplete upload? What is the overall delay between upload and retrieval?
- Real-time viability: Is real-time video streaming possible if data chunks are fetched from Swarm but the root chunk comes through a different channel (e.g., PubSub direct message)?
These questions are critical for determining the feasibility of real-time streaming applications (e.g., video calls) over the Swarm PubSub protocol. The data segments should be checked with MEDIUM redundancy encoding.
Expected behavior
For real-time streaming via PubSub to work effectively:
- Data chunks should be retrievable from the Swarm as soon as they are uploaded
- There should be predictable, minimal latency between root chunk availability and full data chunk retrieval
- A subscriber receiving a root chunk should be able to fetch all referenced data chunks without encountering "chunk not found" errors due to upload in-progress
Steps to reproduce / Research methodology
Scenario 1: Pre-stored content retrieval timing
- Upload a 2 level BMT file (128*4KB along with redundancy) to swarm using
/bytes endpoint
- Wait for full replication (verify via
/pins/{address} or similar)
- From a separate Bee node, retrieve only the root chunk
- Measure time to fetch all data chunks from the swarm
- Record: total retrieval time, chunk-by-chunk latency, any failures
Scenario 2: Concurrent upload/subscribe
- Node A: Start uploading a multi-chunk file (same as in the 1st point) to
/bytes (do not wait for completion)
- Node B: Subscribe to Node A's updates via PubSub WebSocket or anyhow else get the root chunk immediately. (not through Swarm)
- Node B: Immediately upon caching the root chunk, start fetching data chunks from swarm
- Record:
- How many chunks return "not found" initially
- Retry count until all chunks are available
- Total time from root chunk receive to complete data assembly
- Whether missing chunks correlate with upload progress
- Measure end-to-end latency from sender to playable stream on receiver
Test with varying chunk sizes and network conditions
Output
- Documentation of expected latency bounds for real-time applications
Research: Root chunk retrieval timing and streaming viability in PubSub protocol
Summary
Measurement is needed to understand the timing and reliability of chunk retrieval scenarios when streaming data via the PubSub protocol. Specifically, we need to investigate two critical scenarios:
These questions are critical for determining the feasibility of real-time streaming applications (e.g., video calls) over the Swarm PubSub protocol. The data segments should be checked with MEDIUM redundancy encoding.
Expected behavior
For real-time streaming via PubSub to work effectively:
Steps to reproduce / Research methodology
Scenario 1: Pre-stored content retrieval timing
/bytesendpoint/pins/{address}or similar)Scenario 2: Concurrent upload/subscribe
/bytes(do not wait for completion)Test with varying chunk sizes and network conditions
Output