-
Notifications
You must be signed in to change notification settings - Fork 5
Serialization to seed & subsequent re-serialization to shares breaks shamir recover result #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Yeah, it's to do with padding, but the reason seems to be that this library is badly broken/confused. In codex32 the threshold/id/index At least, that's my best interpretation of what's going on. The data structures in this library are a real mess and combine strings, I apologize for the state of this library -- for a long time I have intended to replace all the error-correction logic with rust-bech32 0.12, which will have codex32 support. (It will not have interpolation logic, but that's easy/small and I will continue to implement it here.) However, I let rust-bech32 0.12 get scope-creeped into doing error correction, which is still not done. There is a tracking issue here rust-bitcoin/rust-bech32#189. Maybe I should just cut a release so that I can fix this library. |
|
Oh, I'm not actually blocked on rust-bech32 0.12. Indeed, the docs for 0.11 have codex32 support as an example. |
|
The Without these it won't round trip and worse recovers a different seed. Using default 0 padding on shares has some other problems:
For the BIP85 codex32 application (and that compact QR idea) I set the padding bits by CRC of the data to avoid this. That way they depend on 128-bits of unknown data and can't be assumed and are deterministic. A fast fix would be |
|
If it's only possible to construct |
This comment has been minimized.
This comment has been minimized.
smart, and does the trick! (I tried with your
I agree this is much better than passing padding around |
Thanks, I just rewrote it to save 100 lines and put
To standardize CRC padding we should:
My library passes padding around, it tests the alternate encodings by encoding every def test_from_seed_and_alternates():
"""Test Vector 4: encode secret share from seed"""
seed = bytes.fromhex(VECTOR_4["secret_hex"])
for pad_val in range(0b1111 + 1):
s = Codex32String.from_seed(seed, header="ms10leet", pad_val=pad_val)
assert str(s) == VECTOR_4["secret_s_alternates"][pad_val]
assert s.data == seed
# confirm all 16 encodings decode to same master dataGiven we leak a secret character if initial |
|
@BenWestgate if you have a rewrite feel free to open a PR -- if it's not too hard to review I'm happy to take it in. (Though "I saved 100 lines" makes me worry that it's a big diff.) But I think the correct direction to rewrite in is one where we add a rust-bech32 dependency, use that for all the checksumming and encoding stuff, and then here we add (a) utility methods, and (b) constructors and accessors for the id/threshold/share index. |
This comment was marked as outdated.
This comment was marked as outdated.
What about extracting bytes data from non-"s" strings? Raise InvalidShareIndex error or output useless bytes without their padding? The PR author is able to achieve an unexpected (but inevitable) result that he can't recover the original secret when derived shares are reconstructed from bytes extracted from the original derived shares. Should our libraries:
|
|
maybe I misunderstood the scope of this application - @apoelstra can you check this last comment of mine BenWestgate/python-codex32#2 (comment) |
@BenWestgate I'm definitely in this category. If round-trips (even for derived shares) can be achieved, it should be desired property for Codex32 |
The problem you encounter is you can't construct 130-bits of data from 16 bytes for all 31 share indices because any math you do to pad, even constants like all 0s or all 1s, only works for k initial (aka encoded) strings. The derived (aka interpolated) strings break that padding because it was not a 5-bit value and so is not preserved by GF(32) interpolation the way the u5 checksum or header is. |
|
I did some research and if pad bits are discarded from shares, there is no way to always recover the secret's last character without brute force. It makes no difference how the initial shares' padding is chosen. For an exact recovery, @scgbckbone should store the pad bits with his share index metadata. But since these bits can reveal up to 5-bits of the secret they should be handled like payload bytes. Since BIP-93 only defines how to encode master seeds from bytes and decode unshared secrets to bytes but not shares or strings (really, "shared strings", aka non-"0" threshold parameter strings), I suggest we define a byte decoding with 2 extra bytes for a "minimal header" and padding. String DecodingWhen the threshold parameter of a valid codex32 string is not the digit "0", we call the string a codex32 shared string. If a codex32 shared string is decoded into bytes, it MUST be decoded as follows:
Rationale
|
|
This all sounds right to me. Though I think we should discourage "decoding a share to bytes" at all. The bech32-encoding share has all the information of the share, including padding and metadata etc., and is already in a standard fixed-length ASCII encoding. I suppose users could strip the checksum and |
|
Agreed to discourage. He wants to use the share data for decoy wallets. To my surprise he said this +2 bytes decoding works for his decoy seeds. It's certainly less conspicuous than u5 or ASCII. The max stealth decoding stores no meta-data and brute forces everything against a fingerprint or address. |
|
words of "discouragement" are... kind of bummer. As I stated in related PR, I want to stay "in" standard way of doing it. guess I just drop my decoy idea and treat non-s shares as pure useless strings |
|
Do NOT despair @scgbckbone!
These are merely "RFC words" of NOT RECOMMENDED. I actually encourage you to implement these useful decoys because you understand the full implications and we have carefully weighed your use case.
I agreed to "NOT RECOMMEND" decoding codex32 shares to bytes because the decoding I proposed and need is lossy so they won't round trip back to shares, even if they do contain enough data to recover the secret data. I wrote something like
into my BIP93 length helper PR so we have an "in" standard way to do it. But hopefully other reviewers critique it to make sure this is the very best approach. For example they may decide we MUST decode the entire identifier for round trip. Or that we should not drop the "0" threshold. Or a different byte alignment. Whatever gets consensus, we can build upon for your decoy idea and my "compact CodexQR" both of which need a tiny byte decoding of shares able to recover the secret. Perhaps we should join forces and make compact CodexQRs decoys by default since you also want a QR encoding and I want decoy shares? |
I just noticed that if I import string share(s) to my app and serialize it for storage where I'm storing just seed (aka
data) and metadata (hrp, idx, threshold, id) separately. Next when I try to load storage data to Share object (viafrom_seed) and try to recover secret shares, shares before serialization provide different result compared to shares loaded from seed.My guess is that this has something to do with padding. Any idea how to fix this issue ?
I added test case proving the point: