Skip to content

Conversation

@scgbckbone
Copy link

@scgbckbone scgbckbone commented Dec 5, 2025

I just noticed that if I import string share(s) to my app and serialize it for storage where I'm storing just seed (aka data) and metadata (hrp, idx, threshold, id) separately. Next when I try to load storage data to Share object (via from_seed) and try to recover secret share s, shares before serialization provide different result compared to shares loaded from seed.

My guess is that this has something to do with padding. Any idea how to fix this issue ?

I added test case proving the point:

---- tests::my_vector stdout ----
thread 'tests::my_vector' panicked at src/lib.rs:501:9:
assertion `left == right` failed
  left: "10cbc41852b76438e5781f2cefb49799"
 right: "10cbc41852b76438e5781f2cefb4979f"

@apoelstra
Copy link
Owner

apoelstra commented Dec 5, 2025

Yeah, it's to do with padding, but the reason seems to be that this library is badly broken/confused.

In codex32 the threshold/id/index 3k00la are all part of the codex32-encoded data along with the "actual" data. The first six characters are 30 bits, meaning that the share data, when encoded, should be right-shifted by 6 bits. Instead, we treat the data as its own bytestring which we convert to/from codex32 without shifting, effectively injecting 2 "padding" bits that aren't recognized as padding, aren't zeroed out, and eventually wind up at the end of the string.

At least, that's my best interpretation of what's going on. The data structures in this library are a real mess and combine strings, Fe vectors and byte vectors in arbitrary/lazy ways.

I apologize for the state of this library -- for a long time I have intended to replace all the error-correction logic with rust-bech32 0.12, which will have codex32 support. (It will not have interpolation logic, but that's easy/small and I will continue to implement it here.)

However, I let rust-bech32 0.12 get scope-creeped into doing error correction, which is still not done. There is a tracking issue here rust-bitcoin/rust-bech32#189. Maybe I should just cut a release so that I can fix this library.

@apoelstra
Copy link
Owner

Oh, I'm not actually blocked on rust-bech32 0.12. Indeed, the docs for 0.11 have codex32 support as an example.

@BenWestgate
Copy link

The .from_seed() factory needs a padding parameter and .to_seed() or codex32_decode(bech) needs to return the encoding's padding.

Without these it won't round trip and worse recovers a different seed.

Using default 0 padding on shares has some other problems:

  • Can't disambiguate Bech32 vs Codex32 checksum formats during error correction.
  • If attacker with < k derived shares knows the initial k used 0 padding, the last character of the seed is leaked early. Attacker learns 5 bits per share but the random data is only (5-pad_len) * k. So for 128-bit seeds, two k=3 shares known to use zero padding reveal the last 3 bits of the secret.
  • Missed chance to detect bit errors, e.g.: 4-bit burst errors on 256-bit seed data. (95% chance of detecting a symbol error)

For the BIP85 codex32 application (and that compact QR idea) I set the padding bits by CRC of the data to avoid this.

That way they depend on 128-bits of unknown data and can't be assumed and are deterministic.

A fast fix would be .from_seed only constructs index "s" strings but that loses useful functionality vs giving it a secure deterministic default.

@apoelstra
Copy link
Owner

If it's only possible to construct S strings then users might as well just use rust-bech32 :). The point of this library is that it can also do interpolation.

@BenWestgate

This comment has been minimized.

@scgbckbone
Copy link
Author

For the BIP85 codex32 application (and that compact QR idea) I set the padding bits by CRC of the data to avoid this. That way they depend on 128-bits of unknown data and can't be assumed and are deterministic.

smart, and does the trick! (I tried with your python-codex32)

giving it a secure deterministic default.

I agree this is much better than passing padding around

@BenWestgate
Copy link

BenWestgate commented Dec 7, 2025

smart, and does the trick! (I tried with your python-codex32)

Thanks, I just rewrote it to save 100 lines and put polymod in an Encoding class for any checksum: Bech, Bech32m, long Codex32, Codex32 and CRC. Have to re-add this CRC padding feature and will publish.

giving it a secure deterministic default.

To standardize CRC padding we should:

  • Decide to cover expanded hrp and all data or only payload bits
  • Exhaustively test candidate polynomials and CONST (xor_out) values to maximize error detection of codex32 strings with 9 errors.
    • Probably 1-bit errors as CHARSET is chosen so these are most likely and all CRCs are pushed well past their max length for hamming distance 3 (guaranteed detection of 2 errors and correction of 1).
    • Tie break CRC-2 on 128-bit "ms" secrets, CRC-4 on 256-bit "ms" secrets and CRC-3 on 512-bit "ms" secrets.

I agree this is much better than passing padding around

My library passes padding around, it tests the alternate encodings by encoding every pad_val while this library only checks they decode to the same bytes.

def test_from_seed_and_alternates():
    """Test Vector 4: encode secret share from seed"""
    seed = bytes.fromhex(VECTOR_4["secret_hex"])
    for pad_val in range(0b1111 + 1):
        s = Codex32String.from_seed(seed, header="ms10leet", pad_val=pad_val)
        assert str(s) == VECTOR_4["secret_s_alternates"][pad_val]
        assert s.data == seed
        # confirm all 16 encodings decode to same master data

Given we leak a secret character if initial from_seed shares are always zero padded, to do so is a library bug.

@apoelstra
Copy link
Owner

@BenWestgate if you have a rewrite feel free to open a PR -- if it's not too hard to review I'm happy to take it in. (Though "I saved 100 lines" makes me worry that it's a big diff.)

But I think the correct direction to rewrite in is one where we add a rust-bech32 dependency, use that for all the checksumming and encoding stuff, and then here we add (a) utility methods, and (b) constructors and accessors for the id/threshold/share index.

@BenWestgate

This comment was marked as outdated.

@BenWestgate
Copy link

If only S strings then users might as well just use rust-bech32 :). The point of this library is that it can also do interpolation.

What about extracting bytes data from non-"s" strings? Raise InvalidShareIndex error or output useless bytes without their padding?

The PR author is able to achieve an unexpected (but inevitable) result that he can't recover the original secret when derived shares are reconstructed from bytes extracted from the original derived shares.

BenWestgate/python-codex32#2

Should our libraries:

  • raise InvalidShareIndex if share_idx != 's' when accessing the data property.
  • return useless unpadded bytes
  • return bytes and padding
  • something else?

@scgbckbone
Copy link
Author

maybe I misunderstood the scope of this application - @apoelstra can you check this last comment of mine BenWestgate/python-codex32#2 (comment)

@scgbckbone
Copy link
Author

something else?

@BenWestgate I'm definitely in this category. If round-trips (even for derived shares) can be achieved, it should be desired property for Codex32

@BenWestgate
Copy link

BenWestgate commented Dec 8, 2025

something else?

@BenWestgate I'm definitely in this category. If round-trips (even for derived shares) can be achieved, it should be desired property for Codex32

The problem you encounter is you can't construct 130-bits of data from 16 bytes for all 31 share indices because any math you do to pad, even constants like all 0s or all 1s, only works for k initial (aka encoded) strings. The derived (aka interpolated) strings break that padding because it was not a 5-bit value and so is not preserved by GF(32) interpolation the way the u5 checksum or header is.

@BenWestgate
Copy link

BenWestgate commented Jan 2, 2026

I did some research and if pad bits are discarded from shares, there is no way to always recover the secret's last character without brute force. It makes no difference how the initial shares' padding is chosen.

For an exact recovery, @scgbckbone should store the pad bits with his share index metadata. But since these bits can reveal up to 5-bits of the secret they should be handled like payload bytes.

Since BIP-93 only defines how to encode master seeds from bytes and decode unshared secrets to bytes but not shares or strings (really, "shared strings", aka non-"0" threshold parameter strings), I suggest we define a byte decoding with 2 extra bytes for a "minimal header" and padding.

String Decoding

When the threshold parameter of a valid codex32 string is not the digit "0", we call the string a codex32 shared string.

If a codex32 shared string is decoded into bytes, it MUST be decoded as follows:

  • Translate the threshold parameter to a 3-bit value, most significant bit first.
  • Translate the share index, payload, and left two identifier characters to 5 bits values using the bech32 character table from BIP-0173, most significant bit first.
  • Re-arrange those bits into groups of 8 bits. The incomplete group at the end MUST be 2 to 7 bits, and is discarded.

Note that unlike the decoding process in BIP-0173, we do NOT require that the incomplete group be all zeros.

Rationale

  • Minimum bytes to recover secret data unambiguously
  • Stripping the first and last byte from "s" strings leaves the secret data
  • 4- to 8- bits of identifier can prevent mistakes
  • "0" threshold unsupported as these never interpolate to need share index, padding, or identifier

@apoelstra
Copy link
Owner

This all sounds right to me. Though I think we should discourage "decoding a share to bytes" at all. The bech32-encoding share has all the information of the share, including padding and metadata etc., and is already in a standard fixed-length ASCII encoding.

I suppose users could strip the checksum and ms1 HRP if they really wanted to get a 48-character seed into a 32-byte slot or something.

@BenWestgate
Copy link

Agreed to discourage. He wants to use the share data for decoy wallets. To my surprise he said this +2 bytes decoding works for his decoy seeds. It's certainly less conspicuous than u5 or ASCII. The max stealth decoding stores no meta-data and brute forces everything against a fingerprint or address.

@scgbckbone
Copy link
Author

words of "discouragement" are... kind of bummer. As I stated in related PR, I want to stay "in" standard way of doing it. guess I just drop my decoy idea and treat non-s shares as pure useless strings

@BenWestgate
Copy link

Do NOT despair @scgbckbone!

words of "discouragement" are... kind of bummer. As I stated in related PR, I want to stay "in" standard way of doing it. guess I just drop my decoy idea and treat non-s shares as pure useless strings

These are merely "RFC words" of NOT RECOMMENDED. I actually encourage you to implement these useful decoys because you understand the full implications and we have carefully weighed your use case.

4. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.

I agreed to "NOT RECOMMEND" decoding codex32 shares to bytes because the decoding I proposed and need is lossy so they won't round trip back to shares, even if they do contain enough data to recover the secret data.

I wrote something like

If a codex32 shared string is decoded into bytes, it MUST be decoded as follows:

into my BIP93 length helper PR so we have an "in" standard way to do it. But hopefully other reviewers critique it to make sure this is the very best approach. For example they may decide we MUST decode the entire identifier for round trip. Or that we should not drop the "0" threshold. Or a different byte alignment.

Whatever gets consensus, we can build upon for your decoy idea and my "compact CodexQR" both of which need a tiny byte decoding of shares able to recover the secret.

Perhaps we should join forces and make compact CodexQRs decoys by default since you also want a QR encoding and I want decoy shares?
That'll keep our stuff out of BIP-0093, similar to how "SeedQR" didn't update BIP-0039. which everyone here will appreciate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants