Fix: prevent username collision in set-password signup path#668
Fix: prevent username collision in set-password signup path#668iprasannamb wants to merge 27 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a _unique_username utility to handle username collisions during registration and Google authentication by appending numeric suffixes. Feedback highlights a potential race condition in the non-atomic "check-then-insert" logic and suggests making the database index unique to ensure consistency. Additionally, it is recommended to strip the $ character from usernames to prevent validation errors during login and to clean up redundant imports in the test suite.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a _unique_username utility to prevent username collisions during registration and Google authentication by appending numeric suffixes. It also includes comprehensive tests for this new logic. Feedback focuses on addressing potential race conditions through database-level unique constraints, as well as improving data sanitization by stripping whitespace and truncating excessively long names from external providers.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements unique username constraints and a generation strategy to handle collisions during local and Google-based signups. The changes include a database migration to enforce unique indexes, a helper function to generate suffixed usernames, and error handling for race conditions. Feedback focuses on the operational risk of dropping indexes before ensuring the new unique index can be created, the potential performance bottleneck of the linear username search, and the user experience impact of returning 409 errors for auto-generated username conflicts instead of implementing a retry mechanism.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces unique username enforcement by upgrading the MongoDB index and implementing a _unique_username utility to handle collisions during signup. The authentication routes now include retry logic and error handling for DuplicateKeyError. Feedback was provided regarding a logic error in the _unique_username loop, which fails to verify the final sequential candidate before falling back to a random suffix.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements unique username enforcement and collision resolution. It introduces a safe migration strategy for the MongoDB username index, a utility for generating unique usernames, updated authentication routes to handle race conditions, and unit tests for the new logic. Feedback suggests making the email index unique to avoid login ambiguity, refining the username generation logic for edge cases, and removing a redundant database call.
|
@iprasannamb pls check the bot comments above. Do they make sense? |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request enforces unique constraints for usernames and emails by updating database index configurations and implementing a _unique_username generation utility. The authentication routes are updated with retry logic and error handling for DuplicateKeyError. Feedback indicates that identifying specific conflicts by searching for the substring "email" in error messages is brittle and should be replaced with a check on the keyPattern attribute of the exception. Additionally, an unreachable code block was identified in the set_password function following its retry loop.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements robust username collision handling by introducing a unique index for usernames and a helper function to generate unique usernames with numeric or hex suffixes. It also adds retry logic for user creation to handle race conditions and updates the database configuration to safely upgrade existing non-unique indexes. I have provided feedback regarding code duplication in the index management and user creation logic, as well as a suggestion to simplify the username generation flow and a note on unreachable code in the Google authentication path.
I am having trouble creating individual review comments. Click here to see my feedback.
database/databaseConfig.py (108-184)
The logic for creating and upgrading unique indexes for username and email is nearly identical, leading to significant code duplication. To improve maintainability and readability, this logic should be refactored into a single helper function.
Here's a suggested refactoring:
def _create_or_upgrade_unique_index(collection, field_name, index_name, existing_indexes):
temp_index_name = f'{index_name}_unique_tmp'
logger_prefix = f"Cannot upgrade {field_name} index to unique:"
error_msg = (
f"{logger_prefix} duplicate {field_name}s exist in the collection. "
"Manual deduplication is required before uniqueness can be enforced."
)
if index_name in existing_indexes:
if not existing_indexes[index_name].get('unique'):
try:
collection.create_index(
[(field_name, 1)],
name=temp_index_name,
unique=True,
)
collection.drop_index(index_name)
collection.drop_index(temp_index_name)
collection.create_index([(field_name, 1)], name=index_name, unique=True)
logger.info(f"Upgraded {index_name} index to unique=True")
except MongoDuplicateKeyError:
try:
collection.drop_index(temp_index_name)
except Exception:
pass
logger.error(error_msg)
else:
try:
collection.create_index([(field_name, 1)], name=index_name, unique=True)
logger.info(f"Unique index created on {field_name} in user collection")
except MongoDuplicateKeyError:
logger.error(error_msg.replace("upgrade", "create"))
# In initialize_text_index():
_create_or_upgrade_unique_index(user_collection, 'username', 'username_1', existing_user_indexes)
_create_or_upgrade_unique_index(user_collection, 'email', 'email_1', existing_user_indexes)routes/auth.py (54-61)
The logic in _unique_username can be simplified for better readability. The current implementation has a slightly confusing flow with a check after the loop. A more straightforward approach would be to handle the base case first, then loop through the suffixed candidates.
# Check base username first
if not db.users.find_one({"username": base}):
return base
# If base is taken, try with suffixes
for counter in range(1, _MAX_USERNAME_SEQUENTIAL + 1):
candidate = f"{base}{counter}"
if not db.users.find_one({"username": candidate}):
return candidate
routes/auth.py (419-441)
The user creation logic, including the retry mechanism for handling DuplicateKeyError, is duplicated here and in the google_auth function (lines 520-548). Consider extracting this logic into a private helper function to avoid code duplication and improve maintainability. The helper could handle the username generation and insertion loop, while the calling functions would be responsible for preparing the user data and handling specific error cases.
routes/auth.py (546)
This else block appears to contain unreachable code. A DuplicateKeyError on the email index implies that a user with this email already exists. Therefore, the db.users.find_one({"email": email}) call on line 540 is guaranteed to find a user, making the if user: check on line 541 always true. This else block will likely never be executed.
|
@pradeeban ,Could you please review it now? I’ve considered the required suggestions. Gemini kept suggesting alternatives, and I’ve fixed the bugs. |
There was a problem hiding this comment.
Code Review
This pull request implements unique constraints for the username and email fields, including a migration strategy that safely upgrades existing non-unique indexes. It also introduces a _unique_username helper to resolve collisions during signup and updates authentication routes to handle DuplicateKeyError exceptions. Review feedback suggests reordering the index migration steps to ensure continuous uniqueness enforcement and optimizing the username availability check by using an $in query instead of a regex search.
|
@iprasannamb The bot has some comments above. If you can either address them or simply mark them as resolved if they are irrelevant, I will merge the PR. |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements unique constraints for usernames and emails in the database configuration, including a safe migration strategy for existing non-unique indexes. It introduces a _unique_username helper in the authentication routes to automatically resolve collisions and updates the signup and Google OAuth flows to handle DuplicateKeyError exceptions. Review feedback identifies that the newly added unit tests incorrectly mock find_one instead of the find method now used by the helper function, which will cause test failures. Additionally, a docstring update is suggested to correctly reflect the use of an $in query rather than a regex for finding available usernames.
| mock_db.users.find_one.side_effect = [ | ||
| None, # existing user by email — not found | ||
| {"_id": "x"}, # "alice" username already taken | ||
| None, # "alice1" username is free | ||
| ] |
There was a problem hiding this comment.
The test mocks find_one to simulate username collisions, but the new _unique_username helper function uses find to fetch all candidates in a single round-trip. Consequently, this test will fail because find is not mocked, and the side_effect on find_one will not be consumed as expected for the username check.
# Simulate: first find_one (existing-user check) → None,
# find (_unique_username candidates) → returns 'alice' as taken,
# insert_one → success,
# second find_one (OTP verification) → otp_record.
mock_db.users.find_one.side_effect = [
None, # existing user by email — not found
otp_record, # OTP verification
]
mock_db.users.find.return_value = [{"username": "alice"}]There was a problem hiding this comment.
@pradeeban , I will convert this PR to a draft for now. I will analyze it thoroughly and implement it properly later. Thank you.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
I’ll update it properly, so we can skip merging for now. Appreciate it! |
Summary
Signing up via the
/api/auth/set-passwordendpoint could silently create duplicate usernames, causing login to resolve to the wrong account.Root cause
set_password()auto-assigned a username from the email prefix with no uniqueness check:alice@gmail.comandalice@outlook.com, both received the username alice. The login resolver at line 244 uses a MongoDB $or query on {username, email} — find_one returns whichever document the DB engine happens to return first, silently authenticating the wrong user.Changes
Testing
Checklist: