-
Notifications
You must be signed in to change notification settings - Fork 26
INTPYTHON-527 Add Queryable Encryption support #329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
2d672f2
to
10ae20e
Compare
4acc5ab
to
796855e
Compare
affa499
to
29a1d97
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added my first round of comments. I haven't looked over the tests yet but plan to do so next week. A big area I'll dig more into is around the discussion on KMS_PROVIDERS and KMS_CREDENTIALS
django_mongodb_backend/management/commands/get_encrypted_fields_map.py
Outdated
Show resolved
Hide resolved
data_key = ce.create_data_key( | ||
kms_provider=kms_provider, | ||
master_key=master_key, | ||
) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
89205af
to
293656a
Compare
f0d4e92
to
4ef6b84
Compare
3815a70
to
a6a12e7
Compare
The encryption tests are passing locally for me on Enterprise and on the Atlas VM. On GitHub actions, this first issue was solved by adding
But this issue remains:
|
6ab0a86
to
fb1e120
Compare
"OPTIONS": { | ||
"auto_encryption_opts": AutoEncryptionOpts( | ||
… | ||
schema_map= { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would very strongly recommend not using the phrase "schema_map" in the context of QE.
"schema_map" and "encrypted_fields_map" are both examples of existing terminology that we already use, with both playing the role of referring to a map that declares which fields should be encrypted and how.
But "schema_map" is specific to CSFLE and "encrypted_fields_map" specific to QE, and calling this "schema_map" feels like it's bound to:
- make people believe they're using CSFLE instead of QE and/or
- make our support staff believe they're using CSFLE instead of QE and/or
- will lead to users trying to pass it in the wrong place (i.e. as the "schema_map" auto-encryption option of PyMongo, when they really should be passing it as the "encrypted_fields_map" option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anna, thanks for the clarification. The differences between CSFLE and QE have really been confusing for us. The documentation for AutoEncryptionOpts
describes it as "Automatic Client-Side Field Level Encryption" but apparently it's for Queryable Encryption too...
One point perhaps you can clarify. Do we need to specify keyId
in the encrypted_fields_map
? This example says, "If you are using explicit encryption, add a keyId field with the DEK ID". On the other hand, pymongo's docs for encrypted_fields_map includes keyId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point perhaps you can clarify. Do we need to specify
keyId
in theencrypted_fields_map
? This example says, "If you are using explicit encryption, add a keyId field with the DEK ID". On the other hand, pymongo's docs for encrypted_fields_map includeskeyId
.
I believe I have clarified this with @addaleax, but never hurts to hear it again! The manual tests I've done prove we need keyId for client and ClientEncryption.create_encrypted_collection
creates them on the server side.
@addaleax what do recommend instead of schema_map
, schema
maybe ? qe_schema
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it supposed to be AutoEncryptionOpts(encrypted_fields_map=...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for
AutoEncryptionOpts
describes it as "Automatic Client-Side Field Level Encryption" but apparently it's for Queryable Encryption too...
Yeah, the naming of pretty much everything in this area is slightly unfortunate in general. Since CSFLE predates QE but a lot of the technology stack and the configuration is shared, you'll unfortunately find references to CSFLE in docs for features of the QE stack as well. In theory, we've decided to adopt "In-Use Encryption" as the umbrella term for both CSFLE and QE, but I wouldn't say that it has really caught on.
what do recommend instead of
schema_map
,schema
maybe ?qe_schema
?
If this maps to the encrypted_fields_map
option in AutoEncryptionOptions
, which it appears to do, yes, I'd definitely recommend sticking to that name here too (i.e. encrypted_fields_map
).
One point perhaps you can clarify. Do we need to specify
keyId
in theencrypted_fields_map
? This example says, "If you are using explicit encryption, add a keyId field with the DEK ID". On the other hand, pymongo's docs for encrypted_fields_map includeskeyId
.
Yeah, another big "sigh" to let out here 😅 If you're using the driver's create_encrypted_collection
helper and you're using automatic encryption, then yes, the driver will create key IDs for you. That sounds simple in theory, but is still a bit of a hassle in practice, because you do need to persist the resulting encrypted_fields
configuration which includes key IDs if you intend to use it as part of a client-side encrypted_fields
map (which is a good practice).
You're never going to do anything wrong by just creating keys yourself here. The key creation feature of create_encrypted_collection
is purely a convenience feature, and ultimately when the application runs, it will have to have the correct key IDs available. That can come from the server-side encryptedFields
map, but if you have a client-side encrypted_fields_map
, then that will also need to include the right key IDs.
(This was supposed to help with the fact that you have a bit of a chicken-and-egg situation when setting up QE manually; you need a QE-enabled MongoClient to create keys for an encrypted collection, but you can only specify the correct key IDs after creating them, so you can't start out with the right encryptedFieldsMap
for that initial MongoClient because it does require those…)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not something that will be well-received by Djangonauts and at the very least we'll need to:
Document a workaround (export/import ? )
Give an ETA on when the next QE release will address the issue (assuming this is possible)
Yeah, re-creating collections and migrating data would be the primary workaround. First-class migration support in mongosync is part of https://jira.mongodb.org/browse/REP-3483, but I don't know when this would realistically be scheduled (you may want to check in #fle-qe-devs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's the most secure solution, but from a user usability perspective, I think the easiest thing would be to keep the existing workflow and have showschemamap
retrieve the keyId
s from the server so it can include them in its output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering about the workflow for a dev/prod environment. For example, user commits AutoEncryptionOpts(encrypted_fields_map=...)
with the keyIds for their local environment. Will this break if they use the same settings in production, i.e. will create_encrypted_collection()
use keyIds
from encrypted_fields_map
?
(And frankly, I'm wondering if client-side schema validation is even in scope for our v1 of queryable encryption since this is complicated and the design document doesn't mention anything about it!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. will
create_encrypted_collection()
usekeyIds
fromencrypted_fields_map
?
Yes. create_encrypted_collection()
will only create new keys if the encrypted_fields_map
doesn't already provide keys for all fields.
I'm also wondering about the workflow for a dev/prod environment. For example, user commits
AutoEncryptionOpts(encrypted_fields_map=...)
with the keyIds for their local environment.
Yes, this is one of the pains of bootstrapping CSFLE/QE applications. I'd typically consider client-side schemas to be configuration data that shouldn't be committed to the mainline repository for this reason, but I am sure there are developers who do it without that necessarily being a wrong path.
(And frankly, I'm wondering if client-side schema validation is even in scope for our v1 of queryable encryption since this is complicated and the design document doesn't mention anything about it!)
Yes, you may want to verify whether this is the case or not. Client-side schemas are something that all our tools support and there are relevant security reasons for doing so (in a similar vein to what I already mentioned above, client-side schemas protect against compromised database servers which advertise incorrect schemas) – but they're still not necessarily part of every setup.
I'd also encourage you to reach out in #fle-qe-devs – otherwise I'm also happy to be the one to start a discussion there. While I have a deep understanding of the technical aspects of QE and CSFLE, the PMs for QE are more familiar with what customers do in the real world and what the best practices are that we recommend (for example, I know that client-side schemas are something that isn't considered a necessity for every use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And frankly, I'm wondering if client-side schema validation is even in scope for our v1 of queryable encryption since this is complicated and the design document doesn't mention anything about it!)
We can consider leaving out client side for GA, however I'm inclined to go for it since we want to present as complete and compelling of a feature as we can given the inherent limitations and we've made a lot of progress in understanding the requirements.
I'm rethinking the design now, as well as confirming what PyMongo does for us re: data keys when we call AutoEncryptionOpts
with encrypted_fields_map
instead of schema_map
.
I'll have some pushes coming later tonight and/or tomorrow morning in which I hope to resolve a significant amount of the issues raised today. Thanks @timgraham and @addaleax !
tests/encryption_/tests.py
Outdated
self.assertTrue("__safeContent__" in records[0]) | ||
|
||
|
||
class EncryptedNumberFieldTests(EncryptedFieldTests): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inheriting EncryptedFieldTests
causes all the its tests to be run a second time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
tests/encryption_/tests.py
Outdated
@classmethod | ||
def setUpClass(cls): | ||
super().setUpClass() | ||
try: | ||
from pymongo_auth_aws.auth import AwsCredential # noqa: PLC0415 | ||
except ImportError: | ||
cls.skipTest(cls, "pymongo_auth_aws not installed, skipping AWS credentials tests") | ||
|
||
cls.patch_aws = patch( | ||
"pymongocrypt.synchronous.credentials.aws_temp_credentials", | ||
return_value=AwsCredential(username="", password="", token=""), | ||
) | ||
cls.patch_aws.start() | ||
|
||
cls.patch_azure = patch( | ||
"pymongocrypt.synchronous.credentials._get_azure_credentials", return_value={} | ||
) | ||
cls.patch_azure.start() | ||
|
||
cls.patch_gcp = patch( | ||
"pymongocrypt.synchronous.credentials._get_gcp_credentials", return_value={} | ||
) | ||
cls.patch_gcp.start() | ||
|
||
@classmethod | ||
def tearDownClass(cls): | ||
cls.patch_aws.stop() | ||
cls.patch_azure.stop() | ||
cls.patch_gcp.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this patching is no longer needed since the providers don't appear in the test KMS_PROVIDERS
(after the remove of encryption.KMS_PROVIDERS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I take them out I get this error:
System check identified no issues (1 silenced).
test_appointment (encryption_.test_base.EncryptedFieldTests.test_appointment) ... ERROR
----------------------------------------------------------------------
test_billing (encryption_.test_base.EncryptedFieldTests.test_billing) ... ERROR
----------------------------------------------------------------------
test_get_encrypted_fields_map (encryption_.test_base.EncryptedFieldTests.test_get_encrypted_fields_map)
Test class method called by schema editor ... ERROR
----------------------------------------------------------------------
test_numeric_fields (encryption_.test_base.EncryptedFieldTests.test_numeric_fields)
Fields that have not been tested elsewhere. ... ERROR
----------------------------------------------------------------------
test_patient (encryption_.test_base.EncryptedFieldTests.test_patient) ... ERROR
----------------------------------------------------------------------
test_patientportaluser (encryption_.test_base.EncryptedFieldTests.test_patientportaluser) ... ERROR
----------------------------------------------------------------------
test_patientrecord (encryption_.test_base.EncryptedFieldTests.test_patientrecord) ... ERROR
----------------------------------------------------------------------
test_set_encrypted_fields_map_in_client (encryption_.test_base.EncryptedFieldTests.test_set_encrypted_fields_map_in_client) ... ERROR
----------------------------------------------------------------------
test_show_schema_map (encryption_.test_management.EncryptedFieldTests.test_show_schema_map) ... ok
----------------------------------------------------------------------
======================================================================
ERROR: test_appointment (encryption_.test_base.EncryptedFieldTests.test_appointment)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongo_auth_aws/auth.py", line 110, in aws_temp_credentials
frozen = creds.get_frozen_credentials()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get_frozen_credentials'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 124, in _wrap_encryption_errors
yield
File "/Users/alex.clark/Developer/django-mongodb-cli/src/mongo-python-driver/pymongo/synchronous/encryption.py", line 466, in encrypt
encrypted_cmd = self._auto_encrypter.encrypt(database, encoded_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/auto_encrypter.py", line 44, in encrypt
return run_state_machine(ctx, self.callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/state_machine.py", line 150, in run_state_machine
creds = _ask_for_kms_credentials(ctx.kms_providers)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongocrypt/synchronous/credentials.py", line 140, in _ask_for_kms_credentials
aws_creds = aws_temp_credentials()
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/alex.clark/Developer/django-mongodb-cli/.venv/lib/python3.12/site-packages/pymongo_auth_aws/auth.py", line 115, in aws_temp_credentials
raise PyMongoAuthAwsError("temporary MONGODB-AWS credentials could not be obtained")
pymongo_auth_aws.errors.PyMongoAuthAwsError: temporary MONGODB-AWS credentials could not be obtained
tests/encryption_/tests.py
Outdated
expected_encrypted_fields_map[db_table], | ||
) | ||
|
||
def test_show_schema_map(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea to try to break up the tests. In this case, testing of management commands is conventionally done in encryption_/test_management.py
.
================================ | ||
|
||
Configuring Queryable Encryption in Django is similar to | ||
`configuring Queryable Encryption in Python <https://www.mongodb.com/docs/manual/core/queryable-encryption/quick-start/>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links to mongodb.com can use intersphinx "doc" or "ref". Let me know if you get stuck on this because it can be tricky to understand at first, but you can get a link of targets with python -m sphinx.ext.intersphinx https://www.mongodb.com/docs/manual/objects.inv
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Had you not given that tip, I may have spent the rest of my QE development time figuring out intersphinx.
|
||
DATABASE_ROUTERS = [EncryptedRouter()] | ||
|
||
You are now ready to use server side :doc:`Queryable Encryption </topics/queryable-encryption>` in your Django project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation should be wrapped at 79 characters. I set my editor's right margin at 80 characters and make sure I don't touch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL :set colorcolumn=80
docs/source/topics/known-issues.rst
Outdated
Queryable Encryption | ||
==================== | ||
|
||
.. TODO: Add Django core limitations that affect Queryable Encryption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you misinterpreted what I meant. "Queryable Encryption" isn't a core Django feature, so it doesn't need to be mentioned on this page.
def __str__(self): | ||
return self.ssn | ||
|
||
The API is similar to that of Django's relational fields, with some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wording means that the embedded model API works like relational fields. That's not the case with encrypted fields. And the Python API doesn't manifest any "security-related changes", as far as your example demonstrates.
tests/encryption_/tests.py
Outdated
self.appointment = Appointment(time="8:00") | ||
self.appointment.save() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually Appointment.objects.create()
is used instead of a separate call to save()
.
- Client-side QE configuration mistakenly used `schema_map` to pass the encrypted fields map to Django's schema editor through `AutoEncryptionOpts`. Although confusing, and despite the error, client-side configuration still succeeded because the map given to `AutoEncryptionOpts` in `schema_map` was then correctly passed to `create_collection` via the `encryptedFields` arg. - Re-confirmed in local manual testing that client-side configuration works as expected and requires data keys. There is no code (as far as I can tell) to create data keys in PyMongo that is initiated by the existence of `encrypted_fields_map` alone. Rather, data keys appear to be created in `create_encrypted_collection` and only in `create_encrypted_collection`. - Renamed `showschemamap` -> `showfieldsmap` and updated tests and docs accordingly.
Previous attempts and additional context here: