-
Notifications
You must be signed in to change notification settings - Fork 102
add support to azure data lake gen 2 storage #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add support to azure data lake gen 2 storage #238
Conversation
Signed-off-by: Anhui-tqhuang <[email protected]>
Signed-off-by: Anhui-tqhuang <[email protected]>
6779b70 to
42d4a8b
Compare
Signed-off-by: Anhui-tqhuang <[email protected]>
Signed-off-by: Anhui-tqhuang <[email protected]>
Signed-off-by: Anhui-tqhuang <[email protected]>
Signed-off-by: Anhui-tqhuang <[email protected]>
Signed-off-by: Anhui-tqhuang <[email protected]>
|
This is a fantastic workaround for the issues reported in #239. I note that it duplicates a lot of the |
providers/azure/azure.go
Outdated
| MSIResource string `yaml:"msi_resource"` | ||
|
|
||
| // IsAzureDataLakeGen2 indicates whether the provided storage account is an Azure Data Lake Gen2 account. | ||
| IsAzureDataLakeGen2 bool `yaml:"is_azure_data_lake_gen2"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be interrogated from the account at runtime, rather than relying on user configuration?
It looks like the azblob go SDK doesn't support gen2 at all; a different datalake SDK is required.
. Which is why this PR duplicates so much I expect.
The blob sdk has [func (client *BlobClient) getAccountInfoHandleResponse(resp *http.Response) (BlobClientGetAccountInfoResponse, error)])(https://github.com/Azure/azure-sdk-for-go/blob/49a431a28f26a3e5ccf3f4b8c00ccada08572a60/sdk/storage/azblob/internal/generated/zz_blob_client.go#L1274-L1280) which is aware of HNS (gen2). This is exposed as func (client *BlobClient) GetAccountInfo(ctx context.Context, options *BlobClientGetAccountInfoOptions) (BlobClientGetAccountInfoResponse, error); see the go doc for service.Client's GetAccountInfo(...).
For the underlying API info, it's it's exposed in the Create and List APIs as properties.isHnsEnabled. It's also present in the Get Account Information REST API as header x-ms-is-hns-enabled when using API version >= 2019-07-07.
So it seems like the main azure client should be able to create a storage.Client, call client.GetAccountInfo(), and runtime-dispatch to use either the azblob sdk or the azdatalake SDK as required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on a revision of this patch to add autodetection.
One open question is whether the default behaviour should be autodetection, or whether the user should have to explicitly configure autodetection. The only reason not to autodetect by default is if someone might have a running configuration that uses the gen1 azblob client with a gen2 azdatalake account, and if switching the client could introduce behaviour change that could break the existing app.
I have no evidence to think that such a regression could occur though; it's possible that all gen1 sdk ops performed on a gen2 account work unchanged when switching to gen2 sdk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly right, the configuration might not be required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if os.Getenv("IS_AZURE_DATA_LAKE_GEN2") == "true" && bkt.Provider() == AZURE { | ||
| expected = []string{"obj_5.some", "id1/", "id2/"} // Azure Data Lake Gen2 keeps empty dirs. | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this ^ is the cause of the problem observed with using the objstore client in Loki where it tries to open a directory and fails.
The loki code is written with the assumption that when all contents of a bucket subpath are removed, the subpath ("directory") is implicitly removed too.
This is the case with a gen1 store, but is not the case for a gen2 store where directories are actual filesystem entities, not just delimiters in the names of objects in a flat namespace. This behaviour difference will persist irrespective of whether azblob or azdatalake is used as the client to communicate with a gen2 store.
Signed-off-by: Anhui-tqhuang <[email protected]>
related issue:
Changes
Add support to azure data lake gen 2 storage, use the following flag to enable it
Verification
test config
e2e tests
run thanos locally