Simplify permission checks for creating namespaces #1214

ilongin · 2025-07-08T20:51:04Z

Trying to simplify logic around permissions like "is creation of namespace / project allowed or not" by lifting awareness of "is the process run in CLI or Studio" to Catalog class.

This allowed to remove couple of methods from metastore:

is_studio
is_local_dataset
namespace_allowed_to_create
project_allowed_to_create

Summary by Sourcery

Simplify permission checks by replacing metastore-based flags with a centralized Catalog.is_cli indicator, update CLI commands to use it for routing between local and Studio operations, refactor loader and save logic accordingly, and align tests with the new mechanism.

Enhancements:

Introduce Catalog.is_cli flag and remove deprecated metastore properties (is_studio, is_local_dataset, namespace_allowed_to_create, project_allowed_to_create).
Update CLI dataset commands (rm_dataset, edit_dataset, delete_dataset) to use catalog.is_cli for permission and routing logic.
Refactor loader to determine is_cli based on metastore type and propagate it through the Catalog constructor.
Restrict automatic project creation in save logic when running in CLI mode.

Tests:

Replace allow_create_project and allow_create_namespace fixtures with a single is_cli fixture and mock Catalog.is_cli.
Update parametrized tests to use is_cli instead of legacy allow-create flags.

Summary by Sourcery

Centralize environment awareness by adding Catalog.is_cli and removing legacy metastore flags, refactor related dataset, namespace, and project commands to use the new flag, and align tests with the updated permission mechanism

Enhancements:

Introduce Catalog.is_cli property and remove deprecated metastore permission flags and methods
Refactor catalog loader to set is_cli based on metastore implementation
Update save, namespace, and project creation logic to restrict entity creation in CLI mode
Refactor CLI dataset commands to use Catalog.is_cli for routing between local and Studio operations

Tests:

Replace allow_create_project/allow_create_namespace fixtures with a unified is_cli fixture mocking Catalog.is_cli
Update parametrized tests across unit and functional suites to drive behavior via is_cli

sourcery-ai · 2025-07-08T20:51:10Z

Reviewer's Guide

Introduces a centralized Catalog.is_cli flag to consolidate environment context, removes deprecated metastore permission properties, refactors CLI commands and loader to route between local and Studio operations based on is_cli, updates save and creation workflows to enforce CLI restrictions, and aligns tests with the new mechanism.

Sequence diagram for dataset removal with new is_cli logic

sequenceDiagram
    actor User
    participant CLI
    participant Catalog
    participant Config
    participant Studio
    participant Metastore
    User->>CLI: rm_dataset(...)
    CLI->>Catalog: get_full_dataset_name(name)
    CLI->>Catalog: is_cli
    alt is_cli and studio
        CLI->>Config: read studio token
        alt token exists
            CLI->>Studio: remove_studio_dataset(...)
        else token missing
            CLI->>CLI: raise DataChainError
        end
    else
        CLI->>Metastore: get_project(...)
        CLI->>Catalog: edit local dataset
    end

Class diagram for Catalog and Metastore permission refactor

classDiagram
    class Catalog {
        - _is_cli: bool
        + is_cli: bool
    }
    class AbstractMetastore {
        <<abstract>>
        - Removed: is_studio: bool
        - Removed: is_local_dataset(dataset_namespace: str): bool
        - Removed: namespace_allowed_to_create: bool
        - Removed: project_allowed_to_create: bool
    }
    Catalog --> AbstractMetastore : metastore
    class SQLiteMetastore {
        // No longer implements is_studio
    }
    AbstractMetastore <|-- SQLiteMetastore

Class diagram for loader and Catalog instantiation changes

classDiagram
    class Loader {
        + get_catalog(...): Catalog
    }
    class Catalog {
        + is_cli: bool
    }
    Loader --> Catalog : returns
    class SQLiteMetastore
    Loader ..> SQLiteMetastore : uses for is_cli detection

Class diagram for namespace and project creation permission checks

classDiagram
    class Session {
        + catalog: Catalog
    }
    class Catalog {
        + is_cli: bool
    }
    class Namespace {
        + validate_name(name)
    }
    class Project {
        + validate_name(name)
    }
    Session --> Catalog
    Namespace ..> Session : uses
    Project ..> Session : uses
    Namespace ..> Catalog : checks is_cli for permission
    Project ..> Catalog : checks is_cli for permission

File-Level Changes

Change	Details	Files
Centralize environment context using Catalog.is_cli	Add is_cli parameter to Catalog constructor and expose via is_cli property Determine and pass is_cli based on metastore type in loader Remove deprecated metastore properties (is_studio, is_local_dataset, namespace_allowed_to_create, project_allowed_to_create) Remove is_studio override in SQLiteMetastore	`src/datachain/catalog/catalog.py` `src/datachain/catalog/loader.py` `src/datachain/data_storage/metastore.py` `src/datachain/data_storage/sqlite.py`
Route dataset CLI commands through catalog.is_cli instead of metastore flags	Replace metastore.is_local_dataset checks with catalog.is_cli in rm_dataset and edit_dataset Conditionally invoke Studio API commands based on catalog.is_cli and token presence Remove redundant token retrieval and Studio call duplication	`src/datachain/cli/commands/datasets.py` `src/datachain/lib/dc/datasets.py`
Update save, namespace and project creation to enforce CLI restrictions	Use not catalog.is_cli instead of metastore.project_allowed_to_create for save logic Raise NamespaceCreateNotAllowedError and ProjectCreateNotAllowedError when session.catalog.is_cli is true	`src/datachain/lib/dc/datachain.py` `src/datachain/lib/namespaces.py` `src/datachain/lib/projects.py`
Simplify test fixtures to use is_cli and mock Catalog.is_cli	Replace allow_create_project/allow_create_namespace fixtures with a single is_cli fixture Patch Catalog.is_cli in tests instead of AbstractMetastore flags Update pytest.mark.parametrize decorators to use is_cli across unit and functional tests	`tests/conftest.py` `tests/unit/lib/test_datachain.py` `tests/unit/lib/test_namespace.py` `tests/unit/lib/test_project.py` `tests/func/test_read_dataset_remote.py` `tests/func/test_datasets.py` `tests/func/test_pull.py` `tests/test_cli_studio.py`
Refine remote fallback logic using is_cli	Compute is_local based on catalog.is_cli and default namespace instead of metastore.is_local_dataset Use is_local to drive remote fallback and error raising in get_dataset_with_remote_fallback	`src/datachain/catalog/catalog.py`

Possibly linked issues

remove docstring from DataModel.__pydantic__init_subclass__ #123: PR simplifies namespace/project creation permission checks by removing deprecated metastore methods, directly addressing the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @ilongin - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/datachain/lib/namespaces.py:31` </location>
<code_context>
     """
     session = Session.get(session)

-    if not session.catalog.metastore.namespace_allowed_to_create:
+    if session.catalog.is_cli:
         raise NamespaceCreateNotAllowedError("Creating namespace is not allowed")

</code_context>

<issue_to_address>
Logic for namespace creation restriction appears inverted.

This change may block namespace creation in CLI mode even when permitted. Please verify that this aligns with the intended behavior for both CLI and Studio environments.
</issue_to_address>

### Comment 2
<location> `src/datachain/lib/projects.py:35` </location>
<code_context>
     """
     session = Session.get(session)

-    if not session.catalog.metastore.project_allowed_to_create:
+    if session.catalog.is_cli:
         raise ProjectCreateNotAllowedError("Creating project is not allowed")

</code_context>

<issue_to_address>
Project creation restriction logic may be reversed.

The updated condition blocks project creation in CLI mode, which may not be intended. Confirm that this matches the desired permission logic for CLI and Studio environments.
</issue_to_address>

### Comment 3
<location> `src/datachain/lib/dc/datasets.py:346` </location>
<code_context>
 ):
     namespace_name, project_name, name = catalog.get_full_dataset_name(name)

-    if not catalog.metastore.is_local_dataset(namespace_name) and studio:
+    if catalog.is_cli and studio:
+        # removing Studio dataset from CLI
         from datachain.studio import remove_studio_dataset
</code_context>

<issue_to_address>
Dataset deletion logic now depends on CLI mode rather than dataset locality.

Please verify that this change aligns with the intended behavior and that all relevant scenarios are covered.
</issue_to_address>

### Comment 4
<location> `src/datachain/catalog/catalog.py:530` </location>
<code_context>
             Callable[["AbstractWarehouse"], None]
         ] = None,
         in_memory: bool = False,
+        is_cli: Optional[bool] = True,
     ):
         datachain_dir = DataChainDir(cache=cache_dir, tmp=tmp_dir)
</code_context>

<issue_to_address>
Defaulting is_cli to True may not always reflect the actual environment.

This could cause incorrect behavior if Catalog is used outside a CLI context. Recommend setting is_cli explicitly where Catalog is instantiated or inferring it from the metastore type.

Suggested implementation:

```python
        in_memory: bool = False,
        is_cli: Optional[bool] = None,
    ):
        datachain_dir = DataChainDir(cache=cache_dir, tmp=tmp_dir)
        datachain_dir.init()
        }
        self._warehouse_ready_callback = warehouse_ready_callback
        self.in_memory = in_memory

        # Infer is_cli from metastore type if not explicitly provided
        if is_cli is not None:
            self._is_cli = is_cli
        else:
            # Example inference: if metastore is a CLI metastore, set True, else False
            # You may need to adjust this logic based on your actual metastore types
            self._is_cli = hasattr(self.metastore, "is_cli") and getattr(self.metastore, "is_cli", False)

```

- You will need to ensure that all instantiations of this class in your codebase are updated to explicitly pass `is_cli` if inference is not desired or not possible.
- Adjust the inference logic as needed to match your actual metastore implementation and types.
</issue_to_address>

### Comment 5
<location> `src/datachain/catalog/loader.py:140` </location>
<code_context>
     we can provide DATACHAIN_METASTORE_ARG_TEAM_ID=12345 env variable.
     """
     from datachain.catalog import Catalog
+    from datachain.data_storage.sqlite import SQLiteMetastore

+    metastore = get_metastore(in_memory=in_memory)
     return Catalog(
-        metastore=get_metastore(in_memory=in_memory),
+        metastore=metastore,
         warehouse=get_warehouse(in_memory=in_memory),
         client_config=client_config,
         in_memory=in_memory,
+        is_cli=isinstance(metastore, SQLiteMetastore),
     )
</code_context>

<issue_to_address>
CLI mode is inferred from metastore type, which may not be robust.

Relying on isinstance for CLI mode detection may break if new metastore types are added or if SQLiteMetastore is used outside CLI. Please use a more explicit check for CLI mode.

Suggested implementation:

```python
    metastore = get_metastore(in_memory=in_memory)
    # Explicitly determine CLI mode, e.g., via a function argument or environment variable
    is_cli = bool(os.environ.get("DATACHAIN_CLI_MODE", False))
    return Catalog(
        metastore=metastore,
        warehouse=get_warehouse(in_memory=in_memory),
        client_config=client_config,
        in_memory=in_memory,
        is_cli=is_cli,
    )

```

- You will need to import `os` at the top of the file if it is not already imported.
- If CLI mode should be determined differently (e.g., via a function argument), adjust the assignment to `is_cli` accordingly and update the function signature and all call sites.
</issue_to_address>

### Comment 6
<location> `src/datachain/lib/dc/datachain.py:585` </location>
<code_context>
             project = self.session.catalog.metastore.get_project(
                 project_name,
                 namespace_name,
-                create=self.session.catalog.metastore.project_allowed_to_create,
+                create=not self.session.catalog.is_cli,
             )
         except ProjectNotFoundError as e:
</code_context>

<issue_to_address>
Project creation flag is now inverted based on CLI mode.

This change may prevent project creation in CLI mode, which differs from the previous behavior. Please verify if this aligns with the intended permissions.
</issue_to_address>

### Comment 7
<location> `tests/unit/lib/test_namespace.py:29` </location>
<code_context>
[email protected]("allow_create_namespace", [False])
[email protected]("is_cli", [True])
 @skip_if_not_sqlite
-def test_create_by_user_not_allowed(test_session, allow_create_namespace):
+def test_create_by_user_not_allowed(test_session, is_cli):
     with pytest.raises(NamespaceCreateNotAllowedError) as excinfo:
         create_namespace("dev", session=test_session)
</code_context>

<issue_to_address>
Test for namespace creation denial is preserved and updated.

Consider adding a test for when 'is_cli' is False to verify that namespace creation is permitted in that case.
</issue_to_address>

### Comment 8
<location> `tests/unit/lib/test_project.py:65` </location>
<code_context>
         )


[email protected]("allow_create_project", [False])
[email protected]("is_cli", [True])
 @skip_if_not_sqlite
-def test_save_create_project_not_allowed(test_session, allow_create_project):
</code_context>

<issue_to_address>
Test for project creation denial updated to use 'is_cli'.

Please also add a test for when 'is_cli' is False to confirm project creation is allowed in that scenario.
</issue_to_address>

### Comment 9
<location> `tests/unit/lib/test_datachain.py:3591` </location>
<code_context>
         )


[email protected]("allow_create_project", [False])
[email protected]("is_cli", [True])
 @skip_if_not_sqlite
-def test_save_create_project_not_allowed(test_session, allow_create_project):
</code_context>

<issue_to_address>
Test for project creation not allowed updated to use 'is_cli'.

Please add a test for when 'is_cli' is False to ensure both allowed and not allowed cases are covered.

Suggested implementation:

```python
@pytest.mark.parametrize("is_cli", [True, False])
@skip_if_not_sqlite
def test_save_create_project_not_allowed(test_session, is_cli):
    if is_cli:
        with pytest.raises(ProjectCreateNotAllowedError):
            dc.read_values(fib=[1, 1, 2, 3, 5, 8], session=test_session).save(
                "dev.numbers.fibonacci"
            )
    else:
        # Should succeed when project creation is allowed
        result = dc.read_values(fib=[1, 1, 2, 3, 5, 8], session=test_session).save(
            "dev.numbers.fibonacci"
        )
        assert result is not None

```

- Ensure that the `dc` object and the `save` method are correctly set up to respect the `is_cli` parameter in your actual implementation.
- Adjust the assertion for the allowed case (`is_cli=False`) if there is a more specific expected result than just `result is not None`.
</issue_to_address>

### Comment 10
<location> `tests/unit/lib/test_datachain.py:3226` </location>
<code_context>


 @pytest.mark.parametrize("force", (True, False))
[email protected]("is_cli", (True,))
 @skip_if_not_sqlite
 def test_delete_dataset_from_studio(test_session, studio_token, requests_mock, force):
</code_context>

<issue_to_address>
Studio dataset deletion tests parameterized with 'is_cli'.

Please add tests for 'is_cli=False' to cover the non-Studio deletion path as well.

Suggested implementation:

```python
@pytest.mark.parametrize("force", (True, False))
@pytest.mark.parametrize("is_cli", (True, False))
@skip_if_not_sqlite
def test_delete_dataset_from_studio(test_session, studio_token, requests_mock, force):

```

```python
@pytest.mark.parametrize("is_cli", (True, False))
@skip_if_not_sqlite
def test_delete_dataset_from_studio_not_found(
    test_session, studio_token, requests_mock

```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/datachain/lib/namespaces.py

sourcery-ai · 2025-07-08T20:52:11Z

src/datachain/lib/projects.py

@@ -32,7 +32,7 @@ def create(
    """
    session = Session.get(session)

-    if not session.catalog.metastore.project_allowed_to_create:


issue (bug_risk): Project creation restriction logic may be reversed.

The updated condition blocks project creation in CLI mode, which may not be intended. Confirm that this matches the desired permission logic for CLI and Studio environments.

src/datachain/lib/dc/datasets.py

sourcery-ai · 2025-07-08T20:52:12Z

src/datachain/catalog/catalog.py

@@ -527,6 +527,7 @@ def __init__(
            Callable[["AbstractWarehouse"], None]
        ] = None,
        in_memory: bool = False,
+        is_cli: Optional[bool] = True,


suggestion (bug_risk): Defaulting is_cli to True may not always reflect the actual environment.

This could cause incorrect behavior if Catalog is used outside a CLI context. Recommend setting is_cli explicitly where Catalog is instantiated or inferring it from the metastore type.

Suggested implementation:

in_memory: bool = False, is_cli: Optional[bool] = None, ): datachain_dir = DataChainDir(cache=cache_dir, tmp=tmp_dir) datachain_dir.init() } self._warehouse_ready_callback = warehouse_ready_callback self.in_memory = in_memory # Infer is_cli from metastore type if not explicitly provided if is_cli is not None: self._is_cli = is_cli else: # Example inference: if metastore is a CLI metastore, set True, else False # You may need to adjust this logic based on your actual metastore types self._is_cli = hasattr(self.metastore, "is_cli") and getattr(self.metastore, "is_cli", False)

You will need to ensure that all instantiations of this class in your codebase are updated to explicitly pass is_cli if inference is not desired or not possible.

Adjust the inference logic as needed to match your actual metastore implementation and types.

sourcery-ai · 2025-07-08T20:52:12Z

src/datachain/catalog/loader.py

+    from datachain.data_storage.sqlite import SQLiteMetastore

+    metastore = get_metastore(in_memory=in_memory)
    return Catalog(
-        metastore=get_metastore(in_memory=in_memory),
+        metastore=metastore,
        warehouse=get_warehouse(in_memory=in_memory),
        client_config=client_config,
        in_memory=in_memory,
+        is_cli=isinstance(metastore, SQLiteMetastore),


suggestion: CLI mode is inferred from metastore type, which may not be robust.

Relying on isinstance for CLI mode detection may break if new metastore types are added or if SQLiteMetastore is used outside CLI. Please use a more explicit check for CLI mode.

Suggested implementation:

metastore = get_metastore(in_memory=in_memory) # Explicitly determine CLI mode, e.g., via a function argument or environment variable is_cli = bool(os.environ.get("DATACHAIN_CLI_MODE", False)) return Catalog( metastore=metastore, warehouse=get_warehouse(in_memory=in_memory), client_config=client_config, in_memory=in_memory, is_cli=is_cli, )

You will need to import os at the top of the file if it is not already imported.

If CLI mode should be determined differently (e.g., via a function argument), adjust the assignment to is_cli accordingly and update the function signature and all call sites.

sourcery-ai · 2025-07-08T20:52:12Z

tests/unit/lib/test_namespace.py

 @skip_if_not_sqlite
-def test_create_by_user_not_allowed(test_session, allow_create_namespace):


suggestion (testing): Test for namespace creation denial is preserved and updated.

Consider adding a test for when 'is_cli' is False to verify that namespace creation is permitted in that case.

sourcery-ai · 2025-07-08T20:52:12Z

tests/unit/lib/test_project.py

@@ -62,7 +62,7 @@ def test_invalid_name(test_session, name):


 @skip_if_not_sqlite
-@pytest.mark.parametrize("allow_create_project", [False])


suggestion (testing): Test for project creation denial updated to use 'is_cli'.

Please also add a test for when 'is_cli' is False to confirm project creation is allowed in that scenario.

sourcery-ai · 2025-07-08T20:52:12Z

tests/unit/lib/test_datachain.py

@@ -3588,9 +3585,9 @@ def _full_name(namespace, project, name) -> str:
        )


suggestion (testing): Test for project creation not allowed updated to use 'is_cli'.

Please add a test for when 'is_cli' is False to ensure both allowed and not allowed cases are covered.

Suggested implementation:

@pytest.mark.parametrize("is_cli", [True, False]) @skip_if_not_sqlite def test_save_create_project_not_allowed(test_session, is_cli): if is_cli: with pytest.raises(ProjectCreateNotAllowedError): dc.read_values(fib=[1, 1, 2, 3, 5, 8], session=test_session).save( "dev.numbers.fibonacci" ) else: # Should succeed when project creation is allowed result = dc.read_values(fib=[1, 1, 2, 3, 5, 8], session=test_session).save( "dev.numbers.fibonacci" ) assert result is not None

Ensure that the dc object and the save method are correctly set up to respect the is_cli parameter in your actual implementation.

Adjust the assertion for the allowed case (is_cli=False) if there is a more specific expected result than just result is not None.

tests/unit/lib/test_datachain.py

sourcery-ai · 2025-07-08T20:52:12Z

src/datachain/cli/commands/datasets.py

+        token = Config().read().get("studio", {}).get("token")
+        if not token:
+            raise DataChainError(
+                "Not logged in to Studio. Log in with 'datachain auth login'."
+            )


issue (code-quality): We've found these issues:

Use named expression to simplify assignment and conditional (use-named-expression)

Lift code into else after jump in control flow (reintroduce-else)

Swap if/else branches (swap-if-else-branches)

cloudflare-workers-and-pages · 2025-07-09T13:35:48Z

Deploying datachain-documentation with Cloudflare Pages

Latest commit:	`87715c4`
Status:	✅ Deploy successful!
Preview URL:	https://dd3b8263.datachain-documentation.pages.dev
Branch Preview URL:	https://ilongin-1208-simplify-permis.datachain-documentation.pages.dev

View logs

codecov · 2025-07-09T13:40:12Z

Codecov Report

Attention: Patch coverage is 89.47368% with 2 lines in your changes missing coverage. Please review.

Project coverage is 88.73%. Comparing base (5bd9d5f) to head (87715c4).

Files with missing lines	Patch %	Lines
src/datachain/cli/commands/datasets.py	75.00%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1214      +/-   ##
==========================================
- Coverage   88.74%   88.73%   -0.01%     
==========================================
  Files         153      153              
  Lines       13848    13838      -10     
  Branches     1938     1938              
==========================================
- Hits        12289    12279      -10     
  Misses       1103     1103              
  Partials      456      456

Flag	Coverage Δ
datachain	`88.66% <89.47%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/datachain/catalog/catalog.py	`86.11% <100.00%> (+0.08%)`	⬆️
src/datachain/catalog/loader.py	`75.00% <100.00%> (+0.31%)`	⬆️
src/datachain/data_storage/metastore.py	`93.69% <ø> (-0.12%)`	⬇️
src/datachain/data_storage/sqlite.py	`85.64% <ø> (-0.11%)`	⬇️
src/datachain/lib/dc/datachain.py	`91.40% <ø> (ø)`
src/datachain/lib/dc/datasets.py	`95.12% <100.00%> (ø)`
src/datachain/lib/namespaces.py	`100.00% <100.00%> (ø)`
src/datachain/lib/projects.py	`100.00% <100.00%> (ø)`
src/datachain/cli/commands/datasets.py	`70.37% <75.00%> (-0.72%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…creating-namespace

sourcery-ai

Hey @ilongin - I've reviewed your changes - here's some feedback:

Consider defaulting Catalog.is_cli to False (and overriding it only for CLI contexts in the loader) so that non-SQLite/metastore use-cases aren’t erroneously treated as CLI by default.
The repeated pattern of checking for a studio token and raising a DataChainError in CLI commands could be extracted into a helper to reduce duplication and improve readability.
Add a docstring or brief comment on the new is_cli property in Catalog to clearly document its intended semantics (CLI vs Studio) for future maintainers.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Consider defaulting Catalog.is_cli to False (and overriding it only for CLI contexts in the loader) so that non-SQLite/metastore use-cases aren’t erroneously treated as CLI by default.
- The repeated pattern of checking for a studio token and raising a DataChainError in CLI commands could be extracted into a helper to reduce duplication and improve readability.
- Add a docstring or brief comment on the new is_cli property in Catalog to clearly document its intended semantics (CLI vs Studio) for future maintainers.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

shcheklein · 2025-07-14T19:01:56Z

src/datachain/catalog/catalog.py

@@ -527,6 +527,7 @@ def __init__(
            Callable[["AbstractWarehouse"], None]
        ] = None,
        in_memory: bool = False,
+        is_cli: Optional[bool] = True,


just to double check is it really optional?

so, it is really optional?

let's also come up with a better name please, is_cli is confusing ... you can do is_studio for example, False by default.

shcheklein · 2025-07-14T19:06:00Z

src/datachain/catalog/catalog.py

@@ -1111,7 +1117,12 @@ def get_dataset_with_remote_fallback(
        if version:
            update = False

-        if self.metastore.is_local_dataset(namespace_name) or not update:
+        # local dataset is the one that is in Studio or in CLI but has default namespace
+        is_local = (


I'm not sure I understand this condition - how can we make it simpler, easier to read? (I'm also not sure I understand the comment above)

Before namespaces / projects we didn't have any kind of check here and in Studio we would still try to fetch from remote (Studio) if missing locally, which was wrong (it would fail with strange error like missing token).
Now we can determine if dataset we are fetching is from it's own (local) DB or can be fetched from remote / Studio if missing locally.

We never fallback to Studio if:

Script is already ran in Studio

Dataset starts with local.local.*

I agree this whole function is way to complex and confusing and I will try to refactor it.

I've added better comment for that variable and changed name. I think it should be clear now.

shcheklein · 2025-07-14T19:07:11Z

src/datachain/catalog/loader.py

        warehouse=get_warehouse(in_memory=in_memory),
        client_config=client_config,
        in_memory=in_memory,
+        is_cli=isinstance(metastore, SQLiteMetastore),


there should be a better (explicit way) to set this up ... also, it means not only CLI but local Python, right? Why is it called CLI then?

We usually call CLI for anything other than Studio so that's why I called it that way. It can be renamed.
I was weighing between what you mentioned, only set this explicitly, or have implicit (if explicit flag doesn't exist) determination as it's implemented now. If you have strong opinion only explicit is better I can do that way.

Removed implicit logic and leaved only explicit arg

shcheklein · 2025-07-14T19:08:20Z

src/datachain/cli/commands/datasets.py

@@ -164,21 +167,22 @@ def edit_dataset(
 ):
    namespace_name, project_name, name = catalog.get_full_dataset_name(name)

-    if catalog.metastore.is_local_dataset(namespace_name):
+    if catalog.is_cli and namespace_name != catalog.metastore.default_namespace_name:


also, not sure I understand (too complicated) .. what exactly we are detecting here?

If someone does this: datachain ds edit dev.my-project.cats --new-name "dogs" this means it wants to edit Studio dataset and not local one. What's extra here, and should be removed is catalog.is_cli as by default this is called in CLI

Removed catalog.is_cli condition

…creating-namespace

shcheklein · 2025-07-17T19:19:20Z

src/datachain/cli/commands/datasets.py

+                "Not logged in to Studio. Log in with 'datachain auth login'."
+            )
+    else:
+        # if catalog.metastore.is_local_dataset(namespace_name):


shcheklein · 2025-07-17T19:21:04Z

src/datachain/cli/commands/datasets.py

@@ -164,21 +167,22 @@ def edit_dataset(
 ):
    namespace_name, project_name, name = catalog.get_full_dataset_name(name)

-    if catalog.metastore.is_local_dataset(namespace_name):
+    if namespace_name != catalog.metastore.default_namespace_name:


let's introduce a descriptive var - studio_dataset = ...

to make condisiton descriptive:

if studio_dataset:
....

(tbh still don't a lot this very non obvious way to detect it by analyzing namespaces, comparing it with default - one needs to know a lot about namespaces to understand this code and why it is correct - it is not obvious)

fixing tests and logic

f22f6fb

ilongin marked this pull request as draft July 8, 2025 20:51

sourcery-ai bot reviewed Jul 8, 2025

View reviewed changes

fixing tests

7f8a2a9

ilongin added 4 commits July 9, 2025 16:17

Merge branch 'main' into ilongin/1208-simplify-permission-checks-for-…

b66380c

…creating-namespace

fixing move dataset

8fe0f6e

Merge branch 'main' into ilongin/1208-simplify-permission-checks-for-…

dd75c29

…creating-namespace

refactoring

312c677

ilongin marked this pull request as ready for review July 12, 2025 22:58

ilongin requested review from dreadatour, shcheklein and amritghimire July 12, 2025 22:58

sourcery-ai bot reviewed Jul 12, 2025

View reviewed changes

ilongin mentioned this pull request Jul 14, 2025

Namespaces followup #1236

Closed

2 tasks

shcheklein reviewed Jul 14, 2025

View reviewed changes

ilongin added 3 commits July 16, 2025 17:26

merging with main

d812b59

refactoring

3cd905f

Merge branch 'main' into ilongin/1208-simplify-permission-checks-for-…

87715c4

…creating-namespace

ilongin requested a review from shcheklein July 17, 2025 13:11

shcheklein reviewed Jul 17, 2025

View reviewed changes

		@skip_if_not_sqlite
		def test_create_by_user_not_allowed(test_session, allow_create_namespace):

		@@ -62,7 +62,7 @@ def test_invalid_name(test_session, name):


		@skip_if_not_sqlite
		@pytest.mark.parametrize("allow_create_project", [False])

		@@ -3588,9 +3585,9 @@ def _full_name(namespace, project, name) -> str:
		)

Simplify permission checks for creating namespaces #1214

Are you sure you want to change the base?

Simplify permission checks for creating namespaces #1214

Uh oh!

Conversation

ilongin commented Jul 8, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for dataset removal with new is_cli logic

Class diagram for Catalog and Metastore permission refactor

Class diagram for loader and Catalog instantiation changes

Class diagram for namespace and project creation permission checks

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying datachain-documentation with Cloudflare Pages

Uh oh!

codecov bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilongin Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ilongin commented Jul 8, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jul 8, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Jul 9, 2025 •

edited

Loading

codecov bot commented Jul 9, 2025 •

edited

Loading

ilongin Jul 16, 2025 •

edited

Loading