-
Notifications
You must be signed in to change notification settings - Fork 116
Add cli support to move, remove and copy file to storage using Studio #1221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
amritghimire
wants to merge
13
commits into
main
Choose a base branch
from
amrit/storage-cli
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
4befd98
Add cli support to move, remove and copy file to storage using Studio
amritghimire 9812fc9
Make things much simpler
amritghimire 4f9b6ae
Fix mypy
amritghimire 56a39b1
Address comments
amritghimire 022034b
Fix lint
amritghimire 24f3d15
Add tests
amritghimire c04e383
Merge branch 'main' into amrit/storage-cli
amritghimire 37a704a
Merge branch 'main' into amrit/storage-cli
amritghimire fb21a3d
Merge with top level cp
amritghimire 840b8b7
Update src/datachain/cli/__init__.py
amritghimire d822bd2
Update test
amritghimire 882d6b4
Storage cp test fix
amritghimire 2fe7cc3
Reword something
amritghimire File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
# cp | ||
|
||
Copy storage files and directories between cloud and local storage. | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: datachain cp [-h] [-v] [-q] [-r] [--team TEAM] | ||
[--local] [--anon] [--update] | ||
[--no-glob] [--force] | ||
source_path destination_path | ||
``` | ||
|
||
## Description | ||
|
||
This command copies files and directories between local and/or remote storage. The command can operate through Studio (default) or directly with local storage access. | ||
|
||
## Arguments | ||
|
||
* `source_path` - Path to the source file or directory to copy | ||
* `destination_path` - Path to the destination file or directory to copy to | ||
|
||
## Options | ||
|
||
* `-r`, `-R`, `--recursive` - Copy directories recursively | ||
* `--team TEAM` - Team name to copy storage contents to | ||
* `--local` - Copy data files from the cloud locally without Studio (Default: False) | ||
* `--anon` - Use anonymous access to storage (available only with --local) | ||
* `--update` - Update cached list of files for the sources (available only with --local) | ||
* `--no-glob` - Do not expand globs (such as * or ?) (available only with --local) | ||
* `--force` - Force creating files even if they already exist (available only with --local) | ||
* `-h`, `--help` - Show the help message and exit | ||
* `-v`, `--verbose` - Be verbose | ||
* `-q`, `--quiet` - Be quiet | ||
|
||
## Copy Operations | ||
|
||
The command supports two main modes of operation: | ||
|
||
### Studio Mode (Default) | ||
When using Studio mode (default), the command copies files and directories through Studio using the configured credentials. This mode automatically determines the operation type based on the source and destination protocols, supporting four different copy scenarios. | ||
|
||
### Local Mode | ||
When using `--local` flag, the command operates directly with local storage access, bypassing Studio. This mode supports additional options like `--anon`, `--update`, `--no-glob`, and `--force`. | ||
|
||
## Supported Storage Protocols | ||
|
||
The command supports the following storage protocols: | ||
- **Local file system**: Direct paths (e.g., `/path/to/directory` or `./relative/path`) | ||
- **AWS S3**: `s3://bucket-name/path` | ||
- **Google Cloud Storage**: `gs://bucket-name/path` | ||
- **Azure Blob Storage**: `az://container-name/path` | ||
|
||
## Examples | ||
|
||
### Studio Mode Examples | ||
|
||
The command automatically determines the operation type based on the source and destination protocols: | ||
|
||
#### 1. Local to Local (local path → local path) | ||
**Operation**: Direct local file system copy | ||
- Uses the local filesystem's native copy operation | ||
- Fastest operation as no network transfer is involved | ||
- Supports both files and directories | ||
|
||
```bash | ||
datachain cp /path/to/local/file.txt /path/to/destination/file.txt | ||
``` | ||
|
||
#### 2. Local to Remote (local path → `s3://`, `gs://`, `az://`) | ||
**Operation**: Upload to cloud storage | ||
- Uploads local files/directories to remote storage | ||
- Uses presigned URLs for secure uploads | ||
- Supports S3 multipart form data for large files | ||
- Requires `--recursive` flag for directories | ||
|
||
```bash | ||
# Upload single file | ||
datachain cp /path/to/file.txt s3://my-bucket/data/file.txt | ||
|
||
# Upload directory recursively | ||
datachain cp -r /path/to/directory s3://my-bucket/data/ | ||
``` | ||
|
||
#### 3. Remote to Local (`s3://`, `gs://`, `az://` → local path) | ||
**Operation**: Download from cloud storage | ||
- Downloads remote files/directories to local storage | ||
- Uses presigned download URLs | ||
- Automatically extracts filename if destination is a directory | ||
- Creates destination directory if it doesn't exist | ||
|
||
```bash | ||
# Download single file | ||
datachain cp s3://my-bucket/data/file.txt /path/to/local/file.txt | ||
|
||
# Download to directory (filename preserved) | ||
datachain cp s3://my-bucket/data/file.txt /path/to/directory/ | ||
``` | ||
|
||
#### 4. Remote to Remote (`s3://` → `s3://`, `gs://` → `gs://`, etc.) | ||
**Operation**: Copy within cloud storage | ||
- Copies files between locations in the same bucket | ||
- Cannot copy between different buckets (same limitation as `mv`) | ||
- Uses Studio's internal copy operation | ||
- Requires `--recursive` flag for directories | ||
|
||
```bash | ||
# Copy within same bucket | ||
datachain cp s3://my-bucket/data/file.txt s3://my-bucket/archive/file.txt | ||
|
||
# Copy directory recursively | ||
datachain cp -r s3://my-bucket/data/images s3://my-bucket/backup/images | ||
``` | ||
|
||
### Additional Studio Mode Examples | ||
|
||
1. Copy with specific team: | ||
```bash | ||
datachain cp --team other-team /path/to/file.txt s3://my-bucket/data/file.txt | ||
``` | ||
|
||
2. Copy with verbose output: | ||
```bash | ||
datachain cp -v -r s3://my-bucket/datasets/raw s3://my-bucket/datasets/processed | ||
``` | ||
|
||
### Local Mode Examples | ||
|
||
3. Copy files locally without Studio: | ||
```bash | ||
datachain cp --local /path/to/source /path/to/destination | ||
``` | ||
|
||
4. Copy with anonymous access: | ||
```bash | ||
datachain cp --local --anon s3://public-bucket/data /path/to/local/ | ||
``` | ||
|
||
5. Copy with force overwrite: | ||
```bash | ||
datachain cp --local --force s3://my-bucket/data /path/to/local/ | ||
``` | ||
|
||
6. Copy with update and no glob expansion: | ||
```bash | ||
datachain cp --local --update --no-glob s3://my-bucket/data/*.txt /path/to/local/ | ||
``` | ||
|
||
## Limitations | ||
- **Cannot copy between different buckets**: Remote-to-remote copies must be within the same bucket | ||
|
||
## Notes | ||
* When using Studio mode, you must be authenticated with `datachain auth login` before using it | ||
* The `--local` mode bypasses Studio and operates directly with storage providers |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# mv | ||
|
||
Move storage files and directories through Studio. | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: datachain mv [-h] [-v] [-q] [--recursive] [--team TEAM] path new_path | ||
``` | ||
|
||
## Description | ||
|
||
This command moves files and directories within storage using the credentials configured in Studio. The move operation is performed within the same bucket - you cannot move files between different buckets. The command supports both individual files and directories, with the `--recursive` flag required for moving directories. | ||
|
||
## Arguments | ||
|
||
* `path` - Path to the storage file or directory to move | ||
* `new_path` - New path where the file or directory should be moved to | ||
|
||
## Options | ||
|
||
* `--recursive` - Move directories recursively (required for moving directories) | ||
* `--team TEAM` - Team name to move storage contents from (default: from config) | ||
* `-h`, `--help` - Show the help message and exit | ||
* `-v`, `--verbose` - Be verbose | ||
* `-q`, `--quiet` - Be quiet | ||
|
||
## Examples | ||
|
||
1. Move a single file: | ||
```bash | ||
datachain mv s3://my-bucket/data/file.txt s3://my-bucket/archive/file.txt | ||
``` | ||
|
||
2. Move a directory recursively: | ||
```bash | ||
datachain mv --recursive s3://my-bucket/data/images s3://my-bucket/archive/images | ||
``` | ||
|
||
3. Move a file to a different team's storage: | ||
```bash | ||
datachain mv --team other-team s3://my-bucket/data/file.txt s3://my-bucket/backup/file.txt | ||
``` | ||
|
||
4. Move a file with verbose output: | ||
```bash | ||
datachain mv -v s3://my-bucket/data/file.txt s3://my-bucket/processed/file.txt | ||
``` | ||
|
||
5. Move a directory to a subdirectory: | ||
```bash | ||
datachain mv --recursive s3://my-bucket/datasets/raw s3://my-bucket/datasets/processed/raw | ||
``` | ||
|
||
## Supported Storage Protocols | ||
|
||
The command supports the following storage protocols: | ||
- **AWS S3**: `s3://bucket-name/path` | ||
- **Google Cloud Storage**: `gs://bucket-name/path` | ||
- **Azure Blob Storage**: `az://container-name/path` | ||
|
||
## Limitations and Edge Cases | ||
|
||
### Bucket Restrictions | ||
- **Cannot move between different buckets**: The source and destination must be in the same bucket. Attempting to move between different buckets will result in an error: "Cannot move between different buckets" | ||
|
||
### Directory Operations | ||
- **Recursive flag required**: Moving directories requires the `--recursive` flag. Without it, the operation will fail | ||
- **Directory structure preservation**: When moving directories, the internal structure is preserved | ||
|
||
|
||
### Error Handling | ||
- **File not found**: If the source file or directory doesn't exist, the operation will fail | ||
- **Permission errors**: Insufficient permissions will result in operation failure | ||
- **Storage service errors**: Network issues or storage service problems will be reported with appropriate error messages | ||
|
||
### Team Configuration | ||
- **Default team**: If no team is specified, the command uses the team from your configuration | ||
- **Team-specific storage**: Each team has its own storage namespace, so moving between teams is not supported | ||
|
||
## Notes | ||
|
||
* Moving large directories may take time depending on the number of files and network conditions | ||
* Use the `--verbose` flag to get detailed information about the move operation | ||
* The `--quiet` flag suppresses output except for errors | ||
* This command operates through Studio, so you must be authenticated with `datachain auth login` before using it |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# rm | ||
|
||
Delete storage files and directories through Studio. | ||
|
||
## Synopsis | ||
|
||
```usage | ||
usage: datachain rm [-h] [-v] [-q] [--recursive] [--team TEAM] path | ||
``` | ||
|
||
## Description | ||
|
||
This command deletes files and directories within storage using the credentials configured in Studio. The command supports both individual files and directories, with the `--recursive` flag required for deleting directories. This is a destructive operation that permanently removes files and cannot be undone. | ||
|
||
## Arguments | ||
|
||
* `path` - Path to the storage file or directory to delete | ||
|
||
## Options | ||
|
||
* `--recursive` - Delete directories recursively (required for deleting directories) | ||
* `--team TEAM` - Team name to delete storage contents from (default: from config) | ||
* `-h`, `--help` - Show the help message and exit | ||
* `-v`, `--verbose` - Be verbose | ||
* `-q`, `--quiet` - Be quiet | ||
|
||
## Examples | ||
|
||
1. Delete a single file: | ||
```bash | ||
datachain rm s3://my-bucket/data/file.txt | ||
``` | ||
|
||
2. Delete a directory recursively: | ||
```bash | ||
datachain rm --recursive s3://my-bucket/data/images | ||
``` | ||
|
||
3. Delete a file from a different team's storage: | ||
```bash | ||
datachain rm --team other-team s3://my-bucket/data/file.txt | ||
``` | ||
|
||
4. Delete a file with verbose output: | ||
```bash | ||
datachain rm -v s3://my-bucket/data/file.txt | ||
``` | ||
|
||
5. Delete a directory quietly (suppress output): | ||
```bash | ||
datachain rm -q --recursive s3://my-bucket/temp-data | ||
``` | ||
|
||
6. Delete a specific subdirectory: | ||
```bash | ||
datachain rm --recursive s3://my-bucket/datasets/raw/old-version | ||
``` | ||
|
||
## Supported Storage Protocols | ||
|
||
The command supports the following storage protocols: | ||
- **AWS S3**: `s3://bucket-name/path` | ||
- **Google Cloud Storage**: `gs://bucket-name/path` | ||
- **Azure Blob Storage**: `az://container-name/path` | ||
|
||
## Limitations and Edge Cases | ||
|
||
### Directory Operations | ||
- **Recursive flag required**: Deleting directories requires the `--recursive` flag. Without it, the operation will fail | ||
- **Directory structure**: When deleting directories, all files and subdirectories within the directory are removed | ||
|
||
### Error Handling | ||
- **File not found**: If the source file or directory doesn't exist, the operation will fail | ||
- **Permission errors**: Insufficient permissions will result in operation failure | ||
- **Storage service errors**: Network issues or storage service problems will be reported with appropriate error messages | ||
- **Directory not empty**: Attempting to delete a non-empty directory without `--recursive` will fail | ||
|
||
### Team Configuration | ||
- **Default team**: If no team is specified, the command uses the team from your configuration | ||
- **Team-specific storage**: Each team has its own storage namespace, so deleting from other teams requires explicit team specification | ||
|
||
### Safety Considerations | ||
- **Permanent deletion**: This operation permanently removes files and cannot be undone | ||
- **Batch operations**: Large directories may contain many files and deletion may take time | ||
|
||
## Notes | ||
|
||
* The delete operation is performed through Studio using the configured credentials | ||
* Deleting large directories may take time depending on the number of files and network conditions | ||
* Use the `--verbose` flag to get detailed information about the delete operation | ||
* The `--quiet` flag suppresses output except for errors | ||
* This command operates through Studio, so you must be authenticated with `datachain auth login` before using it | ||
* **Warning**: This is a destructive operation. Always double-check the path before executing the command |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.