-
Notifications
You must be signed in to change notification settings - Fork 120
Http client #1331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Http client #1331
Conversation
Reviewer's GuideIntroduces HTTP/HTTPS support by registering new HTTPClient and HTTPSClient implementations using fsspec’s HTTPFileSystem, handling URL parsing, path resolution, and read-only operations, and verifies behavior with unit tests. File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/datachain/client/http.py:133` </location>
<code_context>
+ # Normal case: prepend protocol and name
+ base_url = f"{self.protocol}://{self.name}"
+
+ if rel_path:
+ # Ensure single slash between base and path
+ if not base_url.endswith("/") and not rel_path.startswith("/"):
+ base_url += "/"
+ full_url = base_url + rel_path
+ else:
+ full_url = base_url
</code_context>
<issue_to_address>
URL concatenation may result in double slashes or missing slashes.
Using string concatenation for URLs can lead to incorrect formatting. Prefer urllib.parse.urljoin for reliable URL construction.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
base_url = f"{self.protocol}://{self.name}"
if rel_path:
# Ensure single slash between base and path
if not base_url.endswith("/") and not rel_path.startswith("/"):
base_url += "/"
full_url = base_url + rel_path
else:
full_url = base_url
return full_url
=======
base_url = f"{self.protocol}://{self.name}"
if rel_path:
# Use urljoin for reliable URL construction
from urllib.parse import urljoin
full_url = urljoin(base_url + "/", rel_path)
else:
full_url = base_url
return full_url
>>>>>>> REPLACE
</suggested_fix>
### Comment 2
<location> `src/datachain/client/http.py:212` </location>
<code_context>
+ if use_cache and (cache_path := self.cache.get_path(file)):
+ return open(cache_path, mode="rb")
+
+ assert not file.location
+ # Don't pass version to fs.open for HTTP
+ return FileWrapper(
</code_context>
<issue_to_address>
The assertion on file.location may be too strict and could cause runtime errors.
If file.location is set, this assertion will raise an exception. Consider handling this scenario more gracefully or providing a clearer error message.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
assert not file.location
# Don't pass version to fs.open for HTTP
return FileWrapper(
self.fs.open(self.get_full_path(file.get_path_normalized())),
cb or (lambda x: None),
)
=======
if file.location:
raise ValueError(
f"Cannot open file with location set: {file.location}. "
"HTTP/HTTPS files must not have a location specified."
)
# Don't pass version to fs.open for HTTP
return FileWrapper(
self.fs.open(self.get_full_path(file.get_path_normalized())),
cb or (lambda x: None),
)
>>>>>>> REPLACE
</suggested_fix>
### Comment 3
<location> `tests/unit/test_client_http.py:93` </location>
<code_context>
+ def test_get_full_path(self):
</code_context>
<issue_to_address>
Consider adding a test for malformed or invalid URLs in get_full_path.
Adding tests for malformed or invalid URLs will help verify that get_full_path handles errors correctly and fails gracefully.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
src/datachain/client/http.py
Outdated
base_url = f"{self.protocol}://{self.name}" | ||
|
||
if rel_path: | ||
# Ensure single slash between base and path | ||
if not base_url.endswith("/") and not rel_path.startswith("/"): | ||
base_url += "/" | ||
full_url = base_url + rel_path | ||
else: | ||
full_url = base_url | ||
|
||
return full_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: URL concatenation may result in double slashes or missing slashes.
Using string concatenation for URLs can lead to incorrect formatting. Prefer urllib.parse.urljoin for reliable URL construction.
base_url = f"{self.protocol}://{self.name}" | |
if rel_path: | |
# Ensure single slash between base and path | |
if not base_url.endswith("/") and not rel_path.startswith("/"): | |
base_url += "/" | |
full_url = base_url + rel_path | |
else: | |
full_url = base_url | |
return full_url | |
base_url = f"{self.protocol}://{self.name}" | |
if rel_path: | |
# Use urljoin for reliable URL construction | |
from urllib.parse import urljoin | |
full_url = urljoin(base_url + "/", rel_path) | |
else: | |
full_url = base_url | |
return full_url |
src/datachain/client/http.py
Outdated
assert not file.location | ||
# Don't pass version to fs.open for HTTP | ||
return FileWrapper( | ||
self.fs.open(self.get_full_path(file.get_path_normalized())), | ||
cb or (lambda x: None), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): The assertion on file.location may be too strict and could cause runtime errors.
If file.location is set, this assertion will raise an exception. Consider handling this scenario more gracefully or providing a clearer error message.
assert not file.location | |
# Don't pass version to fs.open for HTTP | |
return FileWrapper( | |
self.fs.open(self.get_full_path(file.get_path_normalized())), | |
cb or (lambda x: None), | |
) | |
if file.location: | |
raise ValueError( | |
f"Cannot open file with location set: {file.location}. " | |
"HTTP/HTTPS files must not have a location specified." | |
) | |
# Don't pass version to fs.open for HTTP | |
return FileWrapper( | |
self.fs.open(self.get_full_path(file.get_path_normalized())), | |
cb or (lambda x: None), | |
) |
tests/unit/test_client_http.py
Outdated
def test_get_full_path(self): | ||
"""Test full path construction""" | ||
cache = Mock(spec=Cache) | ||
client = HTTPClient("example.com", {}, cache, protocol="https") | ||
|
||
assert client.get_full_path("file.txt") == "https://example.com/file.txt" | ||
assert client.get_full_path("path/to/file.txt") == "https://example.com/path/to/file.txt" | ||
assert client.get_full_path("") == "https://example.com" | ||
|
||
# Test case when path contains full domain (e.g., File with source="https://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding a test for malformed or invalid URLs in get_full_path.
Adding tests for malformed or invalid URLs will help verify that get_full_path handles errors correctly and fails gracefully.
src/datachain/client/http.py
Outdated
filename = parsed.path.split("/")[-1] if parsed.path else "file" | ||
|
||
# Create file info for this single file | ||
file = self.info_to_file(info, prefix if prefix else filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace if-expression with or
(or-if-exp-identity
)
file = self.info_to_file(info, prefix if prefix else filename) | |
file = self.info_to_file(info, prefix or filename) |
Explanation
Here we find ourselves setting a value if it evaluates toTrue
, and otherwiseusing a default.
The 'After' case is a bit easier to read and avoids the duplication of
input_currency
.
It works because the left-hand side is evaluated first. If it evaluates to
true then currency
will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
currency
will be set to DEFAULT_CURRENCY
.
src/datachain/client/http.py
Outdated
Construct full HTTP/HTTPS URL from relative path. | ||
Note: version_id is ignored as HTTP doesn't support versioning. | ||
""" | ||
if version_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Remove redundant conditional (
remove-redundant-if
) - Lift return into if (
lift-return-into-if
)
This uses the HTTPFileSystem's HTML parsing capabilities. | ||
""" | ||
full_url = self.get_full_path(prefix) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Remove redundant exceptions from an except clause [×2] (
remove-redundant-exception
) - Replace length-one exception tuple with exception [×2] (
simplify-single-exception-tuple
)
for more information, see https://pre-commit.ci
Deploying datachain-documentation with
|
Latest commit: |
7fa3d28
|
Status: | ✅ Deploy successful! |
Preview URL: | https://e86d3d94.datachain-documentation.pages.dev |
Branch Preview URL: | https://http-client.datachain-documentation.pages.dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1331 +/- ##
==========================================
- Coverage 87.56% 87.37% -0.20%
==========================================
Files 156 157 +1
Lines 14493 14590 +97
Branches 2081 2096 +15
==========================================
+ Hits 12691 12748 +57
- Misses 1331 1362 +31
- Partials 471 480 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/datachain/client/http.py:26` </location>
<code_context>
+ fs.invalidate_cache()
+ return cast("HTTPFileSystem", fs)
+
+ @classmethod
+ def from_name(
+ cls,
+ name: str,
+ cache: "Cache",
+ kwargs: dict[str, Any],
+ ) -> "HTTPClient":
+ # Determine protocol from the name if it includes it
+ parsed = urlparse(name)
+
+ # Extract just the host/path part without protocol
+ if parsed.scheme:
+ name = parsed.netloc + parsed.path
+
+ return cls(name, kwargs, cache)
+
+ @classmethod
</code_context>
<issue_to_address>
Parsing logic in from_name may lead to unexpected results for URLs with query/fragment.
Currently, query and fragment components are omitted when constructing the name. Please confirm if this is intentional, or update the logic to include these components if needed.
</issue_to_address>
### Comment 2
<location> `src/datachain/client/http.py:93` </location>
<code_context>
+ def get_full_path(self, rel_path: str, version_id: Optional[str] = None) -> str:
</code_context>
<issue_to_address>
Heuristic for domain detection in get_full_path may misclassify some relative paths.
The current logic may incorrectly treat valid file names containing a dot as domains. Please refine the detection or clarify the expected input format.
</issue_to_address>
### Comment 3
<location> `src/datachain/client/http.py:133` </location>
<code_context>
+ def info_to_file(self, v: dict[str, Any], path: str) -> File:
</code_context>
<issue_to_address>
Fallback to current time for last_modified may mask upstream issues.
Consider logging a warning or surfacing the error when last_modified parsing fails, rather than silently defaulting to the current time.
Suggested implementation:
```python
def info_to_file(self, v: dict[str, Any], path: str) -> File:
"""
Convert HTTP file info to DataChain File object.
"""
import logging
logger = logging.getLogger(__name__)
# Extract ETag if available (remove quotes if present)
etag = v.get("ETag", "").strip('"')
# Parse last modified time
last_modified = v.get("last_modified")
if last_modified:
if isinstance(last_modified, str):
```
```python
if isinstance(last_modified, str):
try:
# Try parsing RFC 1123 format
from email.utils import parsedate_to_datetime
last_modified_dt = parsedate_to_datetime(last_modified)
last_modified = last_modified_dt.timestamp()
except Exception as e:
logger.warning(
"Failed to parse last_modified '%s' for path '%s': %s. Falling back to current time.",
last_modified, path, e
)
import time
last_modified = time.time()
```
</issue_to_address>
### Comment 4
<location> `src/datachain/client/http.py:209` </location>
<code_context>
+ # Don't pass version_id to HTTP filesystem
+ return await self.fs._get_file(lpath, rpath, callback=callback)
+
+ async def _fetch_dir(self, prefix: str, pbar, result_queue) -> set[str]:
+ """
+ Override to reject directory listing for HTTP/HTTPS.
+ HTTP doesn't support directory listing in a standard way.
+ """
+ full_url = self.get_full_path(prefix)
+
+ # HTTP/HTTPS doesn't support directory listing - always raise error
+ raise NotImplementedError(f"Cannot download file from {full_url}")
+
+
</code_context>
<issue_to_address>
Error message in _fetch_dir may be misleading about operation type.
Update the error message to specify that directory listing is not supported for HTTP/HTTPS, rather than referring to file download.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
# HTTP/HTTPS doesn't support directory listing - always raise error
raise NotImplementedError(f"Cannot download file from {full_url}")
=======
# HTTP/HTTPS doesn't support directory listing - always raise error
raise NotImplementedError(f"Directory listing is not supported for HTTP/HTTPS at {full_url}")
>>>>>>> REPLACE
</suggested_fix>
### Comment 5
<location> `tests/unit/test_client_http.py:18` </location>
<code_context>
+ assert client_class == HTTPClient
+
+
+def test_split_url_https():
+ """Test URL splitting for HTTPS URLs"""
+ domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt")
+ assert domain == "example.com"
+ assert path == "path/to/file.txt"
+
+
</code_context>
<issue_to_address>
Missing test for split_url with query and fragment components.
Add tests for URLs containing query parameters and fragments to verify correct handling by split_url.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def test_split_url_https():
"""Test URL splitting for HTTPS URLs"""
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt")
assert domain == "example.com"
assert path == "path/to/file.txt"
=======
def test_split_url_https():
"""Test URL splitting for HTTPS URLs"""
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt")
assert domain == "example.com"
assert path == "path/to/file.txt"
def test_split_url_with_query():
"""Test URL splitting for URLs with query parameters"""
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt?foo=bar&baz=qux")
assert domain == "example.com"
assert path == "path/to/file.txt"
def test_split_url_with_fragment():
"""Test URL splitting for URLs with fragment components"""
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt#section1")
assert domain == "example.com"
assert path == "path/to/file.txt"
def test_split_url_with_query_and_fragment():
"""Test URL splitting for URLs with both query and fragment components"""
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt?foo=bar#section1")
assert domain == "example.com"
assert path == "path/to/file.txt"
>>>>>>> REPLACE
</suggested_fix>
### Comment 6
<location> `tests/unit/test_client_http.py:25` </location>
<code_context>
+ assert path == "path/to/file.txt"
+
+
+def test_is_root_url():
+ """Test root URL detection"""
+ assert HTTPClient.is_root_url("https://example.com")
+ assert HTTPClient.is_root_url("https://example.com/")
+ assert not HTTPClient.is_root_url("https://example.com/path")
+ assert not HTTPClient.is_root_url("https://example.com?query=1")
+ assert not HTTPClient.is_root_url("https://example.com#fragment")
+
+
</code_context>
<issue_to_address>
Missing test for is_root_url with HTTP (not HTTPS) URLs.
Please add tests for 'http://' URLs to ensure is_root_url works correctly for both HTTP and HTTPS.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def test_is_root_url():
"""Test root URL detection"""
assert HTTPClient.is_root_url("https://example.com")
assert HTTPClient.is_root_url("https://example.com/")
assert not HTTPClient.is_root_url("https://example.com/path")
assert not HTTPClient.is_root_url("https://example.com?query=1")
assert not HTTPClient.is_root_url("https://example.com#fragment")
=======
def test_is_root_url():
"""Test root URL detection"""
# HTTPS URLs
assert HTTPClient.is_root_url("https://example.com")
assert HTTPClient.is_root_url("https://example.com/")
assert not HTTPClient.is_root_url("https://example.com/path")
assert not HTTPClient.is_root_url("https://example.com?query=1")
assert not HTTPClient.is_root_url("https://example.com#fragment")
# HTTP URLs
assert HTTPClient.is_root_url("http://example.com")
assert HTTPClient.is_root_url("http://example.com/")
assert not HTTPClient.is_root_url("http://example.com/path")
assert not HTTPClient.is_root_url("http://example.com?query=1")
assert not HTTPClient.is_root_url("http://example.com#fragment")
>>>>>>> REPLACE
</suggested_fix>
### Comment 7
<location> `tests/unit/test_client_http.py:34` </location>
<code_context>
+ assert not HTTPClient.is_root_url("https://example.com#fragment")
+
+
+def test_from_name_with_https():
+ """Test creating client from HTTPS URL"""
+ cache = Mock(spec=Cache)
+ client = HTTPSClient.from_name("https://example.com/path", cache, {})
+ assert client.protocol == "https"
+ assert client.name == "example.com/path"
+ assert client.PREFIX == "https://"
+
+
</code_context>
<issue_to_address>
Missing test for from_name with URLs lacking protocol.
Add tests for from_name methods using URLs without a protocol (e.g., 'example.com/path') to confirm default protocol assignment.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def test_from_name_with_https():
"""Test creating client from HTTPS URL"""
cache = Mock(spec=Cache)
client = HTTPSClient.from_name("https://example.com/path", cache, {})
assert client.protocol == "https"
assert client.name == "example.com/path"
assert client.PREFIX == "https://"
=======
def test_from_name_with_https():
"""Test creating client from HTTPS URL"""
cache = Mock(spec=Cache)
client = HTTPSClient.from_name("https://example.com/path", cache, {})
assert client.protocol == "https"
assert client.name == "example.com/path"
assert client.PREFIX == "https://"
def test_from_name_without_protocol():
"""Test creating client from URL without protocol"""
cache = Mock(spec=Cache)
client = HTTPSClient.from_name("example.com/path", cache, {})
assert client.protocol == "https"
assert client.name == "example.com/path"
assert client.PREFIX == "https://"
>>>>>>> REPLACE
</suggested_fix>
### Comment 8
<location> `tests/unit/test_client_http.py:52` </location>
<code_context>
+ assert client.PREFIX == "http://"
+
+
+def test_get_full_path_http():
+ """Test full path construction for HTTP"""
+ cache = Mock(spec=Cache)
+ client = HTTPClient("example.com:8080", {}, cache)
+
+ assert client.get_full_path("file.txt") == "http://example.com:8080/file.txt"
+
+
</code_context>
<issue_to_address>
Missing test for get_full_path with empty rel_path and with rel_path containing slashes.
Please add tests for get_full_path with an empty rel_path and with rel_path values that start with or contain multiple slashes to verify correct URL handling.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
def test_get_full_path_http():
"""Test full path construction for HTTP"""
cache = Mock(spec=Cache)
client = HTTPClient("example.com:8080", {}, cache)
assert client.get_full_path("file.txt") == "http://example.com:8080/file.txt"
=======
def test_get_full_path_http():
"""Test full path construction for HTTP"""
cache = Mock(spec=Cache)
client = HTTPClient("example.com:8080", {}, cache)
# Basic file path
assert client.get_full_path("file.txt") == "http://example.com:8080/file.txt"
# Empty rel_path
assert client.get_full_path("") == "http://example.com:8080/"
# rel_path starts with a slash
assert client.get_full_path("/file.txt") == "http://example.com:8080/file.txt"
# rel_path contains multiple slashes
assert client.get_full_path("dir//file.txt") == "http://example.com:8080/dir//file.txt"
assert client.get_full_path("/dir//file.txt") == "http://example.com:8080/dir//file.txt"
>>>>>>> REPLACE
</suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
src/datachain/client/http.py
Outdated
@classmethod | ||
def create_fs(cls, **kwargs) -> HTTPFileSystem: | ||
# Configure HTTPFileSystem options | ||
kwargs.setdefault("simple_links", True) | ||
kwargs.setdefault("same_scheme", True) | ||
kwargs.setdefault("cache_type", "bytes") | ||
|
||
# HTTPFileSystem doesn't support version_aware, remove it if present | ||
kwargs.pop("version_aware", None) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Parsing logic in from_name may lead to unexpected results for URLs with query/fragment.
Currently, query and fragment components are omitted when constructing the name. Please confirm if this is intentional, or update the logic to include these components if needed.
src/datachain/client/http.py
Outdated
def get_full_path(self, rel_path: str, version_id: Optional[str] = None) -> str: | ||
""" | ||
Construct full HTTP/HTTPS URL from relative path. | ||
Note: version_id is ignored as HTTP doesn't support versioning. | ||
""" | ||
if version_id: | ||
# HTTP doesn't support versioning, ignore it silently | ||
pass | ||
|
||
if self.name.startswith(("http://", "https://")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Heuristic for domain detection in get_full_path may misclassify some relative paths.
The current logic may incorrectly treat valid file names containing a dot as domains. Please refine the detection or clarify the expected input format.
src/datachain/client/http.py
Outdated
def info_to_file(self, v: dict[str, Any], path: str) -> File: | ||
""" | ||
Convert HTTP file info to DataChain File object. | ||
""" | ||
# Extract ETag if available (remove quotes if present) | ||
etag = v.get("ETag", "").strip('"') | ||
|
||
# Parse last modified time | ||
last_modified = v.get("last_modified") | ||
if last_modified: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Fallback to current time for last_modified may mask upstream issues.
Consider logging a warning or surfacing the error when last_modified parsing fails, rather than silently defaulting to the current time.
Suggested implementation:
def info_to_file(self, v: dict[str, Any], path: str) -> File:
"""
Convert HTTP file info to DataChain File object.
"""
import logging
logger = logging.getLogger(__name__)
# Extract ETag if available (remove quotes if present)
etag = v.get("ETag", "").strip('"')
# Parse last modified time
last_modified = v.get("last_modified")
if last_modified:
if isinstance(last_modified, str):
if isinstance(last_modified, str):
try:
# Try parsing RFC 1123 format
from email.utils import parsedate_to_datetime
last_modified_dt = parsedate_to_datetime(last_modified)
last_modified = last_modified_dt.timestamp()
except Exception as e:
logger.warning(
"Failed to parse last_modified '%s' for path '%s': %s. Falling back to current time.",
last_modified, path, e
)
import time
last_modified = time.time()
src/datachain/client/http.py
Outdated
# HTTP/HTTPS doesn't support directory listing - always raise error | ||
raise NotImplementedError(f"Cannot download file from {full_url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Error message in _fetch_dir may be misleading about operation type.
Update the error message to specify that directory listing is not supported for HTTP/HTTPS, rather than referring to file download.
# HTTP/HTTPS doesn't support directory listing - always raise error | |
raise NotImplementedError(f"Cannot download file from {full_url}") | |
# HTTP/HTTPS doesn't support directory listing - always raise error | |
raise NotImplementedError(f"Directory listing is not supported for HTTP/HTTPS at {full_url}") |
tests/unit/test_client_http.py
Outdated
def test_split_url_https(): | ||
"""Test URL splitting for HTTPS URLs""" | ||
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt") | ||
assert domain == "example.com" | ||
assert path == "path/to/file.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing test for split_url with query and fragment components.
Add tests for URLs containing query parameters and fragments to verify correct handling by split_url.
def test_split_url_https(): | |
"""Test URL splitting for HTTPS URLs""" | |
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt") | |
assert domain == "example.com" | |
assert path == "path/to/file.txt" | |
def test_split_url_https(): | |
"""Test URL splitting for HTTPS URLs""" | |
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt") | |
assert domain == "example.com" | |
assert path == "path/to/file.txt" | |
def test_split_url_with_query(): | |
"""Test URL splitting for URLs with query parameters""" | |
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt?foo=bar&baz=qux") | |
assert domain == "example.com" | |
assert path == "path/to/file.txt" | |
def test_split_url_with_fragment(): | |
"""Test URL splitting for URLs with fragment components""" | |
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt#section1") | |
assert domain == "example.com" | |
assert path == "path/to/file.txt" | |
def test_split_url_with_query_and_fragment(): | |
"""Test URL splitting for URLs with both query and fragment components""" | |
domain, path = HTTPClient.split_url("https://example.com/path/to/file.txt?foo=bar#section1") | |
assert domain == "example.com" | |
assert path == "path/to/file.txt" |
tests/unit/test_client_http.py
Outdated
def test_is_root_url(): | ||
"""Test root URL detection""" | ||
assert HTTPClient.is_root_url("https://example.com") | ||
assert HTTPClient.is_root_url("https://example.com/") | ||
assert not HTTPClient.is_root_url("https://example.com/path") | ||
assert not HTTPClient.is_root_url("https://example.com?query=1") | ||
assert not HTTPClient.is_root_url("https://example.com#fragment") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing test for is_root_url with HTTP (not HTTPS) URLs.
Please add tests for 'http://' URLs to ensure is_root_url works correctly for both HTTP and HTTPS.
def test_is_root_url(): | |
"""Test root URL detection""" | |
assert HTTPClient.is_root_url("https://example.com") | |
assert HTTPClient.is_root_url("https://example.com/") | |
assert not HTTPClient.is_root_url("https://example.com/path") | |
assert not HTTPClient.is_root_url("https://example.com?query=1") | |
assert not HTTPClient.is_root_url("https://example.com#fragment") | |
def test_is_root_url(): | |
"""Test root URL detection""" | |
# HTTPS URLs | |
assert HTTPClient.is_root_url("https://example.com") | |
assert HTTPClient.is_root_url("https://example.com/") | |
assert not HTTPClient.is_root_url("https://example.com/path") | |
assert not HTTPClient.is_root_url("https://example.com?query=1") | |
assert not HTTPClient.is_root_url("https://example.com#fragment") | |
# HTTP URLs | |
assert HTTPClient.is_root_url("http://example.com") | |
assert HTTPClient.is_root_url("http://example.com/") | |
assert not HTTPClient.is_root_url("http://example.com/path") | |
assert not HTTPClient.is_root_url("http://example.com?query=1") | |
assert not HTTPClient.is_root_url("http://example.com#fragment") |
tests/unit/test_client_http.py
Outdated
def test_from_name_with_https(): | ||
"""Test creating client from HTTPS URL""" | ||
cache = Mock(spec=Cache) | ||
client = HTTPSClient.from_name("https://example.com/path", cache, {}) | ||
assert client.protocol == "https" | ||
assert client.name == "example.com/path" | ||
assert client.PREFIX == "https://" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing test for from_name with URLs lacking protocol.
Add tests for from_name methods using URLs without a protocol (e.g., 'example.com/path') to confirm default protocol assignment.
def test_from_name_with_https(): | |
"""Test creating client from HTTPS URL""" | |
cache = Mock(spec=Cache) | |
client = HTTPSClient.from_name("https://example.com/path", cache, {}) | |
assert client.protocol == "https" | |
assert client.name == "example.com/path" | |
assert client.PREFIX == "https://" | |
def test_from_name_with_https(): | |
"""Test creating client from HTTPS URL""" | |
cache = Mock(spec=Cache) | |
client = HTTPSClient.from_name("https://example.com/path", cache, {}) | |
assert client.protocol == "https" | |
assert client.name == "example.com/path" | |
assert client.PREFIX == "https://" | |
def test_from_name_without_protocol(): | |
"""Test creating client from URL without protocol""" | |
cache = Mock(spec=Cache) | |
client = HTTPSClient.from_name("example.com/path", cache, {}) | |
assert client.protocol == "https" | |
assert client.name == "example.com/path" | |
assert client.PREFIX == "https://" |
tests/unit/test_client_http.py
Outdated
def test_get_full_path_http(): | ||
"""Test full path construction for HTTP""" | ||
cache = Mock(spec=Cache) | ||
client = HTTPClient("example.com:8080", {}, cache) | ||
|
||
assert client.get_full_path("file.txt") == "http://example.com:8080/file.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing test for get_full_path with empty rel_path and with rel_path containing slashes.
Please add tests for get_full_path with an empty rel_path and with rel_path values that start with or contain multiple slashes to verify correct URL handling.
def test_get_full_path_http(): | |
"""Test full path construction for HTTP""" | |
cache = Mock(spec=Cache) | |
client = HTTPClient("example.com:8080", {}, cache) | |
assert client.get_full_path("file.txt") == "http://example.com:8080/file.txt" | |
def test_get_full_path_http(): | |
"""Test full path construction for HTTP""" | |
cache = Mock(spec=Cache) | |
client = HTTPClient("example.com:8080", {}, cache) | |
# Basic file path | |
assert client.get_full_path("file.txt") == "http://example.com:8080/file.txt" | |
# Empty rel_path | |
assert client.get_full_path("") == "http://example.com:8080/" | |
# rel_path starts with a slash | |
assert client.get_full_path("/file.txt") == "http://example.com:8080/file.txt" | |
# rel_path contains multiple slashes | |
assert client.get_full_path("dir//file.txt") == "http://example.com:8080/dir//file.txt" | |
assert client.get_full_path("/dir//file.txt") == "http://example.com:8080/dir//file.txt" |
src/datachain/client/http.py
Outdated
Construct full HTTP/HTTPS URL from relative path. | ||
Note: version_id is ignored as HTTP doesn't support versioning. | ||
""" | ||
if version_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Remove redundant conditional (
remove-redundant-if
) - Lift return into if (
lift-return-into-if
)
Code is Claude AI generated.
It enables:
Summary by Sourcery
Add HTTPClient to support read-only HTTP/HTTPS file operations and integrate it into the generic Client dispatch logic.
New Features:
Tests:
Summary by Sourcery
Enable read-only HTTP/HTTPS file operations by adding HTTPClient and HTTPSClient implementations using fsspec’s HTTPFileSystem, integrating them into the generic Client dispatch, and covering the functionality with unit tests.
New Features:
Tests: