-
Notifications
You must be signed in to change notification settings - Fork 191
[Doc Review 25.09] Huy - Video Concepts + Text Advanced #1270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated section title to indicate review process. Signed-off-by: Huy Vu <[email protected]>
Greptile OverviewGreptile SummaryThis PR implements documentation improvements across video concepts and text processing sections, enhancing clarity, consistency, and completeness. Most changes are minor editorial improvements including wording clarifications, grammar fixes, markdown link formatting, and added explanatory content for parameters and workflows. Key improvements:
Critical issue:
Confidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Author as Documentation Author
participant PR as Pull Request
participant Review as Review Process
participant Docs as Documentation System
Author->>PR: Submit doc improvements
Note over PR: 14 files changed<br/>Video concepts + Text processing
PR->>Review: Trigger review
Review->>Docs: Check video/architecture.md
Docs-->>Review: ✓ Wording improvements
Review->>Docs: Check video/data-flow.md
Docs-->>Review: ✓ Enhanced explanations
Review->>Docs: Check deduplication docs
Docs-->>Review: ✓ Frontmatter fix, clarifications
Review->>Docs: Check quality-assessment docs
Docs-->>Review: ✓ URL formatting, improvements
Review->>Docs: Check specialized-processing/code.md
Docs-->>Review: ⚠️ Debug comment found (line 48)
Review->>Docs: Check specialized-processing/index.md
Docs-->>Review: ⚠️ Debug comment found (line 79)
Review->>PR: Report findings
Note over PR: 2 critical issues<br/>Must remove debug comments
PR-->>Author: Request changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
af89aa2 to
f8bdf55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15 files reviewed, 3 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15 files reviewed, 2 comments
27a1d6d to
b50dce7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15 files reviewed, 6 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 1 comment
|
/ok to test 93ad6f6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, no comments
93ad6f6 to
52967c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
14 files reviewed, 2 comments
|
|
||
| # Add filter stages for code quality | ||
| pipeline.add_stage(ScoreFilter( | ||
| ## NEED FIX: TypeError: ScoreFilter.__init__() got an unexpected keyword argument 'score_fn' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: debugging comment left in production docs
| ## NEED FIX: TypeError: ScoreFilter.__init__() got an unexpected keyword argument 'score_fn' |
| ) | ||
| ]) | ||
|
|
||
| ## NEED FIX: NameError: name 'code_dataset' is not defined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: debugging comment left in production docs
| ## NEED FIX: NameError: name 'code_dataset' is not defined |
sarahyurick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It seems like there is a lot of unnecessary repetitions in the deduplication pages, maybe they can be cut down?
| @@ -1,3 +1,4 @@ | |||
| --- | |||
| description: "Identify and remove exact duplicates using MD5 hashing in a Ray-based workflow" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally haven't tested TextDuplicatesRemovalWorkflow with exact deduplication. Were you able to verify it?
| @@ -1,3 +1,4 @@ | |||
| --- | |||
| description: "Identify and remove exact duplicates using MD5 hashing in a Ray-based workflow" | |||
| categories: ["how-to-guides"] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page still needs more updates imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we confirmed cloud storage works with the deduplication modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested with cloud storage due to it's require more complicated settings.
Although I have discussed to Lawrence Lane, due to the cloud storage is not Dedup-exclusive, but for text-curator in general, it is not needed in Dedup docs. This would help the Dedup docs more concise and to the point.
📄 Pages to Review
Description
Usage
# Add snippet demonstrating usageChecklist