Skip to content

Enhance Media Support, UI Improvements, and Pipeline Robustness with Gemini Integration and Video Processing Features#47

Merged
navidshad merged 89 commits into
mainfrom
dev
Apr 7, 2026
Merged

Enhance Media Support, UI Improvements, and Pipeline Robustness with Gemini Integration and Video Processing Features#47
navidshad merged 89 commits into
mainfrom
dev

Conversation

@navidshad
Copy link
Copy Markdown
Owner

@navidshad navidshad commented Apr 6, 2026

📋 Summary

This PR introduces comprehensive enhancements and refactoring across the project, including support for multiple media formats such as images and videos. It implements advanced image processing features like upscaling, iteration, and refinement with Gemini creative re-rendering. The video processing pipeline is significantly improved with yt-dlp integration for video downloads, real-time progress tracking, resolution selection, and playback support within the UI.

Additionally, the PR refactors various UI components for better maintainability and usability, including redesigning graph nodes with interactive previews, upgrading chat inputs to support images, and improving modal layouts. Background task management and cancellation support via AbortSignal are introduced, along with robust retry logic and error handling for Gemini API calls.

The repository documentation and project branding have been updated accordingly, and a global design system incorporating custom colors and ambient animations has been integrated.

🔗 Related Tasks

  • #86ex50815 - Implement multimodal intent recognition, image support, and intelligent reference supply controller
  • #86ex3gqx6 - Add video link support, yt-dlp integration, video resolution selection, and improved download reliability
  • #86ex2rmyn - Implement automated thumbnail generation pipeline with background scene enrichment and Gemini 3.1 Flash support
  • #86ex2bna2 - Add markdown rendering, persistent graph node positioning, and drag-and-drop support for ConversationNode and ResultNode
  • #86ewqdkht - Fix transcript parser and update transcript enrichment with visual metadata
  • #86ewqdkec - Implement background task management with UI status updates and cancellation support
  • #86ex41t5v - Add temporary directory safety checks, UI warnings, system instruction support, and professional thumbnail design prompts
  • #86ex4147w - Add copy-to-clipboard functionality and auto-resizing chat inputs with keyboard shortcuts
  • #86ex3ghzu - Integrate AI-generated release notes into GitHub release workflow
  • #86ex3gkk8 - Add video controller cost details and visual metadata display in timeline segments

📝 Additional Details

  • Centralized temporary directory path management with safety checks and subdirectory organization to prevent file deletion outside generated assets.
  • Lazy initialization of managers and improved artifact detection with UI feedback to enhance stability.
  • Enhanced thumbnail generation prompts and image processing logic for better output quality.
  • Introduced pipeline cancellation and retry mechanisms with exponential backoff for resilience against transient errors.
  • Improved download and video playback UI with real-time progress indicators and resolution-specific options.
  • Refactored UI components to standardize styles, disabled button states, and incorporate a new global design system featuring glassmorphism and ambient animations.
  • Documentation updates include supported media formats, optimization details, and project rebranding to FrameFlow.

📜 Commit List

  • 188f225 docs: add supported media formats and optimization details to README
  • aeabbc7 docs: simplify dashboard screenshot styling in README
  • 572d04f docs: remove Brain pipeline diagram and reproducibility link from README
  • c28b12f style: update MediaNode metadata text colors to support light mode and refresh screenshot
  • 0268bc2 docs: update project banner and description layout in README
  • f4049fc docs: reorder README header elements and reposition banner image
  • 91e899b docs: update README with new interface screenshots and project attribution
  • 16b2677 feat: enable thinking configuration with 8000 budget in image generation adapter
  • 905cfce feat: implement image upscaling functionality with Gemini creative re-rendering and update settings UI to support new operation
  • a1086a7 refactor: centralize temporary directory path constants and restrict file deletion to generated assets
  • a47a399 feat: add support for image iteration and refinement by passing attached images as visual context to the intent and generation phases
  • 726cf45 feat: skip image extraction task if image text data is already cached
  • ddf8a91 feat: enable Gemini thinking mode and implement robust response text extraction to filter out thought blocks
  • 03bb720 feat: update prompt instructions to use generic descriptors instead of real-world names for privacy compliance
  • 591d1d4 feat: extract model text output from Gemini adapter and display it in ThumbnailNode UI
  • 5b8c23c refactor: implement dynamic node height calculation for graph layout and improve thumbnail generation prompt and image processing logic
  • 09617f0 refactor: improve background task error handling, retry logic, and transient error detection for Gemini adapter
  • 1c01678 refactor: implement lazy initialization for managers and add missing artifact detection with UI feedback
  • ada3a464 refactor: include frame paths in scene descriptions to improve reference frame retrieval in the supply phase
  • ab07cc3 feat: implement automatic thread path repair and synchronization when changing temp directory
  • e541405 feat: implement real-time thread updates across windows and improve attachment modal layout
  • fe60fb7 style: standardize disabled button states across UI pages with consistent colors and transitions
  • bcc229c feat: update temporary directory path to include FrameFlow subdirectory and ensure its creation
  • 08da113 refactor: update UploadPage UI text and improve code formatting
  • 7da8681 chore: rebrand project to FrameFlow and update documentation accordingly
  • bc4cf92 Merge pull request Implement multimodal intent recognition, image attachments, and UI improvements for AttachmentModal #86ex50815 #46 from navidshad/CU-86ex50815_Implement-image-support-as-initial-resource-apart-link-and-video-file_Navid-Shad
  • 19bbec7 feat: implement multimodal intent recognition and intelligent reference supply controller
  • b2ba3ae style: increase grid column count in AttachmentModal for better layout density #86ex50815
  • 17341c6 refactor: replace manual input implementations with BaseMessageInput component across all graph nodes to support image attachments #86ex50815
  • 0ecfb61 refactor: redesign AttachmentModal using shared components and unify image source handling #86ex50815
  • 33b697e feat: integrate multi-image processing pipeline and image-only graph threads (#86ex50815)
  • a5f55f0 refactor: decompose ResultNode into specialized SummaryNode, ThumbnailNode, and VideoNode components for better maintainability
  • 71d2439 refactor: improve yt-dlp download reliability with path normalization, ffmpeg binary resolution, and robust thread directory management #86ex3gqx6
  • 23a3b9d Merge remote-tracking branch 'origin/dev' into CU-86ex3gqx6_Implement-link-support-then-user-is-able-to-provide-video-link-and-start-a-project_somayeh-roohani
  • d546e4e Merge pull request feat: implement robust retry logic with exponential backoff for Gemini API calls and add batch support for scene description model and image-based structured generation #0a4910d #45 from navidshad/CU-86ex3gw92_Add-gemini-batch-support-for-scene-analysis_Navid-Shad
  • 0a4910d feat: implement robust retry logic with exponential backoff for Gemini API calls and add batch support for scene description model and image-based structured generation
  • 09a22da refactor: replace manual child_process spawning with ytdlp-nodejs wrapper for yt-dlp operations #86ex3gqx6
  • e5bcb86 refactor: make pipeline execution asynchronous and add loading state to summary creation UI #86ex3gqx6
  • e5d7b47 feat: add real-time download progress tracking and UI visualization for video downloads #86ex3gqx6
  • 7ff8b36 feat: add video metadata retrieval and display in ResultNode component
  • 57b6856 refactor: remove hover-based opacity transitions and update overlay z-indexing for media nodes
  • 6f63783 feat: implement video resolution selection by adding format fetching and resolution-specific download support #86ex3gqx6
  • b1c2bcb feat: implement video URL download support using yt-dlp integration #86ex3gqx6
  • e06201b feat: add status field to background tasks and display it in MediaNode UI
  • e6cad9b feat: add AbortSignal to task context for cancellation support
  • 96915c0 refactor: organize temporary files into subdirectories, implement immediate usage recording with abort checks, and add stop confirmation UI for pipeline tasks
  • 33f5fe5 refactor: ensure usage is recorded immediately and add stop confirmation UI for pipeline tasks
  • 69b59a4 feat: implement pipeline cancellation support using AbortSignal across FFmpeg tasks and processing phases
  • c84b74e feat: add system instruction support to Gemini adapter and integrate professional thumbnail design prompts #86ex41t5v
  • 8a42dc1 feat: add temporary directory safety checks and UI warnings for unstable storage paths #86ex41t5v
  • 4105e65 feat: add copy-to-clipboard functionality to conversation messages #86ex4147w
  • 0026d25 feat: upgrade chat inputs to auto-resizing textareas with consistent focus styling and keyboard shortcuts #86ex4147w
  • e9cd917 docs: update PilotUI documentation with source URLs, improved navigation, and corrected Button component props
  • 347ce54 feat: implement global design system with custom colors, glassmorphism components, and ambient animations
  • 2341b3b feat: integrate AI-generated release notes into the GitHub release workflow #86ex3ghzu
  • 54ac31a Merge pull request Refactor header layout and ResultNode UI; enhance timeline segments with visual metadata #86ex3gkk8 #38 from navidshad/CU-86ex3gkk8_Add-cost-detail-Video-controller-Video-detail-Image-detail_Navid-Shad
  • 7ce8506 refactor: update GraphChatPage header layout with constrained title width and repositioned cost display #86ex3gkk8
  • 54efd4b feat: upgrade timeline segments to include visual metadata and update UI to display segment details. #86ex3gkk8
  • eaf3698 refactor: rename videoUrl to mediaContentUrl and add image support to ResultNode component #86ex3gkk8
  • 874788d refactor: redesign ResultNode UI with media-centric layout and enhanced control overlays #86ex3gkk8
  • 9bd4d73 feat: introduce EnrichedTimelineSegment and update transcript enrichment to merge visual descriptions into every segment #86ewqdkht
  • 12ce900 feat: add waitForEnrichTranscript pipeline phase and remove inline transcript enrichment logic #86ewqdkht
  • 30b811b refactor: replace SRT format with line-based transcript format for improved segment indexing #86ewqdkht
  • 6f3a146 feat: implement recursive message branch deletion and add UI controls for branching and node removal #86ex2rmyn
  • 7b8a56c feat: implement automated thumbnail generation pipeline, add Gemini 3.1 Flash support, and update UI to display generated thumbnails. #86ex2rmyn
  • 07ba7d6 chore: release 1.1.6 [skip ci]
  • ed0c467 feat: add markdown rendering support to ConversationNode and ResultNode components #86ex2bna2
  • 104eee1 feat: add version and file type badges to ResultNode and include version in graph message data #86ex2bna2
  • 193c978 feat: implement persistent graph node positioning with drag-and-drop support #86ex2bna2
  • 57087e0 feat: add draggable handle to ConversationNode and restrict drag interaction to specific UI elements #86ex2bna2
  • a87994b feat: redesign graph nodes with interactive video previews and add ConversationNode component #86ex2bna2
  • 5439720 feat: Implement video playback and download functionality in result nodes and pass the user message ID to the video processing pipeline.
  • ad82624 feat: Add extensive debug logging and improve asynchronous handling within the video processing pipeline.
  • 4f2595e feat: Implement retry functionality for message processing and preprocessing tasks with enhanced pipeline context and UI feedback.
  • d717e4c feat: Implement robust cross-platform scenedetect binary and module path resolution.
  • 5926744 feat: introduce Vue Flow graph-based chat interface for parallel tasks #86ex2bna2
  • 033beb8 feat: Implement background task management for preprocessing and update UI to reflect task status. #86ewqdkec

navidshad and others added 30 commits February 25, 2026 01:11
…nd-Tasks_Navid-Shad

feat: Implement background task management for preprocessing and update UI to reflect task status #86ewqdkec
…cessing tasks with enhanced pipeline context and UI feedback.
…odes and pass the user message ID to the video processing pipeline.
…asks_Navid-Shad

Support parallel tasks #86ex2bna2
….1 Flash support, and update UI to display generated thumbnails. #86ex2rmyn
…l-Generation-Pipeline-with-Background-Scene-Enrichment_Navid-Shad

Implement Recursive Message Branch Deletion, Automated Thumbnail Generation, and UI Enhancements #86ex2rmyn
…ent to merge visual descriptions into every segment #86ewqdkht
…ser-in-correction-stage_Navid-Shad

Enhance Transcript Processing with EnrichedTimelineSegment, Pipeline Phase, and Improved Format #86ewqdkht
…idth and repositioned cost display #86ex3gkk8
…deo-controller-Video-detail-Image-detail_Navid-Shad

Refactor header layout and ResultNode UI; enhance timeline segments with visual metadata #86ex3gkk8
…ce supply controller

- Multimodal Context: Updated GeminiAdapter and thread context to aggregate and send user-selected images to the intent recognizer.
- Visual-First Intent: Optimized determineIntent to prioritize Enriched Timeline Segments (scene descriptions) instead of raw transcripts for visual tasks.
- Intelligent Supply Controller: Introduced a new pipeline phase to manage reference images. It strictly uses user attachments if provided, or intelligently selects a subset of video frames based on AI intent to avoid token overflows.
- Reliability Fixes: Added automatic directory creation in GeminiAdapter to prevent ENOENT errors during image generation.
- Performance: Limited intent image history to the last 8 images to maintain low latency and context relevance.

Task ID: #86ex50815
…pport-as-initial-resource-apart-link-and-video-file_Navid-Shad

Implement multimodal intent recognition, image attachments, and UI improvements for AttachmentModal #86ex50815
navidshad added 25 commits April 6, 2026 18:53
…and improve thumbnail generation prompt and image processing logic
…hed images as visual context to the intent and generation phases
…-rendering and update settings UI to support new operation.
@navidshad navidshad changed the title Dev Enhance Media Support, UI Improvements, and Pipeline Robustness with Gemini Integration and Video Processing Features Apr 7, 2026
@navidshad navidshad merged commit 8af65d3 into main Apr 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants