Skip to content

Conversation

@shntnu
Copy link
Collaborator

@shntnu shntnu commented Oct 30, 2025

Onboards cpg0044-uhler-multicells dataset and adds automated onboarding workflow.

  • Add cpg0044-uhler-multicells to prefixes registry
  • Add onboard-dataset skill for automated Phase 1 onboarding
  • Add CLAUDE.md with repository guidance
  • Update contributing docs to mention automated workflows

Tracking: #159

shntnu and others added 4 commits October 30, 2025 14:12
Created a Claude Code skill to automate Phase 1 of the dataset
contribution workflow for maintainers. The skill:
- Gathers contributor information (flexibly from email or interactively)
- Assigns cpg#### identifiers
- Updates prefixes.md
- Creates GitHub Discussion with full dataset details
- Provides brief confirmation with discussion link

Added Dataset Information section to discussion template to capture
all relevant details upfront (identifier, contributor, assay type,
size, components). This makes the discussion the single source of
truth for tracking contributions.

Also added CLAUDE.md to provide Claude Code with repository context
and documentation structure guidance.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The gh CLI doesn't have a 'discussion create' command. Updated the skill to use the GraphQL API via 'gh api graphql' instead, which is the correct approach for creating GitHub Discussions programmatically.

Changes:
- Replace gh discussion create with two-step GraphQL approach
- Add instructions to query for repository/category IDs
- Add proper JSON escaping for discussion body
- Update error handling with fallback IDs
- Update preferred category to "Dataset descriptions"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Extract discussion template from SKILL.md to dedicated template file
  - New: .claude/skills/onboard-dataset/templates/discussion_template.md
  - Makes template the single source of truth (previously Discussion #66)
  - Follows Claude Code best practices for progressive disclosure

- Add create_discussion.sh helper script
  - New: .claude/skills/onboard-dataset/scripts/create_discussion.sh
  - Handles GraphQL discussion creation with proper JSON escaping via jq
  - Replaces complex inline bash commands in skill instructions

- Update SKILL.md with improved organization
  - Simplified from 267 to 222 lines (17% reduction)
  - Fixed all paths to use full paths from repo root
  - Updated title format based on actual discussion patterns (e.g., "cpg####-tag")
  - Better separation of concerns: instructions vs. reference material

- Update contributing_to_cpg.md
  - Note that maintainers may use automated workflows for onboarding

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@shntnu
Copy link
Collaborator Author

shntnu commented Oct 30, 2025

@ErinWeisbart I am piloting a couple of ideas in this PR, which I will request you to review if I decide to proceed with it.

But for now, this airtable link

Champion or Contributor fills out the metadata collection form (https://airtable.com/shrVxz9DcoMlDoCBI)

pops up this warning

There's an issue with this form
The form owner may need to upgrade their workspace in Airtable before this form can accept new responses. Notify the
form owner to let them know that you'd like to submit a response.

  1. Are we still using that form?
  2. If so, can you figure out how to upgrade?

@ErinWeisbart
Copy link
Member

  1. Yes and 2. Yes. Just tagged you in internal slack thread about the issue.

- Add "Required information" header with bold formatting for clarity
- Specify 6 canonical Cell Painting stains and 3/6 minimum for variations
- Add explicit examples for data size and institutional identifiers
- Document maintainer workflow steps after initial contact
- Fix typo in "For existing datasets" section

These changes reduce back-and-forth by making it clearer what information
contributors must provide upfront, particularly institutional identifier
and project tag which were previously easy to overlook.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@shntnu
Copy link
Collaborator Author

shntnu commented Nov 1, 2025

Closing in favor of #161 #162 #163

@shntnu shntnu closed this Nov 1, 2025
@shntnu shntnu deleted the onboard-dataset-skill branch November 1, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants