Skip to content

Conversation

@fhk
Copy link
Contributor

@fhk fhk commented May 15, 2025

As per #1

I'd also like to add the following doc for future reference but couldn't decide which file it should live in.

## Example data

The data is created as part of the [GitHub Actions](https://github.com/apple/embedding-atlas/blob/main/.github/workflows/ci.yml).

Which calls the following script

[embedding-atlas/blob/main/packages/docs/generate_demo_data.py](https://github.com/apple/embedding-atlas/blob/main/packages/docs/generate_demo_data.py)

@donghaoren
Copy link
Collaborator

Thanks! We don't have a a README for how to build the docs yet, I'll take a note and add your suggestion later.

@donghaoren donghaoren changed the title Add title from data set for example docs: add back the title column from demo dataset May 16, 2025
@donghaoren donghaoren merged commit cb2fbeb into apple:main May 16, 2025
4 checks passed
davanstrien pushed a commit to davanstrien/embedding-atlas that referenced this pull request Nov 4, 2025
Add new 'export-hf' subcommand to upload atlas datasets to Hugging Face:

- Refactor CLI to use Click groups while maintaining backward compatibility
- embedding-atlas <file> continues to work as before (serve command)
- New command: embedding-atlas export-hf <file> --repo-id <repo>
- Features:
  - Validate required columns before upload
  - Support public/private repositories
  - Token from --token flag or HF_TOKEN env variable
  - Custom commit messages
  - Create PR option instead of direct commit
  - Dry-run mode to preview upload
  - Helpful error messages and next steps

Backward compatibility:
- Existing CLI behavior unchanged
- main() function preserved as legacy entry point
- All existing options and flags work identically

Examples:
  embedding-atlas export-hf atlas.parquet --repo-id user/dataset
  embedding-atlas export-hf atlas.parquet --repo-id user/ds --private
  embedding-atlas export-hf atlas.parquet --repo-id user/ds --dry-run

This is a non-breaking change that adds new functionality.

Implements Issue apple#2
Depends on Issue apple#1
davanstrien pushed a commit to davanstrien/embedding-atlas that referenced this pull request Nov 4, 2025
Track progress of HF dataset integration implementation:
- Completed work (Issues apple#1 and apple#2)
- In-progress and pending issues
- Next steps for repository owner
- PR creation instructions
- Manual testing checklist
- Progress metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants