Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning

Can the developers provide an example JSONL file for running inference on unlabeled audio using [DrCaps_Zeroshot_Audio_Captioning](https://github.com/X-LANCE/SLAM-LLM/tree/main/examples/drcap_zeroshot_aac)? 

It appears that the dataset JSONL must have this form:

```json
{"source": "/path/to/a_file.wav", "key": "", "target": "", "text": "", "similar_captions": ""}
```
but the content for each field is not clear to me. What should populate `"target"`, `"text"` and "`"similar_captions"`?

Thank you!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Example Dataset for Inference with DrCaps_Zeroshot_Audio_Captioning #169

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions