Skip to content

Metadata curation mode

Dmytro Titov edited this page Sep 26, 2018 · 5 revisions

Sep 26, 2018

Present: Dmytro, Sveinung

We are going to have three versions: a version of the raw content, version of curation and a version of standardized content (built by applying attribute mappings), but these versions are not exposed to the end-user. Instead, we expose "combined" version, which is some kind of concatenation of those three, e.g. 1-2-3, where 1 stands for raw content version, 2 for curated content version and 3 for standardized content version. The enumeration starts with 0 and for the raw content, it means the first version of it, for the curated content it means "no curation" (content as is) and for the standardized version it means no standardization (standardized content is absent).

Aug 23, 2018, Skype session

Present: Dmytro, Sveinung

In order to create cool functionality for metadata providers/curators, we brainstormed how such features might look in TrackFind:

One or more separate tabs on the admin interface (the existing interface could be a "Mapping" tab). We thought of three main features:

  1. Manually edit the fields of a single dataset.
  • Unclear how this might look. The tree structure makes it a bit problematic from a UI point of view. One possibility is to reuse the current tree structure, only displaying the single value (or list if applicable) in the dataset. The top level would then be the ID (or similar) for each dataset that was found/selected. For editing, it might be possible to edit the fields directly in the tree, or one gets a text box when clicked. Or similar...
  • The tree structure could be used instead of the JSON output for showing search results in the search interface.
  1. Manually edit values/attribute names across multiple datasets.
  • Just use the tree structure view as implemented now. When a value or attribute name is selected, one could edit it (in a text box or similar, like in point 1 above). All datasets with the selected value or attribute name are updated in the index and the tree is updated to show the edited contents.
  • This should be relatively quick (max a couple of seconds wait).
  • Another possibility to on-the-fly updates (if the index update process is too slow) is to build up a batch of edit operations and apply all at once later (by the click of a "Save" button or similar).
  1. All edit operations as described above should be stored in a history.
  • The history should be humanly readable by the user (from some infobox or similar in the UI)
  • It should be possible to apply all the changes in batch mode to the underlying database remotely, e.g. in the TrackHub Registry (e.g. convert it to SQL statement / REST API calls or similar). We need to postpone this part until we have knowledge of the remote system.

Technical refactoring possibility:

  • The current implementation uses Apache Lucene for storing data, indexing, and search. This uses a custom dataset format and requires conversion to/from JSON at many places.
  • It might be better to use a JSON-friendly database system or similar instead (e.g. MongoDB).
  • We would like to have versioning support still, so it will be a challenge to make that work with another backend.
  • We postpone this issue for now (but will investigate a bit).

Clone this wiki locally