Skip to content

StabiBerlin/xlsx-to-xml-mods-processing

 
 

Repository files navigation

xlsx-to-mods-xml conversion tool for Tibetan newspaper metadata

This repository contains two functions for converting our project-internal metadata spreadsheets Diverge_Tibetan Newspaper Metadata for XML v2.xlsx to MODS-formatted XML.

Note that these scripts are for project use only, and will not work outside of the Diverge project, because the conversion process from Excel to XML requires hard-coding of column names. Generally, columns and column names cannot be modified. However, new newspaper records (rows) can be added and information updated. To add, alter, or delete columns get in touch with the developers.

The spreadsheet Diverge_Tibetan Newspaper Metadata for XML v2.xlsx contains the metadata for the newspapers in the Diverge corpus. It includes links to the original holdings and aggregates information from various sources, including the newspapers, the catalogue entries of the libraries, where original copies are held and from previous research:

How to use

The workflow is executed in two steps. First, the python programs must be stored hierarchically at or above the level where the raw data and the target directory is stored, in the same hierarchy. Then, in the command line interface, run the following command:

python -m flat-xml.py

You will then be prompted for the relevant file path inputs. This script generates “flat” XML for each record in the spreadsheet, where each column corresponds to a unique field in the XML.

Once the “flat” XML files have been generated, run the second script, which converts “flat” XML to MODS format, compliant with the Berlin State Library’s metadata storage standards, with the command prompt:

python -m mods-from-flat-xml.py

As before, you will receive prompts to input your paths to files or folders for conversion. The MODS converter can handle single records or a directory containing only XML files.

About

convert Tibetan newspaper metadata from a excel spreadsheet to mods-xml

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%