This folder contains scripts used to upload data to the database. Below is some documentation on these scripts. Click here to skip to the database upload log.
This script is used to convert CSV files to JSON for database upload. For example, it can be used to convert the shared Google Sheet where the archive information is stored and worked on to a JSON file.
The script accepts the path to the CSV file and the CSV type as arguments.
Currently the script supports archive or sticker CSVs from the shared google
sheets.
Note that if archive is provided script will also update the primary image
metadata associated with the response.
- Make sure you have the Python dependencies installed, found in
requirements.txt. - Download the CSV data from the Google Sheet as a CSV and save it to a known location, preferably within the
scripts/datafolder. - Run the script with the following command:
where
python3 scripts/csv_to_json.py <path> <type><path>is the path to the CSV file and<type>is eitherarchiveorstickers. - The script will create appropriate JSON files in
scripts/data/tmp, which can then be uploaded to the database using the upload script.
The main script is upload.js and can be used as a command line
tool.
Use the following command to see the help page.
node scripts/upload.js --help
The basic workflow is to run the script with a JSON filepath as the first
argument and the upload type as the second argument (e.g. archive or
sticker). The script will convert the JSON objects to Mongoose models and
upload them to the database.
Some examples JSON files are provided in the example-data folder to show how
the data should be formatted when running the upload script.
- Example file:
upload-archive.json - Command:
node scripts/upload.js example-data/upload-archive.json archive - Description: This command will upload the single archive object to the
database. The archive object has the same fields as outlined in
Types.js.
- Example file:
overwrite-image-meta.json - Command:
node scripts/upload.js --overwrite example-data/overwrite-image-meta.json image-meta - Description: This command will overwrite the two image meta objects with the
given
IDfields in the example file. Note that the entire objects are provided since they will replace the existing objects with the sameID.
- Example file:
update-workshop-nested.json - Command:
node scripts/upload.js --update example-data/update-workshop-nested.json workshops - Description: This will update the workshop with the ID
A040674864with the data in the file. All other fields will be left as they are. Note that sincegeois a nested field, it is provided aslocation.geoin order to preserve the other fields inlocation.
The basic workflow for updating data in the database is as follows:
- Convert the data you want to upload/update to a JSON format. You can use the
csv_to_json.pyscript if the data is from a CSV and is from the archive sheet or stickers. Otherwise you can write some other script to create this JSON file. - Note that the JSON file must contain an array of objects to be uploaded.
These objects should match the schemas found in
Types.js. Nested fields should be written as dot-separated strings such asroot.nested.double_nested.... See the example data folder for examples. - Run the
upload.jsscript with the path to the JSON file to update, overwrite, or upload the data. See the documentation above.
- Changed "Najmeh" to "Nejmeh" in the archive responses.
- Made slight edits to the duplicate archive responses based on the FHL comments.
- Uploaded sticker data to the database.
- Created a new sticker model in
Sticker.js - Created a sticker typedef in
Types.js - Uploaded sticker data to the database by giving them codes. See the shared Google Sheet, under the "Stickers" tab.
- Created a new sticker model in
- Added an
is_duplicate_offield to archive objects in case there are duplicate shops being referenced in different records.- Updated Archive schema and typedef.
- Uploaded updated archive data to database.
- Update the workshops data by adding the following fields:
year_establisheddecade_established
- Updated
Types.jsand the Mongoose schema to reflect this. - Updated the database by running the upload script (
node scripts/upload.js -u scripts/data/workshops/workshops_with_year_est.json workshops)
- Used the new upload method to update the archive information and archive image
metadata.
- Update the archive information to contain the thumbnail image ID (
thumb_img_id) - Updated the archive image metadata to contain the decade taken (
decade_taken)
- Update the archive information to contain the thumbnail image ID (
- Reuploaded the archive data to the database using the new scripts. The key
insight was that you can
"."-separate field names (e.g.a.b.c) to create a nested object{a: {b: {c: ...}}}. - Replaced shops with name/owner name
"unknown"and"-"with null in the archive data.
- Updated the
ArchiveandImageMetaschema to use separate year and decade fields: (primary_year,primary_decade) and (year_taken,decade_taken), respectively. SeeTypes.jsfor more specifics. - Reuploaded the archive info with the new year fields.
- Reuploaded the archive image metadata with updated craft types and location.
- Uploaded the reference scans to the database (for textual information). NOTE:
some textual series data did not provide a scan of the reference source, so
some images may not exist (e.g.
146739591from the archival information survey). - Created/updated the corresponding image metadata and archive response data.
- Created a more comprehensive type,
Location, which includes geolocation, address, and administrative regions 1-4. - Updated the
Workshopsschema:- Combined
location,shop_address, and the administrative region fields into a singlelocation(object) field (theLocationtype inTypes.js)
- Combined
- Updated the
ImageMetaschema:- Created a new location field based on the
Locationtype inTypes.js. Deleted all replaced fields. - Added a new
craft_categoryfield.
- Created a new location field based on the
- Reuploaded the craft workshops data and workshops image metadata.
- Updated the
Archiveinformation schema and reuploaded the data TODO: Reupload archive image metadataTODO: Add a "decade" field to the archive information
- Updated archive info schema, changed
typetoref_typeto prevent conflicts with MongoDB fields. - Uploaded all archive information objects to the database.
- Reuploaded, added the craft_discipline_category field and arrayified space separated strings (updated schema to reflect this).
- Updated the
ImageMetaschema to includeaddress(a multilanguage field),sector, andhistoric_map. - Uploaded the image metadata from the archival information.
- Uploaded the images from the archival information survey.
TODO: Update the craft types to reflect the new types and categories.TODO: Reupload the workshops image metadata to fit this new schema.TODO: Upload all of the reference scans to the database
- Uploaded all images from combined workshops and the image metadata.
- Uploaded the combined workshops data.
- Replaced duplicate image
145593040_2with145593040_1in "Combined" Google sheet. - Replaced duplicate image
143055091_2(in response143056445) with143056445_5in "Combined" Google sheet. - Replaced duplicate image
141743929_2with141743929_3in "Combined" Google sheet.