|
1 | 1 | # LS Metadata extractor
|
2 |
| -simple parser for the two common life-science EM metadata output formats (.xml from EPU and .mdoc from TOMO5 and SerialEM respectively), written in go |
| 2 | +Extract metadata from common life-science electron microscopy data in |
| 3 | +[OSC-EM](https://github.com/osc-em) format. |
3 | 4 |
|
4 |
| -## Usage |
5 |
| -Chose the appropriate binary from the [Releases](https://github.com/SwissOpenEM/LS_Metadata_reader/releases), then: |
6 |
| -LS_Metadata_reader target_directory |
| 5 | +## Input formats |
| 6 | + |
| 7 | +- SerialEM |
| 8 | +- Thermo Fisher EPU |
| 9 | +- TOMO5 |
| 10 | + |
| 11 | +## Installation |
| 12 | + |
| 13 | +Binaries for Mac, Linux, and Windows can be downloaded from our |
| 14 | +[releases](https://github.com/SwissOpenEM/LS_Metadata_reader/releases) page. |
| 15 | +Alternately, you can compile from source by running: |
7 | 16 |
|
8 |
| -For testing, try the associated [tutorial](https://github.com/SwissOpenEM/LS_Metadata_reader/tree/main/tutorial) folder; an example of how the output should look like is provided in the same folder (tutorial_correct.json). |
| 17 | +```sh |
| 18 | +go build -o LS_metadata_reader . |
| 19 | +``` |
9 | 20 |
|
10 |
| -## Comments |
11 |
| -Runs on a directory containing raw files and their instrument written additional information files (.mdoc and .xml respectively), generates a dataset level .json file. In case of usage with EPU pointing to the top level directory is enough, it will search for the data folders and extract the info from there. Using --z you can also obtain a zip file of the xml files associated with your data collection. If you want all the metadata (dataset level, not all individual entries) written out by a given software use the --f flag, otherwise the output will be OSCEM conform. |
| 21 | +### MacOS |
| 22 | + |
| 23 | +The release executables for MacOS are not signed. You may get a warning that MacOS |
| 24 | +cannot verify the developer or check the binary for malicious software. If downloaded |
| 25 | +directly from Github this executable should be safe to run. You can bypass the warning |
| 26 | +by running the command: |
| 27 | + |
| 28 | +```sh |
| 29 | +xattr -d com.apple.quarantine LS_Metadata_reader |
| 30 | +``` |
| 31 | + |
| 32 | +### SerialEM |
12 | 33 |
|
13 |
| -## SerialEM |
14 |
| -SerialEM properties examples are to be added to the existing properties files of your SerialEM installation (update values to reflect your instrument parameters). The two scripts are to be run after each image collection (the lowest tick mark on the SerialEM automization script selection) with the respective name indicating when to use which of the two. Otherwise SerialEM ouput will lack a few required fields for the schema. |
15 | 34 | **!!! Requires SerialEM 4.2.0 or newer !!!**
|
16 | 35 |
|
17 |
| -## For running with EPU directly |
18 |
| -Benefits from setting the config (set using LS_Metadata_reader --c), or handing over the three instrument values directly with flags: <br> |
19 |
| ---cs <br> |
20 |
| ---gain_flip_rotate <br> |
21 |
| ---epu <br> |
22 |
| -The --cs (for the CS value of the instrument) and --gain_flip_rotate (for the orientation of the gain_reference relative to actual data) are unfortunately never provided in the metadata, and are both important for processing. It is therefore highly beneficial to set these two. |
23 |
| -As for --epu, EPU writes its metadata files in a different directory than its actual data (TOMO5 also keeps some additional info that is processed by the LS_Metadata_reader there). It generates another set of folders, usually on the microscope controlling computer, that mirror its OffloadData folders in directory structure. Within them it stores some related information, among which are also the metadata xml files. If --epu is defined as a flag or in the config, the LS_Metadata_reader will directly grab those when the user points it at a OffloadData directory. <br> |
24 |
| -NOTE: This requires you to mount the microscope computer directory for EPU on the machine you are running LS_Metadata_reader on as those are most likely NOT the same. The extractor will work regardless if pointed to the xmls/mdocs directly, this is just for convenience. |
| 36 | +SerialEM requires some additional configuration to ensure that all required information |
| 37 | +is available in the mdoc files. |
| 38 | + |
| 39 | +1. Add instrument properties to `SerialEMproperties.txt`. See the |
| 40 | + [example](SerialEM_Scripts/SerialEMproperties_GlobalAutodocEntry_Example.txt). Update |
| 41 | + values to reflect your instrument parameters. |
| 42 | +2. The two scripts are provided in `SerialEM_Scripts/` for SPA and Tomography datasets. |
| 43 | + One of these should be run after each image collection (the lowest tick mark on the |
| 44 | + SerialEM automization script selection). Otherwise SerialEM output will lack a few |
| 45 | + required fields for the schema. |
| 46 | + |
| 47 | + |
| 48 | +### EPU and TOMO5 |
| 49 | + |
| 50 | +Some instrument data is not available in EPU output. This is normally set in a |
| 51 | +configuration file, but can also be added at the command line using parameters. |
| 52 | + |
| 53 | +A wizard is available to walk through creating the configuration file. Run it using |
| 54 | + |
| 55 | +```sh |
| 56 | +LS_Metadata_reader --c |
| 57 | +``` |
| 58 | + |
| 59 | +The configuration file is saved in the following locations depending on your platform: |
| 60 | + |
| 61 | + - Unix: `$XDG_CONFIG_HOME/LS_reader.conf` (usually `$HOME/.config/LS_reader.conf`) |
| 62 | + - MacOS: `$HOME/Library/Application Support/LS_reader.conf` |
| 63 | + - Windows: `%AppData%\LS_reader.conf` |
| 64 | + |
| 65 | +Config values can also be set using the command line flags: |
| 66 | + |
| 67 | +| Config property | CLI Option | Required | Description | |
| 68 | +| ------------- | -------- | ---------- | --- | |
| 69 | +| CS | `--cs` | yes |the CS value of the instrument |
| 70 | +| Gainref_FlipRotate | `--gain_flip_rotate` | yes | the orientation of the gain_reference relative to actual data |
| 71 | +| MPCPATH | `--epu` | | Path to EPU metadata directory |
| 72 | + |
| 73 | +EPU writes its metadata files in a different directory than its actual data (TOMO5 also |
| 74 | +keeps some additional info that is processed by the LS_Metadata_reader there). It |
| 75 | +generates another set of folders, usually on the microscope controlling computer, that |
| 76 | +mirror its OffloadData folders in directory structure. Within them it stores some |
| 77 | +related information, including the metadata xml files. If `--epu` is defined as a flag |
| 78 | +or in the config, the LS_Metadata_reader will directly grab those when the user points |
| 79 | +it at a OffloadData directory. |
| 80 | +*NOTE: This requires you to mount the microscope computer directory for EPU on the |
| 81 | +machine you are running LS_Metadata_reader on, as those are most likely NOT the same. |
| 82 | +The extractor will work regardless if pointed to the xmls/mdocs directly, this is just |
| 83 | +for convenience.* |
| 84 | + |
| 85 | + |
| 86 | +## Usage |
| 87 | + |
| 88 | +The reader should be called with the path to a folder containing the xml (EPU/TOMO5) or |
| 89 | +mdoc (SerialEM) files. |
| 90 | + |
| 91 | +```sh |
| 92 | +./LS_Metadata_reader -o tutorial_oscem.json tutorial/ |
| 93 | +``` |
| 94 | + |
| 95 | +For testing, try the associated [tutorial](tutorial/) folder; an example of how the |
| 96 | +output should look like is provided in the same folder (tutorial_correct.json). For |
| 97 | +first time use, disregard the warnings about config/flags those are for use directly |
| 98 | +with EPU or the OpenEM Ingestor. |
| 99 | + |
| 100 | +The reader runs on a directory containing the microscope's additional information files |
| 101 | +for each micrograph (.mdoc or .xml for SerialEM and EPU, respectively). It generates a |
| 102 | +JSON file following the OSC-EM schema with metadata for the whole dataset. For |
| 103 | +usage with EPU, pointing to the top level directory is enough; it will search for the |
| 104 | +data folders and extract the info from there. |
| 105 | + |
| 106 | +Using `-z` you can also obtain a zip file of the xml files associated with your data |
| 107 | +collection. This can be useful for archiving or for later analysis. |
| 108 | + |
| 109 | +To include additional metadata not supported by the OSC-EM schema, use the `-f` flag. |
| 110 | +This will include all available dataset-level metadata. |
| 111 | + |
| 112 | + |
| 113 | +## SciCat Ingestor integration |
| 114 | + |
| 115 | +This tool is a compatible metadata extractor for use with the [SciCat Web |
| 116 | +Ingestor](https://github.com/SwissOpenEM/Ingestor). It can be installed automatically by |
| 117 | +including the following in your ingestor configuration file: |
| 118 | + |
| 119 | +```yaml |
| 120 | +MetadataExtractors: |
| 121 | + - Name: LS |
| 122 | + GithubOrg: SwissOpenEM |
| 123 | + GithubProject: LS_Metadata_reader |
| 124 | + Version: v0.3.0 |
| 125 | + Executable: LS_Metadata_reader |
| 126 | + Checksum: 805fd036f2c83284b2cd70f2e7f3fafbe17bc750d2156f604c1505f7d5791d75 |
| 127 | + ChecksumAlg: sha256 |
| 128 | + CommandLineTemplate: "-i '{{.SourceFolder}}' -o '{{.OutputFile}}'" |
| 129 | + Methods: |
| 130 | + - Name: Single Particle |
| 131 | + Schema: oscem_schemas.schema.json |
| 132 | + - Name: Cellular Tomography |
| 133 | + Schema: oscem_cellular_tomo.json |
| 134 | + - Name: Tomography |
| 135 | + Schema: oscem_tomo.json |
| 136 | + - Name: EnvironmentalTomography |
| 137 | + Schema: oscem_env_tomo.json |
| 138 | +``` |
| 139 | +
|
| 140 | +This will automatically download and install the LS_Metadata_extractor with the |
| 141 | +specified version. |
25 | 142 |
|
26 |
| -## Schema-Links |
| 143 | +## Schema |
27 | 144 | Output is compatible to OSCEM schemas https://github.com/osc-em/OSCEM_Schemas/
|
28 | 145 |
|
29 |
| -Specific schema used to generate standard schema conform output (works for SPA and Tomography): https://github.com/osc-em/OSCEM_Schemas/blob/linkml_yaml/src/oscem_schemas/schema/oscem_schemas_tomo.yaml |
| 146 | +Specific schema used to generate standard schema conform output (works for SPA and |
| 147 | +Tomography): |
| 148 | +https://github.com/osc-em/OSCEM_Schemas/blob/linkml_yaml/src/oscem_schemas/schema/oscem_schemas_tomo.yaml |
30 | 149 | with LinkML gen-golang
|
0 commit comments