Skip to content

Commit 36d9521

Browse files
authored
Merge pull request #6 from SwissOpenEM/readme
Update readme
2 parents 42cc0b8 + 7155c2a commit 36d9521

File tree

1 file changed

+138
-19
lines changed

1 file changed

+138
-19
lines changed

README.md

Lines changed: 138 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,149 @@
11
# LS Metadata extractor
2-
simple parser for the two common life-science EM metadata output formats (.xml from EPU and .mdoc from TOMO5 and SerialEM respectively), written in go
2+
Extract metadata from common life-science electron microscopy data in
3+
[OSC-EM](https://github.com/osc-em) format.
34

4-
## Usage
5-
Chose the appropriate binary from the [Releases](https://github.com/SwissOpenEM/LS_Metadata_reader/releases), then:
6-
LS_Metadata_reader target_directory
5+
## Input formats
6+
7+
- SerialEM
8+
- Thermo Fisher EPU
9+
- TOMO5
10+
11+
## Installation
12+
13+
Binaries for Mac, Linux, and Windows can be downloaded from our
14+
[releases](https://github.com/SwissOpenEM/LS_Metadata_reader/releases) page.
15+
Alternately, you can compile from source by running:
716

8-
For testing, try the associated [tutorial](https://github.com/SwissOpenEM/LS_Metadata_reader/tree/main/tutorial) folder; an example of how the output should look like is provided in the same folder (tutorial_correct.json).
17+
```sh
18+
go build -o LS_metadata_reader .
19+
```
920

10-
## Comments
11-
Runs on a directory containing raw files and their instrument written additional information files (.mdoc and .xml respectively), generates a dataset level .json file. In case of usage with EPU pointing to the top level directory is enough, it will search for the data folders and extract the info from there. Using --z you can also obtain a zip file of the xml files associated with your data collection. If you want all the metadata (dataset level, not all individual entries) written out by a given software use the --f flag, otherwise the output will be OSCEM conform.
21+
### MacOS
22+
23+
The release executables for MacOS are not signed. You may get a warning that MacOS
24+
cannot verify the developer or check the binary for malicious software. If downloaded
25+
directly from Github this executable should be safe to run. You can bypass the warning
26+
by running the command:
27+
28+
```sh
29+
xattr -d com.apple.quarantine LS_Metadata_reader
30+
```
31+
32+
### SerialEM
1233

13-
## SerialEM
14-
SerialEM properties examples are to be added to the existing properties files of your SerialEM installation (update values to reflect your instrument parameters). The two scripts are to be run after each image collection (the lowest tick mark on the SerialEM automization script selection) with the respective name indicating when to use which of the two. Otherwise SerialEM ouput will lack a few required fields for the schema.
1534
**!!! Requires SerialEM 4.2.0 or newer !!!**
1635

17-
## For running with EPU directly
18-
Benefits from setting the config (set using LS_Metadata_reader --c), or handing over the three instrument values directly with flags: <br>
19-
--cs <br>
20-
--gain_flip_rotate <br>
21-
--epu <br>
22-
The --cs (for the CS value of the instrument) and --gain_flip_rotate (for the orientation of the gain_reference relative to actual data) are unfortunately never provided in the metadata, and are both important for processing. It is therefore highly beneficial to set these two.
23-
As for --epu, EPU writes its metadata files in a different directory than its actual data (TOMO5 also keeps some additional info that is processed by the LS_Metadata_reader there). It generates another set of folders, usually on the microscope controlling computer, that mirror its OffloadData folders in directory structure. Within them it stores some related information, among which are also the metadata xml files. If --epu is defined as a flag or in the config, the LS_Metadata_reader will directly grab those when the user points it at a OffloadData directory. <br>
24-
NOTE: This requires you to mount the microscope computer directory for EPU on the machine you are running LS_Metadata_reader on as those are most likely NOT the same. The extractor will work regardless if pointed to the xmls/mdocs directly, this is just for convenience.
36+
SerialEM requires some additional configuration to ensure that all required information
37+
is available in the mdoc files.
38+
39+
1. Add instrument properties to `SerialEMproperties.txt`. See the
40+
[example](SerialEM_Scripts/SerialEMproperties_GlobalAutodocEntry_Example.txt). Update
41+
values to reflect your instrument parameters.
42+
2. The two scripts are provided in `SerialEM_Scripts/` for SPA and Tomography datasets.
43+
One of these should be run after each image collection (the lowest tick mark on the
44+
SerialEM automization script selection). Otherwise SerialEM output will lack a few
45+
required fields for the schema.
46+
47+
48+
### EPU and TOMO5
49+
50+
Some instrument data is not available in EPU output. This is normally set in a
51+
configuration file, but can also be added at the command line using parameters.
52+
53+
A wizard is available to walk through creating the configuration file. Run it using
54+
55+
```sh
56+
LS_Metadata_reader --c
57+
```
58+
59+
The configuration file is saved in the following locations depending on your platform:
60+
61+
- Unix: `$XDG_CONFIG_HOME/LS_reader.conf` (usually `$HOME/.config/LS_reader.conf`)
62+
- MacOS: `$HOME/Library/Application Support/LS_reader.conf`
63+
- Windows: `%AppData%\LS_reader.conf`
64+
65+
Config values can also be set using the command line flags:
66+
67+
| Config property | CLI Option | Required | Description |
68+
| ------------- | -------- | ---------- | --- |
69+
| CS | `--cs` | yes |the CS value of the instrument
70+
| Gainref_FlipRotate | `--gain_flip_rotate` | yes | the orientation of the gain_reference relative to actual data
71+
| MPCPATH | `--epu` | | Path to EPU metadata directory
72+
73+
EPU writes its metadata files in a different directory than its actual data (TOMO5 also
74+
keeps some additional info that is processed by the LS_Metadata_reader there). It
75+
generates another set of folders, usually on the microscope controlling computer, that
76+
mirror its OffloadData folders in directory structure. Within them it stores some
77+
related information, including the metadata xml files. If `--epu` is defined as a flag
78+
or in the config, the LS_Metadata_reader will directly grab those when the user points
79+
it at a OffloadData directory.
80+
*NOTE: This requires you to mount the microscope computer directory for EPU on the
81+
machine you are running LS_Metadata_reader on, as those are most likely NOT the same.
82+
The extractor will work regardless if pointed to the xmls/mdocs directly, this is just
83+
for convenience.*
84+
85+
86+
## Usage
87+
88+
The reader should be called with the path to a folder containing the xml (EPU/TOMO5) or
89+
mdoc (SerialEM) files.
90+
91+
```sh
92+
./LS_Metadata_reader -o tutorial_oscem.json tutorial/
93+
```
94+
95+
For testing, try the associated [tutorial](tutorial/) folder; an example of how the
96+
output should look like is provided in the same folder (tutorial_correct.json). For
97+
first time use, disregard the warnings about config/flags those are for use directly
98+
with EPU or the OpenEM Ingestor.
99+
100+
The reader runs on a directory containing the microscope's additional information files
101+
for each micrograph (.mdoc or .xml for SerialEM and EPU, respectively). It generates a
102+
JSON file following the OSC-EM schema with metadata for the whole dataset. For
103+
usage with EPU, pointing to the top level directory is enough; it will search for the
104+
data folders and extract the info from there.
105+
106+
Using `-z` you can also obtain a zip file of the xml files associated with your data
107+
collection. This can be useful for archiving or for later analysis.
108+
109+
To include additional metadata not supported by the OSC-EM schema, use the `-f` flag.
110+
This will include all available dataset-level metadata.
111+
112+
113+
## SciCat Ingestor integration
114+
115+
This tool is a compatible metadata extractor for use with the [SciCat Web
116+
Ingestor](https://github.com/SwissOpenEM/Ingestor). It can be installed automatically by
117+
including the following in your ingestor configuration file:
118+
119+
```yaml
120+
MetadataExtractors:
121+
- Name: LS
122+
GithubOrg: SwissOpenEM
123+
GithubProject: LS_Metadata_reader
124+
Version: v0.3.0
125+
Executable: LS_Metadata_reader
126+
Checksum: 805fd036f2c83284b2cd70f2e7f3fafbe17bc750d2156f604c1505f7d5791d75
127+
ChecksumAlg: sha256
128+
CommandLineTemplate: "-i '{{.SourceFolder}}' -o '{{.OutputFile}}'"
129+
Methods:
130+
- Name: Single Particle
131+
Schema: oscem_schemas.schema.json
132+
- Name: Cellular Tomography
133+
Schema: oscem_cellular_tomo.json
134+
- Name: Tomography
135+
Schema: oscem_tomo.json
136+
- Name: EnvironmentalTomography
137+
Schema: oscem_env_tomo.json
138+
```
139+
140+
This will automatically download and install the LS_Metadata_extractor with the
141+
specified version.
25142
26-
## Schema-Links
143+
## Schema
27144
Output is compatible to OSCEM schemas https://github.com/osc-em/OSCEM_Schemas/
28145
29-
Specific schema used to generate standard schema conform output (works for SPA and Tomography): https://github.com/osc-em/OSCEM_Schemas/blob/linkml_yaml/src/oscem_schemas/schema/oscem_schemas_tomo.yaml
146+
Specific schema used to generate standard schema conform output (works for SPA and
147+
Tomography):
148+
https://github.com/osc-em/OSCEM_Schemas/blob/linkml_yaml/src/oscem_schemas/schema/oscem_schemas_tomo.yaml
30149
with LinkML gen-golang

0 commit comments

Comments
 (0)