LIBDRUM-964. Update ETD logging for DSpace 8.1#257
Merged
dsteelma-umd merged 5 commits intoumd-lib:feature/LIBDRUM-956from May 29, 2025
Merged
Conversation
Customized "dspace" to enable a "UMD_DSPACE_CLI_LOG_CONFIG" environment variable to override the Log4J2 configuration used when running the "dspace" script. Replaced the Log4j v1 "log4j-etdloader.properties" file with a Log4j v2 XML file with equivalent functionality. Modified the "load-etd" script to override the default DSpace Log4J configuration with the configuration from the "log4j2-etdloader.xml" file, so that the email that gets sent be the cron job that runs nightly to load the ETD files will have the proper information. https://umd-dit.atlassian.net/browse/LIBDRUM-964
It is preferred to use JSON-formatted logs for Kubernetes, as it makes the logs easier to search/filter in Splunk. Modified the "script-mail-wrapper" to handle JSON-formatted logs by extracting only the "message" field of non-DEBUG log messages to send in the email. For Kubernetes, this preserves the existing functionality of having non-DEBUG messages in the email sent to the mailing list, while additional DEBUG-level information is sent to the standard out and to Splunk. Note that with these changes, logging in the local development environment and Kubernetes are different -- the local development environments uses a text-based log, instead of JSON-formatted, for simplicity, and does not log at a DEBUG level be default. The intention is to at least preserve similarity of the emails between the local development environment and Kubernetes. https://umd-dit.atlassian.net/browse/LIBDRUM-964
Added "jq" as a dependency to the Docker images, to support using it to filter JSON in the "script-mail-wrapper" script used for ETD nightly cron jobs. https://umd-dit.atlassian.net/browse/LIBDRUM-964
Added the "jq" utility used by the "script-mail-wrapper" script to "Dockerfile.dev-additions" and updated the MailHog setup instructions in "DockerDevelopmentEnvironment.md" to better reflect the current code in the Dockerfile. https://umd-dit.atlassian.net/browse/LIBDRUM-964
524620f to
beca54b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The changes in this issue are closely related to the changes in
LIBDRUM-959.
The ETD loading involved four scripts:
When run by itself, the “load-etd” script creates an "etdloader.log" log file.
When run via the “script-mail-wrapper”, a "script-mail-wrapper-<SCRIPT>-.log" file is created (where where <SCRIPT> is the name of the wrapped script, and <DATE> is a timestamp). It also sends an email of the console output of the wrapped script to a specified email address.
Prior to the addition of the “dspace-cli.log” file in DSpace 8.1, the “load-etd” script could control its logging via a “log4j.configuration” system property, which would be used for all logging. In DSpace 8.1, however, the “dspace” script was modified to set a “log4j2.configurationFile” system property with overrode the setting the “load-etd” script.
Therefore for this issue:
The “dspace” script was customized with a “UMD_DSPACE_CLI_LOG_CONFIG” environment variable, that can be used to override the default Log4j2 configuration file in the script.
The Log4j (version 1) “dspace/config/log4j-etdloader.properties” properties file was updated to a Log4j2 XML file (dspace/config/log4j2-etdloader.xml), to make it consistent with the other log configuration files. In keeping with the other Log4j2 XML files, the file uses “XML strict mode” (see https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-attribute-strict).
At this point, the log behavior generally followed what was previously existing, except for the handling of the DEBUG logging, which would be present in the file-based log (because the DEBUG flag as set), but not in the email from the console output because the “threshold” property was set to “INFO”.
For text-based logs, no straightforward method was found to send both DEBUG logs to the console while restricting the email to INFO logs (as the log entries are sent to console). Therefore in the local development environment, the log configuration defaults to not showing the DEBUG entries, which keeps the email consistent with the existing functionality.
For Kubernetes, which used JSON-formatted logs, the log configuration can be set to send DEBUG (and higher) entries to the log, and the “script-mail-wrapper” was modified to use “jq” to filter out the “DEBUG” entries when creating the email (passing through unchanged any non-JSON lines, and making the JSON entries human-friendly by only displaying the “message” field). This makes it possible to preserve the existing functionality of having more detailed logging in Splunk, while preserving the less-detailed email.
While using JSON-formatted logging in the local development environment would be most consistent with Kubernetes, it was felt that the readability of the text-based logging was more important that seeing all the DEBUG entries.
https://umd-dit.atlassian.net/browse/LIBDRUM-964