Skip to content

LIBDRUM-964. Update ETD logging for DSpace 8.1#257

Merged
dsteelma-umd merged 5 commits intoumd-lib:feature/LIBDRUM-956from
dsteelma-umd:feature/LIBDRUM-964-dspace-8.1
May 29, 2025
Merged

LIBDRUM-964. Update ETD logging for DSpace 8.1#257
dsteelma-umd merged 5 commits intoumd-lib:feature/LIBDRUM-956from
dsteelma-umd:feature/LIBDRUM-964-dspace-8.1

Conversation

@dsteelma-umd
Copy link
Copy Markdown

@dsteelma-umd dsteelma-umd commented May 29, 2025

The changes in this issue are closely related to the changes in
LIBDRUM-959.

The ETD loading involved four scripts:

  • load-etd - A UMD custom Perl script that actually processes the ETD uploads via the stock "dspace" script and the "edu.umd.lib.dspace.app.EtdLoader" class.
  • dspace - the stock script provided by DSpace for running command-line tasks
  • load-etd-nightly - A UMD custom Perl script that is run by cron to determine if there are any ETD uploads to process via the "load-etd" script
  • script-mail-wrapper - A UMD custom shell script that wraps another script

When run by itself, the “load-etd” script creates an "etdloader.log" log file.

When run via the “script-mail-wrapper”, a "script-mail-wrapper-<SCRIPT>-.log" file is created (where where <SCRIPT> is the name of the wrapped script, and <DATE> is a timestamp). It also sends an email of the console output of the wrapped script to a specified email address.

Prior to the addition of the “dspace-cli.log” file in DSpace 8.1, the “load-etd” script could control its logging via a “log4j.configuration” system property, which would be used for all logging. In DSpace 8.1, however, the “dspace” script was modified to set a “log4j2.configurationFile” system property with overrode the setting the “load-etd” script.

Therefore for this issue:

  • The “dspace” script was customized with a “UMD_DSPACE_CLI_LOG_CONFIG” environment variable, that can be used to override the default Log4j2 configuration file in the script.

  • The Log4j (version 1) “dspace/config/log4j-etdloader.properties” properties file was updated to a Log4j2 XML file (dspace/config/log4j2-etdloader.xml), to make it consistent with the other log configuration files. In keeping with the other Log4j2 XML files, the file uses “XML strict mode” (see https://logging.apache.org/log4j/2.x/manual/configuration.html#configuration-attribute-strict).

At this point, the log behavior generally followed what was previously existing, except for the handling of the DEBUG logging, which would be present in the file-based log (because the DEBUG flag as set), but not in the email from the console output because the “threshold” property was set to “INFO”.

For text-based logs, no straightforward method was found to send both DEBUG logs to the console while restricting the email to INFO logs (as the log entries are sent to console). Therefore in the local development environment, the log configuration defaults to not showing the DEBUG entries, which keeps the email consistent with the existing functionality.

For Kubernetes, which used JSON-formatted logs, the log configuration can be set to send DEBUG (and higher) entries to the log, and the “script-mail-wrapper” was modified to use “jq” to filter out the “DEBUG” entries when creating the email (passing through unchanged any non-JSON lines, and making the JSON entries human-friendly by only displaying the “message” field). This makes it possible to preserve the existing functionality of having more detailed logging in Splunk, while preserving the less-detailed email.

While using JSON-formatted logging in the local development environment would be most consistent with Kubernetes, it was felt that the readability of the text-based logging was more important that seeing all the DEBUG entries.

https://umd-dit.atlassian.net/browse/LIBDRUM-964

Customized "dspace" to enable a "UMD_DSPACE_CLI_LOG_CONFIG" environment
variable to override the Log4J2 configuration used when running the
"dspace" script.

Replaced the Log4j v1 "log4j-etdloader.properties" file with a
Log4j v2 XML file with equivalent functionality.

Modified the "load-etd" script to override the default DSpace Log4J
configuration with the configuration from the "log4j2-etdloader.xml"
file, so that the email that gets sent be the cron job that runs
nightly to load the ETD files will have the proper information.

https://umd-dit.atlassian.net/browse/LIBDRUM-964
It is preferred to use JSON-formatted logs for Kubernetes, as it makes
the logs easier to search/filter in Splunk.

Modified the "script-mail-wrapper" to handle JSON-formatted logs by
extracting only the "message" field of non-DEBUG log messages to send
in the email. For Kubernetes, this preserves the existing functionality
of having non-DEBUG messages in the email sent to the mailing list,
while additional DEBUG-level information is sent to the standard out
and to Splunk.

Note that with these changes, logging in the local development
environment and Kubernetes are different -- the local development
environments uses a text-based log, instead of JSON-formatted, for
simplicity, and does not log at a DEBUG level be default.

The intention is to at least preserve similarity of the emails between
the local development environment and Kubernetes.

https://umd-dit.atlassian.net/browse/LIBDRUM-964
Added "jq" as a dependency to the Docker images, to support using
it to filter JSON in the  "script-mail-wrapper" script used for
ETD nightly cron jobs.

https://umd-dit.atlassian.net/browse/LIBDRUM-964
@dsteelma-umd dsteelma-umd changed the title Feature/libdrum 964 dspace 8.1 LIBDRUM-964. Update ETD logging for DSpace 8.1 May 29, 2025
Added the "jq" utility used by the "script-mail-wrapper" script to
"Dockerfile.dev-additions" and updated the MailHog setup instructions
in "DockerDevelopmentEnvironment.md" to better reflect the current
code in the Dockerfile.

https://umd-dit.atlassian.net/browse/LIBDRUM-964
@dsteelma-umd dsteelma-umd force-pushed the feature/LIBDRUM-964-dspace-8.1 branch from 524620f to beca54b Compare May 29, 2025 12:13
@dsteelma-umd dsteelma-umd merged commit a8763fc into umd-lib:feature/LIBDRUM-956 May 29, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant