Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ draft: false
description: "LDaCA Technical Architecture update 2025"
---

by {{< profile id="sefton" >}}, {{< profile id="Bonequi" >}} and {{< profile id="foley" >}}
by {{< profile id="sefton" >}}, **Moises Sacal Bonequi** and {{< profile id="foley" >}}

<br>

Expand All @@ -16,10 +16,20 @@ In this post, we will report on some recent developments, mostly in behind-the-s

<br>

{{< image Src="/2025-architecture/ldaca-execution-strategy.png" Alt="A table depicting LDaCA's execution strategy, from our starting point in 2021 to our desired state in 2028." Desc="Figure 1: An overview of LDaCA's execution strategy (July 2025). Each strand (Collect & organise, Conserve, Find, Access, Analyse, Guide) is relevant to the technical architecture." Title="Figure 1: An overview of LDaCA's execution strategy (July 2025). Each strand (Collect & organise, Conserve, Find, Access, Analyse, Guide) is relevant to the technical architecture." Ref="Language Data Commons of Australia" >}}

<br>

## Recent developments

We have seen some recent developments which promise exciting future work. Pacific and Regional Archive of Digital Sources in Endangered Cultures [(PARADISEC)](https://www.paradisec.org.au/) colleague John Ferlito has created a new version of the LDaCA data portal using a simpler [API](https://www.ldaca.edu.au/resources/glossary/#api) that can be used for both PARADISEC and LDaCA. The same API could potentially be used for many other repositories, such as the [Nyingarn Repository](https://repository.nyingarn.net/).

<br>

{{< image Src="/2025-architecture/arocapi.png" Alt="" Desc="Figure 2: AROCAPI, a new RO-Crate API." Title="Figure 2: AROCAPI, a new RO-Crate API." Ref="PT Sefton" >}}

<br>

The new API is an [RO-Crate](https://www.ldaca.edu.au/resources/glossary/#ro-crate) API, that we have affectionately called *AROCAPI*. AROCAPI is a generic API for collections of ‘objects’ (or ‘items’) which are described using RO-Crates. We are working together with John to create a new [Oni](https://www.ldaca.edu.au/resources/glossary/#oni) stack using this API, and have been evaluating the API throughout its development rather than waiting until John finishes his work on it.

AROCAPI will extend data portals and act as a baseline for other infrastructure that uses RO-Crates. It is designed to work both "out of the box" with an industry-standard data portal UI and to be easily configurable for different audiences and domains. For example, PARADISEC will implement different authentication routes (using the existing "[Nabu](https://catalog.paradisec.org.au/)" catalog) for users to access data than the LDaCA data portal, which uses [CADRE](https://cadre.ada.edu.au/login).
Expand All @@ -32,7 +42,7 @@ In addition to AROCAPI, promising discussions are taking place with one of our

<br>

{{< image Src="/2025-architecture/pt-garage.png" Alt="Piles of boxes in a garage" Desc="Figure 1: An unstaged photo of PT Sefton’s garage, including a box labelled 'hard drives'." Title="Figure 1: An unstaged photo of PT Sefton’s garage, including a box labelled 'hard drives'." Ref="PT Sefton" Height="600" >}}
{{< image Src="/2025-architecture/pt-garage.png" Alt="Piles of boxes in a garage" Desc="Figure 3: An unstaged photo of PT Sefton’s garage, including a box labelled 'hard drives'." Title="Figure 3: An unstaged photo of PT Sefton’s garage, including a box labelled 'hard drives'." Ref="PT Sefton" Height="600" >}}

<br>

Expand All @@ -46,16 +56,19 @@ This project is a once-in-a-career opportunity to develop processes for organisi

Remember:


* Storage is not data management (particularly if the storage is a shopping bag full of mistreated hard drives).
* Passing boxes of storage devices hand to hand is NOT a good strategy to conserve data.
* Hard drives are not archives.

From the outset of the project, the LDaCA architecture has been designed around the principle that to build a research data commons, we need to look after data above all else. We took an approach that considered long-term data management separately from current uses of the data. This resulted in some design choices which are markedly different from those commonly seen in software development for research.

We put effort into:
<br>

{{< image Src="/2025-architecture/workspaces-and-repositories.png" Alt=" " Desc="Figure 4: Creating reusable and interoperable data objects with workspaces and repositories." Title="Figure 4: Creating reusable and interoperable data objects with workspaces and repositories." Ref="Marco La Rosa and PT Sefton" >}}

<br>

We put effort into:

* organising and describing data using open specifications *before* building features into applications
* designing an access-control system with long-term adaptability in mind (read [the story about that](https://www.ldaca.edu.au/news/posts/fair-care-eresearch-2022/) as presented at eResearch Australasia 2022)
Expand All @@ -64,6 +77,12 @@ We put effort into:

In 2024, we released the Protocols for Implementing Long Term Archival Repositories (PILARS), described in this [presentation at Open Repositories 2024](https://www.ldaca.edu.au/news/posts/open-repositories-2024-pilars/). The first principle of PILARS is that data should be portable, not locked in to a particular interface, service or mode of storage. Following the lead of PARADISEC two decades ago, the protocols call for storing data in commodity storage services such as file systems or (cloud) object storage services. This means that data is available independently of any specific software.

<br>

{{< image Src="/2025-architecture/data-findability.png" Alt=" " Desc="Figure 5: The PILARS approach allows for a wide range of architectures, sketched in the diagram above. Data stored according to the protocols can be indexed and served over an API (with appropriate access controls)." Title="Figure 5: The PILARS approach allows for a wide range of architectures, sketched in the diagram above. Data stored according to the protocols can be indexed and served over an API (with appropriate access controls)." Ref="PT Sefton" Height="600" >}}

<br>

With this foundation, and the new interoperability we gain from our collaboration on AROCAPI, we are well placed to collect and conserve more data and make use of data in workspace environments. For instance, the newly established LDaCA analytics forum, a working group formed to advise on the development of research analytics tools, will drive analytical workspaces. Work by the LDaCA technical team will continue to improve data preparation workspaces, for example by potentially collaborating to adapt the Nyingarn Workspace for general purpose use.

<br>
Expand All @@ -72,26 +91,34 @@ With this foundation, and the new interoperability we gain from our collaboratio

For the remainder of this post, we focus on developments in the archival repository functions of the LDaCA architecture – preparing, describing and sharing data so that it is accessible into the future. In the following section, we will share some of the architecture we have developed over the last few years.


#### The LDaCA data portal

The first example of LDaCA-developed architecture that we will share here is our data portal. Our data portal is a central access-controlled gateway to the data that we have been collecting.

During the project, it has been unclear how we would look after data at the conclusion of the project. No single organisation had put up its hand up to host data for the medium- to long-term. However, we have had some positive talks with one of our partner institutions indicating that they may have an appetite for hosting data that otherwise does not have a home. They may also be able to provide some redundancy for at-risk collections where data custodians are comfortable with a copy residing at the partner institution.
<br>

{{< image Src="/2025-architecture/data-portal.png" Alt="" Desc="Figure 6: Our data portal is an example of the pattern outlined in red above." Title="Figure 6: Our data portal is an example of the pattern outlined in red above." Ref="PT Sefton">}}

<br>

During the project, it has been unclear how we would look after data at the conclusion of the project. No single organisation had put up its hand up to host data for the medium- to long-term. However, we have had some positive talks with one of our partner institutions indicating that they may have an appetite for hosting data that otherwise does not have a home. They may also be able to provide some redundancy for at-risk collections where data custodians are comfortable with a copy residing at the partner institution.

#### Other ways of sharing data assets

Alongside the data portal, we have explored other ways of sharing data assets, including local distribution via portable computers such as Raspberry PI with a local wireless network. We have also discussed establishing regional cooperative networks where communities reduce risk by holding data for each other.
Alongside the data portal, we have explored other ways of sharing data assets, including local distribution via portable computers such as Raspberry Pi with a local wireless network. We have also discussed establishing regional cooperative networks where communities reduce risk by holding data for each other.

<br>

{{< image Src="/2025-architecture/raspberry-pi.png" Alt="" Desc="Figure 7: A Raspberry Pi containing a collection, in this case Batchelor Institute's CALL collection, enables mobile access via local wifi." Title="Figure 7: A Raspberry Pi containing a collection, in this case Batchelor Institute's CALL collection, enables mobile access via local wifi." Ref="Language Data Commons of Australia" >}}

<br>

#### Additional technical resources

With our partners, we have developed and adapted a suite of other technical resources, including:


* Oni portal software for mid-to-large deployments. Version 1 is live and Version 2 is currently under development with PARADISEC, involving a new shared API and code base that can be used across LDaCA and beyond.
* REMS overlaid with CADRE to manage access control for identified users. A service agreement between LDaCA and CADRE has been signed, to manage access control. REMS is still the backend of this tool, but CADRE’s wrapper makes it more user-friendly. CADRE version 2 will replace the admin component of REMS and is in the testing phase now.
* [REMS](https://github.com/CSCfi/rems) overlaid with [CADRE](https://cadre.ada.edu.au/login) to manage access control for identified users. A service agreement between LDaCA and CADRE has been signed, to manage access control. REMS is still the backend of this tool, but CADRE’s wrapper makes it more user-friendly. CADRE version 2 will replace the admin component of REMS and is in the testing phase now.
* ‘Corpus tools’ for migrating data from existing formats to LDaCA-ready RO-Crates are [available on github](https://github.com/Language-Research-Technology?q=corpus-tool). These reduce the cost of developing new migration tools by adapting existing corpus tools, provide reproducible migration processes and are a strong foundation for quality assurance checks.
* Software libraries for managing data in RO-Crate, maintaining schemas available on our [github organisation](https://github.com/Language-Research-Technology).
* RO-Crate preparation tools, including:
Expand All @@ -103,11 +130,15 @@ With our partners, we have developed and adapted a suite of other technical reso
* [Nyingarn](https://nyingarn.net/) (focussed on creating searchable text from manuscripts)
* Our next steps will involve a multi-modal workspace, for audio and video transcription.


#### Defining metadata schema

We use the following metadata standards:
<br>

{{< image Src="/2025-architecture/metadata.png" Alt="" Desc="Figure 8: We have extended existing metadata standards for use in different contexts." Title="Figure 8: We have extended existing metadata standards for use in different contexts." Ref="PT Sefton">}}

<br>

We use the following metadata standards:


* [OLAC - The OLAC Metadata Set and Controlled Vocabularies](https://aclanthology.org/W01-1506.pdf)
Expand All @@ -117,7 +148,7 @@ We have worked to define a metadata schema ([the Language Data Commons (LDAC) sc

<br>

{{< image Src="/2025-architecture/pilars-implementations-2024.png" Alt="A diagram." Desc="Figure 2: PILARS implementations – mid-2024." Title="Figure 2: PILARS implementations – mid-2024." Ref="PT Sefton">}}
{{< image Src="/2025-architecture/pilars-implementations-2024.png" Alt="A diagram." Desc="Figure 9: PILARS implementations – mid-2024." Title="Figure 9: PILARS implementations – mid-2024." Ref="PT Sefton">}}

<br>

Expand All @@ -127,8 +158,7 @@ The diagram above attempts to demonstrate how the PILARS principles have been im

## Future directions


{{< image Src="/2025-architecture/pilars-implementations-2026-1.png" Alt="A diagram." Desc="Figure 3: PILARS implementations – mid-2026?" Title="Figure 3: PILARS implementations – mid-2026?" Ref="PT Sefton">}}
{{< image Src="/2025-architecture/pilars-implementations-2026-1.png" Alt="A diagram." Desc="Figure 10: PILARS implementations – mid-2026?" Title="Figure 10: PILARS implementations – mid-2026?" Ref="PT Sefton">}}

<br>

Expand Down Expand Up @@ -156,6 +186,6 @@ We have an opportunity now to consider how the distributed LDaCA technical team

<br>

<br>


<br>
2 changes: 1 addition & 1 deletion themes/LoveIt
Submodule LoveIt updated 72 files
+3 −3 .circleci/config.yml
+1 −1 .husky/pre-commit
+6 −5 README.md
+7 −6 README.zh-cn.md
+7 −1 assets/css/_core/_media.scss
+2 −0 assets/css/_page/_home.scss
+2 −2 assets/css/_page/_index.scss
+1 −0 assets/css/_page/_single.scss
+2 −1 assets/css/_variables.scss
+3 −3 assets/data/cdn/cdnjs.yml
+3 −3 assets/data/cdn/jsdelivr.yml
+33 −1 assets/data/social.yml
+806 −955 assets/js/theme.js
+1 −1 assets/lib/VERSION
+1 −1 assets/lib/aplayer/APlayer.min.js
+ assets/lib/lightgallery/images/loading.gif
+3 −5 assets/lib/lunr/lunr.segmentit.js
+6 −6 assets/lib/valine/Valine.min.js
+1 −1 assets/lib/valine/emoji/apple.yml
+1 −1 assets/lib/valine/emoji/facebook.yml
+1 −1 assets/lib/valine/emoji/google.yml
+1 −1 assets/lib/valine/emoji/twitter.yml
+29 −0 assets/lib/valine/valine.scss
+1 −0 assets/svg/icons/malt.svg
+2 −2 exampleSite/content/about/index.en.md
+3 −3 exampleSite/content/about/index.zh-cn.md
+1 −1 exampleSite/content/posts/basic-markdown-syntax/index.en.md
+1 −1 exampleSite/content/posts/basic-markdown-syntax/index.zh-cn.md
+ exampleSite/content/posts/theme-documentation-basics/hugo-extended-edition.png
+24 −22 exampleSite/content/posts/theme-documentation-basics/index.en.md
+26 −24 exampleSite/content/posts/theme-documentation-basics/index.zh-cn.md
+ exampleSite/content/posts/theme-documentation-basics/language-switch.gif
+2 −2 exampleSite/content/posts/theme-documentation-built-in-shortcodes/index.en.md
+2 −2 exampleSite/content/posts/theme-documentation-built-in-shortcodes/index.zh-cn.md
+6 −6 exampleSite/content/posts/theme-documentation-content/index.en.md
+5 −5 exampleSite/content/posts/theme-documentation-content/index.zh-cn.md
+1 −6 exampleSite/content/posts/theme-documentation-extended-shortcodes/index.en.md
+1 −1 exampleSite/content/posts/theme-documentation-extended-shortcodes/index.zh-cn.md
+19 −11 exampleSite/hugo.toml
+12 −2 hugo.toml
+4 −4 i18n/de.toml
+12 −12 i18n/fr.toml
+199 −0 i18n/nl.toml
+198 −0 i18n/uk.toml
+2 −2 layouts/_default/summary.html
+3 −3 layouts/index.rss.xml
+4 −1 layouts/partials/assets.html
+1 −1 layouts/partials/footer.html
+3 −3 layouts/partials/head/seo.html
+3 −3 layouts/partials/header.html
+3 −2 layouts/partials/init.html
+1 −1 layouts/partials/rss/item.html
+3 −3 layouts/posts/rss.xml
+2 −2 layouts/posts/single.html
+1 −1 layouts/shortcodes/version.html
+3 −3 layouts/taxonomy/rss.xml
+3,535 −3,536 package-lock.json
+10 −9 package.json
+0 −0 resources/_gen/assets/css/2f1ef0.scss_3bb29f2bf07fe25edca3643ea9ec75e0.content
+0 −0 resources/_gen/assets/css/2f1ef0.scss_3bb29f2bf07fe25edca3643ea9ec75e0.json
+0 −0 resources/_gen/assets/css/790698.scss_0074190080154890d773cf4da464c67f.content
+0 −0 resources/_gen/assets/css/790698.scss_0074190080154890d773cf4da464c67f.json
+0 −0 resources/_gen/assets/css/f79aa6.scss_3014a957b725e34a6fe68078e0ba90aa.content
+0 −0 resources/_gen/assets/css/f79aa6.scss_3014a957b725e34a6fe68078e0ba90aa.json
+3 −0 resources/_gen/assets/css/style.scss_1a67ae4ed98f18e3ea7da02d2ccd80c9.content
+0 −0 resources/_gen/assets/css/style.scss_1a67ae4ed98f18e3ea7da02d2ccd80c9.json
+0 −0 resources/_gen/assets/lib/aplayer/dark.scss_259ecfe6b7a68e3c93e3acf036480b70.content
+0 −0 resources/_gen/assets/lib/aplayer/dark.scss_259ecfe6b7a68e3c93e3acf036480b70.json
+1 −1 resources/_gen/assets/lib/valine/valine.scss_b717925ff708a11c5ab1d6426462a4af.content
+0 −0 resources/_gen/assets/lib/valine/valine.scss_b717925ff708a11c5ab1d6426462a4af.json
+0 −3 resources/_gen/assets/scss/css/style.scss_d75fd08668b4bae707167bbce4d8ca46.content
+1 −1 theme.toml