Replies: 6 comments 2 replies
-
|
This was originally shared as a draft (which it honestly still is) in a gist, and I'll import some brief questions and a reply below: from @jelly
reply from @Nihlus:
|
Beta Was this translation helpful? Give feedback.
-
|
I reworked the schema slightly to account for the above - |
Beta Was this translation helpful? Give feedback.
-
|
I've commented on the gist yesterday already (and most if not all concerns have been addressed) so here I'll just say that I very much applaud this design first, implementation second approach! I also very much applaud these issues being addressed in the first place! 👍 😄 Thank you! <3 |
Beta Was this translation helpful? Give feedback.
-
|
I've started a prototype implementation of the proposed schema and incorporated some more feedback and discoveries into a new revision. The main highlights are that As part of the prototype, I'm also examining the API for the changes we would need to make. I'll update the post with my proposal for that soon. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the work! For Debian the same source and binary package can be part of multiple releases. Currently this is not reflected in the design resulting in the same package being imported and tested multiple times. Also in Debian source packages can build binary packages for multiple components (not sure if that is important here). |
Beta Was this translation helpful? Give feedback.
-
|
I've updated the schema to push blobs out into their own tables and edited the proposed logic to copy build results to newly-registered packages for identical inputs on the same distribution. Additionally, incoming results will be registered for all applicable releases of the same build input. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is a proposal for a set of changes to the database schema of rebuilderd in
order to better support the increasing variety of Linux distributions that can
be rebuilt.
The changes unify the handling of core identifying features for individual
source and binary packages, add missing build pipeline metadata, and enable
easier integration of future distribution support.
Problem statement
The current database schema of rebuilderd is mainly built around Arch Linux's
packaging system, reflected in both its structure and field naming. This works
for the basic uses of rebuilderd, but brings challenges for other distributions.
In particular, multiple Debian releases cannot currently coexist within the same
rebuilderd database, requiring several instances of rebuilderd with separate
databases to operate. Multiple architectures within the same database face
similar challenges.
A number of GitHub issues exemplify these hurdles:
Terminology
Various distributions of Linux use different terminology to refer to the same
things. In order to avoid confusion throughout the rest of the document, and to
serve as a basis for some of the proposed changes, the following definitions are
adopted.
Debian, Arch Linux, Fedora, etc.
Several releases may be part of a single overall version of the distribution.
describing it. Source packages are not installable, and can only be used to
build binary packages.
under.
from the source version.
To exemplify, here are some equivalents from the aforementioned distributions and which term they'd fall under.
nullScope and use cases
Firstly, let's identify the requirements for a database schema supporting our
target use cases. To simplify reasoning about such requirements, we will focus
on packaging systems, rather than individual distributions. The distributions
we already have support for will serve as examples and to validate any
assumptions made.
Core to any relational database schema is modeling the data that is to go into
it. In our cases, that is primarily about how to uniquely identify both source
and binary packages for rebuilds. The packaging systems have the following
relevant relational constraints.
As we can see, there is significant overlap between the packaging systems with
some notable differences. In particular,
Therefore, our database schema must support each of these variants at the same
time while also minimizing data duplication.
Proposed design
The following initial proposal for a reworked database schema takes on the above
requirements with a few design guidelines in mind.
coexist in the database at any given time
form), meaning that
table
state of the database
The current schema is not too far off from ticking most of these boxes and
mostly requires some renaming and the moving of certain data between existing
tables.
An interactive diagram of the proposed new schema can be found here, and a

static image is included for rapid reference.
While not very complex, there are some notable changes from the current model.
package or inserted independently, which is the more common approach.
packages. This is now the starting point for any rebuild actions and allows
multiple rebuilds for the same source package to coexist (such as different
architectures).
snapshot, containing all current and past attempts to reproduce a source
package based on its build information.
common ancestor (typically source_packages).
equivalents.
Challenges
I believe the proposed schema should resolve most, if not all, of the
mentioned challenges raised by the community. Some of the changes are in service
of future proposed work (historical data, for example) and can be pruned back if
desired.
Migrations from the old schema to this schema should be relatively simple though
certainly not trivial.
The new schema would obviously necessitate a number of behavioural changes and
updates to database management throughout the daemon. In particular, several
queries that currently pull data from a single table would need to either join
several tables or insert data into several tables depending on the use case.
Beta Was this translation helpful? Give feedback.
All reactions