Skip to content

Conversation

@jonathansick
Copy link
Member

Hoverdrive provides two endpoints for getting documentation links about tables and columns respectively. These endpoints are described in https://sqr-086.lsst.io.

Both endpoints provide an optional mode triggered by a ?redirect=true query parameter where the client is redirected to the most-relevant documentation URL. In this case, only a single column or table can be specified by the ?table and ?column parameters. I'm not sure how to encode this logic in the service descriptor.

Hoverdrive is currently deployed on data-dev and data-int. Online API docs.

Checklist

When making changes to YAML files in the schemas directory:

Copy link
Member Author

@jonathansick jonathansick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's some things I'm not sure of:

  • Is it valid to include two RESOURCE tags in one service descriptor XML file?
  • What is the usage of the INFO tags, or are these deprecated?
  • How should the datalink-manifest.json file be used? It seems to be related to the INFO tags.

@stvoutsin
Copy link
Contributor

stvoutsin commented Apr 23, 2025

Here's some things I'm not sure of:

  • Is it valid to include two RESOURCE tags in one service descriptor XML file?
  • What is the usage of the INFO tags, or are these deprecated?
  • How should the datalink-manifest.json file be used? It seems to be related to the INFO tags.

Some feedback on these, but folks who were more involved in writing these might want to correct me if any of this is wrong:

It should be perfectly valid to include multiple RESOURCE tags in one service descriptor XML file.
We can define multiple different datalink services, and our RubinTableWriter code in the TAP service explicitly handles adding these to the VOTable document.

The INFO tags are being used as template placeholders in this implementation.
The tags with syntax like $dp02_dc2_catalogs_DiaObject_diaObjectId$ act as variables that get replaced with actual column references.

The datalink-manifest.json file basically serves as a registry that maps datalink service IDs to required column names.
We use this manifest to determine which datalink services should be included based on the selected columns in the query, i.e. only including services when all their required columns are present in the results.
It works with the template-based system where INFO tags define parameters that get replaced with actual column references.

@stvoutsin
Copy link
Contributor

I don't know if there is a better way to encode the conditional validation for using redirect=true with single params, but perhaps we can extend the descriptions to specify this.

In particular I'm thinking adding something like this to the redirect param description:
Whether to redirect to the most relevant documentation link. When set to true, only a single column can be specified.

And then for the column params (column, table..)
The name of the column. Multiple columns can be specified unless redirect=true.

@gpdf gpdf self-requested a review April 23, 2025 22:58
Copy link
Collaborator

@gpdf gpdf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathansick Can we find a time to talk about what you had in mind w.r.t. the "optional" parameter? Perhaps this is something that could be brought up with the IVOA, if we agree that it's valuable.

I think @robyww would be interested in this as well.

@jonathansick
Copy link
Member Author

The whole idea of a ?redirect query parameter came up from my recollection of an initial conversation we had a year ago on this documentation linking feature. I must have misunderstood, but now being confronted with writing a service descriptor, I think it'll be more expeditious to make the redirect functionality its own set of endpoints. It might be neat to add optional parameters to the standard, but I don't think it's necessary here. And better yet, the non-redirect functionality isn't implemented in Hoverdrive yet, so separating the endpoints clarifies that. I'll update hoverdrive and then update this PR.

Copy link
Member Author

@jonathansick jonathansick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gpdf I've updated Hoverdrive with new redirect-specififc endpoints, which now make the service descriptor (updated on this branch) easier to write. At the moment we only have endpoints for the redirect functionality. We'll have to discuss what response is desired when asking for multiple links.

Does this service descriptor look good? I'm still unsure of what, if anything, to do with datalink-manifest.json since these endpoints run on any table or column name, not a specific column name.

@jonathansick jonathansick requested a review from gpdf May 6, 2025 18:25
<PARAM name="accessURL" datatype="char" arraysize="*"
value="$baseUrl$/api/hoverdrive/column-docs-redirect"/>
<GROUP name="inputParams">
<PARAM name="table" datatype="char" use="required">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot have the 'use' attribute in the PARAM so I suspect we will have to remove it from the three PARAMs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it looks as if the PARAMs need a value attribute. I think we can set value="" for these which are being templated.

@@ -0,0 +1,40 @@
<?xml version="1.0" encoding="UTF-8"?>
<VOTABLE xmlns="http://www.ivoa.net/xml/VOTable/v1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2">

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a couple of INFO elements here:

  <INFO name="$tap_schema_columns_table_name$" ID="$tap_schema_columns_table_name$" value="this will be dropped..." />
  <INFO name="$tap_schema_columns_column_name$" ID="$tap_schema_columns_column_name$" value="this will be dropped..." />

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what's meant by "this will be dropped..."? I saw it elsewhere and wasn't sure whether it means this was a deprecated feature or something else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that is confusing, but it basically means that those INFO elements are temporary placeholders that get replaced during processing. They are used to establish references between the columns and the datalink service params.
So basically the ref value in the datalink resource definition ends up getting as a value the unique ID of the column (and the INFO elements get used through the processing step to achieve that).

So assuming you have this field in your results that corresponds to a datalink parameter:

      <FIELD name="table_name" datatype="char" arraysize="64*" ID="col_0">
        <DESCRIPTION>the table this column belongs to</DESCRIPTION>
      </FIELD>

This would end up being referenced to like this in the datalink resource:

      <PARAM name="table" datatype="char" ref="col_0" value="">
        <DESCRIPTION>The name of the table.</DESCRIPTION>
      </PARAM>

jonathansick and others added 3 commits May 16, 2025 11:06
Hoverdrive provides two endpoints for getting documentation links about
tables and columns respectively. These endpoints are described in
https://sqr-086.lsst.io.

Both endpoints provide an optional mode triggered by a ?redirect=true
query parameter where the client is redirected to the most-relevant
documentation URL. In this case, only a single column or table can be
specified by the ?table and ?column parameters. I'm not sure how to
encode this logic in the service descriptor.
Hoverdrive now provides /column-docs-redirect and /table-docs-redirect
endpoints. This removes the need to express the optional redirect query
parameter in the service descriptor.

Since we haven't implemented getting VO Tables of documentation links
yet, only these redirects are implemented and represented in the service
descriptor right now.
Now we're connecting the service descriptor to the schema table's coluns

- ref attributes in the PARAM fields map to INFO tags
- INFO tags map those ref templates to column names in the tap schema
- datalink-manifest.json maps the column names to the datalink service
  descriptor

Co-authored-by: stvoutsin <[email protected]>
@jonathansick
Copy link
Member Author

Thanks to some helpful coaching from @stvoutsin I think this service descriptor is closer to "right", although I still don't know how to test it. Any advice on next steps would be great!

What we've done is figure out the ref attribute to put on the PARAM elements, map those to INFO elements, and then expose the service descriptor in datalink-manifest.json with those column names.

The hoverdrive/column-docs-redirect endpoint needs both table and column
name columns, but the hoverdrive/table-docs-redirect endpoint takes just
the table name column. With the previous set up, the manifest prevented
the table-docs-redirect endpoint from being used because a circumstance
with only the table+column names was present.

Splitting the service descriptor into separate XML files for each set of
unique column dependencies should solve this.
@jonathansick
Copy link
Member Author

@stvoutsin stood this up in data-dev and this is the result for https://data-dev.lsst.cloud/api/tap/sync?LANG=ADQL&REQUEST=doQuery&QUERY=SELECT+TOP+1+*+FROM+tap_schema.columns :

<?xml version="1.0" encoding="UTF-8"?>
<VOTABLE xmlns="http://www.ivoa.net/xml/VOTable/v1.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.4">
  <RESOURCE type="results">
    <INFO name="QUERY_STATUS" value="OK" />
    <INFO name="QUERY_TIMESTAMP" value="2025-05-22T23:02:37.441" />
    <INFO name="QUERY" value="SELECT TOP 1 * FROM tap_schema.columns" />
    <TABLE>
      <FIELD name="table_name" datatype="char" arraysize="64*" ID="col_0">
        <DESCRIPTION>the table this column belongs to</DESCRIPTION>
      </FIELD>
      <FIELD name="column_name" datatype="char" arraysize="64*" ID="col_1">
        <DESCRIPTION>the column name</DESCRIPTION>
      </FIELD>
      <FIELD name="utype" datatype="char" arraysize="512*" ID="col_2">
        <DESCRIPTION>lists the utypes of columns in the tableset</DESCRIPTION>
      </FIELD>
      <FIELD name="ucd" datatype="char" arraysize="64*" ID="col_3">
        <DESCRIPTION>lists the UCDs of columns in the tableset</DESCRIPTION>
      </FIELD>
      <FIELD name="unit" datatype="char" arraysize="64*" ID="col_4">
        <DESCRIPTION>lists the unit used for column values in the tableset</DESCRIPTION>
      </FIELD>
      <FIELD name="description" datatype="char" arraysize="512*" ID="col_5">
        <DESCRIPTION>describes the columns in the tableset</DESCRIPTION>
      </FIELD>
      <FIELD name="datatype" datatype="char" arraysize="64*" ID="col_6">
        <DESCRIPTION>lists the ADQL datatype of columns in the tableset</DESCRIPTION>
      </FIELD>
      <FIELD name="xtype" datatype="char" arraysize="64*" ID="col_7">
        <DESCRIPTION>a DALI or custom extended type annotation</DESCRIPTION>
      </FIELD>
      <FIELD name="arraysize" datatype="char" arraysize="16*" ID="col_8">
        <DESCRIPTION>lists the size of variable-length columns in the tableset</DESCRIPTION>
      </FIELD>
      <FIELD name="&quot;size&quot;" datatype="int" ID="col_9">
        <DESCRIPTION>deprecated: use arraysize</DESCRIPTION>
      </FIELD>
      <FIELD name="principal" datatype="int" ID="col_10">
        <DESCRIPTION>a principal column; 1 means 1, 0 means 0</DESCRIPTION>
      </FIELD>
      <FIELD name="indexed" datatype="int" ID="col_11">
        <DESCRIPTION>an indexed column; 1 means 1, 0 means 0</DESCRIPTION>
      </FIELD>
      <FIELD name="std" datatype="int" ID="col_12">
        <DESCRIPTION>a standard column; 1 means 1, 0 means 0</DESCRIPTION>
      </FIELD>
      <FIELD name="column_index" datatype="int" ID="col_13">
        <DESCRIPTION>recommended sort order when listing columns of a table</DESCRIPTION>
      </FIELD>
      <DATA>
        <TABLEDATA>
          <TR>
            <TD>dp02_dc2_catalogs.CcdVisit</TD>
            <TD>band</TD>
            <TD />
            <TD>meta.id;instr.bandpass</TD>
            <TD />
            <TD>Name of the band used to take the exposure where this source was measured. Abstract filter that is not associated with a particular instrument.</TD>
            <TD>char</TD>
            <TD />
            <TD>*</TD>
            <TD />
            <TD>1</TD>
            <TD>0</TD>
            <TD>0</TD>
            <TD>13</TD>
          </TR>
        </TABLEDATA>
      </DATA>
    </TABLE>
    <INFO name="placeholder" value="ignore" />
  </RESOURCE>
  <RESOURCE type="meta" name="ColumnDocumentationRedirect" utype="adhoc:service">
    <DESCRIPTION>Redirect to the most relevant documentation link for a column.</DESCRIPTION>
    <PARAM name="accessURL" datatype="char" arraysize="*" value="https://data-dev.lsst.cloud/api/hoverdrive/column-docs-redirect" />
    <PARAM name="exampleURL" datatype="char" arraysize="*" value="https://data-dev.lsst.cloud/api/hoverdrive/column-docs-redirect?table=dp02_dc2_catalogs.Object&amp;column=detect_isPrimary">
      <DESCRIPTION>Example request to redirect to the documentation for the 'detect_isPrimary' column in the 'dp02_dc2_catalogs.Object' table.</DESCRIPTION>
    </PARAM>
    <GROUP name="inputParams">
      <PARAM name="table" datatype="char" arraysize="*" ref="col_0" value="">
        <DESCRIPTION>The name of the table.</DESCRIPTION>
      </PARAM>
      <PARAM name="column" datatype="char" arraysize="*" ref="col_1" value="">
        <DESCRIPTION>The name of the column.</DESCRIPTION>
      </PARAM>
    </GROUP>
  </RESOURCE>
  <RESOURCE type="meta" name="TableDocumentationRedirect" utype="adhoc:service">
    <DESCRIPTION>Redirect to the most relevant documentation link for a table.</DESCRIPTION>
    <PARAM name="accessURL" datatype="char" arraysize="*" value="https://data-dev.lsst.cloud/api/hoverdrive/table-docs-redirect" />
    <PARAM name="exampleURL" datatype="char" arraysize="*" value="https://data-dev.lsst.cloud/api/hoverdrive/table-docs-redirect?table=dp02_dc2_catalogs.Object">
      <DESCRIPTION>Example request to redirect to the documentation for the 'dp02_dc2_catalogs.Object' table.</DESCRIPTION>
    </PARAM>
    <GROUP name="inputParams">
      <PARAM name="table" datatype="char" arraysize="*" ref="col_0" value="">
        <DESCRIPTION>The name of the table.</DESCRIPTION>
      </PARAM>
    </GROUP>
  </RESOURCE>
</VOTABLE>

That column schema query provides the data links for both the table and column documentation redirect, and it looks like the refs are pointing to the correct columns, so this should work!

@gpdf gpdf added this to the DP1-revision milestone Jun 20, 2025
@gpdf gpdf added the enhancement New feature or request label Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants