Store and maintain the SQL query string in pg_ivm_immv.#128
Store and maintain the SQL query string in pg_ivm_immv.#128adamguo0 wants to merge 8 commits intosraoss:mainfrom
Conversation
|
Thank you for your proposal!
Yes, you are right. Also, we need to recreate dependencies after restore/upgrade, as you suggested before.
I am planning to fix the issue by similar but different approach. I was going to response you with my draft patch of other pull request, but I've not finished it, so I'll show my idea here. I don't like storing query string in pg_ivm_immv always, because we already have query_def column so this makes redundancy, and it needs additional maintenance such as handling table rename as you proposed to handle it by a DDL event trigger. Instead, I would like make two user commands or scripts to be run before and after pg_dump/restore or pg_upgrade. The script for pre-dump/upgrade creates a table for saving query definition string for every IMMV. And, the script for post-restore/upgrade read the table, and recreate query_def, trigger, dependency, and on. For this sake, adding a new SQL function like recreate_all_immvs (or recreate_immv, reset_immv, ...) would be useful. What do you think of this approach? |
|
Thank you for looking! I wonder if there's some way we can automate the pre-upgrade and post-upgrade commands. I think users may be hesitant to increase the complexity of database upgrades even further if pg_ivm requires more work for them. They also need to make sure that no DDLs are executed between running the pre-upgrade script and shutting down the database, and between starting the database and running the post-upgrade script, which could be challenging for a live application. A DDL trigger could maintain this second table automatically so that we can avoid the pre-upgrade script. Then maybe a hook in pg_ivm can apply the post-upgrade logic automatically. I think the maintenance tradeoff may be worth the reduced pain of upgrading. Eager to hear your thoughts, thanks! |
|
Of course, I understand the importance of automation and also would like automated way of pg_dump and pg_upgrade if we could. However, I don't think it is a big matter that running a script at executing pg_upgrade or restoring dump data, because servers should be stopped at pg_upgrade, and restore would be usually performed as a part of system maintenance. Also, some extensions require to execute scripts for upgrade and restore, for example, .... It may matter that query strings saved by a script could be inconsistent with table schemas output by pg_dump because pg_dump would be usually executed during system running. Maybe, this could be prevented by specifying exported snapshot by pg_export_snapshot() to pg_dump's --snapshot option, though. It makes sense that maintaining query strings stored using event trigger would be a way for getting query definitions that can be used to re-create view info. However, I am still not confidence that it is the best, because the redundant information in pg_ivm_immv catalog bothres me. Also, do you have a concreate idea on how to autoamte post-upgrade logic for re-creating view infomation without executing some script? I could not find usable hook for it. We might do it in a trigger when INSERT or COPY is executed on pg_ivm_immv or at the first incremental view maintenance after restore, though |
https://docs.timescale.com/self-hosted/latest/backup-and-restore/logical-backup/ |
I tried to find a hook or other way as well and had some trouble. Maybe we can use the pg_ivm Thank you for your explanation, I agree that the pre- and post-restore functions require the least significant design changes in the extension. I thought of another potential solution which avoids needing to run post-restore logic, but it still involves the query string + DDL trigger.
|
That was the only thing I could think of on the post-restore logic too. In this case, we have to detect if it is the first incremental maintenance or the first execution of any function. One concern is the impact to the performance of the first maintenance.
Another solution I thought was to make a (normal) view for each IMMV instead of storing view definition in pg_ivm_immv. We would get the correct view definition after the restore because pg_dump would dump the view definition. However, I wonder it would not still work well because this would just replace the problem to "how to save the view name with namespace"... The current thought is that; first introduce pre- and post- script approach that enables to support pg_dump/pg_upgrade with minimal change on pg_ivm, and then continue to discuss how to automate them, since each of both is definitely a improvement of user experience. For the automation, we might adopt event trigger approach or perhaps could find some better way. |
Recreate change prevent triggers during refresh_immv in addition to base table triggers.
|
The discussion here seems to have drifted between a fully-automated solution vs a solution that would require manual steps. FWIW, I think that both are the same thing, what only matters is the timing where we want to integrate this data and how we should include it in the dump. Like Nagata-san I am not really a fan of the approach taken by the patch set, for a couple of reasons:
Again, the proposed patch set on branch 20250520_pr is messy and unorganized so it is kind of hard for a reviewer to understand what's going on when going through all the changes. I'd recommend a rework if we want to keep going on with this design. I can't help but notice that the proposed patch set has an exact number of three TODOs. That does not bring a lot of confidence..
I would recommend a user-controlled set of pre- and post- scripts, rather than hacks in loading paths to enforce a behavior. That could be surprising for some users if this is enforced at a stage, particularly something as early as a one-time _PG_init(). Question here for @yugo-n (aka Nagata-san). What do you have in mind in terms of contents to get during the pre-script phase? Based on a set of guesses, I would imagine a user experience like:
All that comes down to how much you want the refreshes to take post-upgrade, but I think that there is also an argument for simplicity and pg_imv has its own internal complications by mimicking the Postgres code for CTAS and matviews, especially if in the future new internal triggers are added in pg_ivm, even if it means a more costly experience because the IMMVs need to be refreshed from scratch. I think that's simpler to let the parent tables replay first anyway, CREATE MATERIALIZED VIEW + REFRESH is what happens in pg_dump for the core matviews. |
|
So, considering the fact that pg_ivm may introduce a different function layer when creating its IMMVs in new versions and how the module works correctly with its dependency on OIDs for its internal objects, I would imagine an upgrade or dump/restore flow to work like that:
|
Hi, I have been looking into how we can smoothly support pg_dump/restore/upgrade with this extension (#118). This PR is part one of my idea. Any feedback is welcome!
As I understand it, there are two fundamental problems.
pg_ivm_immvtable stores a serialized Query tree, and this is not preserved by dump/restore or upgrade. Specifically dump/restore changes the referenced OIDs, and upgrade can change the Query tree struct itself.This means after dump/restore or upgrade, the IMMVs are not usable. In fact, we can't even run
refresh_immvto bring them back, because the original view query is lost.This PR consists of a few changes:
querystring textcolumn topg_ivm_immv. This column stores the fully-qualified SQL query that defines this view.querystringwhenever an object is renamed. This is needed so that the querystring does not become stale if any referenced objects are renamed.RefreshIMMVByOidso that it re-parses thequerystringand stores the updatedviewdefinpg_ivm_immv.recreate_all_immvsthat refreshes all IMMVs. This re-populates the data and re-creates all the base table and change control triggersWith this change, a user can run pg_dump/restore/upgrade, then run
pgivm.recreate_all_immvs()in order to restore their IMMVs to working order. They do not need to drop and re-create the IMMVs, and they do not need to input the original queries again.I hope to find a way to automate the
pgivm.recreate_all_immvs()step somehow. I played with creating executor/processutility hooks to automatically refresh the IMMVs during a restore/upgrade, but couldn't get it to work. I thought I would submit this portion of the PR first and get some feedback. Thanks!