You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apache Parquet is quickly becoming a new standard file format in (geospatial) data intensive computing, for its read and write speeds, small footprints, and many other reasons. Parquet files are very cheap to host on object storage, but most importantly very cheap to query, entirely removing the need for beefy ETL processes in many instances: just write a normal SQL query, e.g. with DuckDB, and you can get exactly the data you want, served from a static file.
open-MaStR could support this by offering a to_parquet function, next to the already existing CSV functionality.
This would also open the possibility for an even bolder move, to host the resultant Parquet files, so that developers and researchers can easily query the data they need, without waiting for the complete dump to run. The MaStR dataset has grown enough that improved querying ergonomics would result in meaningful improvements. A stellar example for this approach is Overture Maps, which collects and distributes openly available geospatial data in this way, making it extremly easy, for instance, to query data directly in QGIS or with DuckDB.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Parquet is quickly becoming a new standard file format in (geospatial) data intensive computing, for its read and write speeds, small footprints, and many other reasons. Parquet files are very cheap to host on object storage, but most importantly very cheap to query, entirely removing the need for beefy ETL processes in many instances: just write a normal SQL query, e.g. with DuckDB, and you can get exactly the data you want, served from a static file.
open-MaStRcould support this by offering ato_parquetfunction, next to the already existing CSV functionality.This would also open the possibility for an even bolder move, to host the resultant Parquet files, so that developers and researchers can easily query the data they need, without waiting for the complete dump to run. The MaStR dataset has grown enough that improved querying ergonomics would result in meaningful improvements. A stellar example for this approach is Overture Maps, which collects and distributes openly available geospatial data in this way, making it extremly easy, for instance, to query data directly in QGIS or with DuckDB.
Beta Was this translation helpful? Give feedback.
All reactions