-
Notifications
You must be signed in to change notification settings - Fork 352
feat(GeoDataFrame support): geopandas support for shapefile exporting #2671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…frame()` routines
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2671 +/- ##
===========================================
+ Coverage 55.5% 72.4% +16.8%
===========================================
Files 644 667 +23
Lines 124135 130810 +6675
===========================================
+ Hits 68947 94739 +25792
+ Misses 55188 36071 -19117
🚀 New features to boost your workflow:
|
|
@wpbonelli it looks like the smoke tests are failing on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to be able to lean on geopandas for shapefile export. I guess we should look at using xarray for netcdf export similarly
General comment re: deprecations, we usually put an expected removal version in the deprecation warning which I think is typically ~2 minor releases ahead?
| @property | ||
| def geo_dataframe(self): | ||
| """ | ||
| Returns a geopandas GeoDataFrame of the model grid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the running convention is to add deprecation notes to docstrings as well as the warning in the code itself
| filename : str or PathLike | ||
| path of the shapefile to write | ||
| mg : flopy.discretization.grid.Grid object | ||
| modelgrid : flopy.discretization.grid.Grid object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest keeping/deprecating mg just making it optional and introducing modelgrid alongside it to avoid breakage. or, since this function will go away soon, is it worth renaming this parameter at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed back to mg and added deprecation notes
flopy/export/shapefile_utils.py
Outdated
|
|
||
| def model_attributes_to_shapefile( | ||
| path: Union[str, PathLike], | ||
| filename: Union[str, PathLike], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as mg/modelgrid above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted naming convention and added deprecation notes
| Parameters | ||
| ---------- | ||
| shpname : str or PathLike | ||
| filename : str or PathLike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted naming convention and added deprecation notes
flopy/export/shapefile_utils.py
Outdated
| geoms, | ||
| shpname: Union[str, PathLike] = "recarray.shp", | ||
| mg=None, | ||
| filename: Union[str, PathLike] = "recarray.shp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted naming convention and added deprecation notes
|
|
||
| return gdf | ||
|
|
||
| def export(self, f, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we would ideally deprecate export() now that the geodataframe approach is the recommended way to export to shapefile, but since export() also handles netcdf we have to wait til there is a similar netcdf export capability via e.g. xarray? working in the same way as the geopandas approach in this PR like to_xarray().to_netcdf()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the long term I think we should deprecate export() for methods like .to_netcdf(), .to_vtk(), .to_raster() to be explicit and follow similar conventions as pandas, geopandas, etc...
| self.repeating = True | ||
| self.empty_keys = {} | ||
|
|
||
| def to_geodataframe(self, gdf=None, kper=0, sparse=False, truncate_attrs=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case you have in mind for sparse=False? Since stress period data is sparse by default (at least in 3.x) it seems reasonable to return a sparse geodataframe too, either exclusively or at least by default?
Kind of like what you're doing in shapefile_export_example.py for WEL.
I guess I'm wondering when someone calling to_geodataframe() on stress period data would want one containing the whole grid, rather than just the relevant boundary conditions. In that case couldn't they get the grid geodataframe and .merge() it manually with the SPD frame or pass the grid frame into the gdf parameter here.
Performance wise, it seems expensive to always do the full grid then dropna().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse=False is useful when exporting from higher level objects like gwf.to_geodataframe(), from multiple individual packages into the same geodataframe, and if a user is exporting multiple stress periods with different boundary conditions from the same package. Because the geodataframe is constructed in the modelgrid object, it is created as a single layer representation of the whole grid and then we filter after the fact. For operations like gwf.to_geodataframe() there is a single construction of geometries and the geodataframe is then passed through to each package/datatype export.
Additionally sparse operates a little differently depending on if it's used on a TransientList or a MFList like object. On a TransientList the dataframe and attributes are constructed using a full grid representation and then filtered after all stress-period data is added. This is a much simpler operation than trying to reconcile multiple sparse dataframes from many stress-periods into a single dataframe.
We could cache a copy of the polygons after creation to improve performance; however, we don't currently give an option on the grid to construct a GeoDataFrame with only certain cells. We could add a nodes= parameter or kwarg to the grid .to_geodataframe() method which would only construct geometries for those nodes, however this will most likely be used in limited cases like exporting data from a single stress period or exporting SFR/LAK/UZF packagedata.
|
|
||
| @property | ||
| def geo_dataframe(self): | ||
| def geodataframe(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why to_geodataframe... elsewhere and not here, other than the fact the old property didn't have to_...?
suggest also deprecating the old one to avoid breakage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments here:
-
geodataframewas used to keep the convention consistent with other properties on theGeoSpatialUtilclass (e.g.,shapely,geojson,points, etc...). -
I did not add a deprecation here because this is a low-level object that generally shouldn't be called by end users. This serves as a "catch all" in methods that handle vector geospatial data and converts it to the type that the function was originally written for. I can keep the old method around and add a deprecation warning in the off chance someone is using this method to convert vector data outside of the intended use case.
| optional attribute name, default uses util2d name | ||
| forgive : bool | ||
| optional flag to continue if data shape not compatible with GeoDataFrame | ||
| truncate_attrs : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think GeoDataFrame.to_file() already truncates for you when writing to shapefile, is this parameter necessary? It seems like having our own API for this would be useful if we let the user customize the truncation but otherwise redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The truncate_attrs flag is definitely useful. to_file() does truncate the attributes, but it just cuts off the end and can potentially write a dbf file with multiple attributes that have identical names. truncate_attrs maintains the way we previously constructed attributes in flopy by removing text from the middle of the attribute name and keeping the layer and stress period number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it- didn't realize this. thanks!
| geoms = GeoSpatialCollection(geoms, "Point") | ||
| gdf = gpd.GeoDataFrame.from_features(geoms) | ||
|
|
||
| for col in list(welldata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of looping to add columns, could this just be
gdf = gpd.GeoDataFrame(welldata, geometry=geoms)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated this, need to use:
geoms = GeoSpatialCollection(geoms, "Point").shape
gdf = gpd.GeoDataFrame(welldata, geometry=geoms)for this type of GeoDataFrame construction because it expects an arraylike object
christianlangevin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look through this PR. Really nice, @jlarsen-usgs. Excited to see it rolled out.
This PR supersedes the draft PR #2573 and continues the cleanup of legacy shapefile exporting code and integration of geopandas GeoDataFrame support.