Skip to content

Fix bug in getdxdyc for parallel runs#238

Open
jcphill wants to merge 1 commit into
dtarb:Developfrom
jcphill:dinf-dxdyc-fix
Open

Fix bug in getdxdyc for parallel runs#238
jcphill wants to merge 1 commit into
dtarb:Developfrom
jcphill:dinf-dxdyc-fix

Conversation

@jcphill
Copy link
Copy Markdown
Contributor

@jcphill jcphill commented Mar 29, 2022

linearpart::getdxdyc() would silently fail to return values
for neighbor cells from other ranks, resulting in bad AreaDinf output.

linearpart<datatype>::getdxdyc() would silently fail to return values
for neighbor cells from other ranks, resulting in bad AreaDinf output.
@dtarb
Copy link
Copy Markdown
Owner

dtarb commented Mar 29, 2022

Do you have an example where this actually causes a problem. I have not investigated this specifically now, but I am skeptical because Areadinf has been tested a lot with multiple processes and ranks, and I think the approach used of having the buffer row at the bounds of each rank, and swapping after each pass likely prevents an actual error.

@jcphill
Copy link
Copy Markdown
Contributor Author

jcphill commented Mar 29, 2022

Yes, in TauDEM-Test-Data/Input/Geographic running "mpiexec -np XXX AreaDinf enogeo.tif" with different rank counts gives enogeosca.tif files for which gdalcompare.py reports pixel differences (thousands of pixels but maximum difference of 2 or so).

@dtarb
Copy link
Copy Markdown
Owner

dtarb commented Mar 29, 2022

Thanks. I'll check it out.

@jcphill
Copy link
Copy Markdown
Contributor Author

jcphill commented Mar 29, 2022

Example output (from original, unfixed version):

$ mpiexec -n 1 ../../../build/areadinf -ang enogeoang.tif -sca enogeosca1.tif
AreaDinf version 5.3.9
Input file enogeoang.tif has geographic coordinate system.
This run may take on the order of 1 minutes to complete.
This estimate is very approximate.
Run time is highly uncertain as it depends on the complexity of the input data
and speed and memory of the computer. This estimate is based on our testing on
a dual quad core Dell Xeon E5405 2.0GHz PC with 16GB RAM.
Nodata value input to create partition from file: -340282346638528859811704183484516925440.000000
Nodata value recast to float used in partition raster: -340282346638528859811704183484516925440.000000
Processors: 1
Read time: 0.137529
Compute time: 1.248470
Write time: 0.041544
Total time: 1.427543
$ mpiexec -n 12 ../../../build/areadinf -ang enogeoang.tif -sca enogeosca12.tif
AreaDinf version 5.3.9
Input file enogeoang.tif has geographic coordinate system.
Nodata value input to create partition from file: -340282346638528859811704183484516925440.000000
Nodata value recast to float used in partition raster: -340282346638528859811704183484516925440.000000
This run may take on the order of 1 minutes to complete.
This estimate is very approximate.
Run time is highly uncertain as it depends on the complexity of the input data
and speed and memory of the computer. This estimate is based on our testing on
a dual quad core Dell Xeon E5405 2.0GHz PC with 16GB RAM.
Processors: 12
Read time: 0.028167
Compute time: 0.140669
Write time: 0.033118
Total time: 0.201954
$ gdalcompare.py enogeosca1.tif enogeosca12.tif
Files differ at the binary level.
Band 1 checksum difference:
Golden: 5380
New: 5431
Pixels Differing: 31527
Maximum Pixel Difference: 1.875
Differences Found: 2

@jcphill
Copy link
Copy Markdown
Contributor Author

jcphill commented Jun 14, 2022

Do you have time to look at this?

@dtarb
Copy link
Copy Markdown
Owner

dtarb commented Jun 15, 2022

Sorry - I have not had time yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants