-
Notifications
You must be signed in to change notification settings - Fork 2
Treewalk cuda #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
astro-YYH
wants to merge
29
commits into
MP-Gadget:master
Choose a base branch
from
astro-YYH:treewalk_cuda
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
7ad04f5
ev_primary to GPU
astro-YYH 859db38
tw->reduce to grav_short_reduce_device
astro-YYH bd086bd
tw->visit to force_treeev_shortrange_device
astro-YYH f7f50ff
ev_primary interactions count
astro-YYH 43c4534
ForceTree cuda managed
astro-YYH 822159b
test
astro-YYH 080ccbc
mem alloc free fix
astro-YYH 814e096
clean message
astro-YYH 0857031
initialize tree = {0} is necessary
astro-YYH 60cb3e3
initialization fix tree
astro-YYH fc0b262
gravity softening fix
astro-YYH b832437
short range table fix device
astro-YYH 7e3958a
test
astro-YYH fa6980e
runs but incorrect matter power
astro-YYH e087272
clean up
astro-YYH 148c299
cudaMallocManaged TreeParams
astro-YYH 18632a9
const argu
astro-YYH fd4d8e9
test kernel runs
astro-YYH 97dcc26
treewalk cpu flag
astro-YYH 614b44e
separate accn by children for accuracy
astro-YYH eaf6d6b
clean up
astro-YYH 7fe1f6f
fof still uses cpu
astro-YYH 6602817
treewalk secondary
astro-YYH 2987306
Guard OPENMP flag with CUDACC
sbird 44c0335
Restore const flags
sbird d7803ca
Add some defaults for CUDA compile flags
sbird 1cc6663
Label strings should be static strings, not heap memory
sbird 5bb4284
Merge pull request #3 from MP-Gadget/treewalk_cuda
astro-YYH d58586e
Merge branch 'master' into treewalk_cuda
sbird File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,7 @@ | ||
| *.a | ||
| *.png | ||
| *.o | ||
| *.d | ||
| .*.swp | ||
| .kdev4/ | ||
| .gdb_history | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,6 +14,7 @@ | |
| #include "timestep.h" | ||
| #include "gravshort.h" | ||
| #include "walltime.h" | ||
| #include "cuda_runtime.h" | ||
|
|
||
| /*! \file gravtree.c | ||
| * \brief main driver routines for gravitational (short-range) force computation | ||
|
|
@@ -95,17 +96,23 @@ force_treeev_shortrange(TreeWalkQueryGravShort * input, | |
| void | ||
| grav_short_tree(const ActiveParticles * act, PetaPM * pm, ForceTree * tree, MyFloat (* AccelStore)[3], double rho0, inttime_t Ti_Current) | ||
| { | ||
| TreeWalk tw[1] = {{0}}; | ||
| struct GravShortPriv priv; | ||
| priv.cellsize = tree->BoxSize / pm->Nmesh; | ||
| priv.Rcut = TreeParams.Rcut * pm->Asmth * priv.cellsize;; | ||
| priv.G = pm->G; | ||
| priv.cbrtrho0 = pow(rho0, 1.0 / 3); | ||
| priv.Ti_Current = Ti_Current; | ||
| priv.Accel = AccelStore; | ||
| TreeWalk *tw; | ||
| cudaMallocManaged(&tw, sizeof(TreeWalk)); // Allocate TreeWalk structure with Unified Memory | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one also probably stack-allocated, passed by value.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But let me take care of this after it is merged. |
||
| memset(tw, 0, sizeof(TreeWalk)); // Zero-initialize the structure | ||
| struct GravShortPriv *priv_ptr; | ||
| cudaMallocManaged(&priv_ptr, sizeof(struct GravShortPriv)); | ||
|
|
||
| // Initialize priv_ptr as usual | ||
| priv_ptr->cellsize = tree->BoxSize / pm->Nmesh; | ||
| priv_ptr->Rcut = TreeParams.Rcut * pm->Asmth * priv_ptr->cellsize; | ||
| priv_ptr->G = pm->G; | ||
| priv_ptr->cbrtrho0 = pow(rho0, 1.0 / 3); | ||
| priv_ptr->Ti_Current = Ti_Current; | ||
| priv_ptr->Accel = AccelStore; | ||
|
|
||
| int accelstorealloc = 0; | ||
| if(!AccelStore) { | ||
| priv.Accel = (MyFloat (*) [3]) mymalloc2("GravAccel", PartManager->NumPart * sizeof(priv.Accel[0])); | ||
| priv_ptr->Accel = (MyFloat (*) [3]) mymalloc2("GravAccel", PartManager->NumPart * sizeof(priv_ptr->Accel[0])); | ||
| accelstorealloc = 1; | ||
| } | ||
|
|
||
|
|
@@ -123,9 +130,15 @@ grav_short_tree(const ActiveParticles * act, PetaPM * pm, ForceTree * tree, MyFl | |
| tw->result_type_elsize = sizeof(TreeWalkResultGravShort); | ||
| tw->fill = (TreeWalkFillQueryFunction) grav_short_copy; | ||
| tw->tree = tree; | ||
| tw->priv = &priv; | ||
| tw->priv = priv_ptr; | ||
| message(0, "gravity_short_tree: tree structure initialized.\n"); | ||
| struct gravshort_tree_params *TreeParams_ptr; | ||
| cudaMallocManaged(&TreeParams_ptr, sizeof(struct gravshort_tree_params)); | ||
| *TreeParams_ptr = TreeParams; | ||
|
|
||
| treewalk_run(tw, act->ActiveParticle, act->NumActiveParticle); | ||
| treewalk_run(tw, act->ActiveParticle, act->NumActiveParticle, TreeParams_ptr); | ||
| /* Free the memory */ | ||
| cudaFree(TreeParams_ptr); | ||
|
|
||
| /* Now the force computation is finished */ | ||
| /* gather some diagnostic information */ | ||
|
|
@@ -148,8 +161,12 @@ grav_short_tree(const ActiveParticles * act, PetaPM * pm, ForceTree * tree, MyFl | |
| * avoiding the fully open O(N^2) case.*/ | ||
| if(TreeParams.TreeUseBH > 1) | ||
| TreeParams.TreeUseBH = 0; | ||
|
|
||
| if(accelstorealloc) | ||
| myfree(priv.Accel); | ||
| myfree(priv_ptr->Accel); | ||
|
|
||
| cudaFree(priv_ptr); | ||
| cudaFree(tw); | ||
| } | ||
|
|
||
| /* Add the acceleration from a node or particle to the output structure, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this stack-allocated again, but pass it by value (and probably const).