Build ROCm-git #1222
-
|
Is it possible to build the ROCm components from their master branches instead of the 7.0 beta branch? From my understanding, a specific gfx1151 fix is not present in the 7.0 branch and it's currently causing GPU hangs. Would love if I could leverage TheRock to more easily build from source |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
|
Bumping the branches is currently a manual process that we do. The reason for this is that there is nothing enforcing that all ROCm components build together at ToT -- so we fall back on a more ad-hoc process that produces known good commit points. But this is all going away. We've been working in the background to make all of the trunk branches across ROCm (often called "develop") buildable together all of the time. The primary vehicles for this are aggregation into the library and system super-repos:
This will reduce the number of pins we have to manage to approximately 3 (compiler, system, libraries) and we are bringing up CI and changing development processes so that the ToT develop branches in each always works together and this is enforced at commit time. As that process completes, we will add automation that keeps the pins in TheRock at latest green across the repos, which for most cases should be very close to ToT (modulo testing latency). TheRock was just switched to build from rocm-libraries yesterday, so this is all very fresh. We're finishing getting the CI up to enforce consistency and then will add automation. We let the updates lag a bit while making this switch but in the meantime will resume regular manual updates until the automation is in place. We'll make the same transition for the system components next month. We expect that taken together, all of these efforts will make ROCm much more attractive for building from source since all components can be relied on to be compatible at ToT. |
Beta Was this translation helpful? Give feedback.
-
|
All of the ROCm repositories that were historically developed in a non public place are being moved to a fully open-source development and contribution model as part of this move (which should be complete in the next ~month). In addition, we are doing team by team work to switch people's development flow from big-feature-branch based to trunk based. There are a lot of engineers and change takes time, but most of the engineers are happy to change their approach to the development model with some of the artificial impediments removed (such as internal only CI, internal only branches, no external contributions taken directly, etc). There are a lot of engineers, and I expect that in practice, some parts will progress faster than others, but we are changing the development norms. It isn't just a code re-organization. The throw it over the wall mentality wasn't just a problem for external contributors: having all of the fragmentation and silos was also impacting even the projects that were further along on their OSS journey. Those of us who were more on the OSS side would spend an inordinate amount of time chasing patches between tightly coupled repos on different sides of the firewalls, fighting to get patches into the waterfall in a way that they could be used, etc. Getting everyone on the same page has simplified that a lot. Ultimately, you have no reason to trust us on this, but I do assure you that the commitment to actually moving the development process itself forward towards OSS norms is much more than just moving code around. And by concentrating the code locations a bit more, we get to concentrate more of our engineers who have experience operating this way: whereas each of the little silos may have had just an engineer or two who was oriented towards a more inclusive development model, now they get to work together. Ultimately the divisions in the code were also causing divisions in the teams, and removing that is helping us create more momentum to do better across the board. There's a long way to go, but I'm hopeful. |
Beta Was this translation helpful? Give feedback.
Bumping the branches is currently a manual process that we do. The reason for this is that there is nothing enforcing that all ROCm components build together at ToT -- so we fall back on a more ad-hoc process that produces known good commit points. But this is all going away.
We've been working in the background to make all of the trunk branches across ROCm (often called "develop") buildable together all of the time. The primary vehicles for this are aggregation into the library and system super-repos:
This will r…