Implement distributed training using horovod #1865

NanoNabla · 2021-05-05T21:36:08Z

I implemented distributed training using Horovod similar to the one that made it already into DeepSpeech.
I already opened a discussion #1849 a while ago if this feature is wanted by you but there isn't any answer, yet.

I tried to keep the changes as minimal as possible. It is still possible to run your undistributed code version. However I also noticed a slightly performance improvement by using Horovod on one of our IBM machines with 6 V100 cards.

I didn't added any CI because I don't have any knowledge of it.

If you need any help with Horovod don't hesitate to ask.

CLAassistant · 2021-08-03T08:42:46Z

All committers have signed the CLA.

NanoNabla · 2021-09-14T13:29:03Z

My PR seems to be unregarded since I made it in May.
Are you interested in parallel training as in DeepSpeech?

If you are interested in it I would try to get my PR able to merge again. Otherwise feel free to close this PR.

NanoNabla added 2 commits May 5, 2021 23:20

multi machine parallelization using horovod

28f1f82

horovod documentation

5f23b13

reuben force-pushed the main branch from 0b5be89 to 2833797 Compare July 13, 2021 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement distributed training using horovod #1865

Implement distributed training using horovod #1865

Uh oh!

NanoNabla commented May 5, 2021

Uh oh!

CLAassistant commented Aug 3, 2021 •

edited

Loading

Uh oh!

NanoNabla commented Sep 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement distributed training using horovod #1865

Are you sure you want to change the base?

Implement distributed training using horovod #1865

Uh oh!

Conversation

NanoNabla commented May 5, 2021

Uh oh!

CLAassistant commented Aug 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NanoNabla commented Sep 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Aug 3, 2021 •

edited

Loading