Replies: 17 comments 125 replies
-
|
I'm happy to support moving the repo into the bytedeco org (or into a dedicated org if that makes more sense). I could also transfer the original repo currently under my personal namespace once it's up to date, as otherwise we would have a forked repo which is often treated as a second class citizen by GitHub compared to a non-forked one. I haven't had the time to really develop Storch further this year, and I doubt this will change much in the near future due other responsibilities and unforeseen things happening in my life. That said, I'll try to help as much as possible. Storch has come a long way and you've put an amazing amount of work into it @mullerhai! |
Beta Was this translation helpful? Give feedback.
-
|
@saudet today I have talk with HUAWEI company AI leader Mr.Cheng , He is very interesting with Storch and javacpp-pytorch. it we could develop CANN[ like cuda] and [TensorRT-llm] with javacpp-pytorch. they want to support us |
Beta Was this translation helpful? Give feedback.
-
|
Hi, @saudet @sbrunk , I want to transfer Storch-numpy Storch-pandas ,storch-opencv ,storch-ffmpeg to Bytedeco repo, to make more people use them . storch-opencv ,storch-ffmpeg is base on javacpp-opencv and javacpp-ffmpeg. how do you think ? |
Beta Was this translation helpful? Give feedback.
-
|
@sbrunk I have update storch-core code to the version 0.7.5-1.5.12 , you could check and validate review the code , in my test on pytorch_on_scala3_lesson is could correctly work |
Beta Was this translation helpful? Give feedback.
-
|
@sbrunk Hi Sbrunk, |
Beta Was this translation helpful? Give feedback.
-
|
Hi @saudet @sbrunk ,good news, I attended Huawei's chip-driven CANN salon meetup in Beijing today 2025-11-15. The event was vibrant and engaging, with many Chinese companies already having completed adaptations with them. I communicated with the event organizer—their next salon will feature presentations on javacpp-pytorch and storch, and I’ll probably get a 10-minute slot to speak. Most of the participating companies today still primarily use C++ and Python, and the venue was quite crowded. Their current research depth has reached the level of transformer Attention Matrix multiplication, tile blocking, and CUDA kernel optimization—essentially focusing on matrix blocking strategies, more efficient model inference communication across multiple GPUs and machines, and so on. Compared to them, our innovative javacpp-pytorch and storch are still in the shallow end, not yet diving into the kernel layer. We have a lot of work ahead. They’ll contact me next week to discuss the details of cooperation between CANN and Bytedeco. Do you have any questions? I’ll ask Huawei for their thoughts about your behalf. the CANN repo https://gitcode.com/cann |
Beta Was this translation helpful? Give feedback.
-
|
I'm writing to provide a quick update on my communication with an engineering team from Huawei's chip division. The conversation was very positive, and they are very interested in collaborating with us. They are currently working on an internal budget and assessment, with a projected start for our official collaboration in 2026. As a first step, they want to evaluate the performance of Storch. I've already demonstrated some of its capabilities to them, and they were impressed, noting that it is very close to Python PyTorch. They then asked for a demonstration of Storch's training capabilities with a GPU. To show this, I had to demonstrate Storch integrated with CUDA. However, I've run into a significant technical issue. Initially, I tried on my Windows 11 machine with an NVIDIA 3060 GPU, but javacpp-pytorch was unable to communicate with CUDA. I then switched to a different laptop running Ubuntu with an NVIDIA 4060 GPU. I had to uninstall CUDA 12.4, as this machine had CUDA 13.0 installed. Even with the Ubuntu setup, javacpp-pytorch (version 2.7.1-1.5.12) still fails to recognize and use CUDA properly. Because of this, I had to reschedule the demonstration with them for either next week or in about two weeks. This is very time-sensitive for us, as the Huawei partnership is a key opportunity. Therefore, I am hoping a stable release of javacpp-pytorch 2.9-1.5.13 can be made available soon. This would allow me to use javacpp-cuda 1.3.0-1.5.13 to successfully demonstrate Storch's CUDA capabilities to them. Summary of Issues and Requests: Urgent Need for New Release: I am encountering CUDA recognition issues on Ubuntu with the current version (2.7.1-1.5.12). Your prompt release of version 1.5.13 would be crucial for this partnership. |
Beta Was this translation helpful? Give feedback.
-
|
(base) muller@muller-Dell-G16-7630:~/Documents/code/javacpp-presets/pytorch$ mvn --version jdk 21 |
Beta Was this translation helpful? Give feedback.
-
|
hi, @saudet ,huawei company Ai engineer tell me ,they also want to try javacpp-pytorch with cuda 1.13.0 ,they are write python ,they don't want to like me build the all project ,just want to make dependency the jar, I think you need really release and publish to maven center repo |
Beta Was this translation helpful? Give feedback.
-
|
@saudet the javacpp-pytorch all part build is so slowly and complex and often meet error , not suit for all people ,please consider ,make easier for most beginer |
Beta Was this translation helpful? Give feedback.
-
|
the Huawei company ai engineer is eagerly try to use javacpp-pytorch with cuda 1.13.0, two choices,publish 1.5.13 snapshot version all to maven snap repo ,please not build by user , another release 1.5.13 version to maven center repo this week. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @saudet ,I have write one gloo example with pytorch , after javacpp-pytorch gloo could work fun, I will implement these code in scala3, these code also like https://github.com/pytorch/examples/blob/main/cpp/distributed/dist-mnist.cpp we need to open two terminal windows then the output |
Beta Was this translation helpful? Give feedback.
-
|
I write a gloo cuda example pytorch , it could work ,I think future in scala3 we just need implement like this @saudet |
Beta Was this translation helpful? Give feedback.
-
|
@saudet I'm scheduled to give another demonstration to Huawei this weekend with the new javacpp-pytorch version 2.9.1-1.5.13, but gloo verification keeps failing, which is making me extremely anxious. Currently, the ProcessGroupGloo creation is failing and causing a JVM crash. I'm not sure if it's a bug in the software itself, due to my incorrect usage, or if I'm passing parameters incorrectly. the final I try ,also failed |
Beta Was this translation helpful? Give feedback.
-
|
now try to use TCPStore and FileStore with ProcessGroupGloo, I think if we could make the gloo multi ranks could discover each other ,the gloo maybe could wok , I have try how to set ,but all failed ,maybe is the gloo.Device cause? I am not sure, but the ProcessGroupGloo should already is running . |
Beta Was this translation helpful? Give feedback.
-
|
I agree that gloo distributed training will be a key challenge we need to tackle before our new version release. After resolving the distributed communication and mutual rank discovery issues, gloo should be able to function normally. The error log shows "connection refused." I suggest you could run the Java code I wrote, starting three separate terminals and specifying rank=0, rank=1, and rank=2 respectively, to see what results you get. If you also get a connection refused error, it might be a problem with the javacpp-pytorch library; if the connection is successful, then it's an issue with my computer's network configuration. I tried again today and still couldn't connect on my computer. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @saudet I'm so eager to see the Linux version of ProcessGroupNCCL implemented with javacpp. Regarding suspending gloo and other issues, is it something we can implement? The javacpp pytorch on scala3 meetup will be held in Beijing, China on Saturday, December 13th. I will likely be the second speaker and will join via remote video conference. I'm really keen on recommending to the attendees that they try using javacpp. For the issue they care about most, NCCL, I'm truly looking forward to announcing to them the feasibility of our ProcessGroupNCCL support. This would be a huge draw for them to try javacpp-pytorch. They are currently making the event promotional poster, and I will share it with you all once it's done. I predict that if our implementation is robust enough, they will choose ours. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hi @saudet
The latest version of Storch is now basically stable. Since all the lessons in the "PyTorch on Scala" course have been successfully completed, I believe I have verified most of the basic functionalities of PyTorch. It’s safe to say that Storch works properly and follows a coding style close to Python’s PyTorch. I highly recommend that everyone learns the Scala 3 version of PyTorch because it appears simpler and allows users to focus more on neural network construction, training, and inference.
After three years of verification and nearly 110,000 lines of Scala code, I can confirm that javacpp-pytorch is indeed an excellent glue layer. However, directly programming with javacpp-pytorch comes with many risks and drawbacks it not easy to use :
first maybe The code is quite verbose, especially when written directly in Java.
second It’s error-prone, with many parameters using types like Pointer and cpp...Vector, and vague names like var1 or var2. Without checking PyTorch’s official documentation, it’s hard to know the exact data types and actual meanings of these parameters.
thrid some Javacpp-pytorch still has some unresolved bugs, such as some padding type layers not working correctly and some hyperparameters failing to take effect use put() method .
Since Python’s PyTorch extends libtorch significantly, many features implemented in Python PyTorch are not available in libtorch, making it impossible for javacpp-pytorch to generate corresponding bindings.
Storch addresses most of these risks and issues, greatly improving the efficiency of using PyTorch. Based on the above, I believe that to promote more widespread use of PyTorch on the JVM platform, it’s better to encourage everyone to try Storch, which is built on javacpp-pytorch. What do you think?
Of course, I also anticipate that there might be a Java-based wrapper library for javacpp-pytorch called JTorch in the future. If necessary, I would like to transfer my extended Storch repository (based on @sbrunk Storch main branch code ) to the bytedeco organization. I also plan to consult @sbrunk sbrunk’s opinions, as I believe hosting it under bytedeco would attract more contributors to optimize Storch and foster ecosystem growth.
On another note, we have high expectations for the 1.5.13 version of pytorch. Looking through the code, I noticed some new additions:
For ProcessGroupGloo, new methods like:
The new class ProcessGroupStatus.
Additions to the global torch class:
I believe these updates show that Gloo support is quite robust. However, based on my understanding of large model training, most users still rely on NCCL-based distributed frameworks with DDP and DSFP modes. Therefore, I would like to request adding the following includes to the presets/torch_include.h generation script:
This should generate corresponding bindings after compilation. I understand this is challenging and may not produce correct code immediately, requiring further debugging. However, without ProcessGroupNCLL, we can only work on small single-machine models, making it difficult to optimize model quantization and CUDA memory usage. I sincerely hope that the 1.5.13 version of javacpp-pytorch will include functional ProcessGroupNCLL, enabling us to develop libraries like DeepSpeed, FairScale, or Colossal-AI to accelerate distributed training. I’d like to hear your thoughts and current challenges—I’m willing to assist with implementation.
Additionally, could the EmbeddingImpl layer include the Embedding::from_pretrained() method? This is crucial for future large-model fine-tuning, especially since we already have the EmbeddingFromPretrainedOptions class. Regarding JIT, I’m unsure if libtorch includes a method similar to torch::trace(jitmodule) and torch.jit.is_scripting in Python.
Beta Was this translation helpful? Give feedback.
All reactions