nsight

Star

Here are 24 public repositories matching this topic...

BrainTwister / docker-devel-env

Star

Fast, reproducible, and portable software development environments

docker jenkins development cmake eclipse gcc vscode cuda clang conan reproducibility portability nsight

Updated Dec 8, 2021
Dockerfile

sharcnet / vscode-hpc

Star

Remote development on HPC clusters with VSCode

python cxx jupyter cpp hpc cmake-examples vscode cuda hpc-clusters nsight vscode-remote

Updated Sep 19, 2022
Jupyter Notebook

HROlive / Fundamentals-of-Accelerated-Computing-with-CUDA-C-Cpp

Star

Accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques.

cpp cuda nvidia cuda-kernels profilling nsight cuda-programming

Updated May 23, 2024
Jupyter Notebook

mnicely / computeWorks_examples

Star

Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA

docker openmp cuda eclipse-plugin cublas nvidia blas nvidia-docker pgi-compiler openacc nsight

Updated May 31, 2022
C++

Salik-Devv / edge-detection-using-cuda

Star

High-performance Sobel edge detection using CUDA with CPU vs GPU benchmarking, roofline analysis, and Nsight profiling.

computer-vision parallel-computing cuda image-processing high-performance-computing performance-analysis gpu-computing sobel roofline-model nsight

Updated Jan 17, 2026
Python

j3soon / hpc-samples

Star

CUDA Samples and Nsight Guided Profiling Samples

cuda profiling nsight nsight-compute

Updated Nov 14, 2025
Cuda

kayush2O6 / nsight-for-remote-gpu-server

Star

nsight, an eclipse IDE for CUDA programming, set up for remote gpu server

gpu eclipse cuda nsight

Updated Aug 9, 2018
Cuda

Kulasus / APPS-2.0

Star

Repository for Architecture of computers and parallel systems course on VŠB

radio c leds lcd cpp assembly architecture assembler cuda assembly-language led assembly-x86 k64f mcuxpresso nsight

Updated May 20, 2020
C++

salehjg / batch-matmul-cuda

Sponsor

Star

A simple and understandable CUDA kernel for batch-matmul operation

kernel cuda matrix-multiplication shared-memory nsight batch-op

Updated Oct 15, 2018
Cuda

Umer-Farooq-CS / MNIST-Classification

Star

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

benchmarking deep-learning parallel-computing cuda mnist neural-networks high-performance-computing gpu-acceleration profiling shared-memory openacc performance-optimization c-cpp nsight tensor-cores cuda-streams pinned-memory

Updated Sep 12, 2025
Cuda

Chirag005 / CUDA-Kernel-project

Star

Custom PyTorch CUDA kernel implementing optimized ReLU activation with vectorization, performance profiling, and memory analysis on Tesla T4 GPU achieving 75% bandwidth efficiency.

gpu cuda pytorch cuda-kernels performance-analysis nsight cuda-programming kernel-profiler cuda-kernel

Updated Oct 27, 2025
Jupyter Notebook

kalyani-25 / Reimplementation_flash-attention-from-scratch

Star

16-step CUDA optimization of FlashAttention-2 achieving 99.2% of official performance on A100 — Ampere architecture

deep-learning cuda pytorch ampere gpu-kernels nsight llm-inference flashattention

Updated Mar 6, 2026
Cuda

Juanx65 / yolov8test

Star

learning how to do profiling on a yolov8 net using nvidia nsight compute

profiling nsight yolov8

Updated Jul 5, 2023
Python

apexedgesystems / vernier

Star

C++23 benchmarking framework with 6 profiler backends, CUDA GPU support, statistical regression detection, cross-compilation for 5 architectures, and CLI tools for analysis and visualization.

Updated May 2, 2026
C++

Hyeonjoon-Nam / Cuda-Study-Journey

Star

High-Performance Computing (HPC) & Optimization studies using CUDA C++. Includes Grid-Stride Loops, Shared Memory tiling, and Nsight Compute profiling analysis.

gpu optimization cuda nsight

Updated Mar 23, 2026
C++

Therad445 / cuda-rentsense-knn

Star

CUDA-accelerated kNN regression for rent estimation with CPU baseline, shared-memory optimization, and profiling

machine-learning performance cmake cxx gpu parallel-computing cuda profiling knn nsight

Updated Mar 20, 2026
C++

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

Star

University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.

gpu nvidia nvidia-cuda nvidia-gpu nsight image-upscaling parallelized nearest-neighbor-algorithm nsight-compute

Updated Dec 29, 2023
C

sbouhrour / mgpu-cg-stencil-solver

Star

Open-source stencil-aware multi-GPU Conjugate Gradient solver on 8× A100 NVLink. 2.07× SpMV vs cuSPARSE · 1.44× above NVIDIA AmgX · 93.5% strong scaling efficiency. Profiled with Nsight Systems & Nsight Compute.

performance-engineering hpc stencil mpi cuda conjugate-gradient multi-gpu nsight nvlink a100

Updated May 7, 2026
Cuda

CarlosArmeroMoneo / quantum-workload-architecture

Star

Quantum workload planning and profiler-backed architecture analysis for exact tensor-network execution.

python quantum-computing profiling tensor-networks nsight cuquantum

Updated Apr 23, 2026
Python

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Feb 20, 2026
Python

Improve this page

Add a description, image, and links to the nsight topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nsight topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsight

Here are 24 public repositories matching this topic...

BrainTwister / docker-devel-env

sharcnet / vscode-hpc

HROlive / Fundamentals-of-Accelerated-Computing-with-CUDA-C-Cpp

mnicely / computeWorks_examples

Salik-Devv / edge-detection-using-cuda

j3soon / hpc-samples

kayush2O6 / nsight-for-remote-gpu-server

Kulasus / APPS-2.0

salehjg / batch-matmul-cuda

Umer-Farooq-CS / MNIST-Classification

Chirag005 / CUDA-Kernel-project

kalyani-25 / Reimplementation_flash-attention-from-scratch

Juanx65 / yolov8test

apexedgesystems / vernier

Hyeonjoon-Nam / Cuda-Study-Journey

Therad445 / cuda-rentsense-knn

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

sbouhrour / mgpu-cg-stencil-solver

CarlosArmeroMoneo / quantum-workload-architecture

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

Improve this page

Add this topic to your repo