Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
minimum_pre_commit_version: 3.0.0

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-added-large-files
args: ['--maxkb=1024']
- id: check-merge-conflict
- id: check-symlinks
- id: detect-private-key
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: mixed-line-ending
args: ['--fix=lf'] # fix line endings to unix style
- id: check-case-conflict
- id: check-json
- id: trailing-whitespace

- repo: https://github.com/executablebooks/mdformat
rev: 0.7.22
hooks:
- id: mdformat
args: ["--number"]
additional_dependencies:
- mdformat-gfm
- mdformat-frontmatter
- mdformat-myst
- mdformat-tables
- mdformat-toc
- mdformat-black
56 changes: 56 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Site settings
title: TileFusion
description: TileFusion is a highly efficient C++ macro kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.
baseurl: "/TileFusion"
url: "https://microsoft.github.io"
authors:
- name: "Ying Cao"
email: "[email protected]"
github: "lcy-seso"
- name: "Chengxiang Qi"
email: "[email protected]"
github: "KuangjuX"

mathjax:
enable: true # MathJax equations, e.g. true, false (default)
combo: "tex-mml-chtml"
tags: "none" # "none", "ams" (default), "all"
google_fonts:
- name: "Source Sans Pro"
weights: "400,400i,700,700i"
- name: "Lora"
weights: "400,400i,700,700i"

# Build settings
markdown: kramdown
kramdown:
math_engine: mathjax
theme: minima

minima:
date_format: "%b %-d, %Y"
social_links:
- { platform: github, user_url: "https://github.com/microsoft/TileFusion" }

plugins:
- jekyll-feed
- jekyll-seo-tag

# Exclude from processing
exclude:
- 3rd-party
- .gitignore

# Navigation
header_pages:
- index.md
- docs/installation.md
- docs/index.md
- examples/index.md
- benchmarks/index.md
- docs/about.md

# Just the Docs configuration
aux_links:
"TileFusion on GitHub":
- "//github.com/microsoft/TileFusion"
7 changes: 7 additions & 0 deletions _includes/custom-head.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<link rel="stylesheet" href="{{ '/assets/css/custom.css' | relative_url }}">

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css">

<meta name="keywords" content="CUDA, C++, Kernel Fusion, Tile Processing">

<link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500&display=swap" rel="stylesheet">
37 changes: 37 additions & 0 deletions _includes/footer.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<footer class="site-footer h-card">
<data class="u-url" href="{{ "/" | relative_url }}"></data>

<div class="wrapper">
<!-- Main content area -->
<div class="footer-content">
<!-- First column: contact information -->
<div class="footer-col">
<h3>Contact Us</h3>
<ul class="contact-list">
{%- if site.authors -%}
{%- for author in site.authors -%}
<li class="p-name">
{{ author.name }}
{%- if author.email -%}
- <a class="u-email" href="mailto:{{ author.email }}">{{ author.email }}</a>
{%- endif -%}
{%- if author.github -%}
- <a href="https://github.com/{{ author.github }}">GitHub</a>
{%- endif -%}
</li>
{%- endfor -%}
{%- endif -%}
</ul>
</div>

<!-- Second column: quick links -->
<div class="footer-col">
<h3>Quick Links</h3>
<ul>
<li><a href="{{ '/docs/' | relative_url }}">Documentation</a></li>
<li><a href="{{ '/examples/' | relative_url }}">Examples</a></li>
</ul>
</div>
</div>
</div>
</footer>
19 changes: 19 additions & 0 deletions _layouts/mathjax.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
layout: default
---

{{ content }}

<script type="text/javascript" id="MathJax-script" async
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
</script>

<script>
MathJax = {
tex: {
inlineMath: [['$', '$'], ['\\(', '\\)']],
displayMath: [['$$', '$$'], ['\\[', '\\]']],
processEscapes: true
}
};
</script>
Binary file added assets/TileFusion-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
72 changes: 72 additions & 0 deletions assets/css/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
.site-footer {
background-color: #f8f9fa;
padding: 40px 0;
color: #333;
border-top: 1px solid #e9ecef;
}

.footer-header {
margin-bottom: 30px;
}

.footer-heading {
font-size: 24px;
margin-bottom: 15px;
color: #0078d4; /* Microsoft blue */
}

.footer-content {
display: flex;
flex-wrap: wrap;
justify-content: space-between;
margin-bottom: 30px;
width: 100%; /* Ensure full width */
}

.footer-col {
flex: 1;
min-width: 200px;
padding: 0 15px;
margin-bottom: 20px;
}

.footer-col h3 {
font-size: 18px;
margin-bottom: 15px;
color: #0078d4;
}

.footer-col ul {
list-style: none;
margin-left: 0;
padding-left: 0;
}

.footer-col ul li {
margin-bottom: 8px;
}

.footer-bottom {
clear: both; /* Ensure footer-bottom appears after all floating elements */
width: 100%; /* Ensure full width */
text-align: center;
border-top: 1px solid #e9ecef;
padding-top: 20px;
font-size: 14px;
color: #666;
}

body {
font-family: 'Roboto', sans-serif;
}

@media screen and (max-width: 600px) {
.footer-content {
flex-direction: column;
}

.footer-col {
width: 100%;
padding: 0;
}
}
5 changes: 5 additions & 0 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.vscode/*
build/*
__pycache__/
.DS_Store
**/.DS_Store
18 changes: 18 additions & 0 deletions benchmarks/gemm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## Test Environment

- **GPU**: NVIDIA Tesla A100
- **CUDA Version**: 12.6

### Results

| [M, N, K] | [kTM, kTN, kTK] | WarpLayout | kRK | CUTLASS(ms) | TileFusion(ms) |
| :----------------- | :-------------: | :--------: | :-: | :---------: | :------------: |
| [1024, 1024, 512] | [64, 128, 128] | [2, 2] | 16 | 0.017591 | 0.016548 |
| [1024, 1024, 1024] | [64, 128, 128] | [2, 2] | 16 | 0.029245 | 0.027156 |
| [2048, 2048, 1024] | [64, 128, 128] | [2, 2] | 16 | 0.065372 | 0.070431 |
| [2048, 2048, 2048] | [64, 128, 128] | [2, 2] | 16 | 0.101253 | 0.128143 |
| [4096, 4096, 4096] | [64, 128 128] | [2, 2] | 16 | 0.818606 | 0.969605 |
| [8192, 8192, 1024] | [64, 128 ,128] | [2, 2] | 16 | 0.871526 | 0.971059 |
| [8192, 8192, 2048] | [64, 128, 128] | [2, 2] | 16 | 1.937879 | 1.931223 |
| [8192, 8192, 4096] | [64, 128, 128] | [2, 2] | 16 | 3.924275 | 3.956757 |
| [8192, 8192, 8192] | [64, 128, 128] | [2, 2] | 16 | 7.740396 | 8.080589 |
28 changes: 28 additions & 0 deletions benchmarks/global_to_shared_copy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
This preliminary test evaluates the performance of transferring a row-major data tile containing half-precision floating-point values between global memory and shared memory. The transfer process involves loading the data tile into shared memory and subsequently storing it back to global memory. This cycle is repeated 100 times to measure performance.

Performance is assessed based on the total time required to complete the 100 data tile transfers.

## Implementations

The test includes implementations using TileFusion and cutlass, with no bank conflicts observed in the NVIDIA Compute Utility. The cutlass implementation utilizes a copy plan that allows for maximal global memory coalescing to optimally utilize the global memory.

## Test Environment

- **GPU**: NVIDIA Tesla A100
- **CUDA Version**: 12.6

## Results

| Shape | Warp Layout | tilefusion(ms) | cutlass(ms) | Ratio |
| :----------------- | :---------: | :------------: | :---------: | :----: |
| RowMajor(16, 64) | (1, 1) | 0.02996 | 0.02957 | 1.013 |
| RowMajor(64, 64) | (1, 1) | 0.05073 | 0.05071 | 1 |
| RowMajor(64, 64) | (2, 1) | 0.05045 | 0.05068 | 0.9956 |
| RowMajor(64, 64) | (4, 1) | 0.05119 | 0.05145 | 0.995 |
| RowMajor(128, 128) | (1, 1) | 0.1369 | 0.154 | 0.8888 |
| RowMajor(128, 128) | (2, 2) | 0.1374 | 0.134 | 1.025 |
| RowMajor(128, 128) | (4, 2) | 0.138 | 0.1382 | 0.9984 |
| RowMajor(128, 256) | (1, 1) | 0.2464 | 0.3694 | 0.6671 |
| RowMajor(128, 256) | (2, 2) | 0.2471 | 0.2458 | 1.005 |
| RowMajor(128, 256) | (2, 4) | 0.2592 | 0.2511 | 1.032 |
| RowMajor(128, 256) | (4, 4) | 0.2543 | 0.2572 | 0.9889 |
13 changes: 13 additions & 0 deletions benchmarks/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
layout: page
title: Benchmarks
nav_order: 5
has_children: true
---

This section contains performance benchmarks for TileFusion across various workloads.

## Contents

- [Data Transfer Between Global and Shared Memory](global_to_shared_copy.md)
- [GEMM Performance](gemm.md)
33 changes: 33 additions & 0 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
layout: page
title: About
nav_order: 6
has_children: false
---

This project is developed and maintained by the following authors:

- [Ying Cao](https://github.com/lcy-seso)
- [Chengxiang Qi](https://github.com/KuangjuX)

## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com).

When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment).
Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [[email protected]](mailto:[email protected]) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
12 changes: 12 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
layout: page
title: Documentation
nav_order: 3
has_children: true
---

Welcome to the TileFusion documentation. Here you'll find detailed information about the library's design documents, APIs, and usage patterns.

## Contents

- [Data Layout for Efficient Shared Memory Access](tiles_in_shared_memory.md)
Loading