Skip to content

Commit ef15951

Browse files
committed
Update Readme
1 parent e9da05a commit ef15951

File tree

2 files changed

+41
-0
lines changed

2 files changed

+41
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,13 +268,15 @@ If you find this repository useful, please consider citing our work:
268268
}
269269
```
270270

271+
```
271272
@article{longvit,
272273
title = {When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology},
273274
author = {Wenhui Wang and Shuming Ma and Hanwen Xu and Naoto Usuyama and Jiayu Ding and Hoifung Poon and Furu Wei},
274275
journal = {ArXiv},
275276
volume = {abs/2312.03558},
276277
year = {2023}
277278
}
279+
```
278280

279281
## Contributing
280282

examples/fairseq/README.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,45 @@ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 train.py \
251251
--use-xmoe
252252
```
253253

254+
### LongNet Model
255+
256+
```bash
257+
cd examples/fairseq/
258+
python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 train.py \
259+
${PATH_TO_DATA} \
260+
--num-workers 2 \
261+
--activation-fn gelu \
262+
--share-decoder-input-output-embed \
263+
--validate-interval-updates 1000 \
264+
--save-interval-updates 1000 \
265+
--no-epoch-checkpoints \
266+
--memory-efficient-fp16 \
267+
--fp16-init-scale 4 \
268+
--arch lm_base \
269+
--task language_modeling \
270+
--sample-break-mode none \
271+
--tokens-per-sample 4096 \
272+
--optimizer adam --adam-betas "(0.9, 0.98)" \
273+
--adam-eps 1e-08 \
274+
--clip-norm 0.0 \
275+
--lr 5e-4 \
276+
--lr-scheduler polynomial_decay \
277+
--warmup-updates 750 \
278+
--dropout 0.1 \
279+
--attention-dropout 0.1 \
280+
--weight-decay 0.01 \
281+
--batch-size 4 \
282+
--update-freq 1 \
283+
--required-batch-size-multiple 1 \
284+
--total-num-update 50000 \
285+
--max-update 50000 \
286+
--seed 1 \
287+
--ddp-backend=c10d \
288+
--flash-attention \
289+
--segment-length [2048,4096] \
290+
--dilated-ratio [1,2]
291+
```
292+
254293
## Example: Machine Translation
255294

256295
### Data Format

0 commit comments

Comments
 (0)