Update Readme

shumingma · shumingma · commit ef159511bc01 · 2023-12-20T02:46:41.000-08:00
diff --git a/README.md b/README.md
@@ -268,13 +268,15 @@ If you find this repository useful, please consider citing our work:
 }
 ```
 
+```
 @article{longvit,
   title     = {When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology},
   author    = {Wenhui Wang and Shuming Ma and Hanwen Xu and Naoto Usuyama and Jiayu Ding and Hoifung Poon and Furu Wei},
   journal   = {ArXiv},
   volume    = {abs/2312.03558},
   year      = {2023}
 }
+```
 
 ## Contributing
 
diff --git a/examples/fairseq/README.md b/examples/fairseq/README.md
@@ -251,6 +251,45 @@ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 train.py \
     --use-xmoe
 ```
 
+### LongNet Model
+
+```bash
+cd examples/fairseq/
+python -m torch.distributed.launch --nproc_per_node=2 --nnodes=1 train.py \
+    ${PATH_TO_DATA} \
+    --num-workers 2 \
+    --activation-fn gelu \
+    --share-decoder-input-output-embed \
+    --validate-interval-updates 1000 \
+    --save-interval-updates 1000 \
+    --no-epoch-checkpoints \
+    --memory-efficient-fp16 \
+    --fp16-init-scale 4 \
+    --arch lm_base \
+    --task language_modeling \
+    --sample-break-mode none \
+    --tokens-per-sample 4096 \
+    --optimizer adam --adam-betas "(0.9, 0.98)" \
+    --adam-eps 1e-08 \
+    --clip-norm 0.0 \
+    --lr 5e-4 \
+    --lr-scheduler polynomial_decay \
+    --warmup-updates 750 \
+    --dropout 0.1 \
+    --attention-dropout 0.1 \
+    --weight-decay 0.01 \
+    --batch-size 4 \
+    --update-freq 1 \
+    --required-batch-size-multiple 1 \
+    --total-num-update 50000 \
+    --max-update 50000 \
+    --seed 1 \
+    --ddp-backend=c10d \
+    --flash-attention \
+    --segment-length [2048,4096] \
+    --dilated-ratio [1,2]
+```
+
 ## Example: Machine Translation
 
 ### Data Format

Original file line number	Diff line number	Diff line change
`@@ -268,13 +268,15 @@ If you find this repository useful, please consider citing our work:`
`268`	`268`	`}`
`269`	`269`	```
`270`	`270`
	`271`	+```
`271`	`272`	`@article{longvit,`
`272`	`273`	`title = {When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology},`
`273`	`274`	`author = {Wenhui Wang and Shuming Ma and Hanwen Xu and Naoto Usuyama and Jiayu Ding and Hoifung Poon and Furu Wei},`
`274`	`275`	`journal = {ArXiv},`
`275`	`276`	`volume = {abs/2312.03558},`
`276`	`277`	`year = {2023}`
`277`	`278`	`}`
	`279`	+```
`278`	`280`
`279`	`281`	`## Contributing`
`280`	`282`