pn-Summary

A well-structured summarization dataset for the Persian language!

Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization 🦁

A well-structured summarization dataset for the Persian language consists of 93,207 records. It is prepared for Abstractive/Extractive tasks (like cnn_dailymail for English). It can also be used in other scopes like Text Generation, Title Generation, and News Category Classification. Moreover, we tested out this dataset on novel models and techniques.

mT5: A pretrained encoder-decoder model
BERT2BERT: A leveraging ParsBERT model as an encoder-decoder architecture.

Follow the rest of the repo for more details.

Paper link: 10.1109/CSICC52343.2021.9420563

YouTube Demo !

References

2021

Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization

Mehrdad Farahani, Mohammad Gharachorloo, and Mohammad Manthouri

In 2021 26th International Computer Conference, Computer Society of Iran (CSICC), 2021

2020

ParsBERT: Transformer-based Model for Persian Language Understanding

Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, and 1 more author

Neural Processing Letters, 2020