pn-Summary

A well-structured summarization dataset for the Persian language!

Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization 🦁

A well-structured summarization dataset for the Persian language consists of 93,207 records. It is prepared for Abstractive/Extractive tasks (like cnn_dailymail for English). It can also be used in other scopes like Text Generation, Title Generation, and News Category Classification. Moreover, we tested out this dataset on novel models and techniques.

  • mT5: A pretrained encoder-decoder model
  • BERT2BERT: A leveraging ParsBERT model as an encoder-decoder architecture.

Follow the rest of the repo for more details.

Paper link: 10.1109/CSICC52343.2021.9420563


YouTube Demo !

References

2021

  1. Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization
    Mehrdad Farahani, Mohammad Gharachorloo, and Mohammad Manthouri
    In 2021 26th International Computer Conference, Computer Society of Iran (CSICC), 2021

2020

  1. ParsBERT: Transformer-based Model for Persian Language Understanding
    Mehrdad Farahani, Mohammad Gharachorloo, Marzieh Farahani, and 1 more author
    Neural Processing Letters, 2020