Open Access Open Access  Restricted Access Subscription or Fee Access

Website Summarization Using Deep Learning

Guda Pranay Netha, M.S.S Manohar, M. Sai Amartya Maruth, Ganjikunta Ganesh Kumar

Abstract


They are a lot of websites, articles and blogs on the internet with a lot of textual information it is sometimes impossible to read all of them, without knowing the underlying summary of the whole text. The whole concept is to reduce or minimize the entire Textual information into a small summary of important information present in the documents so that we get an idea of what the blog or article exactly consists of and we can decide whether to read it or not. They have been several methods implemented in the past to summarize the vast textual information such as Machine Learning Algorithms which were mostly centric on the frequency of words in the sentences and made a decision based on the frequency of words, while some machine learning models worked fine with extractive text summarization they failed in abstractive text summarization the results produced by the model was illogical and dint followed the basic grammar rules, this issue was overcome by using Deep Learning, although in the past deep learning models such as recurrent neural networks, Long Short Term Memory Neural Network, and  Encoder-Decoder Architecture were powerful yet they were able to summarize the only short amount of text and failed extremely on a large set of documents. In this paper, we will see advancements in Natural language processing using deep learning and learn about the latest researched Transformers and Attention mechanisms and create a python application that can summarize a website given its URL and verify the fluency and correctness of the text generated by the application.

Keywords


Deep Learning, Python, URL, Abstractive, Extractive

Full Text:

PDF

References


Padma Priyariya, K. Duraiswamy-An approach for Text Summarization using Deep learning algorithm, Journal of Computer Science,2014; 10(1):1-9

Najibullah A. Indonesian text summarization based on naïve bayes method. InProceeding Of The International Seminar and Conference on Global Issues 2015 Sep 1 (Vol. 1, No. 1).

Shi T, Keneshloo Y, Ramakrishnan N, Reddy CK. Neural abstractive text summarization with sequence-to-sequence models. ACM Transactions on Data Science. 2021 Jan 3;2(1):1-37.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin-Attention Is All You Need

Zhang J, Zhao Y, Saleh M, Liu P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. InInternational Conference on Machine Learning 2020 Nov 21 (pp. 11328-11339). PMLR.

https://ai.googleblog.com/2020/06/pegasus-state-of-art-model-for.html

Nallapati R, Zhou B, Gulcehre C, Xiang B. Abstractive text summarization using sequence-to-sequence rnns and beyond. arXiv preprint arXiv:1602.06023. 2016 Feb 19.

Song S, Huang H, Ruan T. Abstractive text summarization using LSTM-CNN based deep learning. Multimedia Tools and Applications. 2019 Jan;78(1):857-875.

Papineni K, Roukos S, Ward T, Zhu WJ. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics 2002 Jul (pp. 311-318).

https://www.crummy.com/software/BeautifulS

oup/bs4/doc/

https://www.kaggle.com/datasets/shashichande

r009/inshorts-news-data?resource=download

https://huggingface.co/docs/transformers/mode

l_doc/Pegasus

Khatri C, Singh G, Parikh N. Abstractive and extractive text summarization using document context vector and recurrent neural networks. arXiv preprint arXiv:1807.08000. 2018 Jul 20.

Bhandari M, Gour P, Ashfaq A, Liu P, Neubig G. Re-evaluating evaluation in text summarization. arXiv preprint arXiv:2010.07100. 2020 Oct 14.

Akkamahadevi R Hanni; Mayur M Patil; Priyadarshini M Patil-Summarization of customer reviews for a product on a website using natural language processing, IEEE,21-24 September 2016, 10.1109/ICACCI.2016.7732392


Refbacks

  • There are currently no refbacks.


Copyright (c) 2022 Recent Trends in Electronics and Communication Systems