PDF to Text Summarization:

Analyzing Extractive and Abstractive Models

Academic Project, CIS 668: Natural Language Processing
Object Oriented Design: Python 3, NLP, NLTK, Google BERT, IBM Watson, AWS, Azure

In large firms, there are a lot of reports created every month. It would be very time-consuming to go through every report, thus my team and I decided to build PDF to text summarizer in which we can provide a PDF as input and would get a summary of the report.
To resolve this we decided to analyze available text summarization tools and developed a matrix to rank each of the summarizer performances.
We selected text summarizers from 4 teach giants Google, Microsoft, Amazon, and IBM. We analyzed both Abstractive and Extractive models. We developed a program while integrating API's and test across different genres and developed a human interpretation matrix score grid.
Towards the end of our analysis reports, we saw how each tool performed better in some genres than others, but overall IBM Watson's NLU received more score overall.