PDF to Text Summarization:
Analyzing Extractive and Abstractive Models
Academic Project, CIS 668: Natural Language Processing Object Oriented Design: Python 3, NLP, NLTK, Google BERT, IBM Watson, AWS, Azure
In large firms, there are a lot of reports created every month. It would be very time-consuming to go through every
report, thus my team and I decided to build PDF to text summarizer in which we can provide a PDF as input and would
get a summary of the report.
To resolve this we decided to analyze available text summarization tools and developed a matrix
to rank each of the summarizer performances.
We selected text summarizers from 4 teach giants Google, Microsoft, Amazon, and IBM. We analyzed
both Abstractive and Extractive models. We developed a program while integrating API's and test
across different genres and developed a human interpretation matrix score grid.
Towards the end of our analysis reports, we saw how each tool performed better in some genres than others,
but overall IBM Watson's NLU received more score overall.