PDF Mining to create a Word Cloud
Hello all, this is my first blog post and im just going to be using this to show projects and help others with coding problems
I had been seeing a lot of word clouds on reddit and facebook, so I decided to give it a try with python. There was already a very good WordCloud package that I used and also PDFMiner.
So this function just grabs whichever PDF file you want to use and converts it into text. Pretty simple. If you list a number of pages it will find those individually, but if not then it will just parse the whole pdf. I got a lot of help from https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167.
Next I just had to feed the giant text file I had into the WordCloud. This was the easiest part since the WordCloud package is very helpful.
I also added a picture for the word cloud to mask onto. Here is the final project.