top of page

PDF Mining to create a Word Cloud

Hello all, this is my first blog post and im just going to be using this to show projects and help others with coding problems

I had been seeing a lot of word clouds on reddit and facebook, so I decided to give it a try with python. There was already a very good WordCloud package that I used and also PDFMiner.

So this function just grabs whichever PDF file you want to use and converts it into text. Pretty simple. If you list a number of pages it will find those individually, but if not then it will just parse the whole pdf. I got a lot of help from https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167.

Next I just had to feed the giant text file I had into the WordCloud. This was the easiest part since the WordCloud package is very helpful.

I also added a picture for the word cloud to mask onto. Here is the final project.

Featured Posts
Recent Posts
Archive
Search By Tags
No tags yet.
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square
bottom of page