top of page

PDF Mining to create a Word Cloud

  • Nov 14, 2014
  • 1 min read

Hello all, this is my first blog post and im just going to be using this to show projects and help others with coding problems

I had been seeing a lot of word clouds on reddit and facebook, so I decided to give it a try with python. There was already a very good WordCloud package that I used and also PDFMiner.

So this function just grabs whichever PDF file you want to use and converts it into text. Pretty simple. If you list a number of pages it will find those individually, but if not then it will just parse the whole pdf. I got a lot of help from https://www.binpress.com/tutorial/manipulating-pdfs-with-python/167.

Next I just had to feed the giant text file I had into the WordCloud. This was the easiest part since the WordCloud package is very helpful.

I also added a picture for the word cloud to mask onto. Here is the final project.

 
 
 

Comments


Featured Posts
Recent Posts
Archive
Search By Tags
Follow Us
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square

© 2023 by Name of Site. Proudly created with Wix.com

bottom of page