Code With Passion: PageCrawler

So a couple of days ago I was browsing a forum and found a board with tons of links to computer science related eBooks. I found one link to a book called Security Engineering by Ross Anderson, and the webpage contained links to all of the chapters, each in a separate PDF file. I figured downloading each PDF by hand would be far too time consuming for anyone, so I developed a solution to this problem.

PageCrawler lets me download all of them at once. It reads a webpage, parses the HTML in order to find hyperlinks (specifically for hypertext references) and compiles a list of each based on the file types I'm looking for. Click "Save" and select the directory to save to. Voila!

I'm working on the "Extensions" button when I find some time between school, work, and homework. The "Extensions" button will parse the page and list all of the found file extensions in the "File Extensions" text box.

A friend of mine put forth a suggestion to grab files from a range of web pages, and so I'll implement that as well (in time).

I'll post to SourceForge this weekend!

Code With Passion

Wednesday, April 17, 2013

PageCrawler

No comments:

Post a Comment

Scott Christopher Stauffer

Blog Archive