The new yummy!
The new Yummy is coming soon. I have been working on really tearing apart a PDF, so everyone can basically get all the information they need before they have to download. Like: Summary, HTML version, PDF Indexing and searching, Google PDF Search API, working thumbnails, accurate PDF metadata etc. Once I find the best java pdf manipulator I will post them here. (suggestions welcome)
Also, I am in the process of redesigning the entire product. I really hate the del.icio.us copy that I have now. Plus, Yummy is not a del.icio.us for PDFs. Its really meant to be a personal Library. That being the case, I think a clean un del.icio.us design is called for. Let me know your thoughts! Thanks, Brandon
Also, I am in the process of redesigning the entire product. I really hate the del.icio.us copy that I have now. Plus, Yummy is not a del.icio.us for PDFs. Its really meant to be a personal Library. That being the case, I think a clean un del.icio.us design is called for. Let me know your thoughts! Thanks, Brandon
2 Comments:
Some of the tools I have used for PDF manipulation include:
--iText
Manipulate whole pdfs and embedded text and graphics. Also access PDF metadata (like # of pages, author, creation dates, etc.)
http://www.lowagie.com/iText/
--pdfBox
Text extraction. Pre-built hooks for integration with Lucene for text searching.
http://www.pdfbox.org/
Once in text form, there are some tools for automated classification, keyword extraction, etc. Some are Kea (http://www.nzdl.org/Kea/)
--JPedal.org (open source version)
Thumbnail generation, other features.
http://www.jpedal.org/downloados.php
Ironically, I have now used all 3 of your suggested java pdf manipulators. iText, great for building not very good for extracting. pdfBox, the coolest out of the bunch, but ColdFusion 6.1's log4j is a lower version than the newest PDFBox requires, jPedal is the one I am having the most luck with now, with the exception of not being able to generate a thumbnail. imageMagick has been the best for making PDF pages in to a jpeg or gif. But I hate calling command line applications from a web app, especially when they get called as often as mine do. Thanks for your brain time! -Brandon.
Post a Comment
<< Home