| Nov. 23rd, 2009 @ 03:24 pm PDF to spreadsheet (oh, say, Excel) |
|---|
Perhaps a pdf contains a table you really want to extract into something real?
Well, as it happens (and I'm unsure when they added it, but at least as of Adobe Reader 9) there is a "Save as Text" function which exports text, including tables. Unfortunately, it sucks for delimitation of that text, so I used UltraEdit in column mode and cleaned it up for import. (manually placing tabs for column breaks). It occurred to me that if I were to do it again, I'd use some script-fu to make that faster, but since the export filter is unfortunately weak for tables, that would require tweaking on a document basis, so I won't describe it further. After that, however, importing it into Excel is pretty much a matter of opening it and adjusting the things you screwed up on. If you wanna paste it in and divide it after, use Data->Text to Columns and delimit using whatever you picked (tabs, in my case)
Of course, the full version of Acrobat has a feature to allow selection of tables by rows or columns and a far more specific output of data that would make this virtually flawless and far nicer, as it could be used to generate the columns without errors. Given that, I'll be willing to bet there are some freeware tools out there that have a similar feature...
Anybody know of one? (Sumatra PDF has a range selection when you use Ctrl and drag, so it could be used, albeit with a lot of fuss. It does virtually nothing to delimit text in columns, but it can export only the selection, so the columns can be manually differentiated. (That's going to be a problem in most PDF utilities.)) |