The topic today is PDFs and there is probably something for everyone here, whether you aren’t sure how to handle them or why you would want to, or whether you have Acrobat Pro and think that is all you need.
PDFs are files with the suffix .pdf and were designed by Adobe to be portable – to display the same on any machine, PC, Mac or whatever. This makes them ideal for the publishing world and it is much cheaper to send a PDF than to send paper proofs.
As indexers they are nothing but advantageous for us. Email is much faster than couriers, so typically we get an extra day’s indexing time. Do, however, always confirm that you have received them. Get your clients used to expecting that. Email is never reliable and emails with large attachments are prone to getting blocked. You really don’t want to find that your client thinks you have been working on the index for the past week while you have been waiting for the file to arrive.
Quite often I find that the client sends a separate PDF file for each chapter of the book. Clearly we don’t want to have to ask the client to spend a chunk of their day reformatting the PDFs for our benefit, and we don’t have to.
We can use the PDFs unmerged. Create a directory and put all the PDFs for the book into that single directory. In Acrobat Reader press Shift-Control-F and in the Search dialogue you can search all documents in a particular directory. This will enable you to use the Search throughout the whole book, but if are editing your index and want to go to, say, page 174, it doesn’t help. You are not sure which file page 174 is in and even with each file open in a separate window it is still trial and error.
Merging the PDFs into a single file gives you maximum flexibility.
The easiest way of merging PDFs is to use Acrobat. Acrobat comes now in 4 versions: Reader, Standard, Pro, and Pro Extended.
The first, Reader, is completely free and can be downloaded from the Acrobat site The latest version is 9.1 and it is an improvement in every way over previous versions. There is no excuse for using an older version. Reader cannot, however, merge PDF files.
The other versions of Acrobat can all merge files and all cost money – quite a lot of money. If you are just indexing, rather than also copy-editing, typesetting etc., then you don’t need more than Standard. Check the Adobe comparison of features
Once installed you can select the files in Explorer, right-click and select Combine in Adobe Acrobat. You then get the chance to reorder the files, because they are not picked up in the order on disk, press the button and Acrobat will chug away for a while and create a single file.
Sometimes, however, you will instead get a message saying that the file is ‘protected’ and it cannot be merged. This is where non-Adobe utilities come into their own.
Merging Protected PDFs
PDF Split and Merge from is available from SourceForge. It is completely free, but you can make a donation on the download page if you decide this is saving you money.
From pdfsam.org Download the basic 1.1.4 installer, the one at the top, and double-click it.
Once it starts, click on the left on “Merge/Extract”; click on Add to select the files and click Open.
Adjust the order if required and use Browse to give the output file a name
Click Run and within a second or so, you have a merged PDF file, with none of those pesky “protected file” messages.
PDFs have two sorts of page numbers – physical and logical. In order to see both, in Reader 9 press ctrl-K to bring up the Options/Preferences dialogue, select “Page Display” on the left, and check the Use Logical Page Numbers box. If this box is unchecked you will always see only the physical page numbers, so the first page will always be page 1. With it checked, you may see in the navigation bar “97 (1 of 52)” which means logical page 97, physical page 1. You may see “iii (3 of 158) which means that the prelims have been numbered with roman numerals. This is more helpful for indexing because the logical page number can match the number on the actual page, which needs to appear in the index. If the publisher hasn’t already done this then you can correct the page numbers yourself using Acrobat Standard upwards. This is not ideal, however, because you don’t see the logical page number everywhere. If you use the Search function and hover your mouse over each of the results displayed, it shows only the physical page number, and so sometime in a large index you are going to confuse the two.
My preferred option is to physically move the prelims to the back of the document, or delete them altogether, aligning the physical page numbers with the logical ones. You can do that with a paid-for version of Acrobat, but you can also do it with PDF Split and Merge. Clicking on the “Split” option on the left of the screen, Add your PDF file and enter the number of the last page of the prelims into the “Split after these pages” box. That will give you two files, one of prelims and one of text with the page numbers matching the numbers on the printed page.
Seek and ye shall Find
Acrobat has two seeking mechanisms which are distinct and can be used together. In earlier versions of Reader they got entangled, but in version 9 they work properly.
Find works as you might expect. You press ctrl-F and enter text into a box, press return and it takes you to that text in the document. It then gives you arrows next to the box to take you to the next or previous occurrence. Rather less obvious is a small drop-down menu on the side of the box which allows you to seek Whole Words Only or to make the operation Case Sensitive.
Search is completely separate from this and is accessed by pressing Shift-Ctrl-F. Again you enter a text but now a separate window shows each occurrence of the text, with the rest of the line displayed too. On a large document this can take some time to search through the whole document, but you don’t have to wait for it to finish. As soon as you see the occurrence you want you can click and the main window will jump straight to it. Pressing the down-arrow key will move you down the list of search results and the main window will again jump to that location (this didn’t work in some versions of Reader earlier than version 9).
It is important to understand that these two functions are completely separate. For example, you could Search for all occurrences of ‘Inquisition’ and then when examining the 4th occurrence you could Find ‘Torquemada’ without disrupting your list of Search results.
Some of what you find may be too small to read, with its location shown on screen surrounded by a blue box. With PDFs you can ZOOM and make the text larger. The easiest way to do this is to hold the control key down and move the mouse wheel, but there are toolbar buttons and menu options too.
One thing to be wary of when you have zoomed to enlarge the text is where the page ends. It is possible to be looking at the bottom of page 7 with the top edge of page 8 showing at the bottom of the window, with the page number in the navigation bar showing 8. At the end of a long week working on a long book, at some time you will put 8 in the index instead of 7. The way to avoid this is to set the view to View> Page Display> Single Page, rather than Single Page Continuous. That will mean that you only ever have one page on the screen at a time and the number shown is the number to go in the index.
This allows you to handle whatever a publisher throws at you for indexing, but there is one further utility which is useful for the business side of indexing. That is creating your own PDFs.
Why would you want to? Well, PDFs are by far the best format in which to store any form of electronic document. If you store Word files, for example, then you need the Word software in order to read them and if you open them on a different machine from the one on which they were created then they will look different.
The utility CutePDF is free and creates an artificial printer on your machine. You then print to that printer from any application you like and you get a searchable PDF file. The paid-for versions of Acrobat already contain a similar utility but CutePDF is much faster. The latest version of Word allows you to download a utility and then save files in PDF format, but it can be very slow and sometimes crashes.
CutePDF also, of course, allows you to create PDFs from your browser, so when making a purchase online or filing your tax return, for example, you can “print” the confirmation page to PDF and store that file in your records. This is probably as close to the paperless office as you will ever get.