A recent discussion on an indexing discussion list got me thinking that what indexers needed was a guide to everything they didn’t know about the future indexing technologies. So here it is.
In the beginning was the scroll. Some time later came the book – the scroll was chopped up into fixed size pages which were then numbered sequentially. That opened the way for the index – a collection of headings together with indicators showing on which page the required information appeared. And now, and in the future, we are going back to the scroll. Instead of books with multiple pages, books are being published with a single, very-very long page. These books are not on physical paper, as with the scrolls of antiquity, of course, but on a computer screen. So imagine a web page containing the whole contents of a book. In your browser you see a single page on the screen, and can move to the next page by clicking on the scroll bar. You still view the book a page at a time, but, and this is the key point, the size of the page is not fixed. You can make the browser window larger or smaller, zoom to increase or decrease the font size, and the page size changes. You might think that very few books are published in this way, but you would be wrong. This is exactly how the Kindle works. To publish on a Kindle the book is converted into a webpage and then displayed a page at a time on an e-ink screen. By changing the font size or jumping to a particular starting point in the text, you change where your pages start and end. If you view the book on the bigger screen Kindle DX you get a different pagination.
So the future of books is the scroll, which leaves a problem for indexes. They relied on the pages. The solution is to have the indexes point to something in the text smaller than pages – specific paragraphs, specific words or individual character positions. There are two ways of doing this – tagging and embedding.
Tagging is adding little markers in the text and then using those markers in the index. This could be done simply using the character number – instead of “page 123” we refer to “character 492,761”. Alternatively a smaller number of more convenient tags can be added by the publisher, perhaps one for each line, and the index uses those (Elsevier does this). Or perhaps the indexer themselves adds tags to the text and uses those in the index (CUP-XML does this). What the actual locators look like doesn’t really matter because when displayed on the e-book screen it is simply a link for the reader to click on.
Embedding actually stores the index headings themselves directly in the text at the required character position. A separate run of a computer program is required to go through the text and create an index using the character numbers we talked about in tagging, but that can be done very quickly.
Storing these tags and headings in the actual text itself would, of course, be unacceptable if they could be seen by the reader, so it is necessary to have some form of text which is more complex than simply characters and letters, which allows information to be stored invisibly. There are many different ways of doing this as no-one has come up with a format which everyone agrees to be perfect. All word-processors have their own formats, such as MS Word with .doc and .docx, or OpenOffice with .odt. Web pages use a format called HTML which uses tags inside angle brackets to enclose invisible information, so <title> </title> indicates indicates that the text in between the tags is to be used as a title for the browser window but would not be displayed to the reader as part of the text. XML also uses angle bracket tags but goes a step further and allows the document creator to define their own tags. So if you wanted to use <browserheading> </browserheading> instead of the <title> tag then you could do so. It also means that you can create tags for things which no-one else has thought of. As a publisher you can come up with a tag system better than anyone else and then reap the commercial rewards of that system.
So what does this mean for the indexer? First, that indexing has to be more precise. Rather than identifying on which page a concept appears the indexer must identify exactly where the topic starts and ends, right down to the character position. That always involves more work. Second, the indexer will have to use a range of software tools or techniques to record the index information in the document. This might be software tools involving drop-down menus or special keystrokes, or techniques involving colored numbers printed on PDFs. Furthermore, these software tools and techniques will come mostly from the publishers, who are the inventors of their own systems, often designing them concentrating on facilities to handle page presentation, such as illustrations and tables, and not the work patterns of the indexer.
As I mentioned there is no single format on which everyone agrees, nor is their any sign of one being agreed in the near future. As indexers we need to be agile. Investigate and find a format in which you think there will be demand for indexes. Create your own techniques and even tools, using word-processor macros, spreadsheets or programmable function keys on keyboards, to make your indexing process efficient for that format. Pursue work in that format. Blog and tweet about your learning experiences and maybe work will find you. If that doesn’t take off, learn another format. That adds to your portfolio of skills, your menu of services offered. Having said that, it does seem that indexing is in danger of becoming a Red Queen’s Race.