CUP-XML Unique numbers and MS Word

What is CUP-XML

Cambridge University Press (CUP) store indexes for their books in files separate from the books but with the locators, to which the indexes refer, embedded in the XML text of the book itself (unlike conventional embedding, where the headings are embedded). This is sometimes referred to as the CUP-XML Unique locator system.

For example, the index might contain a heading:

  • hydropower, 203.14

and the book will contain a tag “203.14” at the position in the text where hydropower is discussed. When CUP publish the book the locator 203.14 will be replaced with something more suitable – a page number for a paper book, or a hyperlink for a webpage or electronic text.

The system copes with both point locations and ranges, and has the advantage over straight Word embedding that it allows locators to have suffixes.

The CUP descriptions of the system are to be found on their authors’ advice pages.

Unique numbers and MS Word

Only CUP have access to the actual XML of the book. As a method of transporting the locator information from the indexer to CUP for inclusion in the XML, the indexer embeds the locator numbers in an MS Word copy of the book at the appropriate locations. So, in the above example, the indexer inserts “203.14” highlighted red into a Word copy of the book at the discussion on hydropower. CUP then extracts that field and transfers it to their XML. The index itself is sent to CUP as an rtf file, containing the index, as the indexer wants it formatted and using the unique numbers as locators.

What are the unique numbers

Since mid-2006 the unique numbers are now decimal numbers, the part before the decimal being the page number. They must be unique within the book. This means that the unique number 203.14 must define a single point in the book, however, it may appear many times in the index – for example:

  • electricity generation, 203.14
  • hydropower, 203.14

There is no requirement for them to be in numerical order, so “203.14” can appear earlier in the book than “203.12”, but having the page number as part of the number makes it easier for the indexer to keep track of what is going on.

Unique numbers and ranges

A unique number can define a point location in the book text, as described above, or a range of text. To define a range two locators must be included: at the start of the text range the locator must have a suffix of “b” and at the text range the locator must have a prefix of “f”. Nevertheless, this is still referred to in the index by the number alone.

Note: The fact that these are page numbers and the pagination may be different from that in the final book does not matter. The page number is only being used as a way of choosing the unique number – it does not appear in the index which the user sees – the locational information of the locator is given by its position in the text, not the unique number itself.

So “215.12B” and “F215.12” define a range of text in the book, which is referred to in the index by the unique number “215.12”.

Procedure for Embedding Unique numbers in Word

The things which are particularly easy to get wrong are:

  • Using a locator in the index which does not appear in the text – particularly by mistyping the locator in your indexing program
  • Having mismatched range locators – 205.12B without F205.12
  • Wrong range locators – B205.12 or 205.12F
  • Using a locator number in two places in the text so it is not unique

My procedure for tagging a Word document for CUP-XML is:

  • Set the page numbers at the start of each Chapter – so at the start of Chapter 1
    • Insert> Page Numbers (double-click on Insert if the option is missing)
    • Format
    • Start at > 101
    • OK
    • Close
  • Select a point in the document
  • Type 99.99
  • Select the typed text
  • Highlight it Red, using the highlight button on the toolbar
  • Select the text and cut it
  • Position the cursor at the start of the first paragraph of the text
  • Paste the highlighted text
  • By typing, change the number in the tag to what is shown from the status bar page/line suffixed by B
  • Copy the tag
  • Paste it at the end of the paragraph and change the suffix of B to a prefix of F
  • Think of a heading and enter it with the number into your indexing software
  • Consider whether the heading should extend to the next paragraph too – if so, select the end tag (with the suffix of F) and drag it to the end of the next paragraph
  • Repeat – always entering tags in pairs and immediately into the indexing software

Note that closing tags can be moved around by dragging. This should not be done for opening tags as otherwise the number, and that in the matching closing tag should be changed. If they are not changed, then there is the danger of using the same number more than once.

Realistically, most of the tags you enter need to be ranges. There are occasions where a point locator will be appropriate, such as a name for a scholarly citation, but generally ranges should be used, and are expected, to get the most benefit for CUP out of the CUP-XML process.

With care this procdedure is quite practical. If, however, you consider that getting a freelance indexer to do the indexing for you will free you up to do the stuff that you are good at, there is a list on the left of this page of indexers adept at dealing with CUP-XML.

For myself, I tend to automate things to make them simpler, and so, for professional indexers, I have incorporated CUP unique numbering into the latest version of WordEmbed. WordEmbed will, by clicking and selecting:

  • automatically assign unique numbers
  • put the number in the clipboard for pasting to indexing software, avoiding mistyping
  • ensure matching, correct pairings for ranges

WordEmbed is available here : WordEmbed

Disclaimer: This page is produced on a best-efforts basis to help authors and indexers understand the CUP-XML process. This page is not published by CUP and has no official status with them.


