Word to S1000D XML

Conversion from Microsoft Word is one of the more common conversion tasks that we are asked about. The success of the conversion can depend on the use of defined styles in the Word document but even without this conversion may be possible.

Our specialist Consultant has over 20 years experience in this field and his conversion scripts are written to minimise the level of costly, and time-consuming, manual post-conversion clean-up that can often be required.

Word files are first converted to well-formed XML. These files can be further split into separate files that match section levels in the Word file (i.e. section 1.1, section 1.2 etc). These files can be renamed according to Data Module Code (DMC) naming conventions, if details for these are available. Following the completion of any post-conversion manual clean-up that may be required, the resulting files will be parsing XML files that conform to the S1000D standard.

Graphics in the Word document produce a graphic element in the resultant XML file and an matching entity declaration. As with the document text, if DMC naming conventions for the images have been supplied, these can be incorporated. Graphics can be extracted from the Word file, but they will be at a resolution of 72 dpi, which will affect their quality and suitability for press. Graphics can be supplied separately, together with a mapping for the source and target file names The graphics can then be renamed accordingly during the conversion process.

On completion of the conversion, the client is supplied with:

  • an XML file for each component part of the original Word document(s), assigned the relevant DMC number if available
  • a file that maps the component sections in the Word file to the outputted file name
  • a file that details the graphics in the documents. If the source figures do not have DMC numbers, the file simply lists the figures; if the source figures do have a DMC number, the file maps the figure names to the appropriate DMC number.