Automating the Office Documents with the Open XML SDK and SharePoint 2010 Word Automation Services

Microsoft has recently released RTM version of the Open XML SDK V2.0.

Open XML SDK is one of the most hidden jewels of the Microsoft Stack of utility programs. Open XML SDK allows organizations to automate the office solutions (Word, Excel, and PowerPoint) through custom applications like ASP.NET Applications, Windows Services, WCF Services, or any other technologies based on the Microsoft.NET framework.  Open XML SDK is the .NET 3.5 framework based utility to create, update, and manipulate office documents from the server side code. Additionally, you can integrate Open XML SDK solutions with the SharePoint through Web Parts, Ribbons, Application Pages, Item level Action Menus, or Workflows to automate the office documents manipulation using the SharePoint features and solutions framework.

With the release of the SharePoint 2010, Microsoft has introduced new office service – Word Automation Service. Along with Excel Services and Open XML SDK, Microsoft has opened up the series of possibilities for the server side office document automation which would not only programmatically allow creating these documents but manipulating, assembling, shredding, and converting these documents into one format to another format.

If you are planning to automate the office solutions from the servers, please bookmark following links. There isn’t much out there if you search Google or Bing but we have been extremely blessed by the two of the most active MSFT bloggers (Zeyad Rajabi/Brian Jones and Eric White) sharing numerous scenarios, videos, examples, and downloadable code regarding Open XML and Office Automation Solutions.

I was lucky enough to aware of the Zeyad Rajabi’s “SharePoint 2010 Based Document Assembly and Manipulation using Word Automation Services and Open XML” session at the SPC 2009 and here are some of the highlights of the session notes.

Open XML File Format –

  • Open XML is an open ECMA 376 standard and ISO/IEC 29500 standard
  • Defines a set of XML schemas for representing spreadsheets, charts, presentations, and word processing documents.
  • Office Word 2007 and later, Excel 2007 and later, and PowerPoint 2007 and later all use Open XML as the default file format.
  • Users see as single file (.docx, .xslx, .pptx), Developers see as zip file with XML parts
  • Allows developers access to Office files without the need of the Office applications (no need of Office client requires to be installed)
 Open XML SDK –
  • Office Files Manipulation Tool
  • Allows you to create and modify Open XML documents
  • Allows manipulating file contents while users are co-authoring
  • SDK will support both Office 2007 SP2 and Office 2010 file formats
  • Based on Microsoft .NET Framework version 3.5 SP1 (C# and VB), Compatible with LINQ and LINQ-to-XML
  • SDK does NOT Perform file conversions to other formats, like PDF or XPS (Use word automation services)
  • SDK does NOT Perform layout + recalculation tasks (Use word automation services and Excel Services)
  • Open XML 2.0 RTM is latest release. It contains
    • Microsoft.NET managed class library that provides capabilities for reading, writing, modifying, and validating Open XML documents.
    • Productivity tool that includes the ability to diff Open XML documents, a C# code generator, and tools to explore and read about the class library and the standard.
  • Scenarios Possible –
    • Push data into office files from the database and any other data sources accessible from the .NET Framework (Read: It means, it will support virtually loading data from anywhere)
    • Pull data from the office files – query, extract etc.
    • Manipuate the office files – Add the content, Update the content, Swap the content, Append the content etc.
    • Validate office files – Make sure OpenXML format files work with office client
    • Can integrated with the SharePoint – Workflow based, custom action based, web part based, ribbon based by invoking the OpenXML code
    • Can create, download, manipulate, and upload the documents in the SharePoint document library – Use the SharePoint Object Model (if Open XML code runs on the SharePoint Server) or SharePoint Web Services (if Open XML code runs on the non-SharePoint servers)
  • SDK is super powerful at manipulating Office documents but There are still tasks that require application logic
    • Use Word Automation Service for: repagination, conversion to other document formats such as PDF, or updating of the table of contents, fields, and other dynamic content in documents
    • Use Excel Services for: Calculation, rendering complex charts/pivot tables
 Office Services – Word Automation Services
  • Office Files Conversion Tool         
  • Server-ready version of Word, Slimmed down version of word API, Read/write any format understood by Word client
  • Requires SharePoint 2010, standard or enterprise CAL
  • It is a shared service that provides unattended, server-side conversion of documents into other formats, as well as some other word document field manipulation capabilities
  • Convert the word documents to the PDF/XPS and spool them to the printer for the automated printing. Please note service does not include capabilities for printing documents
  • Word Automation Services provides:
    • Layout
    • Export to fixed format
    • File conversion – supports pre-Word 2007 formats, supports PDF/XPS format
    • Complex field calculation
    • Updating Table of contents

Here is the same presentation Zeyad Rajabi has delivered at the PDC,

Although reading Zeyad Rajabi blog would inspire many different scenarios for the office automations, here are some of the business scenarios are possible with the Open XML SDK and SharePoint 2010 Office Services

  • Server Side Document Assembly and Document Shredding – Merge documents together with OpenXML SDK.
  • Covert thousands of Word documents in the PDF file in batch using the word automation services
  • Assemble powerpoint, word, and excel documents into one master word document
  • Generate the word documents based on the data from any source supported by the .NET framework
  • Create and Manipulate word documents stored in the SharePoint document library

I am currently working on the project where I need to manipulate the word documents stored in the SharePoint document library through the program based on the Open XML SDK and update the word document with the data from the Azure Database which would be ultimately consumed by the ASP.NET application hosted in the Azure environment via .NET or WCF Web Services. As you can see, Open XML SDK opens up thousands of office automation scenarios and along with Word Automation Services and Excel Services, sky is the limit. Hopefully this blog entry makes you aware of what Open XML SDK and SharePoint Office Services are capable of.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s