Embedding files into Open XML documents using C#

Ever since Microsoft released Version 2.0 of the Open XML SDK (Software Development Kit), there has been a lot of hype around it. It represents several exciting opportunities for developers, particularly with the features available in the second version. Anyone who has worked with XML will probably be familiar with Open XML too, but just to recap – Open XML is essentially an open ECMA 376 standard that defines a set of XML schemas for representing different type of word processing documents, spreadsheets, charts, ppts and so on. While the newer versions (2007 onwards) of Microsoft Office Word, Excel, PowerPoint etc have all been using Open XML as the default file format, lately more and more developers have started exploring the features, due to the ability provided by the SDK to manipulate Open XML documents using relatively familiar technologies like ZIP and XML.

Open XML SDK download is available on Microsoft’s website, and consists of an API that makes several common tasks related to Open XML packages a matter of a few lines of code. However, there are still certain things that are not so straight-forward. One of them is the ability to embed non-xml files within Open XML documents. It is not uncommon for end users to want to embed pdf into word for example.Quite possible and easy, just requires a bit of know-how.

Brian Jones has already written a couple of excellent articles on embedding files into Open XML documents.Also the need of why a conversion is required for pdf/other files when inserting into word or any other open xml format I am not going to repeat them here again. You can instead read his articles here:

Embedding an Open XML File in another Open XML File

Embedding Any File Type, Like PDF, in an Open XML File

The gist of the approach in the second article which shows you how to embed any file type (like PDF) into an Open XML document is: Using OLE server application create a storage object with the IStorage interface and an image of embedded object, add an image part to the document and feed it with data from the generated image, add an embedded object part to the document and feed it with the data from the IStorage object, create a para with the embedded object and insert into the open xml document.

Basically, in this approach, the creation of IStorage object and the image that represents the embedded object is done using the OLE server associated with the PDF application. This is accomplished using C++ code.

Now, what I am going to show you here is how to accomplish the same using C# code. There are several reasons for wanting to do this. First of all, even if you know C++ others coming behind you may not and it will become a maintenance nightmare.

So, let us try to convert the PackageOleObject function from C++ to C#. When we convert it to C#, this is what we get:


var result = OLE32.StgCreateStorageEx(oleOutputFileName,Convert.ToInt32(OLE32.STGM.STGM_READWRITE | OLE32.STGM.STGM_SHARE_EXCLUSIVE | OLE32.STGM.STGM_CREATE | OLE32.STGM.STGM_TRANSACTED),Convert.ToInt32(OLE32.STGFMT.STGFMT_STORAGE),0,IntPtr.Zero,IntPtr.Zero,ref OLE32.IID_IStorage,out storage);

resultString.AppendLine("CreateStorageEx Result: " + result.ToString("X"));
if (result != 0)
return resultString.ToString();
var CLSID_NULL = Guid.Empty;
result = OLE32.OleCreateFromFile(ref CLSID_NULL,newInput,ref OLE32.IID_IOleObject,(int)OLE32.OLERENDER.OLERENDER_NONE,(IntPtr)null,null,storage,out pOle);

resultString.AppendLine("OleCreateFromFile Result: " + result.ToString("X"));
if (result != 0)
return resultString.ToString();

result = OLE32.OleRun(pOle);

resultString.AppendLine("OleRun Result: " + result.ToString("X"));
if (result != 0)
return resultString.ToString();

IntPtr unknownFromOle = Marshal.GetIUnknownForObject(pOle);
IntPtr unknownForDataObj;

Marshal.QueryInterface(unknownFromOle, ref OLE32.IID_IDataObject, out unknownForDataObj);

var pdo = Marshal.GetObjectForIUnknown(unknownForDataObj) as System.Runtime.InteropServices.ComTypes.IDataObject;

var fetc = new System.Runtime.InteropServices.ComTypes.FORMATETC();
fetc.cfFormat = (short)OLE32.CLIPFORMAT.CF_ENHMETAFILE;
fetc.dwAspect = System.Runtime.InteropServices.ComTypes.DVASPECT.DVASPECT_CONTENT;
fetc.lindex = -1;
fetc.ptd = IntPtr.Zero;
fetc.tymed = System.Runtime.InteropServices.ComTypes.TYMED.TYMED_ENHMF;
var stgm = new System.Runtime.InteropServices.ComTypes.STGMEDIUM();
stgm.unionmember = IntPtr.Zero;
stgm.tymed = System.Runtime.InteropServices.ComTypes.TYMED.TYMED_ENHMF;
pdo.GetData(ref fetc, out stgm);
var hemf = GDI32.CopyEnhMetaFile(stgm.unionmember, emfOutputFileName);

What we are doing here is to essentially create a bin file and a emf file.Instead of using C++, we use pinoke in c# to call these methods from ole32.dll. The bin file will be content and emf file will be the icon that will show up for embedded object in the open xml document.

StgCreateStorageEx creates a bin file at the said path, to which we will write the content into using OleCreateFromFile (both these are in the OLE32 class which use pinvoke to call the respective methods from ole32.dll). Once this file is created we get the data from it using the GetData method and create an meta file from it using CopyEnhMetaFile, which is in gdi32.dll.

I’ve attached the export helper class here which utilizes the technique mentioned here and you can use it as shown below

public void TestMethod1()
string pdfFile = @"C:\temp\po.pdf";
string binFile = @"C:\temp\po.bin";
string emfFile = @"C:\temp\po.emf";
var error = Export.ExportHelper.ExportOleFile(pdfFile, binFile, emfFile);
Assert.IsTrue(string.IsNullOrEmpty(error), "Didnt expect any errors");

Couple of things to note here.

  • If you are planning to deploy to a 64 bit OS, you may want to set target platform to x86
  • On a 64 bit OS, this seems to work only with Abobe version 9.Higher version fails with error code 0x8000FFFF which translates to Catastrophic failure.
  • Depending on usage you may not want to uninitialize and reinitialize ole APARTMENT THREADED. This would be required only if some where else in your application ole is initialized without using apartment thread/li>

Leave a Reply

Your email address will not be published. Required fields are marked *

seven − 2 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>