Structured Storage

As software has become increasingly sophisticated, it has also become more demanding of the services offered by the operating system. Operating systems, in turn, have become resource managers that have to ration the available resources among the various applications. In the area of persistent storage, most operating systems are willing to provide an application with controlled access to the hard disk. Normally, the application receives a file handle through which it can read from and write to certain areas of the disk. In a typical file system, each file is treated as a raw sequence of bytes, with no meaning other than that given to it by the application that created the file. Although the bytes that comprise a file might actually be fragmented into small blocks scattered throughout the drive, the file system is responsible for understanding the layout of the data and presenting the application with a sequential view of it.

Structured storage is a facility that takes a unique approach to saving data. It enables a component (or components) to save data in a single file in a standardized, structured format. In the past, most sophisticated applications developed complex proprietary formats for saving the user's data. While workable, this approach was limited in two significant respects. First, only the application itself had any idea what data was stored in a file. Other applications, such as the system shell or other file viewers, had no information about the file other than its name, size, and other file system trivia. Second, the proprietary format made it almost impossible for other applications to embed their data in the same file. For instance, if the user of a word processing application embedded a picture produced by a graphics utility in a document, and if the picture data was saved in a proprietary format, the application would have difficulty storing the embedded data. Structured storage addresses these limitations by offering a standard mechanism for applications to use when saving data.

Structured storage is sometimes called "a file system within a file" because it can treat a single file as if it is capable of storing directories and files. A file created using the structured storage service contains one or more storages, roughly equivalent to directories, and each storage can contain zero or more streams, roughly equivalent to files. A storage can also contain any number of substorages.

The advantages offered by structured storage are numerous. First, instead of developing a proprietary protocol for saving data, an application can use structured storage to develop a more standardized solution. For example, a word processing application might save a document by creating distinct storages and streams for summary information, embedded objects, macros, and the user's data. The storage for embedded objects might contain substorages, one for each application whose data has been embedded in the document. Each of those substorages might have one or more streams, each containing the data of a single embedded object. This hypothetical example is illustrated in Figure 10-2.

The structured storage service is accessible through two interfaces: IStorage and IStream. IStorage offers methods that can be executed on a storage object, and IStream offers methods that can be executed on a stream object. Microsoft provides implementations of these interfaces as part of the structured storage service. As you'll see, the structured storage service is so easy to use that there's almost no excuse for not using it to read and write data.

Figure 10-2. A single file containing different types of data saved in a structured format.

The IStorage and IStream Interfaces

Like a directory in the file system, a storage object has a certain number of standard operations that it can perform. To get a general idea of the functionality provided by a storage object, consider the standard command-line utilities that are often executed on a directory. These utilities include md (make directory), cd (change directory), deltree (delete a directory tree), xcopy (copy a file or directory), and move (move a file or directory). All of these operations can be executed as methods of the IStorage interface and are described in the following table.

IStorage Method	Description
CreateStream	Creates and opens a stream object with the specified name contained in this storage object
OpenStream	Opens an existing stream object within this storage object by using the specified access permissions
CreateStorage	Creates and opens a new storage object within this storage object
OpenStorage	Opens an existing storage object with the specified name according to the specified access mode
CopyTo	Copies the entire contents of this open storage object into another storage object
MoveElementTo	Copies or moves a substorage or stream from this storage object to another storage object
Commit	Reflects changes for a transacted storage object to the parent level
Revert	Discards all changes made to the storage object since the last commit operation
EnumElements	Returns an enumerator object that can be used to enumerate the storage and stream objects contained in this storage object
DestroyElement	Removes the specified storage or stream from this storage object
RenameElement	Renames the specified storage or stream in this storage object
SetElementTimes	Sets the modification, access, and creation times of the indicated storage element, if supported by the underlying file system
SetClass	Assigns the specified CLSID to this storage object
SetStateBits	Stores up to 32 bits of state information in this storage object
Stat	Returns the STATSTG structure for this open storage object

To create or open a structured storage file, call the StgCreateStorageEx or StgOpenStorageEx function. Using the IStorage interface pointer returned by the StgCreateStorageEx function, you can create streams or additional substorages using the methods in the table above. The following code fragment creates a structured storage file named TestFile.stg and obtains a pointer to the root storage object's IStorage interface:

hr = StgCreateStorageEx(L"C:\\TestFile.stg", 
    STGM_DIRECT|STGM_CREATE|STGM_READWRITE|STGM_SHARE_EXCLUSIVE, 
    STGFMT_STORAGE, 0, 0, 0, IID_IStorage, (void**)&pStorage);

This seemingly simple request to create a new file actually creates on the disk a file that already contains 1.5 KB of data. This overhead is required by the implementation of the structured storage service.

Using the IStorage interface pointer to create a stream is straightforward. The following code fragment creates a stream named MyDataStream and retrieves a pointer to the stream object's IStream interface:

IStream* pStream;
hr = pStorage->CreateStream(L"MyDataStream", 
    STGM_DIRECT|STGM_CREATE|STGM_WRITE|STGM_SHARE_EXCLUSIVE, 
    0, 0, &pStream);

When working with standard files, developers typically use a file handle to execute read and write operations on a file. For structured storage files, you use the IStream interface to execute read and write operations on a stream within the file. You use the ISequentialStream::Write method to write data into a stream, as shown here:

ULONG bytes_written;
char data[] = "HELLO THERE!";
pStream->Write(data, strlen(data), &bytes_written);

The IPropertySetStorage and IPropertyStorage Interfaces

The actual data stored by an individual stream might be in a proprietary format, but it's important that the structure of a file saved using the structured storage interfaces be accessible to all applications through a standard protocol. Not only is the data saved in an organized fashion, but utilities and other applications might be able to obtain information about the data stored in the file if certain streams in the file are stored in a standardized format. For example, a file might have a summary information stream in the root storage that contains information about the file's contents. Other applications, such as the system shell, might allow the user to execute sophisticated queries based on this information—such as a request for a list of all documents written by a certain person or containing information about a certain subject.

As part of the structured storage service, Microsoft offers a property set format that you can use to store summary information in a standard way. This information can then be accessed by anyone who wants it. Microsoft Windows, for example, displays this information when the user right-clicks the icon for a file and then chooses Properties from the context menu to display a Properties dialog box, as shown in Figure 10-3. When viewing a folder as a Web page, Internet Explorer automatically displays this information for any selected file.

Figure 10-3. The Properties dialog box displaying the summary information stream for a file.

The IPropertySetStorage and IPropertyStorage interfaces encapsulate the functionality needed to create property sets, such as the summary information stream described above. The confusing thing about the property storage interfaces is that from the point of view of a structured storage file, a property set is written into a single stream. Therefore, IPropertySetStream and IPropertyStream might seem to be more appropriate names for these interfaces. However, the IPropertySetStorage and IPropertyStorage interfaces were not designed to be used only by structured storage. Since other services might benefit from these interfaces, conceptually these interfaces abstract the storage of properties.

When you work with the property set implementation provided by the structured storage service, you obtain a pointer to the IPropertySetStorage interface by calling QueryInterface on an IStorage pointer, as shown here:

IPropertySetStorage* pPropertySetStorage;
pStorage->QueryInterface(IID_IPropertySetStorage, 
    (void**)&pPropertySetStorage);

Every property set is identified by a globally unique identifier (GUID) called a format identifier (FMTID). This identifier allows any application that might come across this property set to quickly determine whether it understands the contents of the property set. The two most widely used FMTIDs are FMTID_SummaryInformation and FMTID_DocSummaryInformation. The latter stores extended summary information for documents created by Microsoft Office applications. When you create a new property set, you must define its FMTID. The following code uses the IPropertySetStorage interface to create a new property set with the format identifier FMTID_SummaryInformation:

IPropertyStorage* pPropertyStorage;
pPropertySetStorage->Create(FMTID_SummaryInformation, 
    NULL, PROPSETFLAG_ANSI, 
    STGM_CREATE|STGM_READWRITE|STGM_SHARE_EXCLUSIVE, 
    &pPropertyStorage);

The methods of the IPropertySetStorage interface are described in the following table.

IPropertySetStorage Methods	Description
Create	Creates a new property set
Open	Opens a previously created property set
Delete	Deletes an existing property set
Enum	Creates and retrieves a pointer to an object that can be used to enumerate property sets

The IPropertySetStorage::Create method returns an IPropertyStorage interface, which you can use to work with an individual property set. The methods of the IPropertyStorage interface are described in the following table.

IPropertyStorage Methods	Description
ReadMultiple	Reads property values in a property set
WriteMultiple	Writes property values in a property set
DeleteMultiple	Deletes properties in a property set
ReadPropertyNames	Gets corresponding string names for given property identifiers (PROPIDs)
WritePropertyNames	Creates or changes string names corresponding to given PROPIDs
DeletePropertyNames	Deletes string names for given PROPIDs
SetClass	Assigns a CLSID to the property set
Commit	As in IStorage::Commit, flushes or commits changes to the property storage object
Revert	When the property storage is opened in transacted mode, discards all changes since the last commit operation
Enum	Creates and gets a pointer to an enumerator for properties within this property set
Stat	Receives statistics about this property set
SetTimes	Sets modification, creation, and access times for the property set

Two structures are used in defining a property set: PROPSPEC and PROPVARIANT. The PROPSPEC structure defines the property based on a property identifier (PROPID) or a string name; properties are typically defined by a PROPID. More than a dozen PROPIDs are defined for the summary information property set; they define everything from the document's author to a thumbnail sketch of the document. Of course, an application will work only with the properties it requires.

The value of the property itself is defined in the PROPVARIANT structure. This is a variation on a structure used to store variants. For example, to store string data, you simply declare the VT_LPSTR variant type. The code fragment below stores a single property in a property storage. Notice that the string "Anna" is not saved in Unicode. Although in general all strings used by COM+ must be in Unicode, string data stored in a property set need not be. In fact, for the data stored in a summary information property set to display properly on non-Unicode systems such as Windows 98, the string data must not be stored in Unicode.

PROPSPEC ps;
ps.ulKind = PRSPEC_PROPID;
ps.propid = PIDSI_AUTHOR;

PROPVARIANT pv;
pv.vt = VT_LPSTR;
pv.pszVal = "Anna";

hr = pPropertyStorage->WriteMultiple(1, &ps, &pv, 0);

In Windows 2000, the New Technology File System version 5.0 (NTFS5) has been updated so that it supports native property sets on any file or directory using the IPropertyStorage and IPropertySetStorage interfaces. You can use these interfaces to attach a property set to flat files (such as bitmaps) in addition to structured storage files. NTFS stores this data on the disk in a special part of the file structure; this enables applications such as Microsoft Index Server to index and search the contents of native NTFS property sets.