Arrays

C++ affords great flexibility when you deal with pointers and arrays. In fact, there is almost no difference between pointers and arrays in C++. This flexibility can translate into great ambiguity in some cases. Consider the following code:

int* x;        // A pointer, but to how many integers?

int z;         // One integer
int y[50];     // 50 integers (0 through 49)

x = &z;        // x is a pointer to one integer.
x = (int*)&y;  // x is a pointer to an array of 50 integers.

You can see that the intended use of pointer types defined by C++ is not always clear; from the first line of code, you can't tell whether x will point to one integer or to many integers. The truth is, it doesn't matter to C++. If you use a pointer to write data past the end of the allocated space, that's your problem. The flexibility of pointers and arrays is the main reason that pointers are considered a dangerous language feature and are not directly available in higher-level languages such as Java and Visual Basic.

Fixed Arrays

When you make remote calls, it is crucial that you know whether an argument points to a single item or to multiple items because all the data at the memory location pointed to by the argument must be transferred to the server. Unless otherwise indicated by the interface definition, a pointer parameter in IDL is assumed to point to a single element of the specified type. By this rule, the [out] attribute in the following function declaration is assigned to a parameter that is a pointer to a single integer:

HRESULT SquareInteger([in] int myInteger, [out] int* square);

When you need an actual array, the number of elements in that array must be clearly defined so that the system knows how much data to transmit over the network. When you design a distributed system, remember that every unnecessary byte sent across the network slows the entire system. The simplest technique for passing an array of elements to a function is known as a fixed array. From the following code fragment, you can see how fixed arrays got their name—the array holds a fixed number of elements that is known at compile time:

HRESULT SumOfIntegers1([in] int myIntegers[5], [out] int* sum);

At run time, the marshaler for this method copies the 20 bytes (5 integers multiplied by 4 bytes each) of memory pointed to by myIntegers into a transmission buffer. This buffer is then sent to the server, where the myIntegers pointer passed to the SumOfIntegers1 method is adjusted to point directly to the buffer received by the server. In theory, fixed arrays are the most efficient means of passing data. In practice, however, it can be difficult to predict exactly how much data will need to be transferred. Choosing too small a size is an obvious problem, and choosing a larger size can mean that a mostly empty buffer is sent across the network.

As you might recall from Chapter 15, for efficiency reasons the stub often passes—as a method's parameters—direct pointers into the message buffer obtained from the client. With fixed arrays, this is possible because the data transmitted in the buffer is an exact copy of the data in the client's address space, as shown in Figure 16-1.

Conformant Arrays

IDL defines several attributes that help define and control the size of arrays and the data transmitted. These attributes are listed in the following table.

Attribute	Description
first_is	Index of the first array element transmitted
last_is	Index of the last array element transmitted
length_is	Total number of array elements transmitted
max_is	Highest valid array index value
min_is	Lowest valid array index value (not supported—always 0)
size_is	Total number of array elements allocated for the array

Figure 16-1. The marshaled form of a fixed array.

To allow the number of elements transmitted to the server to be determined at run time, IDL supports conformant arrays. The size_is attribute indicates a conformant array. As shown in the following declaration, the caller specifies the actual number of elements in the array at run time. This technique allows the marshaling code to dynamically determine how many elements are in the array, and it has the added advantage of helping the implementation of this method determine how many numbers to add.

HRESULT SumOfIntegers2([in] int cMax, [in, size_is(cMax)] int* 
    myIntegers, [out] int* sum);

Methods that use conformant arrays can be declared using the preceding pointer notation or with the array notation below; they are functionally identical:

HRESULT SumOfIntegers2([in] int cMax, [in, size_is(cMax)] int 
    myIntegers[], [out] int* sum);

Either of these two declarations implies that the client will call the SumOfIntegers2 method, as shown in the following code fragment:

int sum = 0;
int myIntegers[] = { 4, 65, 23, -12, 89, -23, 8 };
pTest->SumOfIntegers2(7, myIntegers, &sum);
cout << "Sum = " << sum;

Conformant arrays might also be embedded in structures, as shown in the following example. Note that no more than one conformant array can be nested in a structure and that it must be the last element of the structure.

typedef struct tagSUM_STRUCT
{
    int cMax;
    [size_is(cMax)] int* myIntegers;
} SUM_STRUCT;

HRESULT SumOfIntegers3([in] SUM_STRUCT* myIntegers, 
    [out] int* sum);

The client-side call is shown in the following code:

int sum = 0;
int myIntegers[] = { 4, 65, 23, -12, 89, -23, 8 };

SUM_STRUCT x;
x.cMax = 7;
x.myIntegers = myIntegers;
pTest->SumOfIntegers3(&x, &sum);

The max_is attribute is nearly identical to the size_is attribute. The only difference is that the size_is attribute specifies the total number of elements in an array and the max_is attribute specifies the highest (zero-based) index value into the array. Thus, a size_is value of 7 is equivalent to a max_is value of 6.

Conformant arrays using the size_is or max_is attribute work well for parameters with [in] attributes but not as well for parameters with [out] or even [in, out] attributes. The problem is that the server gets stuck with the number of array elements specified by the caller. Imagine the case of a parameter with an [out] attribute. The following method returns a specified number of integers:

HRESULT ProduceIntegers1([in] int cMax, [out, size_is(cMax)] 
    int* myIntegers);

The client must allocate a buffer large enough to hold all the data that the method might return, as shown in the following code fragment:

int myIntegers[7];
pTest->ProduceIntegers1(7, myIntegers);

This is all well and good as long as the server returns exactly seven integers every time. Since the client states the maximum buffer size, the server cannot return more than seven integers; if the server returns less than a full load of integers, network bandwidth is wasted. As with fixed arrays, conformant arrays are passed directly to the method implementation from the message buffer received in the stub because the complete array is always in the request message sent by the client, as shown in Figure 16-2.

Figure 16-2. The marshaled form of a conformant array.

Varying Arrays

To deal with the problem of the server returning less than the full number of elements, IDL introduces the concept of varying arrays. As with fixed arrays, the bounds of a varying array are decided at compile time, but the range of elements actually transmitted is determined at run time. The length_is attribute specifies the number of elements to be transmitted at run time, as shown in the following function declaration:

HRESULT ProduceIntegers2([out] int* pcActual, 
    [out, length_is(*pcActual)] int myIntegers[4096]);

The client-side call is shown here:

int how_many_did_we_get = 0;
int myIntegers[4096];
pTest->ProduceIntegers2(&how_many_did_we_get, myIntegers);
cout << how_many_did_we_get << " integers returned.";

This mechanism lets the method determine how many integers, up to a preset maximum, should be transmitted back to the client in the response message. The first [out] parameter, pcActual, is also used to tell the client how many elements were returned. This technique ensures that the client code does not walk into part of the uninitialized buffer accidentally. Varying arrays are also useful for methods with [in, out] parameters, as shown in the following declaration:

HRESULT SendReceiveIntegers([in, out] int* pcActual, 
    [in, out, length_is(*pcActual)] int myIntegers[4096]);

This method allows the client to send a variable number of elements and then receive a different number of elements back, as shown here:

int num_integers = 5;
int myIntegers[4096] = { 0, 1, 2, 3, 4 };
pTest->SendReceiveIntegers(&num_integers, myIntegers);
cout << num_integers << " integers returned";

You can use the first_is and last_is attributes to mark a certain range of elements in the transmission array. Unless you otherwise specify with the first_is attribute, the zero-index element is always the first element of the array transmitted. Using the first_is attribute, you can specify a certain element in the array as the starting point for transmission. The last_is attribute specifies the index of the last element in the array that is marked for transmission. Like the max_is variation on size_is, the last_is attribute defines an index into the array rather than the count defined by length_is. In practice, the first_is and last_is attributes are rarely used.

When used judiciously, varying arrays reduce network traffic, but they can also hurt performance because the data packet received by the marshaling code is not in the correct format to be handed over to the client or server code. Instead, another block of memory must be allocated on the receiving end and the data reconstructed, as shown in Figure 16-3.

Open Arrays

In practice, varying arrays can be somewhat restrictive because there is always a fixed maximum amount of data that can be transmitted. To combat this problem, IDL lets you create arrays with attributes of both conformant and varying arrays, known as open arrays (sometimes called conformant varying arrays). An open array is distinguished by the use of both the size_is attribute of a conformant array and the length_is attribute of a varying array. The caller can control the size of the buffer, but the method itself controls the number of elements transmitted over the wire, as shown in the following code:

HRESULT ProduceIntegers3([in] int cMax, [out] int* pcActual, 
    [out, size_is(cMax), length_is(*pcActual)] int* 
    myIntegers);

The client calls the ProduceIntegers3 method like this:

int how_many_did_we_get = 0;
int myIntegers[5];
pTest->ProduceIntegers3(5, &how_many_did_we_get, myIntegers);
cout << how_many_did_we_get << " integers returned.";

Figure 16-3. The marshaled form of a varying array.

Generally, conformant arrays are most useful for [in] parameters, and open arrays work best for [out] and [in, out] parameters. Like varying arrays, open arrays must be reconstructed in the second block of memory allocated on the receiving side. The flexibility of varying and open arrays comes with a memory and performance penalty commensurate with the size of the buffer, as shown in Figure 16-4.

Figure 16-4. The marshaled form of an open array.

Character Arrays

In C++, a string is fundamentally expressed as an array of characters with a 0 byte indicating the end of the string. Therefore, you can simply pass a string parameter in the same way as other arrays, as shown here:

HRESULT SendString1([in] int cLength, [in, size_is(cLength)] 
    wchar_t* myString);

The client calls this function as follows:

wchar_t wszHello[] = L"Inside COM+";
pTest->SendString1(wcslen(wszHello), wszHello);

Because passing string parameters is such a common programming practice, IDL offers the [string] attribute to make this job easier. IDL can automate the process of calling the appropriate string length function based on the knowledge that all strings end with a null character. With the [string] attribute, the SendString2 method declaration looks like this:

HRESULT SendString2([in, string] wchar_t* myString);

The client calls this function as shown here:

wchar_t wszHello[] = L"Inside COM+";
pTest->SendString2(wszHello);

A potential problem arises when the [string] attribute is combined with the [in, out] attributes, as shown in the following code:

HRESULT SendReceiveString1([in, out, string] wchar_t* 
    myString);

See whether you can find the error in the following code:

// Client-side usage
wchar_t wszHello[256] = L"Inside COM+";
pTest->SendReceiveString1(wszHello);
wprintf(L"Received string: %s\n", wszHello);     // ???

// Server-side implementation of the function
HRESULT CObject::SendReceiveString1(wchar_t* myString)
{
    wprintf(L"Received string: %s\n", myString); // Inside COM+
    wcscpy(myString, L"Nice weather today");     // Uh-oh
    return S_OK;
}

This code works fine as long as the length of the string returned by the SendReceiveString1 method is smaller than or equal to the length of the string sent. In this sample, however, the method returns a string several characters longer than that sent by the client. This might not seem like a problem because the client has allocated a buffer of 256 characters, which is more than enough to hold a few extra characters. Nevertheless, the [string] attribute used by the SendReceiveString1 method tells the system to compute the length of the client's string and then allocate only the minimum amount of memory needed on the server side. The result is that the implementation on the server side writes past the end of the character array and onto random bits of memory.

We can correct this problem using the following IDL declaration:

HRESULT SendReceiveString2([in] int cMax, [in, out, string,  
    size_is(cMax)] wchar_t* myString);

The client calls the function as shown here:

wchar_t wszHello[256] = L"Inside COM+";
pTest->SendReceiveString2(256, wszHello);
wprintf(L"Received string: %s\n", wszHello);

Even this solution is incomplete because the client still must specify the maximum size of the buffer. Yet the exact amount of memory necessary for the result can be determined only inside the method itself. If the client guesses too small a value, the server is out of luck. To avoid these dire straits, robust interfaces usually force the method's implementation to allocate space for the [out] buffer. This dynamically allocated buffer is then returned to the client, where it must later be freed. Thus, the IDL declaration of the method is simplified, as shown here:

HRESULT SendReceiveString3([in, out, string] wchar_t** 
    myString);

But the implementation of the SendReceiveString3 method is made more complex by the addition of the memory allocation call, as shown in the following example:

// Server-side implementation of the function
HRESULT CObject::SendReceiveString3(wchar_t** myString)
{
    wprintf(L"Received string: %s\n", *myString);
    CoTaskMemFree(*myString);

    wchar_t returnString[] = L"Nice weather today";
    *myString = (wchar_t*)CoTaskMemAlloc(
        (wcslen(returnString)+1)*sizeof(wchar_t));
    wcscpy(*myString, returnString);
    return S_OK;
}

The client-side code is responsible for freeing this memory, as shown here:

wchar_t* wszHello = L"Inside COM+";
wchar_t* myString = (wchar_t*)CoTaskMemAlloc(
    (wcslen(wszHello)+1)*sizeof(wchar_t));
wcscpy(myString, wszHello);
pSum->SendReceiveString3(&myString);

wprintf(L"Received string: %s\n", myString);
CoTaskMemFree(myString);

Multidimensional Arrays

IDL can also deal with multidimensional arrays. The biggest problem with marshaling multidimensional arrays is their ambiguous nature in C++. Take a look at the following declaration:

int** test;

In C++, this declaration can have a number of possible meanings. Is test a pointer to a pointer to an integer? A pointer to a pointer to an array of integers? A pointer to an array of pointers to integers? A pointer to an array of pointers to integer arrays? The intended use of test is unclear because as far as C++ is concerned, it doesn't make any difference. When you marshal calls between different address spaces, however, these distinctions become crucial. IDL manages these subtle differences by using a special syntax. The size_is and length_is attributes can work with a variable number of arguments, each indicating the conformance and variance of one level of indirection.

The simplest case is a pointer to a pointer to a single integer, because if no additional attributes are specified, IDL assumes a pointer to a single element. This situation is illustrated by the following IDL declaration:

The second possibility is that of a pointer to a pointer to an array of integers. In this case, you can use the following IDL declaration, in which the first level of indirection is omitted and the default value of 1 is used. Note that the size_is values are read right to left; the rightmost value affects the rightmost pointer of the parameter.

A pointer to an array of pointers to integers is declared using the IDL syntax below.

Last, a pointer to an array of pointers to integer arrays defines both positions of the size_is attribute, as shown here:

Passing Arrays of User-Defined Types from Visual Basic

From Visual Basic, you can call a method of an interface that accepts an array of structures without using the safe array type described in Chapter 5. For example, the following interface definition contains a method that accepts an array of MYTYPE structures. The MYTYPE structure is also defined in the interface definition.

interface ISum : IUnknown
{
    typedef struct MYTYPE
    {
        short a;
        long b;
    } MYTYPE;

    HRESULT Sum([in] long cMax, [in, size_is(cMax)] MYTYPE* 
        myarray);
}

If you assume that an arbitrary coclass has implemented the ISum interface, calling the ISum::Sum method from Visual Basic is not difficult. The trick is to pass the array as if you were passing only the first element of that array. For example, if the array is named myarray, instead of passing the parameter by using the standard Visual Basic notation myarray(), you would use myarray(0), as shown here:

Private Sub Command1_Click()
    Dim myRef As New InsideCOM
    Dim myarray(3) As MYTYPE
    myarray(0).a = 1
    myarray(0).b = 5
    myarray(1).a = 2
    myarray(1).b = 4
    myarray(2).a = 3
    myarray(2).b = 3
    myRef.Sum 3, myarray(0)    ' myarray(0) makes arrays work!
End Sub