Although a distributed system can offer excellent availability (that is, little downtime) and protection against catastrophic failure, the probability of a failure somewhere in the system is far greater due to the higher complexity. From a client's perspective, a failure of either the network or the server is identified by the failure of a remote method call. In such cases, an HRESULT value such as RPC_S_SERVER_UNAVAILABLE or RPC_S_CALL_FAILED is returned.
A more complex situation exists on the server side if a client fails. Depending on the type of server, the failure of a client might or might not wreak havoc on the server. For example, a stateless object that always remains running and simply gives out the current time to any client that asks is not affected by the loss of a client process. However, any object that maintains state for its clients is obviously very interested in the death of those clients. Such objects typically have a method, such as ByeByeNow, that clients call before exiting. However, if the client process or a portion of the network fails, the client might not have the opportunity to notify the object of its intentions. This failure leaves the server in an unstable state because it is maintaining information for clients that might no longer exist.
RPC deals with this situation by using a logical connection called a context handle between the client and the server processes. If the connection between the two processes is broken for any reason, a special function called a rundown routine can be invoked on the server side to notify the server that a client connection has been broken. For performance reasons, COM+ does not leverage the functionality of RPC context handles. Instead, the ORPC protocol defines a pinging mechanism that determines whether a client is still alive. RPC context handles are implemented using a pinging mechanism as well, but because of the special needs of COM+, they are not suitable. A pinging mechanism is quite simple. Every so often, the client sends a ping message to an object saying, "I'm alive, I'm alive!" If the server does not receive a ping message within a specified period of time, the client is assumed to have died and all its references are freed.
This simplistic type of pinging algorithm is not sufficient for COM+ because it leads to too much unnecessary network traffic. In a distributed environment that includes hundreds, thousands, or hundreds of thousands of clients and components, network capacity can be overwhelmed simply by the number of ping messages being transmitted. To reduce the network traffic devoted to ping messages, COM+ relies on the OXID Resolver on each client machine to detect whether its local clients are alive and then send single ping messages on a per-machine basis instead of a per-object basis. This means that the client machine's OXID Resolver sends only one ping message to each computer that is serving its clients.
Even with only one message being sent to each computer, ping message traffic can still grow quite hefty—the ping data for each OID is 16 bytes. For example, if a client computer holds 5000 object references to an object running on another machine, each ping message is approximately 78 KB! To further reduce the amount of network traffic, ORPC includes a special mechanism called delta pinging. The idea behind delta pinging is that a server often has a relatively stable set of objects that are used by clients. Instead of including data for each individual OID in the ping message, delta pinging stipulates that a set of OIDs can be pinged by a single identifier called a ping set that refers to all the OIDs in that set. When delta pinging is employed, the ping message for five OIDs is the same size as a message for 1 million OIDs.
To establish a ping set, the client calls the IOXIDResolver::ComplexPing method. The AddToSet parameter of the ComplexPing method accepts an array of OIDs that should define the ping set. Once defined, all the OIDs in the set can be pinged simply by calling IOXIDResolver::SimplePing and passing it the ping set identifier (SETID) value returned from the ComplexPing method. If necessary, the ComplexPing method can be called again at any time to add or remove OIDs from the ping set.
To clean up after broken connections between a client and a server, the pinging mechanism uses a reclaiming process called garbage collection. The pinging mechanism activates garbage collection based on two values—the time that should elapse between each ping message and the number of ping messages that must be missed before the server can consider the client missing in action. The product of these two values determines the maximum amount of time that can elapse without a ping message being received before the server assumes that the client is dead.
By default, the ping period is set to 120 seconds; three ping messages must be missed before the client can be presumed dead. Currently, the user cannot change these default values. Thus, 6 minutes (3 × 120 seconds) must elapse before a client's references are implicitly reclaimed. Whether the server immediately reclaims the object references held by a client once the timeout has occurred is considered an implementation-dependent detail of the ORPC specification. In fact, if the server does not reclaim those references and later begins receiving ping messages from the heretofore-assumed-dead client, it can infer that whatever problem prevented the ping messages from being received has been fixed.
Some stateless objects, such as the time server example discussed at the beginning of this section, have no need for the COM+ garbage collection mechanism. These objects usually run forever and don't really care about a client after a method call has finished executing. For such objects, you can switch off the pinging mechanism by passing the MSHLFLAGS_NOPING flag to the CoGetStandardMarshal function. The following code fragment (with the relevant code in boldface) shows how to use the MSHLFLAGS_NOPING flag in an implementation of the IClassFactory::CreateInstance method:
IMarshal* pMarshal = NULL; HRESULT CFactory::CreateInstance(IUnknown *pUnknownOuter, REFIID riid, void** ppv) { if(pUnknownOuter != NULL) return CLASS_E_NOAGGREGATION; CObject *pObject = new CObject; if(pObject == NULL) return E_OUTOFMEMORY; IUnknown* pUnknown; pObject->QueryInterface(IID_IUnknown, (void**)&pUnknown); CoGetStandardMarshal(riid, pUnknown, 0, NULL, MSHLFLAGS_NOPING|MSHLFLAGS_NORMAL, &pMarshal); pUnknown->Release(); // Call QueryInterface, which typically is for // IID_IUnknown. HRESULT hr = pObject->QueryInterface(riid, ppv); pObject->Release(); return hr; } |
Just before exiting, the object should execute the following code to free the standard marshaler:
pMarshal->DisconnectObject(0); pMarshal->Release(); |
Note that objects for which the MSHLFLAGS_NOPING flag has been specified never receive calls to their IUnknown::Release methods. Clients can call Release, but such calls are not remoted to the object itself. Due to the highly efficient delta pinging mechanism used by COM+, turning off this mechanism for an object does not cause a corresponding reduction in network traffic. As long as other objects on the server require ping messages, ORPC must send a ping message to the server machine by calling the IOXIDResolver::SimplePing method. The only difference is that the object that specified the MSHLFLAGS_NOPING flag is not added to the SETID that is being pinged.
With an understanding of the ORPC network protocol under our belts, let's examine the data transmitted across the network during an actual remote method invocation. Figure 19-11 shows the request PDU sent when the client process calls the ISum::Sum method. Immediately following the ORPCTHIS structure are the x and y inbound parameters of the Sum method. Here the Sum method has been called with the values 4 and 9.
Figure 19-11. The request PDU transmitted when the client calls ISum::Sum with the parameters 4 and 9.
After the Sum method executes on the server, the response PDU is generated and sent back to the client. Clearly visible following the ORPCTHAT structure in the response PDU is the outbound value of 13 (4 + 9), as shown in Figure 19-12.
Figure 19-12. The response PDU transmitted when the component returns the value 13 after executing the ISum::Sum method.