Wednesday, March 18, 2015

Performance optimization in C++ based Server using Message / Packet Pool

          In any application server, allocating/deallocating memory whiles each and every message processing will consume more CPU cycle. It will degrade the server performance.
To solve this problem, need to create Message pool during server startup and use the messages from pool while processing the requests. Application Server can process different type of messages. So the message pool needs to be more generic. This problem also can be solved by OOAD approach.
Problem Definition:-
          Memory allocation and deallocation always time consuming process in any programing language and Operating system.
          Memory allocation/deallocation system calls (malloc /free) will take more times because of the following reasons,
          -   Searching for a suitable free block among the previously freed blocks.
          -   Some time fragmentation needed in malloc/free.
-      Even malloc() is called from multiple threads, there must be some kind of synchronization on global structure maintain by kernal.

Another problem is if the incoming traffic is high and server takes more time to process the request then it will lead to memory exhaustion. It will affect the other modules which are running on same process/program.

Design Approach
          We are not improving malloc/free system call’s performance in this approach. But we can reduce the no.of times calling these functions in our application. The idea is instead of allocating memory in each and every message processing, allocate big chunk memory during startup time and reuse the same memory chunk in processing time. The application server can handle different type of messages. Each message has their own unique business logic in their processMessage.
Below class diagram and sample implementation describes the OOAD design to create generic Message pool in startup and the same used in request processing time.

Implementation Approach
            We can implement this approach in any OO language which is supports operator overloading. The below sample code implement this approach using new operator overload using C++.
          In the startup (BaseMessage::ConfigurePool ) method , allocate big chunk of memory and split the memory into BaseMessage class size and put into pool ( Queue implementation using list ).
// Startup time calls
static void BaseMessage::ConfigurePool()
    s_iMessageClassSize = sizeof(BaseMessage);

    // read it from user config
    s_iMaxMessagePoolSize = MAX_POOL_SIZE;
    s_iPayloadSize = BUFF_SIZE;

    // CAL_SIZE = PoolSize * ( BaseMessage class Size +PayloadBuffer size) ;
    // one time allocation
    s_pMemorychunks = new BYTE [ CAL_SIZE ];
    // split the BaseMessage memroy chunks from allocated memory chunk
    for( BYTE* p = s_pMemorychunks; p < s_pMemorychunks + CAL_SIZE;  )
            BaseMessage *pMessage = ( BaseMessage*) p;
            p += s_iMessageClassSize;
            pMessage->m_pPayload  = (BYTE*) p;
            s_plstMessagePool->addLast( pMessage );
            p += s_iPayloadSize;

static void BaseMessage::UnConfigurePool()
   delete s_plstMessagePool;
   delete[] s_pMemorychunks;

While receiving (BaseMessage:: DataAvailabletoRead() )the request , need to create base class object ( BaseMessage ) using the overload new. So here the memory allocation will not happen. It just return BaseMessage size pointer from pool. But the corresponding constructor called by new operator.
          Once the read done from socket, need to create the derived class object ( BaseMessage::CreateMessage )  using anther overload (BaseMessage*) new operator. Here the corresponding derived class constructor called by new operator and vptr also gets update. After that, the corresponding derived class ProcessMessage gets called if we call the ProcessMessage function.
// On Message processing time calls
static void BaseMessage:: DataAvailabletoRead()
    // call this function from select
    BaseMessage *pMessage = new BaseMessage(); // take it from pool.


static BaseMessage* BaseMessage::CreateMessage(BaseMessage* pOrgMessage)
    BaseMessage *pMessage = 0;

  switch( pOrgMessage->type )
  case HTTP:
       pMessage = new (pOrgMessage) HTTPMessage();
    // call new ( size, BaseMessage*)
  case DNS:
       pMessage = new (pOrgMessage) DNSMessage();
  case HTTPS:
       pMessage = new (pOrgMessage) HTTPSMessage();
  case DIAMETER:
      pMessage = new (pOrgMessage) DiameterMessage();
  case RADIUS:
      pMessage = new (pOrgMessage) RadiusMessage();
  case DHCP:
      pMessage = new (pOrgMessage) DHCPMessage();
    return pMessage;
// Operator overload functions
void* BaseMessage::operator new( int size )

    BaseMessage *pMessage = s_pFreeDiaList->removeFirst();
    if( ! pMessage)
        // Message Pool is full.
        // handle the scenrio
    return pMessage;

void* BaseMessage::operator new( int size, BaseMessage* pOrgMessage );
    // skip the memory allocation, only update vptr
    return pOrgMessage;

void BaseMessage::operator delete( void* pData, size_t )
    // return the message chunk into pool
    BaseMessage *pMessage = (BaseMessage*) pData;
    s_plstMessagePool->addFirst( pMessage );

bool BaseMessage::ReadMessage()
    // read the data from socket
    // check the type based on update the pointer with Child class object
    BaseMessage *pActualMessage;
    pActualMessage = BaseMessage::CreateMessage( this );

     // then pass the ChildClass::ProcessMessage into worker thread


virtual bool BaseMessage::ProcessMessage()

 // all drivered class should implement ProcessMessage()
bool DNSMessage::ProcessMessage()
   // Do the actual business logic
   // Construct the Response message OR set the route and fwd it to another server
   WriteMessage(); // SendResponse()

Conclusion and Recommendations

          Using this approach, we can improve the server performance and avoid run time memory exhaustion.
          We hope, in Linux (VM) machine this approach will give 5x to 7X performance improvement in sample application. Anyway this number will vary based on how many time memory allocation/deallocation happening while processing the incoming message in your application.
          This problem and the solution can be applicable for the following application level products in network/telecommunication domain.
-      Load Balancer
-      Routing Agent
-      AAA Server
-      Billing Server
-      Amplifier/ Re-Director
-      Web Server
-      Policy Manager
Message pool size and payload buffer size should be configurable. We need to give the suitable value for this property if the product is 32-bit application.
Instead of having single Message pool for the entire application, we can create multiple Message pool per Clients/RemoteServers. It will avoid the entire Message pool exhaustion if any one of the remote server respond very slow.