During my carrier most of the time I had the opportunity to work on high-performance and high-availability systems. Data trickling is a technique that I came across, architect and implemented to address the performance issues. I'm going to explain about this pattern in the following paragraphs.
Problem : Transaction Time = Processing Time + Persistent Time
Based on the above formula time taken to persist data effect the transaction time. Data persistence in RDBMS systems are costly. Per transaction multiple tables can update and if the RDBMS system has tuned for read access by using indexes the updates will take more time. RDBMS systems keep separated from the application servers by using various distributed deployment architectures, with the distributed systems a data persistence call will go through the network and network latency will add to the transaction time as well. A success full transaction finish once the data stored into the persistence storage, without persistence system cannot guarantee any recovery after a failover/ bad state.
Solution : Store the transaction object as it is to a temporary store and update the RDBMS as a parallel batch process. Complete the transaction after the data stored in the intermediate storage.
Trickle Server: trickle server is a intermediate persistent storage and a persistent manager between the business logic and the RDBMS.
Trickle server will keep a persistent storage to do the immediate data updates most of the time it will be a flat file system or a lightweight database like Derby or HSQL.
Implementation : Use flat file system as the trickle store and implement transaction management using check-points. Periodically read the flat file system and do a batch update of data to the RDBMS by updating the relevant table(s) or stored procedure(s).Based on the above formula time taken to persist data effect the transaction time. Data persistence in RDBMS systems are costly. Per transaction multiple tables can update and if the RDBMS system has tuned for read access by using indexes the updates will take more time. RDBMS systems keep separated from the application servers by using various distributed deployment architectures, with the distributed systems a data persistence call will go through the network and network latency will add to the transaction time as well. A success full transaction finish once the data stored into the persistence storage, without persistence system cannot guarantee any recovery after a failover/ bad state.
Solution : Store the transaction object as it is to a temporary store and update the RDBMS as a parallel batch process. Complete the transaction after the data stored in the intermediate storage.
Trickle Server: trickle server is a intermediate persistent storage and a persistent manager between the business logic and the RDBMS.

Trickle server will keep a persistent storage to do the immediate data updates most of the time it will be a flat file system or a lightweight database like Derby or HSQL.
Optimize : Divide the flat files in to multiple files by keeping a size overflow factor. Write data as binary. Use solicited hard drives to optimize the IO calls. Sync the intermediate store with a remote backup to overcome fail-over situations.
Optional : To optimize the processing time by minimizing the RDBMS calls for the processing use a cache or a internal-store to store data in-memory. Fill the internal-store on the server startup and use parallel and demand loading to reduce the server startup time.

0 comments:
Post a Comment