A method and system for fast failure detection in a distributed computer system. The method includes executing a distributed computer system having a plurality of clusters comprising at least a first cluster, a second cluster and the third cluster, and initializing failure detection by creating a connected cluster list in each of the plurality of clusters, wherein for each one of the plurality of clusters, a respective connected cluster list describes others of the plurality of clusters said each one is communicatively connected with. A status update message is sent upon changes in connectivity between the plurality of clusters, and generating an updated connected cluster list in each of the plurality of clusters in accordance with the status update message. The method then determines whether the change in connectivity results from a cluster failure by examining the updated connected cluster list in each of the plurality of clusters.
Oracle Mar 2010 - Jul 2012
Project Lead
Symantec Jul 2004 - Mar 2010
Senior Software Engineer
Ibm Aug 2000 - Jul 2001
Software Engineer
Apple Aug 2000 - Jul 2001
Technical Architect
Education:
University of Southern California 2004 - 2004
Master of Science, Masters, Electrical Engineering
Delhi Public School - Ghaziabad
Cummins College of Engineering
Cummins College of Engineering For Women, Pune
Bachelor of Engineering, Bachelors, Computer Engineering
Department of Technology, Savitribai Phule Pune University
Bachelor of Engineering, Bachelors, Computer Engineering
Skills:
Disaster Recovery Testing Cluster Distributed Systems High Availability Unix Solaris C++ Sql Replication Veritas Cluster Server Hibernate Junit Soa Java Enterprise Edition Emc Storage Solutions Restful Webservices Spring Framework Web Services