Gretel: Lightweight Fault Localization for Openstack

Publication
In CoNEXT'16

Like any other distributed system, cloud management stacks such as OpenStack, are susceptible to faults whose root cause is often hard to diagnose and may take hours or days to fix. We present GRETEL, a system that leverages nonintrusive system monitoring, to expedite root cause analysis of both operational and performance faults manifesting in OpenStack operations. GRETEL uses unique operational fingerprints to quickly identify faulty operations at runtime. GRETEL is accurate in its diagnosis, and achieves >98% precision in identifying the faulty operation with very few false positives even under conditions of stress. GRETEL is lightweight and orders of magnitude faster than prior work, sustaining a throughput of ∼77 Mbps.

Ayush Goel
Ayush Goel
Systems Research Scientist

My research interests include distributed systems, program analysis and (more recently) systems for ML.