James Sanford https://froop.com/ https://github.com/jamessanford/ Overview: Linux/UNIX Systems Engineer Automation of large Linux and container based "cloud" infrastructures. Troubleshooting complex systems, from source code to protocol to packets. Programming fluency in Go, Python, bash. Professional Experience: Systems Reliability Team CloudFlare, San Francisco, CA April 2015 - July 2016 - Rewrote parts of metrics pipeline and functionality probing. (Go, Python) - Transition monitoring system to Prometheus (timeseries and rule based), replacing legacy Nagios system. - Improved automation when bringing up new bare metal servers. (Python) - Capacity estimation tooling and dashboard. (Go, Grafana) - Oncall rotation for production HTTP and DNS servers. Site Reliability Engineering Google Inc, Mountain View, CA October 2005 - July 2011 - Borg SRE team, worldwide task scheduling and machine deployments. - Developed C++ simulation of Borg task scheduling system. This allowed measurement and prediction of capacity and risk, testing of datacenter-scale changes, and experiments in binpacking. A team was created to build out software based on this simulation. - Backend API to manage resource allocation in clusters. (Python) - Coordination point between operations and engineering teams to assist with using the Borg infrastructure. - Oncall rotation for Borg service worldwide. Senior UNIX Systems Architect Critical Path, San Francisco, CA December 1999 - May 2002 January 2003 - March 2005 - Software deployment automation system in C and Perl, managing releases to hundreds of servers. - Oncall rotation for hosted email platform. References: Acknowledged in "Large-scale cluster management at Google with Borg" paper for work on task scheduling simulation. https://research.google.com/pubs/pub43438.html