Site Reliability Engineer
US-VA-McLean
External
Req #: 8728
Type: Regular Full-Time
Overview: Credence is seeking a Site Reliability Engineer to support a task order within GSA COMET II. Responsibilities: Qualifications: * Bachelor's/Masters degree in computer science or other highly technical, scientific discipline * Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript * Experience with cloud storage technologies as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn) * A proactive approach to spotting problems, areas for improvement, and performance bottlenecks * 5+ years of experience with Cloud Architecture, preferably AWS * 10+ years of experience with Operations of enterprise systems with over million users * 10+ years of experience with application development * 5+ years of experience in DevSecOps * 3+ years of experience with microservices * 5+ years of experience leading teams * 3+ years of experience with agile Role & Responsibilities * Run the production environment by monitoring availability and taking a holistic view of system health * Build software and systems to manage/operate platform infrastructure and applications * Improve reliability, quality, and time-to-market of our suite of software solutions * Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve * Provide primary operational support and engineering for multiple large distributed software applications * Ensure Production readiness for releases which includes Performance/Usability Testing * Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding * Partner with development teams to improve services through rigorous testing and release procedures * Participate in system design consulting, platform management, and capacity planning * Create sustainable systems and services through automation and uplifts * Balance feature development speed and reliability with well-defined service level objectives * Production incidents RCAs and Conducting post-incident reviews * Optimizing on-call rotations and processes * Constant upkeep of documentation and runbooks * Required to have US citizenship with the ability to obtain and maintain a Clearance if required.