Site Reliability Engineer

US-VA-McLean

External

Req #: 8728
Type: Regular Full-Time
logo

Credence Management Solutions, LLC

Connect With Us:
				Overview:

Credence is seeking a Site Reliability Engineer to support a task order within GSA COMET II.

Responsibilities:

Qualifications:

*  Bachelor's/Masters degree in computer science or other highly technical, scientific discipline
* Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
* Experience with cloud storage technologies as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
* A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
* 5+ years of experience with Cloud Architecture, preferably AWS
* 10+ years of experience with Operations of enterprise systems with over million users
* 10+ years of experience with application development
* 5+ years of experience in DevSecOps
* 3+ years of experience with microservices
* 5+ years of experience leading teams
* 3+ years of experience with agile Role & Responsibilities
* Run the production environment by monitoring availability and taking a holistic view of system health
* Build software and systems to manage/operate platform infrastructure and applications
* Improve reliability, quality, and time-to-market of our suite of software solutions
* Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
* Provide primary operational support and engineering for multiple large distributed software applications
* Ensure Production readiness for releases which includes Performance/Usability Testing
* Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
* Partner with development teams to improve services through rigorous testing and release procedures
* Participate in system design consulting, platform management, and capacity planning
* Create sustainable systems and services through automation and uplifts
* Balance feature development speed and reliability with well-defined service level objectives
* Production incidents RCAs and Conducting post-incident reviews
* Optimizing on-call rotations and processes
* Constant upkeep of documentation and runbooks
* Required to have US citizenship with the ability to obtain and maintain a Clearance if required.
			
Share this job: