Staff Site Reliability Engineer

US-TX-Plano

Attract-careers1

Req #: 93187
Type: Fulltime-Regular

DISH

Connect With Us:
Connect To Our Company
				Overview:

Our Technology teams challenge the status quo and reimagine capabilities across industries. Whether through research and development, technology innovation or solution engineering, our team members play a vital role in connecting consumers with the products and platforms of tomorrow.

Responsibilities:

Candidates must be willing to participate in at least one in-person interview, which may include a live whiteboarding or technical assessment session.

Key Responsibilities:

* Implement and maintain monitoring, alerting, observability, and distributed tracing solutions to improve system health visibility, MTTD, and proactive issue detection.

* Lead incident response efforts, participate in on-call rotations, conduct root cause analysis, and facilitate blameless post-mortems to drive continuous improvement and reduce MTTR.

* Develop, document, and maintain incident response procedures, runbooks, and SOPs to ensure rapid, consistent responses to critical incidents.

* Define, measure, and report SLIs/SLOs for retail wireless and supporting applications, implementing SRE best practices like error budgets and chaos engineering to enhance system reliability.

* Drive automation initiatives by developing tools and solutions to streamline operational tasks, reduce manual effort, and eliminate toil.

* Collaborate across teams to embed reliability into the software development lifecycle, ensure production readiness of new features, optimize performance, plan capacity, and mentor junior engineers.

Qualifications:

Education and Experience:

* Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience
* 6+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role in a fast-paced, high-availability environment. Experience in the telecommunications or retail industry is a strong plus

Skills and Qualifications:

* Strong experience with monitoring and observability tools, preferably Dynatrace, along with Prometheus, Grafana, Splunk, ELK Stack, Datadog, or AppDynamics.

* Proficient in programming and scripting languages relevant to SRE work, such as Python, Go, Java, Ruby, or Bash, with strong SQL skills for relational and NoSQL databases like Oracle, Cassandra, PostgreSQL, and MySQL.

* Extensive experience with cloud platforms, preferably AWS, and familiarity with services from Azure or Google Cloud; certifications like AWS Solutions Architect Associate are a plus.

* Solid understanding of microservices, REST APIs, and containerization technologies such as Docker; Kubernetes experience is preferred.

* Familiar with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI, CircleCI), and experience with retail wireless systems such as billing and activation platforms.

* Strong problem-solving, analytical, and debugging skills, with excellent communication and collaboration abilities; adaptable to dynamic environments and willing to support on-call rotations and weekend coverage.
			
Share this job: