Overview:
Our Technology teams challenge the status quo and reimagine capabilities across industries. Whether through research and development, technology innovation or solution engineering, our team members play a vital role in connecting consumers with the products and platforms of tomorrow.
Responsibilities:
Candidates must be willing to participate in at least one in-person interview, which may include a live whiteboarding or technical assessment session.
Your mission as a Lead Data Engineer involves maintaining and optimizing petabyte-scale data architectures within a complex enterprise cloud environment. This role ensures the continuous reliability and quality of data used for high-stakes business reporting by leading operational support and root cause analysis for critical ETL jobs. By integrating Generative AI tools and Infrastructure-as-Code, you will drive the next generation of data engineering efficiency across our company.
What Success Looks Like (Objectives)
* Monitor and provide high-level operational support for large enterprise data warehouse systems, resolving complex ETL failures and ensuring data quality
* Maintain and optimize scalable batch and streaming data pipelines on the AWS platform, leveraging S3, Glue, and Snowflake/Databricks for peak performance
* Lead incident management and root cause analysis initiatives to develop robust operational metrics and drive continuous improvement of production systems
* Partner with cross-functional Agile teams, including Data Scientists and DevOps, to implement sophisticated ETL/ELT transformation logic
* Manage CI/CD pipelines and Infrastructure-as-Code using GitLab while exploring Generative AI integration points like Amazon Q for pipeline optimization
* Participate in shift-based working hours and on-call support to guarantee the continuous reliability and performance of enterprise data systems
Qualifications:
Core Skills and Competencies (What you'll bring)
* Expertise in architecting high-performance data pipelines using PySpark and Spark SQL with a focus on cost-optimization and query performance tuning
* Advanced proficiency in developing parameterized SQL scripts and orchestration logic to automate sophisticated end-to-end data processing and business reporting
* Strategic mastery of AWS data services (EC2, EMR, GLUE, S3) and core cloud architecture components including VPC, IAM, CloudWatch, and Data Lake frameworks
* Professional expertise in designing and managing workflow orchestration using tools such as Control-M or Apache Airflow to support automated data processing
* Deep understanding of dimensional and 3NF data modeling concepts to ensure high-quality and performant data solutions
* AI Innovation skills to evaluate findings from POC initiatives and apply tools like Amazon Q, Gemini, or Databricks Genie Rooms to data engineering workflows
* Critical Experience supporting the production operations of large-scale Enterprise Data Warehouses and architecting petabyte-scale ingestion pipelines
Additional Qualifications
Successful candidates will typically have:
* Proven ability to evaluate and communicate findings from Proof-of-Concept (POC) initiatives to stakeholders
* Strong analytical and problem-solving skills with a track record of driving continuous improvement in production systems
Minimum Requirements
* Bachelor's degree in Computer Science or a related technical field
* 5+ years of professional experience in the operation and production support of large Enterprise Data Warehouses
* Hands-on experience with AWS data services and data platforms (Snowflake or Databricks)
* Proficiency with Git/GitLab for version control and CI/CD processes
* Knowledge of Infrastructure-as-Code tools such as Terraform or AWS CloudFormation
Share this job:
Share this Job