Data Engineering manager
IN-Pune
Global Careers (External)
Req #: 10455
Type: Full-Time
|
Overview: Data Engineer Manager Responsibilities: Data Engineering Leadership * Lead and mentor a team of data engineers in developing and managing scalable, secure, and high-performance data pipelines. * Define best practices for data ingestion, transformation, and processing in a Lakehouse architecture. * Drive automation, performance tuning, and cost optimization in cloud data solutions. Cloud Data Infrastructure & Processing * Architect and manage AWS-based big data solutions (EMR, EKS, Glue, Redshift). * Design and maintain Apache Airflow workflows for data orchestration. * Optimize Spark and distributed data processing frameworks for large-scale workloads. * Implement streaming solutions (Kafka, Kinesis, Flink) for real-time data processing. AI/ML & Advanced Analytics * Collaborate with Data Scientists and AI/ML teams to build and deploy machine learning models using AWS SageMaker. * Support feature engineering, model training, and inference pipelines at scale. * Enable AI-driven analytics by integrating structured and unstructured data sources. Business Intelligence & Visualization * Support BI and reporting teams with optimized data models for Amazon QuickSight and other visualization tools. * Ensure efficient data aggregation and pre-processing for interactive dashboards and self-service analytics. * Design, develop, and maintain middleware components that facilitate seamless communication between data platforms, applications, and analytics layers. Master Data Management (MDM) & Governance * Implement MDM strategies to ensure clean, consistent, and deduplicated data. * Establish data governance policies for security, privacy, and compliance (GDPR, HIPAA, etc.). * Ensure adherence to data quality frameworks across structured and unstructured datasets. Collaboration & Strategy * Partner with business teams, AI/ML teams, and analysts to deliver high-value data products. * Define and maintain data architecture strategies aligned with business goals. * Enable real-time and batch processing for analytics, reporting, and AI-driven insights. Technical Expertise: * Extensive AWS experience with services such as EMR, EKS, Glue, Redshift, S3, Lambda, and SageMaker. * Proficient in big data processing frameworks (e.g., Spark, Hive, Presto) and Lakehouse architectures. * Skilled in designing and managing Apache Airflow workflows and other orchestration tools. * Solid understanding of Master Data Management (MDM) and data governance best practices. * Proficient with SQL & NoSQL databases (e.g., Redshift, DynamoDB, PostgreSQL, Elasticsearch). * Middleware Development - Proven expertise in building middleware components like REST API that integrate data pipelines with applications, analytics platforms, and real-time systems. * Hands-on experience with Gitlab CI/CD, Terraform, CFT, and Infrastructure-as-Code (IaC) methodologies. * Familiarity with AI/ML pipelines, model deployment, and monitoring using SageMaker. * Experience with data visualization tools, particularly AWS QuickSight, for business intelligence. Qualifications: Experience with Lakehouse frameworks (Glue Catalog, Iceberg, Delta Lake). Expertise in streaming data solutions (Kafka, Kinesis, Flink). In-depth understanding of security best practices in AWS data architectures. Demonstrated success in driving AI/ML initiatives from ideation to production. Educational Qualification: * Bachelor's degree or higher (UG+) in Computer Science, Data Engineering, Aerospace Engineering, or a related field. * Advanced degrees (Master's, PhD) in Data Science or AI/ML are a plus.