Overview:
We are looking for an experienced Senior LLMOps Engineer to design, implement, and maintain production-grade large-language-model (LLM) pipelines, deployment architectures, and monitoring systems across enterprise environments. The Senior LLMOps Engineer will play a critical role in operationalizing generative AI capabilities, ensuring that LLM-based applications are scalable, secure, reliable, and compliant with emerging AI risk and governance frameworks. This role spans the spectrum of model deployment, orchestration, evaluation, and optimization.
Responsibilities:
* Architect and maintain scalable LLM and RAG pipelines, including model hosting, inference optimization, retrieval layers, and context management frameworks.
* Lead the design and implementation of secure GenAI infrastructure across cloud environments, ensuring reliability, performance, and cost efficiency.
* Build and manage automated evaluation systems that assess LLM output quality, safety, latency, and adherence to AI governance requirements.
* Develop CI/CD workflows tailored for LLM- and GenAI-based applications, including dataset versioning, model lineage, and automated testing of prompt and model behaviors.
* Collaborate with AI Product Engineers and Data Scientists to productionize LLM-based prototypes into enterprise-grade, maintainable systems.
* Integrate vector databases, model gateways, content filters, and guardrail frameworks into end-to-end LLM solutions.
* Implement observability and monitoring solutions that track performance metrics, hallucination rates, cost profiles, and user interaction patterns.
* Lead troubleshooting and root-cause analysis for issues related to LLM deployment, inference performance, or pipeline reliability.
* Stay current with emerging LLM architectures, inference optimizations, fine-tuning techniques, and relevant MLSecOps patterns.
* Ensure compliance with data privacy, ethical AI, and AI-governance frameworks throughout pipeline design and operations.
* Mentor junior engineers and contribute to Steampunk's AI engineering best practices, tooling, and reusable infrastructure patterns.
* You will contribute to the growth of our AI & Data Exploitation Practice!
Qualifications:
* Ability to hold a position of public trust with the U.S. government.
* Bachelor's and 8 years of experience.
* 5+ years of experience in software engineering, data engineering, MLOps, or cloud engineering, with 2+ years focusing specifically on LLM or GenAI operations.
* Strong experience deploying models using frameworks such as Hugging Face Transformers, vLLM, TensorRT-LLM, or similar.
* Proficiency in Python and operational tooling such as FastAPI, PyTorch, LangChain, LlamaIndex, and vector databases (FAISS, Milvus, Pinecone, or similar).
* Advanced knowledge of cloud platforms (AWS, Azure, GCP) including model hosting, distributed compute, and secure networking patterns.
* Hands-on experience building CI/CD pipelines, automated testing frameworks, and environment provisioning for AI/ML workloads.
* Experience with Docker, Kubernetes, and infrastructure-as-code (Terraform, CloudFormation).
* Familiarity with MLSecOps, AI governance, model hardening, prompt injection defenses, and content safety monitoring.
* Strong understanding of logging, observability, and performance profiling for high-throughput LLM inference systems.
* Excellent written and verbal communication skills, with the ability to explain trade-offs and architectural decisions to technical and non-technical stakeholders.
* Demonstrated ability to balance long-term platform thinking with hands-on operations and rapid problem solving.
* Experience working in agile teams and using modern project management tools.
Share this job:
Share this Job