Join to apply for the Principal Site Reliability Engineer role at Groupon Groupon is a marketplace where customers discover new experiences and services every day and local businesses thrive. To date, we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. We stand out as one of the few platforms committed to helping local businesses succeed on a performance basis. Groupon is on a journey to transform our business with a relentless pursuit of results. Despite our global presence, we maintain a culture that inspires innovation, rewards risk-taking, and celebrates success. We offer resources and scale, combined with autonomy and impactful work. Principal Site Reliability Engineer Role Overview: Are you ready to elevate your expertise and impact the reliability and scalability of mission-critical systems? As a Principal SRE (Level V/VI), you will ensure our platforms' performance, availability, and resilience. You will lead initiatives to redefine operational excellence, collaborate across teams, implement cutting-edge technologies, foster a culture of reliability, and mentor engineers. This is a unique opportunity for someone passionate about solving complex challenges and shaping platform reliability. Key Responsibilities: Architect and maintain fault-tolerant systems, ensuring uptime SLAs of 99.9% or higher. Drive automation in infrastructure management and deployment using tools like Terraform, Ansible, Kubernetes. Create and optimize CI/CD pipelines for reliable, secure software delivery. Build and enhance observability solutions, including monitoring, logging, and alerting systems using Prometheus, Grafana, and ELK stack. Collaborate with stakeholders to define and achieve SLIs, SLOs, and error budgets aligned with business needs. Lead incident response during on-call rotations, ensuring rapid resolution and root cause analysis. Design and execute performance testing, capacity planning, and scalability strategies. Identify and resolve bottlenecks to improve system performance and developer efficiency. Mentor junior engineers, fostering a collaborative, growth-oriented environment. Guide architectural decisions to drive innovation and reliability. Qualifications: 10+ years in systems engineering, with at least 5+ years in SRE or DevOps roles. Expertise in cloud platforms (GCP, AWS) and container orchestration (Kubernetes, Docker). Proficiency in programming/scripting languages like Python, Go, Bash. Advanced knowledge of Infrastructure as Code (IaC) tools such as Terraform and Ansible. Deep understanding of networking, DNS, load balancing, security principles. Proven track record managing high-availability systems in demanding environments. Exceptional analytical and problem-solving skills. Preferred Qualifications: Certifications in cloud or container technologies (e.g., AWS/GCP/Azure, Kubernetes CKA). Experience in industries like eCommerce, FinTech, SaaS. Familiarity with Agile development processes and frameworks. What We Offer: The opportunity to work with cutting-edge technologies in a transformative environment. A collaborative and innovative work culture that values your expertise and contributions. Professional growth and leadership development pathways. A chance to leave a lasting impact on system reliability and scalability. Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world! Groupon’s purpose is to build strong communities through thriving small businesses. Learn more about us and our DEI approach. If this sounds like a great fit, click apply and join us on a mission to be the destination for local experiences and services. Beware of Recruitment Fraud: Groupon follows a merit-based recruitment process without charging job seekers any fees. Beware of fake job postings and fraudulent interviews. Always check our official career site at grouponcareers.com for legitimate opportunities. #J-18808-Ljbffr