OBSERVABILITY SPECIALIST

Remote
Kastech Software Solutions Group


Responsibilities: * Design and implement comprehensive observability strategies and architectures for AWS cloud environments, including metrics, logs, and distributed tracing. * Configure and maintain observability tools and platforms, ensuring their proper integration with our systems and applications (cloud native and monolithic) * Develop custom dashboards and alerts to monitor key performance indicators (KPIs) and overall system health. * Automate the deployment and management of observability infrastructure using Infrastructure as Code (IaC) tools. * Work closely with development, operations, and engineering teams to understand their observability needs and provide effective solutions. * Participate in incident resolution, providing observability data and analysis to identify root causes and facilitate recovery. * Implement and manage observability solutions specifically for containerized environments and orchestration with Elastic Kubernetes Service (EKS). * Evaluate and recommend new observability tools and technologies to enhance our capabilities. * Document observability configurations, processes, and best practices. * Train and support other teams in the use of observability tools and techniques. * Stay up-to-date on the latest trends and best practices in observability and cloud technologies. Requirements: * Cloud Knowledge and Experience (AWS): * Proven experience minimum 6 - 8 years working with the Amazon Web Services (AWS) cloud platform. * In-depth knowledge of AWS services relevant to observability, such as CloudWatch (Logs, Metrics, Alarms), X-Ray, and potentially others like AWS Observability Service. * Understanding of the architecture and design principles of applications in the AWS cloud. * Infrastructure as Code (IaC): * Practical experience in deploying and managing infrastructure using Infrastructure as Code (IaC) tools such as Terraform, or similar. * Ability to write, maintain, and improve IaC code to automate the creation and configuration of observability infrastructure. * Elastic Kubernetes Service (EKS): * Significant experience in the deployment, management, and observability of containerized applications using Amazon EKS. * Deep understanding of Kubernetes concepts and its interaction with AWS. * Hands-on experience configuring observability tools specifically for Kubernetes environments, such as Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), Jaeger, etc., within EKS. * General Observability Experience: * Solid understanding of observability principles and best practices (metrics, logs, distributed tracing). * Experience with various observability and monitoring tools. * Ability to develop effective dashboards and alerts based on observability data. * Capacity to analyze observability data to identify performance and availability issues. Additional Technical Skills: * Ability to develop scripts and automate tasks using languages such as Python, Bash, etc. * Knowledge of Linux operating systems. * Familiarity with Agile and DevOps methodologies. * Interpersonal Skills: * Strong problem-solving skills and the ability to analyze complex data. * Excellent communication and collaboration skills. * Ability to work independently and as part of a team. Nice to have * Relevant AWS certifications (e.g., AWS Certified DevOps Engineer – Professional). * Experience with other container orchestration platforms (e.g., vanilla Kubernetes). * Knowledge of Site Reliability Engineering (SRE) principles. * Experience in implementing Service Level Objectives (SLOs) and Service Level Indicators (SLIs).

trabajosonline.net © 2017–2021
Más información