OBSERVABILITY ENGINEER

80.000.000 - 120.000.000


Job Summary The Observability Engineer will design, implement, and maintain observability solutions for complex systems and applications. This role requires a solid understanding of monitoring and observability practices, as well as expertise in tools and technologies used to collect and analyze performance, logging, and metrics data. Responsibilities Monitoring Setup and Configuration: Configure monitoring tools to gather data from various systems, applications, and network components. Define metrics, configure data collection agents, and ensure proper connectivity and access. Alert Management: Monitor alerts, perform triage to identify critical issues, analyze alert patterns, and configure escalation workflows to ensure timely response and resolution. Performance Analysis and Troubleshooting: Use tool features to analyze metrics, logs, and traces. Conduct root cause analysis, troubleshoot issues, and identify areas for optimization. Incident Response: Collaborate across teams to respond to incidents quickly, handling triage, communication, and coordination with stakeholders. Participate in post-incident reviews to identify improvements. Dashboard and Visualization: Develop and maintain dashboards and visualizations that offer a consolidated view of system health and performance. Customize dashboards based on specific business and operational requirements. Capacity Planning and Scalability: Monitor resource utilization and trends to forecast capacity needs. Collaborate on resource planning and provisioning to support scalability and optimal performance. Tool Administration and Maintenance: Perform routine administration tasks for observability tools, including user management, access control, and system upgrades. Monitor the health and availability of these tools. Documentation and Knowledge Sharing: Document configurations, troubleshooting steps, and best practices. Contribute to knowledge bases and share insights with the team. Tool Integration and Automation: Integrate observability tools with other systems, including ticketing and incident management platforms. Automate monitoring configurations and reporting to improve efficiency. Continuous Improvement and Research: Stay updated on observability trends, research new tools and methods, and continuously improve monitoring setups to align with best practices. Other duties as assigned. Skills and Experience Bachelor's degree in computer science or a related technical field preferred. 5+ years of experience in software engineering or IT with a focus on monitoring, alerting, and analysis. Proficiency in application, cloud infrastructure, and monitoring tool administration. Hands-on experience with SolarWinds, Elasticsearch (AWS OpenSearch), and similar tools (e.g., Splunk). Experience with APM tools such as AppDynamics or alternatives like Dynatrace, New Relic. Proficiency in scripting languages (Python, JSON, PowerShell preferred). Strong understanding of web services and CI/CD pipelines. Ability to thrive in a fast-paced environment with excellent problem-solving skills, adaptability, and teamwork skills. Knowledge of Infrastructure as Code (IaC), particularly CDK and Terraform, is highly desirable. Passion for DevOps, application/API monitoring, automation, and reliability. About Auxis This is a full-time position with a work schedule of Monday-Friday with some schedule variations as needed including on-call coverage rotation. Occasional night or weekend work for special projects. This position is 100% work from office. #J-18808-Ljbffr

trabajosonline.net © 2017–2021
Más información