Sutherland is seeking Application and System Monitoring Engineer to elevate our existing CloudOps monitoring capabilities. In this role, you will work with a variety of modern tools and technologies to build the next generation of monitoring systems, as well as troubleshoot and resolve issues in our development, test, and production environments. The ideal candidate should be capable of working in a dynamic and complex software environment and be an energetic self-starter passionate about building, innovating, and achieving excellence. Subject Matter Expertise: Experience implementing predictive and detailed monitoring. Expertise in Linux command line. Design, architect, and implement secure, highly available monitoring infrastructure. Enhanced monitoring capabilities, including auto-detection of brute force and password attacks in logs. Implement next-generation predictive monitoring solutions to detect and alert on capacity utilization, network issues, and choke points. Design, implement, and improve tools like Grafana, Prometheus, Loki, Promtail, node exporter. Log parsing, management, and configuration of alerting and notifications (VictorOps/Splunk, Email). Architect and implement Icinga 2 monitoring and alerting systems. Monitor system metrics and perform log parsing. Automate tasks using Bash and/or Python scripting. Perform predictive monitoring of systems and applications. Familiarity with JVM internals, JMX, and REST APIs for monitoring. Experience with AWS infrastructure. Deep understanding of Java applications, TLS, Apache. Automate performance checks of systems and web applications. Problem-solving, troubleshooting, and root cause analysis skills. Create and maintain dashboards and reports, integrating data across platforms and tools. Assist in scripting and queries for environment self-healing capabilities. Strong written, verbal, interpersonal, and presentation skills. Effective communication with technical and non-technical stakeholders. Customer-focused approach and management skills. Stay updated on the latest monitoring technologies and trends. Adhere to configuration, release, and change management protocols. Skills and Qualifications: Bachelor's degree in Computer Science or equivalent experience. Experience with monitoring tools in production environments. 5+ years of cloud operations experience. 5+ years of expertise in Linux command line. 5+ years of experience with Terraform in AWS for automation. 5+ years of building production services in AWS. 4+ years of scripting experience with Python and Bash. Ability to participate in an on-call rotation. Knowledge of IT equipment, diagnostic tools, systems analysis, and design. Knowledge of information systems, computer technology capabilities, security, and disaster recovery. Proficiency with computer operating systems. Strong problem-solving, analytical, and troubleshooting skills. Excellent communication skills, both oral and written. Bonus Skills: Familiarity with Catchpoint. #J-18808-Ljbffr