Sr Observability Engineer (Open to Remote)
We are a fast-growing FinTech company that is looking for a highly skilled Senior Observability Engineer to join our team. As a Senior Observability Engineer, you will play a pivotal role in designing, implementing, and maintaining our observability infrastructure. Your expertise in Grafana, AWS, Prometheus, and APM tools will drive our efforts in achieving comprehensive monitoring, ensuring optimal performance, and enabling proactive issue resolution.
The Senior Observability Engineer should be quick to grasp new concepts, thoroughly explore the depths of an issue, and be persistent in understanding the root cause of issues. This role requires strong interpersonal skills due to continual interaction with managers and users with varying technical backgrounds in a fast-paced work environment.
Here is what you need to be successful in the position:
Design, implement, and maintain scalable observability solutions using Grafana, Prometheus, InfluxDB, or similar tools to monitor system performance, health, and availability.
Implement and optimize observability infrastructure on AWS, leveraging various services to enhance monitoring capabilities.
Automate data collection, aggregation, and visualization processes to streamline observability workflows and reduce manual effort.
Collaborate with cross-functional teams to establish best practices, automate processes, and ensure scalability and reliability of monitoring systems.
Lead initiatives to enhance observability, troubleshoot issues, and proactively identify performance bottlenecks.
Mentor team members, providing guidance on observability tools, techniques, and methodologies.
Contribute to architectural decisions and provide insights to improve system performance and reliability.
Stay updated on emerging technologies and industry trends, incorporating new tools and methodologies to improve observability practices.
Let's get more specific on skills:
Proven experience (5+ years) in designing, implementing, and maintaining observability solutions with expertise in Grafana, Prometheus, AWS, and APM tools.
5+ years of experience in software engineering, Dev/Ops and/or SRE
Experience with infrastructure-as-code (Terraform)
Experience managing containers using Kubernetes, ECS
Experience managing containers using Docker/Kubernetes
Proficiency in a shell scripting language
Experience with Elastic and AWS CloudWatch
Experience with indexing medium to large datasets
Experience managing and responding to alerts and providing fast feedback
Proficiency with SCM (Git)
Experience with development and deployment in a hosted cloud environment like AWS
Familiar with system performance optimization, scalability, architecture, and design concepts
Technical certification in AWS or related technologies highly desirable
This is required for the job:
Ability to leverage tools to perform day-to-day administration tasks, root-cause analysis and service restoration (such as backup, restore, failover, log interpretation, and performance monitoring)
Knowledge and understanding of IT concepts, best practices and procedures
Self-motivated with the ability to work individually or in a team
Ability to multitask and manages work effectively by prioritizing own assignments, schedules, and meetings resulting in timely completion of work.
High degree of personal integrity.
The applicant should eager to learn and obtain technical certification.
Must be able to receive and follow instructions given by management.
Must have the ability develop solutions to unique problems.
The work environment characteristics described here maybe encountered while performing the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
Must be physically capable to stoop, bend, lift up to 20 lbs to place technology supplies and computers and related equipment onto racks, desks, counters and into cabinets and onto storage shelves, etc.
Moderate noise (i.e. business office with computers, phone, and printers, light traffic).
Ability to work in a confined area.
Ability to sit at a computer terminal for an extended period of time. Occasional stooping or kneeling may be necessary.
While performing the duties of this job, the employee is regularly required to stand, sit, talk, hear and use hands and fingers to operate a computer keyboard and telephone.
Specific vision abilities are required by this job due to computer work.
Regular, predictable attendance is required.
Triumph Business Capital, Triumph Bancorp, Inc. and its subsidiaries reserve the right to modify this job description at any time, with or without notice. This job description in no way implies that these are the only duties, to be performed by the employee occupying this position. This job description is not an employment contract, implied or otherwise.
Equal Employment Opportunity Statement: Triumph Business Capital and Triumph Bancorp, Inc. and its subsidiaries, provide equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, sexual orientation, national origin, age, disability, genetic information, marital status, or status as a covered veteran in accordance with applicable federal, state and local laws.
The total salary range for this position is: 141,000 - 218,500 USD Annual
Pay: $140,999 to $218,548/year
$140,999.00 - $218,548.00
Job Reference #: REQ-3346