Fidelity Investments
Principal Site Reliability Engineer
Westlake, TX
Mar 6, 2025
Full Job Description

Job Description:

Position Description:

Builds and operates highly resilient platforms in AWS cloud environments. Coordinates systems using Infrastructure as Code tools (IAM, ARM, Terraform, and Chef). Performs reliability engineering throughout the entire Software Development Lifecycle (SDLC) using Python, NodeJS, or Java. Deploys and supports distributed multi-tiered application systems using Kubernetes and Continuous Integration/Continuous Deployment (CI/CD) pipelines. Creates dashboards to capture the latency, availability, error, and saturation (performance) of applications using Splunk, Grafana, Prometheus, Catchpoint, and Datadog. Creates Service-Level Indicator/Service-Level Objective (SLI/SLO) dashboards and automated processes to update changes and create new dashboards. Identifies and resolves application issues using DataDog, Prometheus, and Splunk. Creates, maintains, and tune monitors using ELK, OpenSearch, and OpenTelemetry. Supports applications hosted in Amazon Web Services ("AWS") Cloud and Kubernetes. Builds, deploys, automates, and supports application services spanning multiple technology platforms, frameworks, and languages.

Primary Responsibilities:

  • Provides automated solutions for business and technology operational activities and manual tasks.

  • Analyzes the observability, resiliency, availability, and performance of applications.

  • Triages, deep dives, and executes root cause analysis.

  • Provides resolution of business, and system issues through enhancement initiatives.

  • Resolves issues as required during critical outages to avoid negative business impact.

  • Contributes to product architectural solutions, addressing high impact system issues.

  • Deploys and supports distributed multi-tiered application systems.

  • Manages the scalability and resiliency of applications.

  • Ensures daily business operations are not impacted by system issues (trade processing and correction, fund and sweep translation, and cash position and reconciliation).

  • Consults across the enterprise to plan for and implement enhancements to systems to avoid system outages and ensure seamless implementations.

  • Establishes end-to-end flow of application systems to quickly identify and resolve critical business issues.

  • Tests the resiliency of application systems using Chaos Engineering techniques.

  • Mentors junior team members.

Education and Experience:

Bachelor's degree (or foreign education equivalent) in Computer Information Systems, Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and five (5) years of experience as a Principal Site Reliability Engineer (or closely related occupation) maintaining and improving the reliability, performance, and scalability of distributed applications.

Or, alternatively, Master's degree (or foreign education equivalent) in Computer Information Systems, Computer Science, Engineering, Information Technology, Information Systems, Mathematics, Physics, or a closely related field and three (3) years of experience as a Principal Site Reliability Engineer (or closely related occupation) maintaining and improving the reliability, performance, and scalability of distributed applications.

Skills and Knowledge:

Candidate must also possess:

  • Demonstrated Expertise ("DE") performing site reliability engineering to analyze the observability, resiliency, availability, instrumentation, and performance of distributed applications; creating dashboards and monitors to capture the latency, availability, error, and saturation performance of distributed applications using Splunk, Grafana, Prometheus, Catchpoint, Telemetry, and Datadog; and creating SLI/SLO dashboards, monitors, and automated processes to update changes and create new dashboards.

  • DE developing Kubernetes platforms and automations in public and private Cloud -- RKS (Rancher), EKS (AWS), and AKS (Azure) -- using Python, Shell Scripting, GIT, Docker, and Kubernetes.

  • DE automating business and technology operational activities - Kubernetes cluster rehydration, application recycling, patching, disaster recovery, and ITSM reporting -- using Jenkins Core, uDeploy, RunDeck, Ansible and AWX.

  • DE performing triage and root cause analysis (RCA) in a multi-tiered, fund accounting application system related to hardware, software, network, applications, and cloud service providers, on multiple platforms -- Unix, Windows, and AWS cloud Environments, using DataDog, Splunk, Grafana, and Kibana.

#PE1M2 #LI-DNI

Certifications:

Category:

Information Technology

Fidelity's hybrid working model blends the best of both onsite and offsite work experiences. Working onsite is important for our business strategy and our culture. We also value the benefits that working offsite offers associates. Most hybrid roles require associates to work onsite every other week (all business days, M-F) in a Fidelity office.

PDN-9e5d4395-9c55-4bf8-a715-c964cdf31dc0
Job Information
Job Category:
Information Technology
Spotlight Employer
Related jobs
Revvity
The primary responsibility of this position is leading a site Procurement team, building and executing partnering with our global operations and business leaders to providing strategic sourcing expe...
Mar 6, 2025
Hopkinton, MA
Revvity
The Sales Specialist, LabChip is responsible for Direct Sales of the LabChip GX and GXII Touch Microfluidics Platforms and Consumables primarily in the Mid-Atlantic. This includes: Management of key...
Mar 6, 2025
PA
Novartis
Job Description SummaryThis is a field-based and remote opportunity supporting key accounts in an assigned geography.Novartis is unable to offer relocation support for this role. Please only apply if...
Mar 6, 2025
©2025 Gamma Phi Omega Sorority, Inc.
Powered by TalentAlly.
Apply for this job
Principal Site Reliability Engineer
Fidelity Investments
Westlake, TX
Mar 6, 2025
Your Information
First Name *
Last Name *
Email Address *
This email belongs to another account. Please use a diferent email address or Sign In.
Zip Code *
Password *
Confirm Password *
Create your Profile from your Resume
By clicking the Apply button, you agree to the terms of use and privacy policy and consent to receive emails from us about job opportunities, career resources, and other relevant updates. You can unsubscribe at any time.
Continue to Apply

Fidelity Investments would like you to finish the application on their website.

Supercharge Your Resume with AI

Boost your resume with AI-driven enhancements. The tool analyzes and refines your content, highlighting your strengths and tailoring it for maximum impact. Get personalized suggestions and apply improvements instantly to stand out in the job market.