Job Description:
About the Job:
As a Lead Software Engineering - you’ll play a critical role in ensuring critical API platforms(Cloud/Onprem) stability, availability, and compliance while resolving high-level escalations. You’ll oversee upgrades, troubleshoot complex issues, and maintain observability, working closely with cross-functional teams to meet business and technical needs. This role requires a holistic understanding of the platform, deep technical expertise, and the ability to manage projects efficiently. Requires working with cross functional teams, collaborate with security experts, public cloud teams and network engineers to create software to implement APIs for network, consumer, and business solutions.
Experience Level: 12+ Years
Roles and Responsibilities:
Responsible for designing, deploying, and managing cloud infrastructure solutions on Microsoft Azure ensuring security, availability, and scalability of systemsMaintain a comprehensive understanding of the platform, identifying upgrade needs, resolving compliance issues, and ensuring client availabilityLead Tier 4 outage resolution, managing critical escalations to minimize downtime and restore services swiftly. Conduct deep-dive network investigations, troubleshoot complex issues (e.g., HTTP error codes, Azure/Mulesoft networking), and configure CORA, Azure Traffic Manager, Azure Front Door, Kubernetes Ingress, DNS, and private endpointsManage certificates, including gateway/cluster ingestion, and sync certificates from Keyfactor to Key Vaults to clustersWrite detailed technical specifications for upgrades (e.g., AKS, RTF, NGINX, AKV2K8s, Fluentbit, Noname) and ensure alignment with technical and business requirementsOversee observability using OpenSearch and Fluentbit, including querying, cluster management, index configuration, caching optimization, and log ingestion troubleshootingMaintain and upgrade Azure Kubernetes (AKS) clusters, troubleshoot node pools, manage pod disruption budgets, secrets syncing, and understand daemonsets, replicasets, and deploymentSupport Mulesoft integrations, perform RTF installations/upgrades, validate release notes, and troubleshoot ingress/pod templatesRemediate Astra violations across systems (Linkerd, istio,Calico, NGINX, Noname, AKV2K8s) and train teams to address violations effectively.Manage projects across multiple tools, ensuring timely progress and completion of tasks.
Working with Infrastructure as Code (IaC) tools and automating tasksAdvanced troubleshooting, and performance optimizationCandidate will provide tier 3 support on a rotating basis, working closely with other teams and subject matter experts.This candidate needs to be proactive and demonstrate the ability to analyze issues, generate ideas, and initiate action while achieving results.Actively participate in Scrum, providing status of tasks and coordinating with project team to meet requirements.Extended hours and weekend release work may be required.Experience working in an environment where coordination with multiple teams is essential to success.
Primary / Mandatory skills:
Overall – 12+ years of experience in platform engineering, API platforms, Azure Kubernetes (AKS), Mulesoft, and observability tools (OpenSearch, Fluentbit)
managing Azure resources, automating deployments
Very strong written and verbal skills.Prior experience in large API platform framework, API development using M2E, AJSC would be a plus.Ability to prioritize individual/group work in a deadline driven environment.Experience designing and deploying AKS environments.Expertise in networking, troubleshooting, and configuring Azure Traffic Manager, Azure Front Door, Kubernetes Ingress, DNS, and private endpointsStrong skills in writing technical specifications for upgrades (AKS, RTF, NGINX, AKV2K8s, Fluentbit, Noname)Proficiency in managing certificates and syncing across systems (Keyfactor, Key Vaults, clusters)Advanced knowledge of OpenSearch (cluster management, indexing, caching) and Fluentbit (configuration, log ingestion)Experience with Mulesoft RTF installations, upgrades, and troubleshootingAbility to remediate Astra violations and train teams on complianceStrong project management skills, with experience tracking tasks across multiple platforms.
Technical Skills: Extensive expertise in Azure, Java/Python, Kubernetes, MuleSoft, OpenSearch, Fluentbit, networking, and compliance remediation.
Additional information (if any): Willing to work in Shift Duties, Willingness to learn is very important as AT&T offers excellent environment to learn Digital Transformation skills.
Weekly Hours:
40Time Type:
RegularLocation:
IND:KA:Banglaore / Intl Tech Park, Whitefield Rd - Storage: Innovator Building, Itpb, Whitefield RdIt is the policy of AT&T to provide equal employment opportunity (EEO) to all persons regardless of age, color, national origin, citizenship status, physical or mental disability, race, religion, creed, gender, sex, sexual orientation, gender identity and/or expression, genetic information, marital status, status with regard to public assistance, veteran status, or any other characteristic protected by federal, state or local law. In addition, AT&T will provide reasonable accommodations for qualified individuals with disabilities. AT&T is a fair chance employer and does not initiate a background check until an offer is made.
Job ID R-65745 Date posted 05/06/2025