Senior DevOps/Site Reliability Engineer

EPAM Systems

  • Украина
  • Постоянная работа
  • Полная занятость
  • 1 мес. назад
We're seeking a skilled DevOps/SRE with extensive expertise in designing, implementing, and maintaining observability platforms to ensure system reliability, performance, and scalability. As a vital member of our SRE team, you will promote the adoption of observability best practices, fostering proactive monitoring, swift incident resolution, and continuous enhancements to our software products and infrastructure.This role emphasizes creating and refining observability solutions-including metrics, logs, and traces-to provide actionable insights into system health and performance. You'll also advance automation for deployment pipelines, oversee applications across various environments, and ensure our systems meet rigorous reliability and availability expectations. Collaboration will be essential as you engage closely with development teams to integrate observability into the software lifecycle, equipping them with the tools and practices for efficient debugging and iteration.The remote option applies only to the Candidates who will be working from any location in Ukraine.ResponsibilitiesArchitect and implement observability platforms using tools like Prometheus, Grafana, and OpenTelemetry to support our Next.js frontend and accompanying systemsDesign and maintain automated deployment pipelines focused on reliability, observability, and zero-downtime updates across multiple environmentsCollaborate with development teams to integrate observability into local workflows for accelerated debugging and iterationOptimize infrastructure and tools for scalability, fault tolerance, and performance with the aim of reducing mean time to detection (MTTD) and resolution (MTTR)Mentor team members in SRE practices, including observability-driven development, incident management, and post-mortem analysesRequirementsProficiency in scripting languages like Python for automation and observability toolsExpertise in observability frameworks (e.g., Prometheus, Grafana, Loki, Jaeger) and logging solutions (e.g., ELK stack, Fluentd)Background in containerization technologies (e.g., Docker) and orchestration platforms (e.g., Kubernetes, AWS ECS)Knowledge of infrastructure as code tools (e.g., Terraform, Ansible) to provision and manage observable systemsFamiliarity with version control systems, especially Git, and integrating observability into CI/CD pipelines (e.g., Jenkins, GitHub Actions)Capability to define and measure service-level indicators (SLIs), objectives (SLOs), and error budgets to ensure system reliabilityCompetency in fostering collaboration and communication, with a strong commitment to nurturing a blameless culture of improvementNice to haveProficiency in Polish languageProficiency in programming languages as applied to SRE, DEVOPS, or observability contextsFamiliarity with cloud platforms, such as AWS, with a focus on observability services (e.g., CloudWatch, X-Ray)Understanding of distributed systems, chaos engineering, or security practices in observable environmentsWe offer/BenefitsWITH US YOU CANWork on a flexible schedule remotely or from any of our comfortable offices or coworking spaces in UkraineReceive the necessary equipment to perform your work tasksChange projects and technology stacks within EPAMGain experience in various business domains (Insurance, E-commerce, Healthcare, Finance, Travelling, Media, Artificial Intelligence, and more)Consider relocation options in over 30 countries worldwideParticipate in volunteer, charity programs and communities (both technical and interest-based) WE FOCUS ON YOUR PROFESSIONAL GROWTHYou can plan your individual career path together with your manager.Receive regular feedback from colleaguesImprove your English for free with certified teachers (Speaking Clubs, client interview preparation courses, etc.)Get the opportunity to undergo free training and certification in AWS, GCP, or Azure CloudsUse the internal E-learn training program (18,200+ specialized training and mentoring programs)Access corporate accounts on LinkedIn Learning, Get Abstract and other partner resourcesStudy at EPAM Solution Architecture School with the instructors who are practicing architectsDevelop as a leader, join Delivery Management, Resource Management, Leadership Essentials school and moreParticipate in internal communities (500+ meetups, technical discussions, brainstorming sessions, online events and conferences annually) WHAT WE OFFERVacation and sick leave (including a sick leave without a medical certificate)A wide range of Voluntary Medical Insurance programs providing both medical treatment and various preventive options (including sports activities)Medical insurance for family members at corporate ratesCompany support during significant life events (childbirth or adoption, marriage, etc.)Support for psychological comfort: discounts on services from mental health specialists or coaches, thematic trainingE-kids program - a free programming language training program for EPAMers' childrenEPAM strives to provide its global team of over 52,800+ professionals in more than 55 countries with opportunities for professional growth from day one of collaboration. Our colleagues are the source of EPAM's success, so we value cooperation, strive to always understand our clients' business and aim for the highest quality standards. No matter where you are, you will join a dedicated, diverse community that will help you realize your potential to the fullest.Why Choose Us

EPAM Systems