Site Reliability engineer в Playtika, Київ, Вінниця

Playtika

  • Киев
  • Постоянная работа
  • Полная занятость
  • 18 ч. назад
Playtika is a leader of innovation and excellence in interactive entertainment. From humble beginnings as a startup in 2010, and with cheetah-like instincts, we've grown to become an industry leader with more than 9 million daily and 31 million monthly active users across our portfolio of games.
/ /3 вересня 2025Site Reliability engineerКиїв, ВінницяResponsibilities:
  • Maintain and improve existing monitoring configurations (alerts, dashboards, service discovery, scrape configs, etc.)
  • Implement and enhance alerting logic, including threshold tuning and dynamic alert conditions
  • Troubleshoot monitoring and metrics-related issues (e.g., missing data, false alerts, broken dashboards)
  • Support and improve self-developed metrics collectors and Python-based monitoring services
  • Assist NOC and SRE teams with alert deduplication, escalation rules, and alert quality improvements
  • Participate in design and implementation of observability improvements for new services and infrastructure components
  • Review, modify, and extend existing scripts and plugins (primarily Python and Bash)
  • Provide monitoring-related guidance to development, infrastructure, and operations teams
  • Ensure monitoring tools and services operate reliably within Kubernetes clusters and Linux systems
  • Maintain monitoring configuration in Git and follow internal version control best practices
  • Participate in cross-team initiatives to improve the overall monitoring and incident response ecosystem
Requirements:
  • Strong hands-on experience with Linux systems (primarily Ubuntu)
  • Practical knowledge of Prometheus ecosystem, VictoriaMetrics, Grafana, and Zabbix
  • Experience supporting monitoring systems in Kubernetes-based infrastructure
  • Solid scripting skills (Bash)
  • Familiarity with Git and common version control workflows
  • Good understanding of networking and infrastructure concepts (ports, protocols, DNS, etc.)
  • Ability to troubleshoot metric collection, alert firing, and data visualization issues
  • Basic knowledge of SQL (e.g., for querying time-series or metadata stores)
  • Strong communication skills for cross-functional collaboration
Nice to have:
  • Understanding of high-availability and failover patterns in observability systems
  • Experience working with SLO/SLA-based alerting or anomaly detection mechanisms
  • Exposure to automation and CI/CD pipelines for monitoring infrastructure

Dou