Site Reliability engineer в Playtika, Київ, Вінниця

Киев
Постоянная работа
Полная занятость

18 ч. назад

Playtika is a leader of innovation and excellence in interactive entertainment. From humble beginnings as a startup in 2010, and with cheetah-like instincts, we've grown to become an industry leader with more than 9 million daily and 31 million monthly active users across our portfolio of games.
/ /3 вересня 2025Site Reliability engineerКиїв, ВінницяResponsibilities:

Maintain and improve existing monitoring configurations (alerts, dashboards, service discovery, scrape configs, etc.)
Implement and enhance alerting logic, including threshold tuning and dynamic alert conditions
Troubleshoot monitoring and metrics-related issues (e.g., missing data, false alerts, broken dashboards)
Support and improve self-developed metrics collectors and Python-based monitoring services
Assist NOC and SRE teams with alert deduplication, escalation rules, and alert quality improvements
Participate in design and implementation of observability improvements for new services and infrastructure components
Review, modify, and extend existing scripts and plugins (primarily Python and Bash)
Provide monitoring-related guidance to development, infrastructure, and operations teams
Ensure monitoring tools and services operate reliably within Kubernetes clusters and Linux systems
Maintain monitoring configuration in Git and follow internal version control best practices
Participate in cross-team initiatives to improve the overall monitoring and incident response ecosystem

Requirements:

Strong hands-on experience with Linux systems (primarily Ubuntu)
Practical knowledge of Prometheus ecosystem, VictoriaMetrics, Grafana, and Zabbix
Experience supporting monitoring systems in Kubernetes-based infrastructure
Solid scripting skills (Bash)
Familiarity with Git and common version control workflows
Good understanding of networking and infrastructure concepts (ports, protocols, DNS, etc.)
Ability to troubleshoot metric collection, alert firing, and data visualization issues
Basic knowledge of SQL (e.g., for querying time-series or metadata stores)
Strong communication skills for cross-functional collaboration

Nice to have:

Understanding of high-availability and failover patterns in observability systems
Experience working with SLO/SLA-based alerting or anomaly detection mechanisms
Exposure to automation and CI/CD pipelines for monitoring infrastructure

Dou

Откликнуться