
Site Reliability engineer в Playtika, Київ, Вінниця
- Киев
- Постоянная работа
- Полная занятость
/ /3 вересня 2025Site Reliability engineerКиїв, ВінницяResponsibilities:
- Maintain and improve existing monitoring configurations (alerts, dashboards, service discovery, scrape configs, etc.)
- Implement and enhance alerting logic, including threshold tuning and dynamic alert conditions
- Troubleshoot monitoring and metrics-related issues (e.g., missing data, false alerts, broken dashboards)
- Support and improve self-developed metrics collectors and Python-based monitoring services
- Assist NOC and SRE teams with alert deduplication, escalation rules, and alert quality improvements
- Participate in design and implementation of observability improvements for new services and infrastructure components
- Review, modify, and extend existing scripts and plugins (primarily Python and Bash)
- Provide monitoring-related guidance to development, infrastructure, and operations teams
- Ensure monitoring tools and services operate reliably within Kubernetes clusters and Linux systems
- Maintain monitoring configuration in Git and follow internal version control best practices
- Participate in cross-team initiatives to improve the overall monitoring and incident response ecosystem
- Strong hands-on experience with Linux systems (primarily Ubuntu)
- Practical knowledge of Prometheus ecosystem, VictoriaMetrics, Grafana, and Zabbix
- Experience supporting monitoring systems in Kubernetes-based infrastructure
- Solid scripting skills (Bash)
- Familiarity with Git and common version control workflows
- Good understanding of networking and infrastructure concepts (ports, protocols, DNS, etc.)
- Ability to troubleshoot metric collection, alert firing, and data visualization issues
- Basic knowledge of SQL (e.g., for querying time-series or metadata stores)
- Strong communication skills for cross-functional collaboration
- Understanding of high-availability and failover patterns in observability systems
- Experience working with SLO/SLA-based alerting or anomaly detection mechanisms
- Exposure to automation and CI/CD pipelines for monitoring infrastructure
Dou