Svitla Systems Inc. is looking for a Senior Site Reliability Engineer (Datadog) for a full-time position (40 hours per week) in Poland. Our client is a leading expert network connecting business and government professionals with industry experts to support informed decision-making. They provide a research enablement platform powered by real-time data, innovative technology, and specialized expertise. Through calls, conferences, surveys, and workshops, the platform enables clients to gain insights across industries like healthcare, finance, consumer goods, energy, technology, and legal sectors. Since 2003, the company has partnered with top consulting firms, hedge funds, and Fortune-ranked companies, helping them turn insights into action.
You will join the CloudOps team and help drive the optimization and performance of the infrastructure monitoring and observability practices.
You will be responsible for managing, maintaining, and optimizing Datadog for comprehensive monitoring and observability across our Azure infrastructure, Kubernetes environments, and application services.
By leveraging Datadog’s tools for monitoring, alerting, and automated remediation, you will play a key role in ensuring the high availability, reliability, and performance of cloud-based systems.
Requirements
- 5+ years of experience as a site reliability engineer or monops.
- 3+ years of experience with clouds ( AWS, GCP, Azure) in general
- Strong experience with monitoring and observability tools.
- 2+ years of experience with Datadog (experience in using Datadog’s integration features (alerts, monitoring dashboards, and automated remediation).
- 2+ years of experience in cloud cost management (FinOps)
- Proficiency in scripting with languages such as Bash, PowerShell, Python, or similar.
- Strong troubleshooting and debugging capabilities in an agile software development environment.
- Strong problem-solving skills and a proactive approach to system monitoring and issue resolution.
- Proven experience managing projects and meeting deadlines while maintaining high-quality standards.
- The ability to prioritize tasks effectively and exhibit good judgment when managing resources.
- Excellent interpersonal and communication skills for cross-team collaboration.
- Independent and self-motivated individual with the ability to drive tasks to completion.
- A team-oriented person with a collaborative mindset who can work in a fast-paced, agile environment.
- Strategic thinking with the ability to balance operational needs with long-term goals.
- The ability to take ownership of tasks and a strong sense of accountability.
- At least upper-intermediate English level.
- Overlap till 7 pm CET is a must. The client’s team is in the EST time zone.
Nice to have
- Knowledge of Infrastructure as Code using tools like Terraform, ARM templates, or Azure CLI is a huge plus.
- Azure Solutions Architect Expert certification or equivalent.
- Azure Security Engineer certification (Associate level).
- Familiarity with Ansible for automation and configuration management.
- Advanced knowledge of Kubernetes and container orchestration best practices.
- Experience in CI/CD pipelines and integrating Datadog with DevOps processes.
Responsibilities
- Datadog Implementation & Management: Take full ownership of Datadog for monitoring infrastructure, services, and applications across multiple environments (Production, Development, Test). Ensure optimal configurations for observability and alerting.
- Performance & Health Monitoring: Monitor infrastructure and application performance using Datadog, identify potential issues, and create automated remediation workflows to resolve them.
- Cost Management: Optimize and monitor Azure cloud costs using Datadog and other cloud tools, tracking and improving resource usage and cost-efficiency.
- Automation & Remediation: Leverage Datadog’s alerting system and integrations to automate the remediation of common infrastructure and application issues.
- Kubernetes & Cloud Infrastructure: Collaborate with CloudOps and Engineering teams to monitor and optimize Kubernetes environments, ensuring containers, pods, and services are running efficiently.
- Collaboration: Work closely with Engineering, AppOps, and CloudOps teams to address complex infrastructure challenges, ensuring smooth deployments and high availability.
- Security & Compliance: Ensure security and compliance best practices are followed for monitoring and logging, participating in security audits and incident response activities as required.
- Infrastructure as Code: Support the automation and deployment of infrastructure using tools like Terraform and Azure Resource Manager (ARM).
- FinOps: Contribute to FinOps activities by tracking resource usage and optimizing cloud costs, providing data-driven insights into cost-saving opportunities.
- Best Practices & Optimization: Continuously review and improve monitoring configurations, workflows, and processes for maximum efficiency, performance, and security.
We offer
- US and EU projects based on advanced technologies.
- Competitive compensation based on skills and experience.
- Annual performance appraisals.
- Flexibility in workspace, either remote or in our welcoming office.
- Comprehensive medical insurance after one month.
- MultiSport card with access to 2500 sports facilities all over Poland
- Bonuses for recommendations of new employees.
- Bonuses for article writing, public talks, other activities.
- 15 vacation days, 10 national holidays, sick leaves, family days off.
- Educational activities reimbursement on the monthly basis.
- Free webinars, meetups and conferences organized by Svitla.
- Gifts for anniversaries, New Year, children and more.
- Fun corporate celebrations and activities.
- Awesome team, friendly and supportive community!
About Svitla
Svitla Systems is a global trusted IT solutions company headquartered in California, with business and development offices through out the US, Latin America, Europe, and Asia. Svitla is an outspoken advocate of workplace flexibility, best known for its well-established remote culture, individual approach to our teammate’s professional and personal growth, and family-like environment.
Since 2003, Svitla has served a wide range of clients, from innovative start-ups in California to mega-large corporations such as Ingenico, Amplience, InvoiceASAP and Global Citizen. At Svitla, developers work with clients’ teams directly, building lasting and successful partnerships, as a result of seamless integration with on-site processes.
Svitla Systems’ global mission is to build a business that contributes to the well-being of our partners, personnel and their families, improves our communities, and makes a lasting difference in the world. Join us!