Site Reliability Engineer (m/f/d) | JobSetuu
gridscale GmbH
Posted 3 घंटे पहले • Via www.arbeitnow.com
Description
Job Overview
- Source: Arbeitnow
- Location: Köln
- Job Type: Full-Time
Job Description
At our company, it’s all about #OneTeam! Join gridscale and help shape the future of the cloud together with OVH.
As a leading tech company, we’ve been working for over two decades to reduce our environmental footprint - with innovative solutions and an open cloud designed to be sustainable from the ground up: #SustainableByDesign.
Our Tech Stack 🚀
OpenStack · Kubernetes · KVM · Linux · Bare-metal
· Ansible · Terraform · FluxCD/ ArgoCD · Git · Go · Python
· Claude Code/ Cursor/ agentic coding tooling
Your Role💻
You'll help build, operate, and industrialize OVHcloud's on-premise cloud platform (OPCP). You'll join a small, senior team that owns the OpenStack-based infrastructure and the Kubernetes / GitOps stack our customer-facing cloud runs on and that treats AI-assisted engineering as a first-class part of how we work.
The platform is actively in build mode, so joining now means real influence on the architecture, the automation strategy, and how we adopt AI in platform engineering. As a Senior, you shape the focus of your role around your strengths and interests: there's a clear backbone of automation, compute-lifecycle, and platform work, plus an explicit AI-substrate workstream. You're at home in a security-oriented, highly automated (GitOps) environment, keep an overview in ambiguous situations, and make well-founded decisions on that basis.
Your Tasks
Design and build OpenStack-based on-prem infrastructure that deploys itself autonomously - discovering available hardware and bringing up a functional datacenter in minutes.
Develop Infrastructure as Code with Ansible and Terraform - typically spec-first with LLM assistance, then human-validated; push this further via custom agent / sub-agent setups, agentic test generation, and prompt-engineered review loops.
Drive the ongoing development of our Kubernetes stack and GitOps workflows (FluxCD / ArgoCD).
Own the full lifecycle of our compute infrastructure - from bare-metal (firmware, provisioning, hardware health) through hypervisors to virtual compute nodes - and build the automation that keeps capacity healthy and rolls out updates without disturbing tenant workloads.
Build and extend the AI substrate that compounds our output: Markdown knowledge bases as retrieval substrate, agentic prototypes for incident triage and capacity planning, and deeper integration of agentic coding tools into daily work.
Contribute to the self-healing direction, turning today's manual runbooks into tomorrow's reasoning agents. Auto-remediation isn't a separate team here - it's how platform work is meant to land.
Design and implement test suites aligned with functional and technical specs (non-regression, performance, security).
Document and package the solution so users can deploy and operate it without friction, and keep improving the platform based on telemetry and user feedback.
Act as a technical reference and mentor across automation, platform engineering, and AI-tooling topics.
What we offer you💼
A platform that is genuinely in build mode - your architectural decisions stick.
A senior team where seniority means autonomy, not just a title.
AI-augmented engineering as a first-class workflow -Claude Code and comparable agentic tooling, Markdown-KB-as-substrate, and room to push the practice further. Modern tooling that compounds your work instead of just sitting next to it.
Exceptional team spirit across all departments and national borders - we live #OneTeam
Exciting work in a highly innovative, international environment with cutting-edge technologies
32 vacation days, increasing with length of service
Flexible working hours, home-office options, and a secure permanent position with market- and performance-based compensation
Employer-funded pension plan and an attractive insurance package
OVHcloud covers 50% of public transportation costs
Up to €400 per year toward sports activities (gym membership, classes, etc.)
Attractive discounts at numerous shops and companies through Corporate Benefits
A contribution toward leasing your cargo bike
Regular company events and free cold and hot beverages
Several years of hands-on experience running production infrastructure (SRE, Platform, or DevOps).
Solid OpenStack experience - deployed, operated, and debugged it in production.
End-to-end compute infrastructure management, from bare-metal lifecycle through hypervisor and virtual compute node operations (migration, host evacuation, graceful drains, capacity rebalancing). The skill matters more than the specific tooling - what counts is having done it at scale and automated it.
Strong with Infrastructure as Code (Ansible, Terraform) and GitOps (FluxCD or ArgoCD), plus solid Linux administration including on bare-metal.
Active, daily practice of AI-assisted engineering, with opinions formed from real use. You can describe a workflow where an LLM saved you half a day, and one where you should have skipped it. Theoretical interest doesn't count.
Fluent English, written and spoken - our team is distributed, and this is the working language.
Nice to Have
Production experience with Kubernetes and the cloud-native ecosystem.
Production-quality Go and/or Python.
Deeper agentic tooling craft (Claude Code, Cursor, Aider): custom agent / sub-agent setups, hooks, prompt engineering, your own workflows or skills and managing a Markdown-first knowledge base as substrate for AI workflows.
Advanced compute-node tuning (CPU pinning, NUMA, hugepages, SR-IOV / PCI passthrough) and basic network debugging (VLANs, BGP).
Observability tooling (Prometheus, Loki, Grafana, etc.) and auto-remediation / self-healing systems (StackStorm, Event-Driven Ansible, or similar).
Experience in security-critical environments and with edge or multi-site deployments.
Soft Skills
A continuous-improvement mindset and ownership for what you build.
You see AI tooling as a structural shift in how engineering gets done - not a trend, not a threat and want to shape how the team adopts it.
You enjoy sharing knowledge, learning from peers, and can synthesize ideas clearly.
Find more English Speaking Jobs in Germany on Arbeitnow
Expert Career Tips for Site Reliability Engineer (m/f/d) Roles
To succeed in a competitive market as a Site Reliability Engineer (m/f/d), you need more than just technical skills. Here are some expert strategies to elevate your profile:
- Build a Strong Portfolio: For technical roles, a clean GitHub or a personal project site is essential. For non-technical roles, a case study portfolio demonstrating problem-solving and impact is equally valuable. Show, don't just tell, what you have achieved in your previous positions.
- Master the Narrative: When interviewing, use the STAR method (Situation, Task, Action, Result) to structure your answers. Quantify your results wherever possible—mentioning "increased efficiency by 20%" is much more impactful than saying "improved efficiency."
- Continuous Learning: The industry moves fast. Whether it's staying updated with the latest AI tools or mastering a new management methodology, continuous professional development is key. Consider obtaining industry-recognized certifications that align with Site Reliability Engineer (m/f/d) requirements.
- Networking: Connect with other professionals in similar roles. Join online communities, attend webinars, and engage in meaningful discussions on professional social networks. Often, the best opportunities come through referrals and community engagement.
- Soft Skills Matter: Communication, empathy, and leadership are often the deciding factors between two equally qualified technical candidates. Cultivate these skills as they are universally valued across all industries and seniority levels.
Additionally, research the specific company's culture and values. Tailoring your application to show how you align with their mission can significantly increase your chances of moving forward in the process.
Salary & Compensation
Salary not disclosed; typically competitive for the role.
Work Arrangement
Type: On-Site
Standard business hours at the office.
Comprehensive Application Strategy & Hiring Process
Applying for a new role is a marathon, not a sprint. Follow this strategic approach to maximize your success rate:
1. Initial Research & Tailoring
Don't send the same resume to every employer. Spend at least 30 minutes researching the company. Look for recent news, their product roadmap, and their team structure. Modify your summary and core competencies to reflect the specific keywords found in the job description.
2. The Perfect Cover Letter
If the application allows for a cover letter, use it to tell a story that your resume cannot. Explain why you are passionate about this specific company and how your unique background makes you the perfect fit for the challenges they are currently facing.
3. Navigating the Multi-Stage Interview
Most modern hiring processes involve 3-5 stages. This typically includes a recruiter screen, a technical or skill-based assessment, a peer interview, and a final leadership round. Prepare for each stage differently: focus on enthusiasm and fit for the recruiter, technical depth for the assessment, and strategic vision for the leadership round.
4. Post-Interview Follow-Up
Always send a personalized thank-you note within 24 hours of each interview. Reference a specific topic discussed during the call to demonstrate your active listening and genuine interest in the role.
By following these steps, you demonstrate a high level of professionalism and attention to detail that sets you apart from the average applicant.
Typical Interview Process
- Resume screening
- HR call
- Skill interview
- Final manager interview
- Offer
Tip: Research the company's products and culture.
Global Market Intelligence & Relocation Insights
At JobSetuu, we specialize in helping talent navigate the global job market. Here is what you need to know about the current landscape in Köln and beyond:
The demand for skilled professionals is increasingly borderless. For roles based in Köln, understanding the local cost of living, visa requirements (if applicable), and cultural nuances is vital. If this is a remote role, consider the time zone alignment and the asynchronous communication culture of the hiring organization.
Relocation Support: Many forward-thinking companies offer relocation packages that include moving stipends, temporary housing, and legal assistance with work permits. When evaluating an offer, look beyond the base salary—consider the total compensation package, including equity, bonuses, and healthcare benefits.
Work-Life Balance Trends: Hybrid and remote work have become standard in many regions. Research the local labor laws and common practices regarding work hours and vacation time to ensure the role aligns with your lifestyle goals.
Leveraging JobSetuu's tools can help you compare salaries across different cities and understand the "purchasing power" of your potential offer, ensuring you make an informed decision for your long-term career path.
Skills & Competency Roadmap for Professional Development
To remain competitive in Professional Development, we recommend focusing on the following core competencies over the next 12-18 months:
- Technical Mastery: Deepen your expertise in the core tools and languages relevant to your field. For developers, this might be cloud architecture; for marketers, it might be data-driven attribution modeling.
- AI Augmentation: Learn how to leverage generative AI and automation tools to increase your productivity. Understanding how to integrate these technologies into your workflow is becoming a non-negotiable skill.
- Leadership & Strategy: Even in individual contributor roles, the ability to think strategically and lead projects from inception to completion is highly valued. Focus on stakeholder management and high-level project planning.
- Data Literacy: The ability to interpret data and use it to drive decisions is essential across all business functions. Familiarize yourself with data visualization and basic analytical concepts.
By investing in these areas, you not only prepare yourself for the role you are applying for today but also build a resilient foundation for the opportunities of tomorrow.
Apply via JobSetuu
Discover your next career milestone on JobSetuu. This Site Reliability Engineer (m/f/d) position is part of our commitment to bringing you the most relevant and high-impact job openings globally. At JobSetuu, we simplify your job search by aggregating premier listings and providing the tools you need to stand out. Don't miss the chance to elevate your professional journey—explore more opportunities and career insights on our platform today.
Similar Roles
Sr. Recruiter / HR Manager (Top1%) | A-Plyer-Recruiting, Employer Branding & People Ops (m/w/d) | JobSetuu
FINE DINE Verlags GmbH
Leiter:in Presse- und Öffentlichkeitsarbeit / PR & Social Media | JobSetuu
InkuPlay UG (haftungsbeschränkt)
Bright Vision Technologies: Site Reliability Engineer (SRE) | JobSetuu
WWR Employer