AI Engineer for LLM Ops & Evaluation (m/f/d)
Auxilius.ai
Posted 2 hours ago • Via www.arbeitnow.com
Description
Job Overview
- Role: AI Engineer for LLM Ops & Evaluation (m/f/d)
- Company: Auxilius.ai
- Location: Munich
- Employment Type: Full-Time
- Category / Department: IT
- Salary: Competitive / Not Disclosed — confirm during interview
- Key Skills / Technologies: IT
- Listing Source: Arbeitnow
Job Description
You'll join an early-stage, AI-native startup with a product that has already proven market fit. We build cutting-edge AI solutions for Governance, Risk and Compliance (GRC) for enterprises around the world.
Our customers are auditors, risk managers, and compliance teams, which means evaluation rigor, auditability, and EU AI Act readiness aren't afterthoughts for us. They're product requirements.
Tasks
As our AI Engineer for LLMOps & Evaluation, you'll own the LLMOps pipeline end-to-end and work directly alongside our founding team.
You will:
- Own the LLMOps pipeline: Evaluate infrastructure, prompt optimization loop, and the production integration that turns experiments into reliable customer-facing features
- Design evaluation strategy per output type: Decide when to use deterministic evals (exact match, schema validation, embeddings) vs. LLM-as-judge, and build the rubrics, test datasets, and human-review loops that make the system trustworthy
- Drive prompt engineering and optimization across all LLM operations in the product: Moving from hand-tuned prompts to a measurable, iterative process
- Pick the right tool for each problem: Some things are LLM problems, some are embedding + classical NLP problems, some are deterministic logic
- Run the production side of AI features: Observability (Langfuse /LangSmith / similar), cost and latency engineering, incident response when an LLM feature degrades
- Build human-in-the-loop workflows: Review queues, feedback ingestion, labeling; so production signal feeds back into evals and prompt iteration
- Mentor our AI & Analytics Intern and contribute to how we build the AI team over time
Requirements
- 3+ years of hands-on experience building and shipping ML/AI systems in production (we care more about what you've shipped than years on a CV)
- Have shipped an LLM evaluation or prompt optimization pipeline, not just used LLMs in a project, but owned the loop
- Strong hands-on experience with LLM-as-judge, including its variance problems and concrete techniques for controlling them
- Solid foundation in classical NLP and ML ops: Embeddings, semantic similarity, entity matching, classification, fuzzy matching
- Informed opinions on deterministic vs. LLM-based evals, from experience
- Production judgment: You've owned cost and latency tradeoffs, observability, and incident response for an LLM-powered feature. You're familiar with prompt regression and have strategies for managing it
- Strong Python
- Excellent English communication, written and verbal: We discuss nuanced technical tradeoffs daily with the founding team and customers
- Comfort with ambiguity: You can run experiments on real data, build intuition for this domain, and know when to stop iterating
Nice to have
- Hands-on experience with LLM observability and eval tooling (Langfuse, LangSmith, Phoenix/Arize, Helicone, Braintrust, W&B)
- Experience with DSPy or similar prompt optimization frameworks, and opinions on where they do and don't work
- Experience with Azure OpenAI in EU regions, or with EU-sovereign providers (Mistral, Aleph Alpha)
- Exposure to guardrails, content safety, or AI governance
- Exposure to enterprise software, ideally GRC, compliance, audit, or regulated industries
- Familiarity with Java/Spring Boot or Kubernetes on Azure; enough to integrate cleanly
- German
Benefits
- Hands-on ownership of a real AI product used by enterprise customers
- Work directly alongside the founding team from day one
- Hybrid work model: Munich North, minimum one day per week in the office, otherwise flexible (open to strong candidates elsewhere in the EU for the right fit); onboarding will take in-office
- A steep learning curve at the intersection of LLM engineering, enterprise GRC, and startup operations
- The chance to shape the AI team as we grow
Auxilius .ai is building AI-powered GRC solutions for enterprises. We're early-stage, fast-growing, and backed by real customers. Our tech stack includes Java & Spring Boot, Angular, Kubernetes on Azure, and OpenAI & Anthropic LLMs.
Find more English Speaking Jobs in Germany on Arbeitnow
Salary & Compensation
The salary for this position has not been publicly disclosed. Compensation is typically determined based on your experience, skills, and interview performance. Use your research on industry benchmarks and the cost of living in the role's location to negotiate effectively.
In addition to base salary, many employers in this sector offer a comprehensive benefits package that may include:
- Annual or performance-based bonuses
- Health, dental, and vision insurance
- Provident Fund (PF) and Gratuity contributions (India)
- Paid Time Off (PTO), sick leave, and public holidays
- Professional development budget and learning allowances
- Stock options or Employee Stock Ownership Plans (ESOPs) at select companies
- Flexible or remote working allowances
- Parental leave and family health coverage
Note: The specific benefits offered by this employer should be confirmed during the offer stage. Not all benefits listed above may apply to every organisation or role type.
Work Arrangement
Type: On-Site / Full-Time
This is an on-site, full-time position. You will be expected to report to the office or designated work location during standard business hours, Monday through Friday. Some companies offer flexible start and end times or occasional work-from-home days at the manager's discretion. The company fosters a collaborative environment with open workspaces, dedicated meeting rooms, and structured team events.
Typical Interview Process
While each organisation structures its hiring differently, candidates for this type of role typically go through the following stages:
- Resume and application screening
- Introductory phone or video call with HR
- Role-specific skill or competency interview
- Final interview with the hiring manager or panel
- Reference checks and offer discussion
Tip: Research the company's products, culture, and recent news thoroughly before each interview round.
About the Employer
Auxilius.ai is the organisation posting this opportunity. While full company details are available on the original job listing, here is what you should research before applying:
- Company size and culture: Review the company's LinkedIn profile, Glassdoor reviews, and their official website to understand team size, work culture, and employee satisfaction.
- Products and services: Familiarise yourself with what the company builds, sells, or delivers. Being knowledgeable about their offerings will set you apart during interviews.
- Recent news: Search for any recent fundraising, acquisitions, product launches, or leadership changes — these often come up in interviews and signal company health.
- Location and offices: The role is based in or around Munich. Confirm office address, remote policy details, and travel requirements during the process.
- Where this listing was found: This job was sourced from Arbeitnow.
How to Apply & Preparation Tips
To apply for the AI Engineer for LLM Ops & Evaluation (m/f/d) position, follow these steps:
- Tailor your resume: Customise your CV to match the specific requirements listed in the job description. Use keywords from the posting to pass Applicant Tracking System (ATS) filters.
- Write a compelling cover letter: Even if not mandatory, a concise cover letter demonstrating your enthusiasm and fit for the role significantly improves your chances.
- Apply via the original listing: Use the apply link on the original job post to submit your application. Avoid applying through third-party channels that may delay or lose your submission.
- Prepare for phone screening: Be ready for an initial call within 3–7 business days of applying. Have your resume and a quiet space ready.
- Follow up professionally: If you haven't heard back in 7–10 business days, a brief, polite follow-up email to the recruiter is acceptable and often appreciated.
Key skills relevant to this role include: IT. Ensure these are prominently featured on your resume and LinkedIn profile.
Disclaimer: This listing is aggregated from a public job board for informational purposes. JobSetuu does not guarantee the accuracy or current availability of this position. Always verify the details on the employer's official careers page before applying.
Similar Roles
Process Automations & Technical Operations Specialist (m/w/d; Teilzeit 32h/Woche; Remote möglich)
ecosistant
System- und Netzwerkadministrator (m/w/d)
Ritter Technologie GmbH
IT Infrastructure Specialist Microsoft (m/w/d)
hubside - Die Recruitingwerkstatt