Senior ML Infrastructure Engineer

(m/f/d)

CLUSTER MANAGEMENT • CONTAINERIZATION & ORCHESTRATION • INFRASTRUCTURE OPTIMIZATION
You will own the hardware and software stack that enables our scientists to make breakthrough discoveries.
Location:
Frankfurt am Main, Germany (On-site)
Type:
Full-time (Initially 3 years)
Openings:
2 Positions

The Role

As an ML Infrastructure Engineer, you will own the hardware and software stack that enables our scientists to simulate brain dynamics. You will bridge the gap between bare-metal hardware and high-level JAX code, ensuring our researchers have the compute power and stability required to push the boundaries of AI.#

Key Responsibilities:

  • Cluster Management: Manage, maintain, and optimize our local high-performance compute cluster (Linux-based, NVIDIA GPUs). You are the owner of the hardware environment.
  • Containerization & Orchestration: Design and manage robust containerized environments (Docker/Kubernetes) to ensure reproducible and scalable research workflows.
  • Infrastructure Optimization: Maintain and evolve the core ML software infrastructure (Python/JAX codebase), focusing on efficiency, reproducibility, and scalability.
  • Research Operations (MLOps): Execute and monitor large-scale model training and inference runs in tight cooperation with research scientists.
  • Technical Support: Provide hands-on hardware and software support to the research team, troubleshooting bottlenecks in the research workflow.

Your Profile

We are seeking technically proficient engineers with 5+ years of industry experience who love Linux and want to apply their skills to scientific discovery.

Essential Technical Requirements:

  • Education: M.Sc. in Computer Science, Engineering, Physics, or equivalent industry experience. Ph.D a plus.
  • Experience: 5+ years of work experience with a proven track record.
  • Linux Mastery: Deep expertise in Linux administration is non-negotiable. You must be comfortable managing clusters, users, and bare-metal hardware, shell scripting, and hardware configuration.
  • Container Administration: Proven production experience with Docker and/or Kubernetes is required. You know how to orchestrate complex workloads efficiently.
  • ML Frameworks: Strong experience with Python and deep learning frameworks, specifically JAX and PyTorch.
  • Bonus: Prior experience specifically in ML Infrastructure administration (e.g., Slurm, Docker/Kubernetes for ML).
  • Bonus: Proven track record of Open Source contributions or personal software projects.
  • Bonus: Experience in computational modeling or neuroscience (understanding the "why" behind the code).

Soft Skills:

  • Goal-driven and proactive: Strong self-management skills with the ability to take ownership of the infrastructure stack.
  • Collaborative Mindset: A collaborative mindset; you enjoy enabling others to succeed.
  • Communication: Excellent written and verbal communication skills in English. Knowledge of German is a plus, but not required.

What We Offer

  • Impact: A unique opportunity to join an early-stage startup where your infrastructure decisions will directly shape the company's technological trajectory.
  • Environment: A creative, interdisciplinary setting combining academic excellence with entrepreneurial spirit.
  • Growth: Collaboration with international research partners and the chance to work on novel analog hardware concepts.
  • Package: Competitive salary and benefits package.

Get in touch

How to Apply

Please submit via our online application system:

  1. Your CV (including a full publication list and contact information of 2-3 referees).
  2. A brief cover letter highlighting your relevant research interests and what motivated you to apply to this position.
  3. A link to your GitHub / portfolio or a sample of your code.
Apply here