The Role
As an ML Infrastructure Engineer, you will own the hardware and software stack that enables our scientists to simulate brain dynamics. You will bridge the gap between bare-metal hardware and high-level JAX code, ensuring our researchers have the compute power and stability required to push the boundaries of AI.#
Key Responsibilities:
- Cluster Management: Manage, maintain, and optimize our local high-performance compute cluster (Linux-based, NVIDIA GPUs). You are the owner of the hardware environment.
- Containerization & Orchestration: Design and manage robust containerized environments (Docker/Kubernetes) to ensure reproducible and scalable research workflows.
- Infrastructure Optimization: Maintain and evolve the core ML software infrastructure (Python/JAX codebase), focusing on efficiency, reproducibility, and scalability.
- Research Operations (MLOps): Execute and monitor large-scale model training and inference runs in tight cooperation with research scientists.
- Technical Support: Provide hands-on hardware and software support to the research team, troubleshooting bottlenecks in the research workflow.
Your Profile
We are seeking technically proficient engineers with 5+ years of industry experience who love Linux and want to apply their skills to scientific discovery.
Essential Technical Requirements:
- Education: M.Sc. in Computer Science, Engineering, Physics, or equivalent industry experience. Ph.D a plus.
- Experience: 5+ years of work experience with a proven track record.
- Linux Mastery: Deep expertise in Linux administration is non-negotiable. You must be comfortable managing clusters, users, and bare-metal hardware, shell scripting, and hardware configuration.
- Container Administration: Proven production experience with Docker and/or Kubernetes is required. You know how to orchestrate complex workloads efficiently.
- ML Frameworks: Strong experience with Python and deep learning frameworks, specifically JAX and PyTorch.
- Bonus: Prior experience specifically in ML Infrastructure administration (e.g., Slurm, Docker/Kubernetes for ML).
- Bonus: Proven track record of Open Source contributions or personal software projects.
- Bonus: Experience in computational modeling or neuroscience (understanding the "why" behind the code).
Soft Skills:
- Goal-driven and proactive: Strong self-management skills with the ability to take ownership of the infrastructure stack.
- Collaborative Mindset: A collaborative mindset; you enjoy enabling others to succeed.
- Communication: Excellent written and verbal communication skills in English. Knowledge of German is a plus, but not required.
What We Offer
- Impact: A unique opportunity to join an early-stage startup where your infrastructure decisions will directly shape the company's technological trajectory.
- Environment: A creative, interdisciplinary setting combining academic excellence with entrepreneurial spirit.
- Growth: Collaboration with international research partners and the chance to work on novel analog hardware concepts.
- Package: Competitive salary and benefits package.