StellenbeschreibungppWe are looking for a skilled AI Engineer with a strong focus on testing, evaluating, and operationalizing Large Language Models (LLMs) to join our growing team. In this role, you will ensure that our language models meet high standards of accuracy, robustness, safety, and performance, and that they integrate seamlessly into our Speech-to-Text and AI-driven application landscape. /ppYou will work closely with product, full-stack, and infrastructure engineers to transform state-of-the‑art language models into reliable, production-ready systems that solve real customer problems. You make prototypes production ready. /ph3Key Responsibilities /h3h3LLM Evaluation Testing /h3ulliDesign and maintain systematic evaluation frameworks for LLMs, including: Automated test suites, Golden datasets, Regression benchmarks /liliDefine quantitative metrics (e.g., accuracy, latency, hallucination rate, task success) and qualitative evaluation protocols. /liliPerform error analysis and root-cause investigations on model failures. /li /ulh3Task Alignment Optimization /h3ulliFocus on rapid prototyping and operationalization of customer use cases /liliImprove model performance on specific tasks using a prompt-first workflow (system prompts, few-shot examples, tool instructions). /liliBuild and iterate evaluation sets; run experiments to measure quality, latency, and cost. /liliCurate high-signal datasets for automated prompt optimization (cleaning, labeling, filtering, augmentation). /liliApply lightweight adaptation when beneficial (prompt tuning, parameter-efficient methods like LoRA/adapters). /liliUse supervised fine-tuning / instruction tuning when prompting and lightweight methods don’t reach the target. /liliPrepare and curate training datasets (cleaning, labeling, augmentation, filtering). /liliEvaluate and compare open-source and commercial LLMs for specific use cases. /liliDesign controlled experiments (A/B tests, offline evaluations). /liliDocument results and recommend model choices. /liliCollaborate with full-stack engineers to integrate prototypes into product, backend services and user-facing applications. /liliSupport API design for model inference and post-processing. /liliEnsure models behave reliably in real-time and batch workflows. /li /ulh3Quality, Safety Guardrails /h3ulliImplement mechanisms to: /liliReduce hallucinations /liliEnforce output formats /liliApply content filters /liliDetect and handle unsafe or low-confidence outputs /li /ulh3Performance Cost Optimization /h3ulliOptimize inference latency and throughput. /liliBalance model size, quantization, batching, and caching strategies. /liliMonitor and optimize inference costs. /li /ulh3MLOps Lifecycle Management /h3ulliVersion models, datasets, prompts, and evaluation results. /liliMonitor model performance in production and detect drift. /liliWork closely with product managers to translate requirements into model behaviors. /liliSupport internal teams with guidance on prompt design and model usage. /liliContribute to documentation and internal best practices. /liliDefine standards for dataset quality, labeling guidelines, and storage. /liliMaintain traceability between datasets, experiments, and deployed models. /li /ulh3Synthetic Data Generation /h3ulliUse LLMs or other techniques to generate synthetic training data where real data is scarce. /li /ulh3AgenticLLMs Human-in-the-Loop Workflows /h3ulliDesign and test LLM workflows that call tools, functions, or external APIs. /liliDesign feedback loops where human reviewers validate or correct model outputs. /li /ulh3Research Scouting /h3ulliTrack relevant papers, frameworks, and open-source projects. /li /ulh3Internal Enablement /h3ulliCreate internal guidelines for prompt writing and evaluation. /liliRun occasional knowledge-sharing sessions. /li /ulh3What You Bring /h3h3AI / ML Experience /h3ulliAt least 3–5 years of experience in machine learning or applied AI. /liliPractical experience working with LLMs in production or advanced prototypes. /liliExperience with PyTorch or TensorFlow. /liliFamiliarity with fine-tuning techniques and training pipelines. /liliStrong understanding of experimental design. /li /ulh3Programming Skills /h3ulliFamiliarity with REST APIs and backend integration. /liliExperience with dataset preprocessing, labeling pipelines, and versioning. /liliFamiliarity with Docker, CI/CD, and model deployment. /li /ulh3Analytical Mindset /h3ulliAbility to reason about model behavior and failure modes. /li /ulh3Communication /h3ulliGood verbal and written communication in English and German. /liliStartup Mentality /liliComfortable with ambiguity, fast iteration, and high ownership. /li /ulh3What We Are Offering /h3ulliOpportunity to participate in AlpineAI’s company shares program after initiation period. /liliDynamic, innovation-driven culture. /liliHigh autonomy and real product impact. /liliClose collaboration with experts in speech, NLP, and applied AI. /liliExposure to cutting-edge AI technologies. /li /ulh3Don’t Apply If /h3ulliYou are not willing to work on‑site in Zurich or Davos. /liliYou do not have a work permission for Switzerland. /liliYou have never worked in a startup environment. /li /ulh3About Us /h3h3Learn more about AlpineAI at: /h3 h3Ready to help customers succeed with AI? /h3h3Apply now with your CV and a short cover letter. We look forward to hearing from you. /h3 /p #J-18808-Ljbffr