Machine Learning Engineer, Vision
Sarvam AI
Bengaluru, Karnataka, IndiaMID
AIDeep Learning
Job Description
Join Sarvam AI as a Machine Learning Engineer focusing on vision-language models.
Responsibilities
- Design and run training and fine-tuning pipelines for large vision-language models on GPU clusters
- Build multimodal data pipelines — ingestion, filtering, deduplication, synthetic generation, and quality assurance
- Implement and experiment with new architectures and training techniques from research
- Build evaluation harnesses, benchmarks, and automated regression tracking
- Optimise models for inference — quantisation, batching, and serving infrastructure
- Build robust pipelines and integrations that put vision model capabilities in the hands of end users
- Translate real-world problems into well-scoped ML tasks with the right data and evaluation strategy
- Work directly with clients to understand their use cases — document processing, visual search, form extraction — and own the solution end to end
- Build production-grade systems on top of Sarvam Vision and open-source models: multimodal pipelines, retrieval-augmented workflows, and structured output extraction
- Debug and improve deployed solutions — latency, accuracy, edge cases, and integration with client infrastructure
- Strong Python and PyTorch — comfortable reading and modifying model internals
- Hands-on experience training or fine-tuning large models, including debugging broken runs
- Experience building data pipelines at scale
- Solid grounding in transformer architectures and modern training techniques
- Comfort with ambiguity — the roadmap is not fully pre-specified
- Strong focus on secure coding practices, code quality, and system reliability
- Undergraduate degree in a technical discipline (CS, statistics, physics, or equivalent)
- Experience with vision-language models or multimodal systems
- Distributed training (FSDP, DeepSpeed, Megatron-LM)
- Post-training methods — RLHF, DPO, or alignment techniques
- Inference optimisation — quantisation, distillation, serving
- Prior exposure to vision-based AI systems or document processing pipelines
- Contributions to open-source projects or a solid GitHub portfolio
- Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
- High ownership and high impact, from day one
- Everything we do is AI-first, from the way we build and ship to the way we think about problems
- You can work on problems that could change how an entire country learns, works, and communicates
Qualifications
- Strong Python and PyTorch experience.
- Experience training large models.
Nice to have
- Experience with vision-language models.
- Contributions to open-source projects.