Machine Learning Engineer, Vision

Sarvam AI

Bengaluru, Karnataka, IndiaMID

AIDeep Learning

Job Description

Join Sarvam AI as a Machine Learning Engineer focusing on vision-language models.

Responsibilities

Design and run training and fine-tuning pipelines for large vision-language models on GPU clusters
Build multimodal data pipelines — ingestion, filtering, deduplication, synthetic generation, and quality assurance
Implement and experiment with new architectures and training techniques from research
Build evaluation harnesses, benchmarks, and automated regression tracking
Optimise models for inference — quantisation, batching, and serving infrastructure
Build robust pipelines and integrations that put vision model capabilities in the hands of end users
Translate real-world problems into well-scoped ML tasks with the right data and evaluation strategy
Work directly with clients to understand their use cases — document processing, visual search, form extraction — and own the solution end to end
Build production-grade systems on top of Sarvam Vision and open-source models: multimodal pipelines, retrieval-augmented workflows, and structured output extraction
Debug and improve deployed solutions — latency, accuracy, edge cases, and integration with client infrastructure
Strong Python and PyTorch — comfortable reading and modifying model internals
Hands-on experience training or fine-tuning large models, including debugging broken runs
Experience building data pipelines at scale
Solid grounding in transformer architectures and modern training techniques
Comfort with ambiguity — the roadmap is not fully pre-specified
Strong focus on secure coding practices, code quality, and system reliability
Undergraduate degree in a technical discipline (CS, statistics, physics, or equivalent)
Experience with vision-language models or multimodal systems
Distributed training (FSDP, DeepSpeed, Megatron-LM)
Post-training methods — RLHF, DPO, or alignment techniques
Inference optimisation — quantisation, distillation, serving
Prior exposure to vision-based AI systems or document processing pipelines
Contributions to open-source projects or a solid GitHub portfolio
Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
High ownership and high impact, from day one
Everything we do is AI-first, from the way we build and ship to the way we think about problems
You can work on problems that could change how an entire country learns, works, and communicates

Qualifications

Strong Python and PyTorch experience.
Experience training large models.

Nice to have

Experience with vision-language models.
Contributions to open-source projects.

Job Description

Responsibilities

Qualifications

Nice to have

Interested in this role?