Product Manager, CloudlyMELT

Job Description

As Product Manager for CloudlyMELT, you will own the roadmap for a platform that sits at the intersection of two of the fastest-moving areas in enterprise technology: AI infrastructure and observability. Your customers are the engineers and leaders responsible for keeping expensive GPU clusters running efficiently for the AI workloads that power their business. They are technically sophisticated, cost-conscious, and operating under real pressure. Building a product they trust and rely on requires deep understanding of both their infrastructure world and the AI capabilities that make CloudlyMELT different from every legacy monitoring tool they have already tried.

ABOUT CLOUDLYMELT
CloudlyMELT is an AI-native GPU observability platform that correlates network, GPU, and application layers in a single view, reducing MTTR from hours to seconds. It addresses GPU underutilization averaging 15 to 25% in Kubernetes clusters, straggler bottlenecks in distributed training, silent thermal throttling, and the cross-layer blind spot that makes GPU infrastructure incidents so expensive and slow to resolve. Built on OpenTelemetry, Prometheus, and DCGM, it delivers ML-powered predictive failure detection, LLM-driven root cause analysis, cost attribution, and multi-tenant fairness controls for organizations running serious AI infrastructure

Job Requirement

  • Own and maintain the CloudlyMELT product roadmap including cross-layer correlation capabilities, GPU failure prediction, straggler detection, cost attribution features, LLM root cause analysis, and platform observability infrastructure
  • Conduct ongoing discovery with AI engineering teams, MLOps leads, infrastructure architects, and FinOps stakeholders to understand their GPU operational challenges, cost pain points, and current observability gaps
  • Write clear, detailed product requirements for technically complex observability features, with precision about data sources, model behavior, output format, and integration requirements
  • Define success metrics for CloudlyMELT features in terms customers care about: MTTR reduction, GPU utilization improvement, cost attribution accuracy, and time to insight
  • Lead go-to-market planning for new platform capabilities in collaboration with marketing and sales, including benchmark data, competitive positioning, and demo environment development
  • Track competitive developments in the GPU observability and AIOps market including Datadog, Prometheus, Run:ai, and NVIDIA Base Command, and maintain CloudlyMELT's differentiated positioning
  • Manage the relationship between CloudlyMELT's internal AIOps capabilities and its customer-facing product, ensuring internal learnings feed product improvements
YOU MAY BE A GOOD FIT IF YOU HAVE

  • 2 to 4 years of product management experience at a B2B technology, infrastructure, observability, or AI/ML company
  • Strong technical literacy in GPU infrastructure, Kubernetes, distributed training, or cloud observability tooling: you can have a meaningful conversation with a senior ML infrastructure engineer about why GPU utilization is hard to measure accurately
  • Experience defining products that combine ML capabilities and real-time data infrastructure
  • Strong analytical instincts: you define the right metrics before building and you use them honestly to evaluate outcomes
  • Ability to translate deeply technical infrastructure capabilities into clear, compelling product narratives for both engineering buyers and FinOps or executive stakeholders
  • Comfort working with ML, platform, and data engineering teams on features with significant technical complexity and dependency
  • Competitive awareness and the ability to articulate specifically why CloudlyMELT wins against alternatives

PREFERRED QUALIFICATIONS
  • Experience with observability platforms, infrastructure monitoring tools, or AIOps products
  • Familiarity with GPU compute, Kubernetes cluster management, or distributed ML training workflows
  • Knowledge of FinOps practices and cloud cost optimization in AI infrastructure contexts
  • Experience with open standards such as OpenTelemetry, Prometheus, or DCGM
  • Experience shipping ML-powered product features in a production observability or infrastructure context
  • Bachelor's degree in Computer Science, Engineering, or a related field

COMPENSATION & BENEFITS
  • Salary: Competitive base, negotiable based on experience
  • Performance-based commission structure: your earnings scale directly with your results
  • Two annual festive bonuses, each equivalent to half a month's salary
  • Two-day weekends, 10 days casual leave, 10 days sick leave, and 14 public holidays per CloudlyIO's global holiday calendar for Bangladesh
  • Fully subsidized lunch and evening snacks, plus tea and coffee throughout the day
  • Direct collaboration with US clients and teams, with real exposure to global enterprise AI deals from day one