Performance Engineer ML Models

C2C
  • C2C
  • Anywhere

Job Title: Performance Engineer – ML Models

Location: San Francisco Bay Area, CA (Onsite)

Experience: [5+ years]

Employment Type: [Contract]


Job Summary:

We are seeking an experienced Performance Engineer to evaluate, optimize, and enhance the efficiency of Machine Learning (ML) models in production environments. The ideal candidate should have expertise in Kubernetes, Load Testing, Python, and ML model performance tuning to ensure scalable and high-performance ML solutions.


Key Responsibilities:

  • Design, develop, and execute performance tests for Machine Learning (ML) models in production.
  • Analyze latency, throughput, and scalability of ML models and improve inference efficiency.
  • Optimize ML model deployments in Kubernetes-based environments.
  • Conduct load testing to simulate real-world workloads and identify bottlenecks.
  • Implement profiling tools to monitor ML model resource consumption (CPU, memory, GPU usage).
  • Collaborate with ML engineers, data scientists, and DevOps teams to optimize model serving.
  • Automate performance monitoring, logging, and alerting using Python-based tools.
  • Work on distributed computing and parallel processing to enhance ML model execution.
  • Implement best practices for ML model scaling and A/B testing in production.
  • Stay updated on ML performance optimization techniques, Kubernetes advancements, and cloud-based ML solutions.

Required Skills & Experience:

  • Strong experience in performance engineering for Machine Learning applications.
  • Hands-on expertise in Kubernetes for ML model deployment and orchestration.
  • Experience with load testing tools such as Locust, JMeter, or K6.
  • Proficiency in Python for scripting, automation, and performance analysis.
  • Knowledge of ML model serving frameworks (TensorFlow Serving, Triton Inference Server, ONNX Runtime).
  • Strong understanding of cloud-based ML solutions (AWS SageMaker, GCP AI Platform, Azure ML).
  • Experience in profiling and optimizing ML models for inference efficiency.
  • Familiarity with GPU acceleration, CUDA, and TensorRT for ML model performance tuning.
  • Experience with distributed computing frameworks such as Ray, Dask, or Spark.

Nice-to-Have Skills:

  • Exposure to MLOps practices and CI/CD pipelines for ML models.
  • Experience with real-time ML inference workloads and microservices architecture.
  • Knowledge of AI accelerators (TPUs, FPGAs) and edge ML deployment.
  • Familiarity with data pipeline performance optimization.

 

Thanks

Debasish Pattnaik

d.pattanaik@mrtechnosoft.com

www.mrtechnosoft.com

undefined


From:
DEBASISH PATTNAIK,
MRTECHNOSOFT
d.pattanaik@mrtechnosoft.com
Reply to:   d.pattanaik@mrtechnosoft.com