Model optimization/CUDA Data Science engineer

Europe, Ukraine

We seek a highly skilled CUDA API Developer to design and implement a high-performance API server for text-to-speech (TTS) applications. The ideal candidate will have expertise in leveraging CUDA for GPU-accelerated computing, audio processing, and efficient data encoding. You will be responsible for integrating advanced TTS models, optimizing performance on GPUs, and converting audio output into Opus frames for real-time streaming and storage.

Responsibilities:

  • Develop a robust API server in C++ to process TTS requests efficiently.
  • Integrate CUDA-accelerated TTS models for fast inference.
  • Implement audio processing pipelines to sanitize and normalize PCM data.
  • Use Libopus to encode PCM audio into Opus frames for streaming.
  • Optimize GPU resource management to handle concurrent requests.
  • Design and implement REST or gRPC endpoints for TTS services.
  • Benchmark, test, and debug the system for latency and throughput optimization.
  • Document the system architecture and provide maintenance support.

Requirements:

  • Proficiency in C++ development, with experience in building performance-critical applications.
  • Strong knowledge of CUDA and GPU programming for accelerated computing.
  • Experience with audio processing libraries (e.g., Libopus, FFmpeg).
  • Familiarity with TTS models like Tacotron, FastSpeech, or custom implementations.
  • Understanding of REST and gRPC APIs.
  • Strong problem-solving and debugging skills.
  • Knowledge of multi-threading and GPU resource management.

Nice to Have

  • Experience with TensorRT for model optimization.
  • Knowledge of streaming protocols like WebSocket or HTTP/2.
  • Background in real-time systems or multimedia applications.
  • Experience with containerization tools like Docker for deployment.


Ready to rumble?

Send your CV or contact us here.