30 days left

AI Research Engineer (Kernel & Inference Optimization) — AI Inference Optimization

T
Tether

Remote · Worldwide

Full Time Posted: May 19, 2026
Job description

Tether develops blockchain-based financial infrastructure, stablecoin systems, AI technologies, peer-to-peer communication tools, and digital asset services used globally across exchanges, wallets, and payment platforms. The company operates as a distributed remote organization supporting fintech, blockchain infrastructure, and AI research initiatives.

The AI Research Engineer (Kernel & Inference Optimization) role focuses on inference optimization, GPU kernels, scalable model serving pipelines, and low-latency AI deployment for production systems. 

The position involves optimizing AI inference performance across mobile devices, edge environments, and distributed GPU infrastructure while improving throughput, latency, memory efficiency, and scalability. This opportunity aligns with blockchain infrastructure roles and advanced AI systems engineering for real-world production environments.

For organizations scaling a web3 team alongside AI infrastructure, this role combines low-level optimization engineering with production-grade inference system deployment.

🔹 Responsibilities

  • Design and deploy advanced model serving architectures optimized for high throughput, low latency, and efficient memory usage across edge and resource-constrained devices.

  • Build, execute, and monitor inference tests in simulated and live production environments while tracking latency, throughput, memory consumption, and error rates.

  • Prepare datasets and simulation scenarios for evaluating inference performance, latency optimization, and memory utilization in real-world deployments.

  • Analyze serving infrastructure bottlenecks including batch processing inefficiencies, network delays, and memory constraints to improve scalability and reliability.

  • Collaborate with cross-functional teams to integrate optimized inference frameworks and serving pipelines into production systems supporting edge and on-device AI applications.

  • Define and monitor performance success metrics while continuously refining optimization strategies for scalable inference systems.

Companies looking to build a web3 engineering team with strong AI systems expertise may value candidates experienced in GPU optimization, distributed inference, and mobile AI deployment.

🔹 Requirements

  • Degree in Computer Science or related field. PhD in NLP, Machine Learning, or related discipline is preferred alongside strong AI research publication history.

  • Strong expertise in Metal Shading Language (MSL) and writing custom compute shaders from scratch.

  • Proven experience with low-level kernel optimization and inference optimization on mobile or resource-constrained devices.

  • Deep understanding of modern model serving architectures and inference optimization techniques for low-latency, high-throughput systems.

  • Experience writing GPU kernels for mobile devices and deploying end-to-end inference pipelines.

  • Ability to apply empirical research methods to optimize latency, computational bottlenecks, and memory constraints in production systems.

  • Experience designing distributed inference systems using Tensor Parallelism, Pipeline Parallelism, and Expert Parallelism.

  • Strong understanding of Diffusion Models and Vision Transformers.

  • Knowledge of pruning, quantization, Flash Attention, KV Cache, Speculative Decoding (Eagle), and related inference optimization techniques.

Remote web3 jobs and AI infrastructure roles continue expanding across inference optimization, GPU systems engineering, distributed AI serving, and blockchain-integrated production platforms as companies scale globally distributed compute systems