| Page 178 | Kisaco Research

For years, AI services have been locked into expensive GPU cloud infrastructure, burdened by high costs, latency, and privacy risks. ZETIC.ai introduces a breakthrough: an end-to-end automated SDK that converts your existing AI models into fully optimized on-device apps within hours. By leveraging the NPUs already inside billions of smartphones, we enable companies to eliminate GPU servers entirely, cutting costs by up to 99% while delivering real-time, private, and scalable AI experiences. This session unveils how the future of AI is no longer in the cloud — it’s already in your pocket.

Author:

Yeonseok Kim

CEO
Zetic

Yeonseok Kim

CEO
Zetic

This demo explores how to achieve high-performance AI on Google's Tensor Processing Units (TPUs) using the JAX ecosystem, with a specific focus on image recognition workflows. We’ll begin with micro-benchmarks that showcase JAX's unique advantages for TPU-based computation, such as its single-program, multiple-data (SPMD) programming model, which is ideal for the TPU's systolic array architecture. This setup is designed for integration into larger, production-grade environments, such as those running on Kubernetes.

Author:

Ravi Mahendrakar

Senior Product Manager
Google Cloud

Ravi Mahendrakar is a Product Management leader at Google, focused on ML Frameworks & Ecosystems. With over 20 years of experience, including product roles at AWS, Aerospike, VAST Data, Pure Storage, Veritas, and IBM. Ravi  specializes in bringing innovative data and enterprise software solutions to market. Ravi has an MBA from Chicago Booth and a Master's in Computer Science from CSU Chico.

 

Ravi Mahendrakar

Senior Product Manager
Google Cloud

Ravi Mahendrakar is a Product Management leader at Google, focused on ML Frameworks & Ecosystems. With over 20 years of experience, including product roles at AWS, Aerospike, VAST Data, Pure Storage, Veritas, and IBM. Ravi  specializes in bringing innovative data and enterprise software solutions to market. Ravi has an MBA from Chicago Booth and a Master's in Computer Science from CSU Chico.

 

Rebellions introduces REBEL-Quad, the world’s first UCIe-Advanced AI accelerator designed for peta-scale inference. Built for efficiency at every layer, REBEL-Quad redefines the economics of AI data centers—delivering higher throughput and lower power consumption compared to GPU-based systems. This session will feature a demo showcase, highlighting how REBEL-Quad brings frontier-class performance into production, enabling enterprises to scale large-model workloads without the energy tax. Join us to see how next-generation chiplet architecture translates into real customer value: efficiency, scalability, and faster time-to-deployment.

Author:

Jinwook Oh

Co Founder and CTO
Rebellions

Jinwook Oh is the Co-Founder and Chief Technology Officer of Rebellions, an AI chip company based in South Korea. After earning his Ph.D. from KAIST (Korea Advanced Institute of Science and Technology), he joined the IBM TJ Watson Research Center, where he contributed to several AI chip R&D projects as a Chip Architect, Logic Designer, and Logic Power Lead. At Rebellions, he has overseen the development and launch of two AI chips, with a third, REBEL, in progress. Jinwook's technical leadership has been crucial in establishing Rebellions as a notable player in AI technology within just three and a half years.

Jinwook Oh

Co Founder and CTO
Rebellions

Jinwook Oh is the Co-Founder and Chief Technology Officer of Rebellions, an AI chip company based in South Korea. After earning his Ph.D. from KAIST (Korea Advanced Institute of Science and Technology), he joined the IBM TJ Watson Research Center, where he contributed to several AI chip R&D projects as a Chip Architect, Logic Designer, and Logic Power Lead. At Rebellions, he has overseen the development and launch of two AI chips, with a third, REBEL, in progress. Jinwook's technical leadership has been crucial in establishing Rebellions as a notable player in AI technology within just three and a half years.

Following the MLCommons MLPerf Inference v5.1 Results on the morning of Tuesday 9th September on the keynote stage, Miro Hodak, Senior Member of Technical Staff, AI Performance Engineering at AMD will deliver a detailed analysis of the results followed by a Q&A session from the audience. 

Author:

Miro Hodak

Senior Member of Technical Staff, AI Performance Engineering
AMD

Miro Hodak is a Principal Member of Technical Staff at AMD, where he focuses on AI performance and benchmarking. Prior to joining AMD, he served as an AI Architect at Lenovo and was a professor in physics at North Carolina State University before that. 

Miro has been actively involved with MLPerf and MLCommons since 2020, contributing to the development of multiple MLPerf benchmarks and submitting results across several rounds of Inference and Training. Since 2023, he has served as co-chair of the MLPerf Inference Working Group.

He has authored peer-reviewed publications in fields ranging from artificial intelligence and computer science to materials science, physics, and biochemistry, with his work cited over 2,500 times.

Miro Hodak

Senior Member of Technical Staff, AI Performance Engineering
AMD

Miro Hodak is a Principal Member of Technical Staff at AMD, where he focuses on AI performance and benchmarking. Prior to joining AMD, he served as an AI Architect at Lenovo and was a professor in physics at North Carolina State University before that. 

Miro has been actively involved with MLPerf and MLCommons since 2020, contributing to the development of multiple MLPerf benchmarks and submitting results across several rounds of Inference and Training. Since 2023, he has served as co-chair of the MLPerf Inference Working Group.

He has authored peer-reviewed publications in fields ranging from artificial intelligence and computer science to materials science, physics, and biochemistry, with his work cited over 2,500 times.

Distributed training jobs are brittle; a single node failure can halt progress and waste expensive GPU cycles. This technical demo dives into Cluster Director, focusing on how engineers can automate resilient, large-scale GPU infrastructure. We'll start with a declarative YAML configuration to define and provision a multi-node GPU cluster, optimized with the ideal network topology for NCCL communication. The core of the demo will be a live failure simulation. You will see Cluster Director automatically detect a preempted node, perform remediation, and maintain the integrity of the running workload with minimal disruption.

Author:

Ilias Katsardis

Senior Product Manager
Google Cloud

Ilias Katsardis is a Senior Product Manager based in Sunnyvale, CA, driving the future of AI infrastructure at Google Cloud. He is responsible for Cluster Director and the Cluster Toolkit, two key components of Google's supercomputing architecture. Passionate about making large-scale AI and HPC more accessible, Ilias focuses on creating solutions that automate complex configurations and provide a seamless user experience. His work enables researchers and developers to spend less time on infrastructure management and more time on scientific breakthroughs. With a rich background that includes roles at Cray Inc. and ClusterVision, along with founding two tech startups, Ilias brings over 15 years of deep industry expertise to his role.

Ilias Katsardis

Senior Product Manager
Google Cloud

Ilias Katsardis is a Senior Product Manager based in Sunnyvale, CA, driving the future of AI infrastructure at Google Cloud. He is responsible for Cluster Director and the Cluster Toolkit, two key components of Google's supercomputing architecture. Passionate about making large-scale AI and HPC more accessible, Ilias focuses on creating solutions that automate complex configurations and provide a seamless user experience. His work enables researchers and developers to spend less time on infrastructure management and more time on scientific breakthroughs. With a rich background that includes roles at Cray Inc. and ClusterVision, along with founding two tech startups, Ilias brings over 15 years of deep industry expertise to his role.

Author:

Abhijith Prabhudev

Product Manager
Google Cloud

Abhijith Prabhudev is a Product Manager based in Sunnyvale, CA, leading the AI infrastructure observability and monitoring at Google Cloud. He is responsible for GPU infrastructure reliability, monitoring and resiliency capabilities. His work enables researchers and developers to spend less time on infrastructure management and more time on building and training AI models. With over 15+ years of infrastructure industry experience that includes leading VMware vSphere product team and a full stack engineer, Abhijith is passionate about solving infrastructure problems that hinder developer and administrator productivity. 

Abhijith Prabhudev

Product Manager
Google Cloud

Abhijith Prabhudev is a Product Manager based in Sunnyvale, CA, leading the AI infrastructure observability and monitoring at Google Cloud. He is responsible for GPU infrastructure reliability, monitoring and resiliency capabilities. His work enables researchers and developers to spend less time on infrastructure management and more time on building and training AI models. With over 15+ years of infrastructure industry experience that includes leading VMware vSphere product team and a full stack engineer, Abhijith is passionate about solving infrastructure problems that hinder developer and administrator productivity. 

Large language models can now power capable software agents, yet real‑world success comes from disciplined engineering rather than flashy frameworks. Most reliable agents are built from simple, composable patterns instead of heavy abstractions.

The talk will introduce patterns to add complexity and autonomy only when it pays off. Attendees should leave with a practical decision framework for escalating from a single prompt to multi‑step agents, also keeping in mind guardrails for shipping trustworthy, cost‑effective agents at scale. 

Author:

Sushant Mehta

Research Engineer
Google Deepmind

Sushant Mehta

Research Engineer
Google Deepmind