How to Effectively Reduce AI Pipeline Runtime3:45pm - 4:10pm on Friday, October 4 in PennTop North
In this talk, we will discuss how and why it’s important to migrate PySpark pipelines to use PyPy instead of CPython.
An example will be shared involving a core AI pipeline that ingests more than 4 TB of data (Parquet, TSV, and Json) per run, and produces optimized models on behalf of marketing clients. We’ll outline how migration to PyPy brought a decrease in runtime of 30% overall without any code changes, while keeping the Operational team happy.
We will also offer recommendations on the steps to follow to accomplish runtime reduction – from unit testing, which Spark configuration to use, and how to deploy into production – and touch on some limitations that can be faced with PyPy.