1000x faster data manipulation: vectorizing with Pandas and Numpy
10:15am - 10:40am on Saturday, October 5 in MadisonNathan Cheever
- Slides:
- https://docs.google.com/presentation/d/1_ST-4SqamrvANCCS46I1jrDlQekRlbcfVPFTZnhY47w/edit#slide=id.g635adc05c1_1_79
- Watch:
- https://youtu.be/nxWginnBklU
Description
The data transformation code you’re writing is correct, but potentially 1000x slower than it needs to be! In this talk, we will go over multiple ways to enhance a data transformation workflow with Pandas and Numpy by showing how to replace slower, perhaps more familiar, ways of operating on Pandas data frames with faster-vectorized solutions to common use cases like:
- if-else logic in applied row-wise functions
- dictionary lookups with conditional logic
- Date comparisons and calculations
- Regex and string column manipulation
- and others! …
without needing a beefier computer, writing Cython, or other libraries outside the Pandas ecosystem.