Taming the Beast: Automated Testing for Complex Data Pipelines

Abstract

When you are faced with a system that processes massive datasets with complex data pipelines involving machine learning, how do you test effectively? When your tests results are less “pass” and “fail”, and more “sort of” and “not really”, how do you automate testing?

Trish Khoo draws upon her experience in testing complex data systems to demonstrate proven strategies for testing in this field. Her experience working on ultra-large-scale systems at Google in Mountain View, California shaped her technical approach to testing which she applies in her work as a consultant today.

Presented at

6 May – YOW! Data Conference. Sydney, Australia. Recorded.
24 May – ATTAC Conference (Australian Tech & Test Automation Conference). Melbourne, Australia.
12 June – Brisbane Data Science Meetup. Brisbane, Australia.
27 June – Sydney Testers Meetup. Sydney, Australia.
17 July – YOW! Night: Modern Testing. Brisbane, Australia.
23 July – YOW! Night: Modern Testing. Melbourne, Australia.
25 July – YOW! Night: Modern Testing. Sydney, Australia.

Slides & Video

View slides on Canva.com.
Video coming soon.

References

Project Ground Truth: Accurate Maps Via Algorithms and Elbow Grease, Google I/O, 2013.
Statistical Data Sampling, Celal Ziftci, 11 November 2015.
Evolution of the Netflix Data Pipeline, Netflix Technology Blog, Feb 15 2016.
With thanks to Paulo Lai and Shaw Innes for helping me with my talk.

Contact me

If you have any questions about my talk, or would like to hire me to consult for your company, contact me directly.