Mastering Large Datasets: Parallelize and Distribute Your Python Code by John T. Wolohan
With an emphasis on clarity, style, and performance, author J.T. Wolohan expertly guides you through implementing a functionally-influenced approach to Python coding. You'll get familiar with Python's functional built-ins like the functools operator and itertools modules, as well as the toolz library.
Mastering Large Datasets teaches you to write easily readable, easily scalable Python code that can efficiently process large volumes of structured and unstructured data. By the end of this comprehensive guide, you'll have a solid grasp on the tools and methods that will take your code beyond the laptop and your data science career to the next level!
Key features
* An introduction to functional and parallel programming
* Data science workflow
* Profiling code for better performance
* Fulfilling different quality objectives for a single unifying task
* Python multiprocessing
* Practical exercises including full-scale distributed applications
Audience
Readers should have intermediate Python programming skills.
About the technology
Python is a data scientist's dream-come-true, thanks to readily available libraries that support tasks like data analysis, machine learning, visualization, and numerical computing.
J.T. Wolohan is a lead data scientist at Booz Allen Hamilton and a PhD researcher at Indiana University, Bloomington, affiliated with the Department of Information and Library Science and the School of Informatics and Computing. His professional work focuses on rapid prototyping and scalable AI. His research focuses on computational analysis of social uses of language online.