Published 8/2025
Created by Meta Brains
MP4 | Video: h264, 1280×720 | Audio: AAC, 44.1 KHz, 2 Ch
Level: All | Genre: eLearning | Language: English | Duration: 32 Lectures ( 2h 51m ) | Size: 1.23 GB
Learn Dask arrays, dataframes & streaming with scikit-learn integration, real-time dashboards etc.
What you’ll learn
Master Dask’s core data structures: arrays, dataframes, bags, and delayed computations for parallel processing
Build scalable ETL pipelines handling massive CSV, Parquet, JSON, and HDF5 datasets beyond memory limits
Integrate Dask with scikit-learn for distributed machine learning and hyperparameter tuning at scale
Develop real-time streaming applications using Dask Streams, Streamz, and RabbitMQ integration
Optimize performance through partitioning strategies, lazy evaluation, and Dask dashboard monitoring
Create production-ready parallel computing solutions for enterprise-scale data processing workflows
Build interactive real-time dashboards processing live cryptocurrency and stock market data streams
Deploy Dask clusters locally and in cloud environments for distributed computing applications
Requirements
Basic Python programming knowledge (variables, functions, loops, data structures)
Familiarity with Pandas for data manipulation and NumPy for array operations
Understanding of fundamental data science concepts and workflow processes
No prior experience with parallel computing or distributed systems required – we’ll cover everything from scratch
Description
Unlock the power of parallel computing in Python with this comprehensive Dask course designed for data scientists, analysts, and Python developers. As datasets continue to grow beyond the memory limits of traditional tools like Pandas, Dask emerges as the essential solution for scaling your data processing workflows without changing your familiar Python syntax.This hands-on course takes you from Dask fundamentals to advanced real-time streaming applications through practical projects and real-world scenarios. You’ll start by understanding Dask’s architecture and how it compares to alternatives like Spark and Ray, then dive deep into Dask’s core data structures including arrays, dataframes, bags, and delayed computations. The course emphasizes practical application, teaching you to handle massive datasets that would crash traditional Python tools.Through three comprehensive projects, you’ll gain real-world experience processing millions of rows of data, building scalable machine learning pipelines with scikit-learn integration, and creating real-time cryptocurrency dashboards using Dask Streams and Streamz. You’ll master essential concepts like lazy evaluation, partitioning strategies, and performance optimization while working with popular data formats including CSV, Parquet, JSON, and HDF5.The course covers advanced topics including ETL pipeline development, hyperparameter tuning at scale, and real-time data streaming with RabbitMQ integration. You’ll learn to set up Dask clusters both locally and in cloud environments, monitor performance using Dask’s diagnostic dashboard, and integrate Dask seamlessly with the broader Python data science ecosystem.By completion, you’ll be equipped to tackle big data challenges that exceed single-machine capabilities, implement production-ready parallel computing solutions, and build scalable data applications that can grow with your organization’s needs. Perfect for data professionals ready to move beyond the limitations of traditional Python data tools and embrace enterprise-scale data processing capabilities.
Password/解压密码www.tbtos.com
转载请注明:0daytown » Master Dask: Python Parallel Computing for Data Science