Data engineering with Python : work with massive datasets to design data models and automate data pipelines using Python

By:

Crickard, Paul

Material type: Text

TextLanguage: English Publication details: Birmingham - Mumbai Packt Publishing 2020Description: xii, 337pISBN:

9781839214189
183921418X

Subject(s):

DDC classification:

005.133 CRI-D

Contents:

Preface -- Section 1. Building data pipelines: extract, transform, and load -- Ch. 1. What is data engineering? -- Ch. 2. Building our data engineering infrastructure -- Ch. 3. Reading and writing files -- Ch. 4. Working with databases -- Ch. 5. Cleaning, transforming, and enriching data -- Ch. 6. Building a 311 data pipeline -- Section 2. Deploying data pipelines in production -- Ch. 7. Features of a production pipeline -- Ch. 8. Version control with the NiFi registry -- Ch. 9. Monitoring data pipelines --Ch. 10. Building a production data pipeline -- Section 3. Beyond batch: real-time and streaming data -- Ch. 11. Building a custom NiFi processor -- Ch. 12. Streaming data with Apache NiFi -- Ch. 13. Streaming data with Apache Kafka -- Ch. 14. Data processing with Apache Spark -- Ch. 15. Real-time edge data with MiNiFi, Kafka, and Spark --Appendix and building a NiFi cluster -- Index.

Summary: Practical guide to data engineering using Python and open-source Apache technologies for building, deploying, and managing data pipelines. Three sections: (1) Building ETL pipelines — reading/writing files, relational and NoSQL databases, data cleaning and transformation, Apache NiFi pipeline; (2) Production deployment — NiFi registry version control, monitoring, staging, validation, failure handling; (3) Real-time and streaming data — Apache NiFi streaming, Apache Kafka (Python producers and consumers), Apache Spark and PySpark processing, real-time edge data with MiNiFi, Kafka and Spark. Appendix: building a distributed NiFi cluster. Code files on GitHub. Suitable for data engineers, data analysts, ETL developers, and IT professionals transitioning to data-driven roles.

Item type: Books and Monographs List(s) this item appears in: List of New Arrivals (Books)

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings ( 2 )
Title notes ( 3 )

Holdings
Item type	Current library	Home library	Collection	Call number	Materials specified	Status	Date due	Barcode
Books and Monographs	Central Library, NIT Jalandhar General Stacks	Central Library, NIT Jalandhar	Center for Artificial Intelligence	005.133 CRI-D (Browse shelf(Opens below))		Available		102776
Books and Monographs	Central Library, NIT Jalandhar General Stacks	Central Library, NIT Jalandhar	Center for Artificial Intelligence	005.133 CRI-D (Browse shelf(Opens below))		Available		102777

Published by Packt Publishing Limited, Birmingham, UK. Title page: Birmingham—Mumbai.

Practical guide to data engineering using Python and open-source Apache technologies for building, deploying, and managing data pipelines. Three sections: (1) Building ETL pipelines — reading/writing files, relational and NoSQL databases, data cleaning and transformation, Apache NiFi pipeline; (2) Production deployment — NiFi registry version control, monitoring, staging, validation, failure handling; (3) Real-time and streaming data — Apache NiFi streaming, Apache Kafka (Python producers and consumers), Apache Spark and PySpark processing, real-time edge data with MiNiFi, Kafka and Spark. Appendix: building a distributed NiFi cluster. Code files on GitHub. Suitable for data engineers, data analysts, ETL developers, and IT professionals transitioning to data-driven roles.

Print
Suggest for purchase
Send to device
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com) Open Library (openlibrary.org)