Normal view MARC view ISBD view

Data engineering with Python : work with massive datasets to design data models and automate data pipelines using Python

Crickard, Paul

Data engineering with Python : work with massive datasets to design data models and automate data pipelines using Python - Birmingham - Mumbai Packt Publishing 2020 - xii, 337p.

Published by Packt Publishing Limited, Birmingham, UK. Title page: Birmingham—Mumbai.

Preface -- Section 1. Building data pipelines: extract, transform, and load -- Ch. 1. What is data engineering? -- Ch. 2. Building our data engineering infrastructure -- Ch. 3. Reading and writing files -- Ch. 4. Working with databases -- Ch. 5. Cleaning, transforming, and enriching data -- Ch. 6. Building a 311 data pipeline -- Section 2. Deploying data pipelines in production -- Ch. 7. Features of a production pipeline -- Ch. 8. Version control with the NiFi registry -- Ch. 9. Monitoring data pipelines --Ch. 10. Building a production data pipeline -- Section 3. Beyond batch: real-time and streaming data -- Ch. 11. Building a custom NiFi processor -- Ch. 12. Streaming data with Apache NiFi -- Ch. 13. Streaming data with Apache Kafka -- Ch. 14. Data processing with Apache Spark -- Ch. 15. Real-time edge data with MiNiFi, Kafka, and Spark --Appendix and building a NiFi cluster -- Index.

Practical guide to data engineering using Python and open-source Apache technologies for building, deploying, and managing data pipelines. Three sections: (1) Building ETL pipelines — reading/writing files, relational and NoSQL databases, data cleaning and transformation, Apache NiFi pipeline; (2) Production deployment — NiFi registry version control, monitoring, staging, validation, failure handling; (3) Real-time and streaming data — Apache NiFi streaming, Apache Kafka (Python producers and consumers), Apache Spark and PySpark processing, real-time edge data with MiNiFi, Kafka and Spark. Appendix: building a distributed NiFi cluster. Code files on GitHub. Suitable for data engineers, data analysts, ETL developers, and IT professionals transitioning to data-driven roles.

ISBN: 9781839214189 183921418X

Subjects--Topical Terms:
Computer program language
Python
Data mining.
Apache Kafka (Computer program)
Apache Spark (Electronic resource)
Real-time data processing.

Dewey Class. No.: 005.133 CRI-D

Print
Suggest for purchase
Send to device
More searches

Search for this title in:
Other Libraries (WorldCat) Other Databases (Google Scholar) Online Stores (Bookfinder.com) Open Library (openlibrary.org)