Amazon cover image
Image from Amazon.com

Data engineering with Python : work with massive datasets to design data models and automate data pipelines using Python

By: Material type: TextTextLanguage: English Publication details: Birmingham - Mumbai Packt Publishing 2020Description: xii, 337pISBN:
  • 9781839214189
  • 183921418X
Subject(s): DDC classification:
  • 005.133 CRI-D
Contents:
Preface -- Section 1. Building data pipelines: extract, transform, and load -- Ch. 1. What is data engineering? -- Ch. 2. Building our data engineering infrastructure -- Ch. 3. Reading and writing files -- Ch. 4. Working with databases -- Ch. 5. Cleaning, transforming, and enriching data -- Ch. 6. Building a 311 data pipeline -- Section 2. Deploying data pipelines in production -- Ch. 7. Features of a production pipeline -- Ch. 8. Version control with the NiFi registry -- Ch. 9. Monitoring data pipelines --Ch. 10. Building a production data pipeline -- Section 3. Beyond batch: real-time and streaming data -- Ch. 11. Building a custom NiFi processor -- Ch. 12. Streaming data with Apache NiFi -- Ch. 13. Streaming data with Apache Kafka -- Ch. 14. Data processing with Apache Spark -- Ch. 15. Real-time edge data with MiNiFi, Kafka, and Spark --Appendix and building a NiFi cluster -- Index.
Summary: Practical guide to data engineering using Python and open-source Apache technologies for building, deploying, and managing data pipelines. Three sections: (1) Building ETL pipelines — reading/writing files, relational and NoSQL databases, data cleaning and transformation, Apache NiFi pipeline; (2) Production deployment — NiFi registry version control, monitoring, staging, validation, failure handling; (3) Real-time and streaming data — Apache NiFi streaming, Apache Kafka (Python producers and consumers), Apache Spark and PySpark processing, real-time edge data with MiNiFi, Kafka and Spark. Appendix: building a distributed NiFi cluster. Code files on GitHub. Suitable for data engineers, data analysts, ETL developers, and IT professionals transitioning to data-driven roles.
Item type: Books and Monographs List(s) this item appears in: List of New Arrivals (Books)
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Current library Home library Collection Call number Materials specified Status Date due Barcode
Books and Monographs Central Library, NIT Jalandhar General Stacks Central Library, NIT Jalandhar Center for Artificial Intelligence 005.133 CRI-D (Browse shelf(Opens below)) Available 102776
Books and Monographs Central Library, NIT Jalandhar General Stacks Central Library, NIT Jalandhar Center for Artificial Intelligence 005.133 CRI-D (Browse shelf(Opens below)) Available 102777

Published by Packt Publishing Limited, Birmingham, UK. Title page: Birmingham—Mumbai.

Preface -- Section 1. Building data pipelines: extract, transform, and load -- Ch. 1. What is data engineering? -- Ch. 2. Building our data engineering infrastructure -- Ch. 3. Reading and writing files -- Ch. 4. Working with databases -- Ch. 5. Cleaning, transforming, and enriching data -- Ch. 6. Building a 311 data pipeline -- Section 2. Deploying data pipelines in production -- Ch. 7. Features of a production pipeline -- Ch. 8. Version control with the NiFi registry -- Ch. 9. Monitoring data pipelines --Ch. 10. Building a production data pipeline -- Section 3. Beyond batch: real-time and streaming data -- Ch. 11. Building a custom NiFi processor -- Ch. 12. Streaming data with Apache NiFi -- Ch. 13. Streaming data with Apache Kafka -- Ch. 14. Data processing with Apache Spark -- Ch. 15. Real-time edge data with MiNiFi, Kafka, and Spark --Appendix and building a NiFi cluster -- Index.

Practical guide to data engineering using Python and open-source Apache technologies for building, deploying, and managing data pipelines. Three sections: (1) Building ETL pipelines — reading/writing files, relational and NoSQL databases, data cleaning and transformation, Apache NiFi pipeline; (2) Production deployment — NiFi registry version control, monitoring, staging, validation, failure handling; (3) Real-time and streaming data — Apache NiFi streaming, Apache Kafka (Python producers and consumers), Apache Spark and PySpark processing, real-time edge data with MiNiFi, Kafka and Spark. Appendix: building a distributed NiFi cluster. Code files on GitHub. Suitable for data engineers, data analysts, ETL developers, and IT professionals transitioning to data-driven roles.

Dr. Sanjeev, Librarian
Managed by: Dr. D. P. Tripathi, Deputy Librarian, Central Library
For any query / question, please mail at circulation.liby@nitj.ac.in 

Powered by Koha