Data engineering with Python : work with massive datasets to design data models and automate data pipelines using Python
Material type:
TextLanguage: English Publication details: Birmingham - Mumbai Packt Publishing 2020Description: xii, 337pISBN: - 9781839214189
- 183921418X
- 005.133 CRI-D
| Item type | Current library | Home library | Collection | Call number | Materials specified | Status | Date due | Barcode | |
|---|---|---|---|---|---|---|---|---|---|
| Books and Monographs | Central Library, NIT Jalandhar General Stacks | Central Library, NIT Jalandhar | Center for Artificial Intelligence | 005.133 CRI-D (Browse shelf(Opens below)) | Available | 102776 | |||
| Books and Monographs | Central Library, NIT Jalandhar General Stacks | Central Library, NIT Jalandhar | Center for Artificial Intelligence | 005.133 CRI-D (Browse shelf(Opens below)) | Available | 102777 |
Published by Packt Publishing Limited, Birmingham, UK. Title page: Birmingham—Mumbai.
Preface -- Section 1. Building data pipelines: extract, transform, and load -- Ch. 1. What is data engineering? -- Ch. 2. Building our data engineering infrastructure -- Ch. 3. Reading and writing files -- Ch. 4. Working with databases -- Ch. 5. Cleaning, transforming, and enriching data -- Ch. 6. Building a 311 data pipeline -- Section 2. Deploying data pipelines in production -- Ch. 7. Features of a production pipeline -- Ch. 8. Version control with the NiFi registry -- Ch. 9. Monitoring data pipelines --Ch. 10. Building a production data pipeline -- Section 3. Beyond batch: real-time and streaming data -- Ch. 11. Building a custom NiFi processor -- Ch. 12. Streaming data with Apache NiFi -- Ch. 13. Streaming data with Apache Kafka -- Ch. 14. Data processing with Apache Spark -- Ch. 15. Real-time edge data with MiNiFi, Kafka, and Spark --Appendix and building a NiFi cluster -- Index.
Practical guide to data engineering using Python and open-source Apache technologies for building, deploying, and managing data pipelines. Three sections: (1) Building ETL pipelines — reading/writing files, relational and NoSQL databases, data cleaning and transformation, Apache NiFi pipeline; (2) Production deployment — NiFi registry version control, monitoring, staging, validation, failure handling; (3) Real-time and streaming data — Apache NiFi streaming, Apache Kafka (Python producers and consumers), Apache Spark and PySpark processing, real-time edge data with MiNiFi, Kafka and Spark. Appendix: building a distributed NiFi cluster. Code files on GitHub. Suitable for data engineers, data analysts, ETL developers, and IT professionals transitioning to data-driven roles.
