Vinoo Ganesh

Speaker, Technologist, and Startup Advisor

O'Reilly Superstream Series: Data Pipelines

O'Reilly Superstream

Data pipelines are the foundation for success in data analytics, so understanding how they work is of the utmost importance. Join us for four hours of expert-led sessions that will give you insight into how data is moved, processed, and transformed to support analytics and reporting needs. You’ll also learn how to address common challenges like monitoring and managing broken pipelines, explore considerations for choosing and connecting open source frameworks, commercial products, and homegrown solutions, and more.

Designing Data Pipelines — with Interactivity

O'Reilly Online Training

The data pipeline has become a fundamental component of the data science, data analyst, and data engineering workflow. Pipelines serve as the glue that links together various components of the data cleansing, data validation, and data transformation process. However, despite its importance to the data ecosystem, constructing the optimal data pipeline is generally an afterthought - if it’s considered at all. This makes any changes to the central pipeline highly error-prone and cumbersome.

Apache Parquet Website

Rebuilding the Parquet Website

The Parquet website was a bit dated - especialy given it’s heavy usage. It took the opportunity to rebuild the website using Hugo. Check it out and please let me know if you have any feedback! Code https://github.com/apache/parquet-site/commit/3563721676b364b767058a953f2bcc3e2c0c4b09 Link http://parquet.apache.org

Ask a CISO: S3 Bucket Permissions and IAM Audits

Horangi

Data is the most valuable resource in the world and more prized than oil, The Economist declared in 2017. Today, at least 97% of organizations use data to power their business opportunities, and we are accumulating data at a rate never before seen in history. The big question then is how do we secure and ensure that we can make optimal use of all this data? Link https://www.horangi.com/blog/s3-buckets-permissions-and-iam-audits

Designing Data Pipelines — with Interactivity

O'Reilly Online Training

The data pipeline has become a fundamental component of the data science, data analyst, and data engineering workflow. Pipelines serve as the glue that links together various components of the data cleansing, data validation, and data transformation process. However, despite its importance to the data ecosystem, constructing the optimal data pipeline is generally an afterthought - if it’s considered at all. This makes any changes to the central pipeline highly error-prone and cumbersome.