Data is Nutritious

Data Engineer's Memo

DataProcessing

parquet-tools is easy and useful

TL;DR I installed parquet-tools and try to use it. It's easy to install and useful to fetch parquet files on Amazon s3 How to install parquet-tools Original Apache parquet-tools is not easy to use since it needs build using Java. But it's …

Speeding up URL forward-matching Query by splitting schema

Introduction In data processing context, we often use query with URL condition. For example, using Google Analytics URL parameters you can measure where your site's users are from(Search Engine, Listing Ad or Display Ad, etc.). Forward-mat…