Data is Nutritious

Data Engineer's Memo

Entries from 2019-01-01 to 1 year

Speeding up URL forward-matching Query by splitting schema

Introduction In data processing context, we often use query with URL condition. For example, using Google Analytics URL parameters you can measure where your site's users are from(Search Engine, Listing Ad or Display Ad, etc.). Forward-mat…

AWS Glue's GetPartition API is slow for table with much Partitions.

Introduction AWS Glue is very useful Hive Metastore service for people using Hive on EMR / Spark on EMR / Presto on Athena. I felt that fetching partitions is very slow, especially tables with much partitions. Technically users need to cal…