parquet-tools is easy and useful
TL;DR
- I installed parquet-tools and try to use it.
- It's easy to install and useful to fetch parquet files on Amazon s3
How to install parquet-tools
Original Apache parquet-tools is not easy to use since it needs build using Java.
But it's simple. Just
pip install parquet-tools
How to use it
Show parquet file contents.
parquet-tools show /path/to/parquet +-------+-------+---------+ | one | two | three | |-------+-------+---------| | -1 | foo | True | | nan | bar | False | | 2.5 | baz | True | +-------+-------+---------+
Show parquet file schema.
parquet-tools inspect /path/to/parquet ############ file meta data ############ created_by: parquet-cpp version 1.5.1-SNAPSHOT num_columns: 3 num_rows: 3 num_row_groups: 1 format_version: 1.0 serialized_size: 2226 ############ Columns ############ one two three ############ Column(one) ############ name: one path: one max_definition_level: 1 max_repetition_level: 0 physical_type: DOUBLE logical_type: None converted_type (legacy): NONE ############ Column(two) ############ name: two path: two max_definition_level: 1 max_repetition_level: 0 physical_type: BYTE_ARRAY logical_type: String converted_type (legacy): UTF8 ############ Column(three) ############ name: three path: three max_definition_level: 1 max_repetition_level: 0 physical_type: BOOLEAN logical_type: None converted_type (legacy): NONE