Some things to keep in mind when working with pandas DatetimeIndex
and looking to persist data using Parquet to back a table in BigQuery:
- timestamps must be physical Parquet columns (index metadata is ignored)
- store UTC instants (convert to UTC, then drop timezone metadata)
- BigQuery
TIMESTAMPonly supports microsecond precision
df.index = df.index.tz_convert(None)
df = df.reset_index()
df.to_parquet(
buffer,
engine="pyarrow",
compression="snappy",
index=False,
coerce_timestamps="us",
allow_truncated_timestamps=True,
use_deprecated_int96_timestamps=False,
)
Without this, timestamps were loaded with the wrong type / incorrect values.