Now that we can read fractional seconds, it would be interesting to see whether we can easily implement writing.
I have done some brief tests with write_xport(), and was able to produce reasonable results by removing the two round() calls from
|
df = df.with_columns((((nw.nth(col_indx).cast(nw.Float64))/convfac) + offset_secs).round() * mulfac) |
and
|
df = df.with_columns((nw.nth(col_indx).cast(nw.Float64)/1e9).round() * mulfac) |
What I am not clear on is when the vectorized functions are used vs
convert_datetimelike_to_number()
This is a simple reproducer:
import pandas as pd
import pyreadstat
df = pd.DataFrame.from_dict({"min": pd.to_datetime("1850-01-01 12:34:56.123456"), "max": pd.to_datetime("2100-12-31 23:45:56.123456")}, columns=["testdt"], orient="index")
print(df)
pyreadstat.write_xport(df, "./minmax.xpt")
sasdf, meta = pyreadstat.read_xport(
"./minmax.xpt", output_format="polars")
print(sasdf)
With current code, sasdf is truncated to whole seconds. With the round() call removed, microseconds are preserved.
Now that we can read fractional seconds, it would be interesting to see whether we can easily implement writing.
I have done some brief tests with
write_xport(), and was able to produce reasonable results by removing the tworound()calls frompyreadstat/pyreadstat/_readstat_writer.pyx
Line 91 in e23b1d8
pyreadstat/pyreadstat/_readstat_writer.pyx
Line 141 in e23b1d8
What I am not clear on is when the vectorized functions are used vs
convert_datetimelike_to_number()This is a simple reproducer:
With current code, sasdf is truncated to whole seconds. With the
round()call removed, microseconds are preserved.