Last modified on 01 Oct 2021.
ISO 8601 format
For duration:
P
(duration, always at the beginning of the duration),Y
(year),M
(month),W
(week),D
(day),T
(time designator, always precedes the time components),H
(hour),M
(minute),S
(second).- Example:
P3Y6M4DT12H30M5S
– a duration of three years, six months, four days, twelve hours, thirty minutes, and five seconds.
Converter,
from datetime import datetime, timedelta
from isodate import duration_isoformat # used only for datetime.timedelta
def get_isoformat(time):
"""
Convert pd.Timedelta, pd.Timestamp,
datetimme.datetime, datetime.time, datetime.date, datetime.timedelta
to isoformat
"""
if not isinstance(time, timedelta):
return time.isoformat()
else:
return duration_isoformat(time)
Timedelta
To Timedelta
,
# numpy.timedelta64(208206000000000,'ns') → Timedelta('2 days 09:50:06')
pd.Timedelta(time, unit='ns')
# DateOffsets ('14T') → Timedelta('0 days 00:14:00')
pd.to_timedelta('14T')
# Can't use 'T' as '1T'?
from pandas.tseries.frequencies import to_offset
pd.to_timedelta(to_offset('T'))
From Timedelta
,
# Timedelta('0 days 00:01:20') -> 80 (s)
# (SINGLE VALUE)
td.total_seconds() # float
# Timedelta('0 days 00:01:20') -> 80 (s) (FLOAT)
# (ONLY WORK with a series, not a single value)
series.astype('timedelta64[s]') # or 'ms'
# '1 minutes' -> '1T'
def timedelta_to_string(timedelta):
units = ['D', 'H', 'T', 'S', 'L', 'U', 'N']
time_format = ''
for i, c in enumerate(timedelta.components):
if c != 0: time_format += str(c) + units[i]
return time_format
## EXAMPLE
import pandas as pd
test = pd.Timedelta('1 minutes')
timedelta_to_string(test)
Timedelta('0 days 00:01:00')
'1T'
TimedeltaIndex
differences
There is no .diff
method with TimedeltaIndex
, you can use,
np.subtract(df[1:], df[:-1])
# convert to hour
np.subtract(df[1:], df[:-1]) / pd.Timedelta('1 hour')
Compare/Make arithmetic different frequency strings
We wanna compare 150S
(150 seconds) with 1T
(1 minutes).
import pandas as pd
pd.to_timedelta('150S') > pd.to_timedelta('1T')
pd.to_timedelta('120S') == pd.to_timedelta('1T')
pd.to_timedelta('120S') == pd.to_timedelta('2T')
True
False
True
Timestamps
from datetime import datetime
# to same timezone (UTC, +0)
df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True, infer_datetime_format=True, cache=True)
# UTC+0 to UNIX timestamp (POSIX timestamp)
df['timestamp'] = df['timestamp'].apply(lambda x: int(datetime.timestamp(x)*1000)) # miliseconds
# unix timestamp to Timestamps
datetime.fromtimestamp(unix_ts//1000)
# UNIX float (ms) -> datetime64
df['timestamp'] = df['timestamp'].astype('datetime64[ms]')
# change `ms` with others, e.g. `ns` for nanosecond
# remove timezone
df['time'].dt.tz_localize(None)
Get info timestamps
def set_index(data, col_time):
"""
Make a copy of a time-series dataframe `df` and set the column-time be the
index of the dataframe.
In the case index has no name, we set it as `'index'`.
"""
df0 = data.copy()
if col_time != 'index': # col_time is not the index
df0 = df0.set_index(col_time)
else:
if df0.index.name is None:
df0.index.name = 'index'
return df0
def get_info_timestamps(df, col_date='index'):
# make sure timestamps are on index
df = set_index(df, col_date)
index_name = df.index.name
df = df.reset_index()
print('Time range: ', df[index_name].max() - df[index_name].min())
print('Number of different time steps: ', df[index_name].diff().value_counts().count())
print('Max time step: ', df[index_name].diff().max())
print('Min time step: ', df[index_name].diff().min())
print('The most popular time step: ', df[index_name].diff().value_counts().index[0])
print('timestamps are monotonic increasing? ', df[index_name].is_monotonic)
print('Are there duplicate timestamps? ', df[index_name].duplicated().any())
print('How many unique duplicates? ', df[index_name].duplicated().sum(), ' (in total ',df.shape[0], ')')
print('How many repeated duplicates? ', df[index_name].duplicated(keep=False).sum(), ' (in total ',df.shape[0], ')')
Check timestamps are well sorted?
# CHECK
df.date.is_monotonic # monotonic increasing?
df.date.is_monotonic_decreasing # decreasing?
# if using groupby
def check_monotonic(group):
return group.is_monotonic
df.groupby('label').agg({'timestamp': [check_monotonic] })
# ARRANGE THEM
df.sort_values(by='date', inplace=True)
Detect time series frequency
Find the different time steps in a datetime columns,
# count the number of elements for each time steps
df.date.diff().value_counts()
# count number of different time steps
df.date.diff().value_counts().count()
# take the index of the largest
df.date.diff().value_counts().index[0]
# take the index of the smallest
df.date.diff().value_counts().index[-1]
00:01:00 11
00:03:00 2
00:02:00 1
00:04:00 1
Name: date, dtype: int64
4
Timedelta('0 days 00:01:00')
Timedelta('0 days 00:04:00')
One can couple with function timedelta_to_string
in the previous section to find out the most-appeared time steps to feed into df.resample()
’s rule
.
List of resampling rules
Official ref here — search “DateOffsets” to jump to the table.
B business day frequency
C custom business day frequency (experimental)`
D calendar day frequency
W weekly frequency
M month end frequency
SM semi-month end frequency (15th and end of month)
BM business month end frequency
CBM custom business month end frequency
MS month start frequency
SMS semi-month start frequency (1st and 15th)
BMS business month start frequency
CBMS custom business month start frequency
Q quarter end frequency
BQ business quarter endfrequency
QS quarter start frequency
BQS business quarter start frequency
A year end frequency
BA, BY business year end frequency
AS, YS year start frequency
BAS, BYS business year start frequency
BH business hour frequency
H hourly frequency
T, min minutely frequency
S secondly frequency
L, ms milliseconds
U, us microseconds
N nanoseconds