Picture by Writer | Midjourney
Â
Time-based information could be distinctive once we face completely different time-zones. Nonetheless, decoding timestamps could be onerous due to these variations. This information will assist you handle time zones and timestamps with the Pandas library in Python.
Â
Preparation
Â
On this tutorial, we’ll use the Pandas package deal. We will set up the package deal utilizing the next code.
Â
Now, we’ll discover the right way to work with time-based information in Pandas with sensible examples.
Â
Dealing with Time Zones and Timestamps with Pandas
Â
Time information is a singular dataset that gives a time-specific reference for occasions. Essentially the most correct time information is the timestamp, which comprises detailed details about time from 12 months to millisecond.
Let’s begin by making a pattern dataset.
import pandas as pd
information = {
'transaction_id': [1, 2, 3],
'timestamp': ['2023-06-15 12:00:05', '2024-04-15 15:20:02', '2024-06-15 21:17:43'],
'quantity': [100, 200, 150]
}
df = pd.DataFrame(information)
df['timestamp'] = pd.to_datetime(df['timestamp'])
Â
The ‘timestamp’ column within the instance above comprises time information with second-level precision. To transform this column to a datetime format, we must always use the pd.to_datetime
operate.”
Afterward, we will make the datetime information timezone-aware. For instance, we will convert the info to Coordinated Common Time (UTC)
df['timestamp_utc'] = df['timestamp'].dt.tz_localize('UTC')
print(df)
Â
Output>>
transaction_id timestamp quantity timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
Â
The ‘timestamp_utc’ values comprise a lot data, together with the time-zone. We will convert the prevailing time-zone to a different one. For instance, I used the UTC column and adjusted it to the Japan Timezone.
df['timestamp_japan'] = df['timestamp_utc'].dt.tz_convert('Asia/Tokyo')
print(df)
Â
Output>>>
transaction_id timestamp quantity timestamp_utc
0 1 2023-06-15 12:00:05 100 2023-06-15 12:00:05+00:00
1 2 2024-04-15 15:20:02 200 2024-04-15 15:20:02+00:00
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
0 2023-06-15 21:00:05+09:00
1 2024-04-16 00:20:02+09:00
2 2024-06-16 06:17:43+09:00
Â
We might filter the info in accordance with a selected time-zone with this new time-zone. For instance, we will filter the info utilizing Japan time.
start_time_japan = pd.Timestamp('2024-06-15 06:00:00', tz='Asia/Tokyo')
end_time_japan = pd.Timestamp('2024-06-16 07:59:59', tz='Asia/Tokyo')
filtered_df = df[(df['timestamp_japan'] >= start_time_japan) & (df['timestamp_japan'] <= end_time_japan)]
print(filtered_df)
Â
Output>>>
transaction_id timestamp quantity timestamp_utc
2 3 2024-06-15 21:17:43 150 2024-06-15 21:17:43+00:00
timestamp_japan
2 2024-06-16 06:17:43+09:00
Â
Working with time-series information would permit us to carry out time-series resampling. Let’s take a look at an instance of information resampling hourly for every column in our dataset.
resampled_df = df.set_index('timestamp_japan').resample('H').rely()
Â
Leverage Pandas’ time-zone information and timestamps to take full benefit of its options.
Â
Extra Sources
Â
Â
Â
Cornellius Yudha Wijaya is a knowledge science assistant supervisor and information author. Whereas working full-time at Allianz Indonesia, he likes to share Python and information suggestions by way of social media and writing media. Cornellius writes on a wide range of AI and machine studying subjects.