pandas
Dates and Times
Basics
Visualization is one of the crucial foundations of practicing data science, and analyzing information on a timeline can be vital to understanding its behavior. However, our understanding and structure of time is difficult to translate into raw computing — we can recognize that the phrase "8/1/2022" is the day that follows "7/31/2022", but without help, a computer would just see those as two different strings.
To help, pandas
introduced the to_datetime
function to convert strings (like the ones above) to Timestamp
objects, which have many attributes that help translate our human perception of dates and times for the computer’s sake.
Python already has a |
to_datetime
Let’s start with an example DataFrame
that contains event types, attendance numbers, and the dates on which the events occur. We’ll check the data types of our columns to see what pandas
starts out with.
import pandas as pd
list_1 = ['7/1/2021', '7/4/2021', '8/20/2021', '9/2/2021', '4/1/2050']
list_2 = ['Presentation', 'Class', 'Presentation', 'Event', 'Class']
list_3 = [25, 0, 50, 48, 1000000]
myDF = pd.DataFrame(zip(list_1, list_2, list_3), columns=['date', 'type', 'attendance'])
print(myDF.dtypes)
date object type object attendance int64
As we can see from the output, date
is a general object
— this doesn’t help us, as we need a Timestamp
object and its host of useful functions and attributes. The simplest way to convert this only requires our to_datetime
function, the column we want to change, and the format our dates come in:
myDF['date'] = pd.to_datetime(myDF['date'], format='%m/%d/%Y')
print(myDF.dtypes)
date datetime64[ns] type object attendance int64
Hold on — what’s with the datetime64[ns]
object? In spite of this output, if we test the type of our individual dates, we find that they are indeed Timestamp
objects. The implication is that all the Timestamp
attributes are usable when working with date
.
type(myDF.date[0])
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
format
Argument
The date column is now saved with the data type that we want. If you recall, we had an odd-looking format
argument. The idea is that we are telling pandas
how to parse the date: month first, then day, then year, separated by forward slashes. If the dates had dashes or some other separator, we could indicate this in format
to ensure pandas
is reading it correctly.
format
does pay attention to capitalization. For example, %y
means the date excludes century, while %Y
includes century (think 13 compared to 2013). There are helpful tables to explain the different formats, and we’ll include the main ones here:
-
Month:
-
%b
— abbreviated month, "Oct";%B
— full month name, "October" -
%m
— month number with leading zero, "04";%-m
— month number, no leading zero, "4"
-
-
Day:
-
%a
— abbreviated weekday, "Mon";%A
— full weekday name, "Monday" -
%d
— day number with leading zero, "07";%-d
— day number, no leading zero, "7"
-
-
Year:
-
%y
— last two numbers of year, "14";%Y
— full year, "1914"
-
|
Dates/Times Objects, Methods, and Attributes
Converting data using to_datetime
is only useful if we can extract further information using Timestamp
and other pandas
packages.
Time Differences: Timedelta
and DateOffset
The Timedelta
and DateOffset
objects are simple: they are time/date objects different from date times that can be supplemented a difference between dates (4 days from now, 4 months ago, etc) to find the desired future/past date. The key difference is that Timedelta
works with absolute time (1 day = 24 hours), while DateOffset
works with calendar time, accounting for things such as daylight savings time. DateOffset
also lacks an array class corresponding to timedelta64[ns]
.
Timedelta
For Timedelta
, you can use either keyword arguments or a pair including (value, unit)
to indicate the duration of time.
import pandas as pd
day_example = pd.Timedelta(days=6)
week_example = pd.Timedelta(weeks=14)
print(day_example)
print(week_example)
6 days 00:00:00 98 days 00:00:00
As we can see here, Timedelta
works in days, hours, minutes, and seconds, where the equivalency of 1 week = 7 days applies before output. Since months and years have differing numbers of days, you cannot use them when generating Timedeltas
.
DateOffset
Imported from pandas.tseries.offsets
, DateOffset
is another method allowing for the changing of times. Since this category uses calendar-based logic, we can supplement years, months, or weeks as keywords.
A plural parameter (such as |
print(pd.Timestamp("2019-12-25") + pd.DateOffset(day=1))
print(pd.Timestamp("2019-12-25") + pd.DateOffset(days=1))
2019-12-01 00:00:00 2019-12-26 00:00:00
Knowing the different uses of the two data types is important.
-
DateOffset
is useful due to its broader acceptance of time parameters and use of calendar logic. We often want to know the dates of things weeks, months, and years in advance, and it’s inconvenient to translate those to number of days for use withTimedelta
. Additionally, 4 months from January 15th is informally understood to be May 15th, andDateOffset
understands this whereTimedelta
does not. -
Timedelta
can be interpreted byDataFrames
andSeries
, whileDateOffset
cannot and is cast as a simpleobject
. Additionally, anyDateOffset
time measurements equal to or shorter than an hour function likeTimedelta
.
It’s important to think of these two objects as cooperating tools to your success rather than mutually exclusive options.
dt
operator
You’ll notice that Timestamp
attributes include basic elements of date information — month
, day
, year
, second
, day_of_week
, and so on. Let’s try and creating a month
column from date
in myDF
:
myDF['month'] = myDF['date'].month
AttributeError: 'Series' object has no attribute 'month'
As the error message says, we can’t get month
from the date
column because it’s a Series
, not a Timestamp
. Instead of looping through each value or using apply
, we have the dt
accessor, allowing us to use Timestamp
attributes and functions column-wide. Now we can create our month
column:
myDF['month'] = myDF['date'].dt.month
print(myDF)
date type attendance month 0 2021-07-01 Presentation 25 7 1 2021-07-04 Class 0 7 2 2021-08-20 Presentation 50 8 3 2021-09-02 Event 48 9 4 2050-04-01 Class 1000000 4
Examples
Create month
and year
columns from our date
column.
Click to see solution
myDF['month'] = myDF['date'].dt.month
myDF['year'] = myDF['date'].dt.year
print(myDF)
date type attendance month year 0 2021-07-01 Presentation 25 7 2021 1 2021-07-04 Class 0 7 2021 2 2021-08-20 Presentation 50 8 2021 3 2021-09-02 Event 48 9 2021 4 2050-04-01 Class 1000000 4 2050
Create the weekday
column from date
.
Click to see solution
myDF['weekday'] = myDF['date'].dt.day_name()
print(myDF)
date type attendance weekday 0 2021-07-01 Presentation 25 Thursday 1 2021-07-04 Class 0 Sunday 2 2021-08-20 Presentation 50 Friday 3 2021-09-02 Event 48 Thursday 4 2050-04-01 Class 1000000 Friday
Shift all the days in date
forward by one week, replacing the old dates in the process.
Click to see solution
myDF['date'] = myDF['date'] + pd.Timedelta("7 days")
print(myDF)
date type attendance 0 2021-07-08 Presentation 25 1 2021-07-11 Class 0 2 2021-08-27 Presentation 50 3 2021-09-09 Event 48 4 2050-04-08 Class 1000000
Suppose myDF.date
contains exclusively days from the first semester of an academic year, and each year ends on May 31st. Create the end_of_school
column using the date
column and DateOffset
, which contains the last day of school for that academic year. Then create days_until_school_is_over
, a column that contains the number of days between date
and end_of_school
.
Click to see solution
one_year_later = myDF['date'] + pd.offsets.DateOffset(years=1)
myDF['end_of_school'] = pd.to_datetime({'month': 5, 'day': 31, 'year':one_year_later.dt.year})
myDF['days_until_school_is_over'] = myDF['end_of_school'] - myDF['date']
print(myDF)
date type attendance end_of_school days_until_school_is_over 0 2021-07-01 Presentation 25 2022-05-31 334 days 1 2021-07-04 Class 0 2022-05-31 331 days 2 2021-08-20 Presentation 50 2022-05-31 284 days 3 2021-09-02 Event 48 2022-05-31 271 days 4 2050-11-01 Class 1000000 2051-05-31 211 days
Resources
As always, the information we include here is just a portion of all you can know about using dates and times in pandas
. They have a great, extensive user guide that includes DatetimeIndex
, indexing using dates, and much more.