Date and Time in Pandas

8994-pandas.jpg

When we say Pandas, the first thing that probably comes to mind are the lovable, bamboo-eating, black-and-white bears. And you're not alone! But, in the world of IT, there's a fantastic tool called Pandas that helps you work with data in an organized way.

When working with input data in Extract, Transform, Load (ETL) or as part of information analysis processes, it's relatively common to manage data records on the basis of time and date. While simple in theory, there are many nuances to processing data under the lense of time and date.

From working with numerous ranges, calculating a certain number of parameters, etcetera, there's a lot to consider. In reality, you have to be very careful when working with time and date, which is where the popular Pandas library makes its grand entrance into our article. This great tool offers lots of functions to work with time and date, enabling you to create advanced and effective scripts. So, let's get this Pandas party started and dive in!

Why is Pandas library important for ETL and Data Analysis?

Pandas is a Python library that allows you to work and organize data using rows and columns, sometimes called "tabular data." Previously, you may have used a spreadsheet like Excel for the purpose of managing tabular data, but now, not so much. The beauty of using the Pandas library is that you can use it both for similar tasks that you would use a spreadsheet for like simple data operations and also complex data science tasks like machine learning.

Pandas is a popular data science tool for Python. It can be used to make business decisions across industries, helping people make decisions in areas such as marketing, sales, product development, finance, medicine, and so on.

It is worth noting that Pandas does not have multiprocessing support and can be slow when working with significantly large datasets. Sometimes, you have to spend a lot of time while Pandas reads a multitude of files, or aggregates and evaluates functions.

What are Date and Time in computers?

There are many variants of how to represent date and time in computers. For instance, we can measure time by Epoch. An epoch is a date and time from which a computer measures system time. Most computer systems determine time as a number representing the seconds removed from a particular arbitrary date and time. 

Unix and POSIX measure time as the number of seconds that have passed since Thursday 1 January 1970 00:00:00 UT, a point in time known as the Unix epoch.

The NT time epoch on Windows NT and later, refers to the Windows NT system time in 10−7s intervals from 00:00:00 1 January 1601.

Also, ISO 8601 is an international standard covering the worldwide exchange and communication (information interchange) of date and time-related data. The standard uses the Gregorian calendar. One of the general principles describes that date and time values are ordered from the largest to the smallest unit of time: year, month (or week), day, hour, minute, second, and a fraction of second. The lexicographical order of the representation thus corresponds to chronological order. ISO 8601 uses the 24-hour clock system for working with time.

In Python datetime modules describe working with basic date and time types. For instance, we can write this clear and effective script:

>>> import time
>>> from datetime import date
>>> today = date.today()
>>> today

datetime.date(2021, 24, 9)

Code language: JavaScript (javascript)

Very often, in ETL and data analysis, we solve the following types of problems:

  • the task of counting (for example, the number of days between certain dates);
  • the task of determining a range (for example, whether a certain date is included in a given range);
  • search for a task (for example, find all weekends in a given range).

Working with date and time in Pandas

Pandas API for Date and Time

The Pandas library provides a very rich and easy-to-use API for date and time. The API functions for working with time intervals and dates can be divided into the following categories.

Category of working days and months:

  • DateOffset
  • BusinessDay
  • BusinessHour
  • CustomBusinessDay
  • CustomBusinessHour
  • MonthEnd
  • MonthBegin
  • BusinessMonthEnd
  • BusinessMonthBegin
  • CustomBusinessMonthEnd
  • CustomBusinessMonthBegin
  • SemiMonthEnd
  • SemiMonthBegin

Category of weeks and quarters, years and fiscal years (52-53 weeks): 

  • Week
  • WeekOfMonth
  • LastWeekOfMonth
  • BQuarterEnd
  • BQuarterBegin
  • QuarterEnd
  • QuarterBegin
  • BYearEnd
  • BYearBegin
  • YearEnd
  • YearBegin
  • FY5253
  • FY5253Quarter

Category of days, ticks, hours, minutes, etc.

  • Easter
  • Tick
  • Day
  • Hour
  • Minute
  • Second
  • Milli
  • Micro
  • Nano

All these classes of functions help you efficiently, and in a matter of a few lines of code in Python, construct very complex queries to a specific date and time, as well as to their ranges.

Web Solutions

Creating a datetime object

Datetime is very useful for creating Python objects to work with date and time.

import pandas as pd
 
df = pd.DataFrame({'year': [2019, 2021],'month': [1, 12],'day': [1, 5]})
 
pd.to_datetime(df)
 
0   2019-01-01
1   2021-12-05
dtype: datetime64[ns]

Code language: JavaScript (javascript)

Or you can use another comfort form of to_dataframe():

pd.to_datetime(1491192805, unit='s')

Timestamp('2017-04-03 04:13:25')

Code language: JavaScript (javascript)

Converting string to datetime and vice versa

Pandas works very closely with another Python library datetime.

The datetime library includes two methods for converting strings to datetime objects and vice versa. To convert a string to a datetime object, use the strptime() function.

from datetime import date
 
d = date.today()
d.strftime("%y%m%d")

210921
Code language: JavaScript (javascript)

For the reverse conversion, you can use the strftime() method.

from datetime import datetime
 
str = '09/21/21 12:00:00'
datetime.strptime(str, '%m/%d/%y %H:%M:%S')
 
datetime.datetime(2021, 9, 21, 12, 0)

Code language: JavaScript (javascript)

Date ranges in Pandas

When working with dates, you can use the Python datetime library as illustrated in the previous example. On the other hand, for cases of date ranges and time recalculations, you can use the built-in functions of Pandas.

The  Pandas library is easy to use to create date ranges and help avoid loops when using a standard library. Here's an example of creating a 31-day array of dates starting on January 2-nd, 2021.

import pandas as pd
from datetime import datetime
 
pd.date_range(start=datetime(year=2021, month=1, day=2), periods=31, freq='D')

Code language: JavaScript (javascript)

Then, you will have 31 days in DatetimeIndex, and it will end on February 1-st 2021.

DatetimeIndex(['2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05',
               '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09',
               '2021-01-10', '2021-01-11', '2021-01-12', '2021-01-13',
               '2021-01-14', '2021-01-15', '2021-01-16', '2021-01-17',
               '2021-01-18', '2021-01-19', '2021-01-20', '2021-01-21',
               '2021-01-22', '2021-01-23', '2021-01-24', '2021-01-25',
               '2021-01-26', '2021-01-27', '2021-01-28', '2021-01-29',
               '2021-01-30', '2021-01-31', '2021-02-01'],
              dtype='datetime64[ns]', freq='D')

Code language: JavaScript (javascript)

Between date and between time in Pandas

Between and between_time can be effectively used to determine which records are included in a specific list between two dates or two times. This allows you to select only records from DataFrame in Pandas that belong to this period, a very convenient and frequently used function. For example, for time, here's the following code example:

import pandas as pd
 
i = pd.date_range('2021-04-09', periods=4, freq='1D20min')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)

A
2021-04-09 00:00:001
2021-04-10 00:20:002
2021-04-11 00:40:003
2021-04-12 01:00:004

ts.between_time('0:10', '0:50')

A
2021-04-10 00:20:002
2021-04-11 00:40:003

Code language: JavaScript (javascript)

Conclusion

It'd be naive of us to try and fit all the possibilities of working with time and date in Pandas in one article since there are so many useful functions and methods. 

After finding exactly what will help you solve your project's unique needs, it will most likely take a few lines of code to apply the solution, which is what's great about Pandas. 

From practice, Svitla Systems engineers recommend spending some time studying the documentation when working with date and time to find the perfect fit solution. Plus, as a bonus, the code will work quickly, since it uses the internal capabilities of the Pandas library.

Svitla Systems has extensive experience in creating ETL solutions and data analytics. You can contact our company with your tasks and products, and our Data Analytics and ETL engineers will solve your problem on time and will accompany these solutions in the process of your information system operation.