calplot绘制日历热图

Calplot creates heatmaps from Pandas time series data.

Plot Pandas time series data sampled by day in a heatmap per calendar year using matplotlib.

import pandas as pd
import calplot
df = pd.read_excel(r"日历热图.xlsx")
df.drop('d_key', axis=1, inplace=True)
df.index
RangeIndex(start=0, stop=461, step=1)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 461 entries, 0 to 460
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   d_date   461 non-null    datetime64[ns]
 1   sku_sum  461 non-null    int64         
dtypes: datetime64[ns](1), int64(1)
memory usage: 7.3 KB
Signature:
df.set_index(
    keys,
    drop: 'bool' = True,
    append: 'bool' = False,
    inplace: 'bool' = False,
    verify_integrity: 'bool' = False,
)
df.set_index(['d_date'], inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 461 entries, 2020-08-30 06:48:07 to 2022-02-28 02:19:39
Data columns (total 1 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   sku_sum  461 non-null    int64
dtypes: int64(1)
memory usage: 7.2 KB
df.index
DatetimeIndex(['2020-08-30 06:48:07', '2020-08-31 08:48:44',
               '2020-09-01 03:22:57', '2020-09-02 05:22:20',
               '2020-09-03 00:47:00', '2020-09-04 06:03:27',
               '2020-09-05 07:44:57', '2020-09-06 04:32:15',
               '2020-09-07 00:12:22', '2020-09-08 00:41:44',
               ...
               '2022-02-19 00:37:46', '2022-02-20 04:25:20',
               '2022-02-21 00:48:09', '2022-02-22 03:08:56',
               '2022-02-23 00:39:48', '2022-02-24 00:11:49',
               '2022-02-25 00:24:41', '2022-02-26 02:12:47',
               '2022-02-27 01:49:16', '2022-02-28 02:19:39'],
              dtype='datetime64[ns]', name='d_date', length=461, freq=None)
df.head()
	       sku_sum
d_date	
2020-08-30 06:48:07	1
2020-08-31 08:48:44	2
2020-09-01 03:22:57	3
2020-09-02 05:22:20	4
2020-09-03 00:47:00	3
# 注意这里的index设置和set_index之间还是有所差异的, 设置index, 使用这种方法
# 设置时间序列的index
# 注意可能偶尔出现异常, index没有正确转为datetime
# df.index= pd.DatetimeIndex(df['d_date'])
# df.drop('d_date', inplace=True, axis=1)
squeeze

Squeeze 1 dimensional axis objects into scalars.

Series or DataFrames with a single element are squeezed to a scalar. DataFrames with a single column or a single row are squeezed to a Series. Otherwise the object is unchanged.

This method is most useful when you don’t know if your object is a Series or DataFrame, but you do know it has just a single column. In that case you can safely call squeeze to ensure you have a Series.

Parameters
axis{0 or ‘index’, 1 or ‘columns’, None}, default None

A specific axis to squeeze. By default, all length-1 axes are squeezed. For Series this parameter is unused and defaults to None.

Returns
DataFrame, Series, or scalar
The projection after squeezing axis or all the axes.
pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)]).squeeze(axis=0)
a0    0
a1    1
a2    2
a3    3
a4    4
Name: 0, dtype: int64
pd.DataFrame([list(range(5))], columns=["a{}".format(i) for i in range(5)]).T
d_series = df.squeeze()

	0
a0	0
a1	1
a2	2
a3	3
a4	4
# 将dataframe转为series结构, 同时设置index, 必须是DatetimeIndex格式
# ts = pd.Series(df['sku_sum'], index=df.index)
d_series.index
DatetimeIndex(['2020-08-30 06:48:07', '2020-08-31 08:48:44',
               '2020-09-01 03:22:57', '2020-09-02 05:22:20',
               '2020-09-03 00:47:00', '2020-09-04 06:03:27',
               '2020-09-05 07:44:57', '2020-09-06 04:32:15',
               '2020-09-07 00:12:22', '2020-09-08 00:41:44',
               ...
               '2022-02-19 00:37:46', '2022-02-20 04:25:20',
               '2022-02-21 00:48:09', '2022-02-22 03:08:56',
               '2022-02-23 00:39:48', '2022-02-24 00:11:49',
               '2022-02-25 00:24:41', '2022-02-26 02:12:47',
               '2022-02-27 01:49:16', '2022-02-28 02:19:39'],
              dtype='datetime64[ns]', name='d_date', length=461, freq=None)
d_series.info()
<class 'pandas.core.series.Series'>
DatetimeIndex: 461 entries, 2020-08-30 06:48:07 to 2022-02-28 02:19:39
Series name: sku_sum
Non-Null Count  Dtype
--------------  -----
461 non-null    int64
dtypes: int64(1)
memory usage: 7.2 KB
d_series.head()
d_date
2020-08-30 06:48:07    1
2020-08-31 08:48:44    2
2020-09-01 03:22:57    3
2020-09-02 05:22:20    4
2020-09-03 00:47:00    3
Name: sku_sum, dtype: int64
calplot.calplot?

calplot.calplot(
    data,
    how='sum',
    yearlabels=True,
    yearascending=True,
    yearlabel_kws=None,
    subplot_kws=None,
    gridspec_kws=None,
    figsize=None,
    fig_kws=None,
    colorbar=None,
    suptitle=None,
    suptitle_kws=None,
    tight_layout=True,
    **kwargs,
)
data : Series
    Data for the plot. Must be indexed by a DatetimeIndex.
# https://calplot.readthedocs.io/en/latest/
# 注意这里只接受series数据的类型, index必须是DatetimeIndex(注意这里不是直接将时间所在的列使用set_index生成)
calplot.calplot(d_series)
pSdWhHU.png