{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "好的,监控采集的光伏发电数据,晚上数据都是0,白天根据天气情况。想要获得几个操作: \n", " 1.每天白天的平均值,也就是每天从第一个非零数据到最后一个非零数据之间的平均值; \n", " 2.每天第一个和最后一个非零数据的时间,也就是获得第一个和最后一个非零数据所在行,发电开始和结束时间; \n", " 3.每天第一个和最后一个非零数据时间差,也就是发电时长\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 读取数据 \n", "设置索引 \n", "去除时间列" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_excel('2021(1).xlsx')\n", "\n", "df.index=df['dt']\n", "\n", "df=df.drop('dt',axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对应的是第一个问题,第二个问题,第三个问题" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "def first(x):\n", " return x[x[x!=0].first_valid_index():x[x!=0].last_valid_index()].mean()\n", "def seconed(x):\n", " return x[x!=0].first_valid_index(),x[x!=0].last_valid_index()\n", "def third(x):\n", " return x[x!=0].last_valid_index()-x[x!=0].first_valid_index()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df1=df.groupby(pd.Grouper(freq='D')).apply(first)\n", "\n", "df2=df.groupby(pd.Grouper(freq='D')).apply(seconed)\n", "\n", "df3=df.groupby(pd.Grouper(freq='D')).apply(third)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "拼接操作" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "df_l=pd.concat([df1,df2,df3],axis=1)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | da | \n", "0 | \n", "1 | \n", "
---|---|---|---|
dt | \n", "\n", " | \n", " | \n", " |
2021-09-06 | \n", "1.2 | \n", "(2021-09-06 15:07:46, 2021-09-06 15:11:46) | \n", "0 days 00:04:00 | \n", "
2021-09-07 | \n", "1.0 | \n", "(2021-09-07 15:07:46, 2021-09-07 15:11:46) | \n", "0 days 00:04:00 | \n", "