Automate Reports with Python: Save 10 Hours a Week

If you work in any role that involves generating reports, you know the drill. Every week or every month, you pull data from multiple sources, paste it into a spreadsheet, clean it up, build charts, format the output, and email it to stakeholders. The work is tedious, error-prone, and consumes hours that could be spent on analysis, strategy, or simply going home on time.

Python can automate this entire process. Not partially. Not most of it. All of it. From reading raw data to generating polished Excel workbooks with charts, to emailing the finished report to the right people on a schedule. Once you set it up, the report generates itself, and you never touch it again unless the requirements change.

This guide walks you through exactly how to do it, with real code examples you can adapt to your own reporting workflows.

The Typical Manual Report Workflow

Before automating anything, it helps to map out what you are currently doing by hand. Most manual reporting workflows follow a predictable pattern.

  1. Download or export data from one or more systems (databases, CRMs, Google Analytics, internal tools)
  2. Open Excel and paste the raw data into a workbook
  3. Clean the data by removing duplicates, fixing formatting issues, handling missing values, and standardizing column names
  4. Transform and aggregate by creating pivot tables, calculating metrics, applying formulas, and merging data from different sources
  5. Create charts to visualize trends, comparisons, and KPIs
  6. Format the output with headers, colors, number formatting, and a professional appearance
  7. Save and distribute by exporting to PDF or attaching the workbook to an email and sending it to stakeholders

Each of these steps takes time, and each is a potential point of human error. Miss a filter? Your numbers are wrong. Forget to update a formula? Last month's data bleeds into this month's report. Copy the wrong column? Nobody notices until the VP asks why revenue dropped 90%.

Python eliminates all of this. Every step becomes a line of code that executes identically every time.

The Python Libraries You Need

You do not need dozens of libraries. Four core tools cover the vast majority of reporting needs.

pandas is the backbone of data manipulation in Python. It reads data from CSV files, Excel workbooks, databases, and APIs. It filters, groups, merges, pivots, and aggregates data with concise, readable syntax. If your report involves any kind of data transformation, pandas does the heavy lifting.

openpyxl is the library for reading and writing Excel files. It gives you full control over cell formatting, colors, fonts, number formats, merged cells, and even Excel formulas. When your output needs to be a polished Excel workbook, openpyxl makes it happen programmatically.

matplotlib creates charts and visualizations. Line charts, bar charts, pie charts, scatter plots, and more. You can embed these directly into your Excel output or save them as images for inclusion in PDF reports.

schedule (or the operating system's built-in scheduler) runs your script automatically at whatever interval you need. Daily at 7 AM. Every Monday at 9 AM. The first of every month. You set it and forget it.

Step 1: Reading Your Data

The first step in any automated report is reading the raw data. pandas makes this remarkably simple regardless of your data source.

Reading from Excel:

import pandas as pd

df = pd.read_excel("sales_data.xlsx", sheet_name="Q1 Sales")

Reading from CSV:

df = pd.read_csv("monthly_exports.csv")

Reading from a database:

import sqlite3

conn = sqlite3.connect("company_database.db")
df = pd.read_sql("SELECT * FROM sales WHERE date >= '2026-01-01'", conn)

Reading from multiple files and combining them:

import glob

files = glob.glob("monthly_reports/*.csv")
all_data = pd.concat([pd.read_csv(f) for f in files])

The key principle is that no matter where your data lives, pandas can read it into a DataFrame, which is a table-like structure that you can manipulate with simple commands. Once your data is in a DataFrame, everything else follows the same pattern regardless of the original source.

Step 2: Cleaning and Transforming

Raw data is almost never ready for a report. It needs cleaning, and pandas provides tools for every common data quality issue.

Removing duplicates:

df = df.drop_duplicates()

Handling missing values:

# Fill missing values with zero
df["revenue"] = df["revenue"].fillna(0)

# Or drop rows with missing critical fields
df = df.dropna(subset=["customer_id", "revenue"])

Standardizing text:

df["region"] = df["region"].str.strip().str.title()

Converting data types:

df["date"] = pd.to_datetime(df["date"])
df["revenue"] = df["revenue"].astype(float)

Creating calculated columns:

df["profit_margin"] = (df["revenue"] - df["cost"]) / df["revenue"] * 100
df["quarter"] = df["date"].dt.quarter

Each of these operations is a single line of code. In Excel, the equivalent would involve helper columns, nested formulas, and manual verification. In Python, the transformations are explicit, documented, and repeatable.

Step 3: Aggregating and Summarizing

Reports rarely show raw data. They show summaries, totals, averages, and comparisons. pandas handles aggregation with its groupby method, which works similarly to Excel pivot tables but with far more flexibility.

Basic grouping and aggregation:

regional_summary = df.groupby("region").agg(
    total_revenue=("revenue", "sum"),
    avg_deal_size=("revenue", "mean"),
    num_deals=("revenue", "count"),
    top_deal=("revenue", "max")
).round(2)

Multi-level grouping:

quarterly_by_region = df.groupby(["quarter", "region"])["revenue"].sum().unstack()

Pivot tables:

pivot = df.pivot_table(
    values="revenue",
    index="product_category",
    columns="quarter",
    aggfunc="sum",
    margins=True  # Adds row and column totals
)

Calculating period-over-period changes:

monthly = df.groupby(df["date"].dt.to_period("M"))["revenue"].sum()
monthly_growth = monthly.pct_change() * 100

These operations produce the exact summary tables that typically form the core of business reports. Each one takes a single line of code and executes in milliseconds.

Step 4: Creating Charts

Visualizations transform numbers into insights. matplotlib lets you create publication-quality charts with a few lines of code.

Bar chart of revenue by region:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
regional_summary["total_revenue"].plot(kind="bar", ax=ax, color="#2563eb")
ax.set_title("Revenue by Region", fontsize=14, fontweight="bold")
ax.set_ylabel("Revenue ($)")
ax.set_xlabel("Region")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("revenue_by_region.png", dpi=150)

Line chart showing monthly trends:

fig, ax = plt.subplots(figsize=(12, 6))
monthly.plot(kind="line", ax=ax, marker="o", linewidth=2)
ax.set_title("Monthly Revenue Trend", fontsize=14, fontweight="bold")
ax.set_ylabel("Revenue ($)")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig("monthly_trend.png", dpi=150)

Pie chart of market share:

fig, ax = plt.subplots(figsize=(8, 8))
category_revenue = df.groupby("product_category")["revenue"].sum()
category_revenue.plot(kind="pie", ax=ax, autopct="%1.1f%%")
ax.set_ylabel("")
ax.set_title("Revenue by Product Category")
plt.tight_layout()
plt.savefig("market_share.png", dpi=150)

You can save these charts as image files and embed them in Excel workbooks or PDF reports. The charts update automatically every time the script runs with new data.

Step 5: Writing the Output to Excel

This is where openpyxl shines. You can create a polished, professionally formatted Excel workbook that looks like it was built by hand, but was generated by your script in seconds.

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.chart import BarChart, Reference
from openpyxl.utils.dataframe import dataframe_to_rows

wb = Workbook()
ws = wb.active
ws.title = "Executive Summary"

# Add a title
ws["A1"] = "Monthly Sales Report"
ws["A1"].font = Font(size=16, bold=True)
ws["A2"] = f"Generated: {pd.Timestamp.now().strftime('%B %d, %Y')}"
ws["A2"].font = Font(size=10, italic=True, color="666666")

# Write the summary data starting at row 4
header_fill = PatternFill(start_color="2563EB", end_color="2563EB", fill_type="solid")
header_font = Font(color="FFFFFF", bold=True)

for r_idx, row in enumerate(dataframe_to_rows(regional_summary, index=True), start=4):
    for c_idx, value in enumerate(row, start=1):
        cell = ws.cell(row=r_idx, column=c_idx, value=value)
        if r_idx == 4:  # Header row
            cell.fill = header_fill
            cell.font = header_font
            cell.alignment = Alignment(horizontal="center")

# Auto-adjust column widths
for column in ws.columns:
    max_length = max(len(str(cell.value or "")) for cell in column)
    ws.column_dimensions[column[0].column_letter].width = max_length + 4

# Add a chart directly in the workbook
chart = BarChart()
chart.title = "Revenue by Region"
chart.y_axis.title = "Revenue ($)"
data = Reference(ws, min_col=2, min_row=4, max_row=ws.max_row, max_col=2)
categories = Reference(ws, min_col=1, min_row=5, max_row=ws.max_row)
chart.add_data(data, titles_from_data=True)
chart.set_categories(categories)
ws.add_chart(chart, "F4")

wb.save("monthly_report.xlsx")

This script produces a workbook with a title, a timestamp, a formatted data table with colored headers, and an embedded chart. Your stakeholders receive a familiar, professional Excel file. They never need to know that Python created it.

Step 6: Scheduling Your Report

The final step is making the report generate itself automatically. The approach depends on your operating system.

On macOS and Linux (using cron):

Open your terminal and type crontab -e, then add a line like this:

0 7 * * 1 /usr/bin/python3 /path/to/generate_report.py

This runs the script every Monday at 7:00 AM.

On Windows (using Task Scheduler):

  1. Open Task Scheduler from the Start menu
  2. Click "Create Basic Task"
  3. Set the trigger (e.g., weekly on Monday at 7:00 AM)
  4. Set the action to "Start a Program"
  5. Browse to your Python executable (e.g., C:\Python312\python.exe)
  6. Add the path to your script as an argument

Adding email delivery to your script:

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase
from email import encoders

def send_report(filename, recipients):
    msg = MIMEMultipart()
    msg["Subject"] = f"Weekly Sales Report - {pd.Timestamp.now().strftime('%B %d, %Y')}"
    msg["From"] = "reports@yourcompany.com"
    msg["To"] = ", ".join(recipients)

    with open(filename, "rb") as f:
        attachment = MIMEBase("application", "octet-stream")
        attachment.set_payload(f.read())
        encoders.encode_base64(attachment)
        attachment.add_header("Content-Disposition", f"attachment; filename={filename}")
        msg.attach(attachment)

    with smtplib.SMTP("smtp.yourcompany.com", 587) as server:
        server.starttls()
        server.login("reports@yourcompany.com", "your_password")
        server.send_message(msg)

send_report("monthly_report.xlsx", ["manager@company.com", "team@company.com"])

With this addition, the report is not only generated automatically but delivered directly to stakeholders' inboxes. You are completely removed from the process.

Full Example: Weekly Sales Report

Here is a complete, end-to-end script that ties everything together. This script reads sales data from a CSV file, calculates key metrics, creates a chart, generates a formatted Excel report, and is ready to be scheduled.

import pandas as pd
import matplotlib.pyplot as plt
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl.drawing.image import Image
from datetime import datetime

# --- 1. Read and clean data ---
df = pd.read_csv("sales_data.csv")
df["date"] = pd.to_datetime(df["date"])
df["revenue"] = df["revenue"].astype(float)

# Filter to current month
current_month = datetime.now().replace(day=1)
df_month = df[df["date"] >= current_month]

# --- 2. Calculate metrics ---
summary = df_month.groupby("region").agg(
    total_revenue=("revenue", "sum"),
    num_deals=("deal_id", "count"),
    avg_deal=("revenue", "mean")
).round(2).sort_values("total_revenue", ascending=False)

total_revenue = df_month["revenue"].sum()
total_deals = len(df_month)

# --- 3. Create chart ---
fig, ax = plt.subplots(figsize=(10, 5))
summary["total_revenue"].plot(kind="bar", ax=ax, color="#2563eb")
ax.set_title("Revenue by Region (Current Month)", fontweight="bold")
ax.set_ylabel("Revenue ($)")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("chart.png", dpi=150)
plt.close()

# --- 4. Build Excel report ---
wb = Workbook()
ws = wb.active
ws.title = "Sales Report"

ws["A1"] = "Weekly Sales Report"
ws["A1"].font = Font(size=18, bold=True)
ws["A2"] = f"Generated: {datetime.now().strftime('%B %d, %Y at %I:%M %p')}"
ws["A2"].font = Font(size=10, color="888888")

ws["A4"] = f"Total Revenue: ${total_revenue:,.2f}"
ws["A4"].font = Font(size=12, bold=True)
ws["A5"] = f"Total Deals: {total_deals}"
ws["A5"].font = Font(size=12, bold=True)

# Write summary table
start_row = 7
header_fill = PatternFill(start_color="1e3a5f", end_color="1e3a5f", fill_type="solid")

for r_idx, row in enumerate(dataframe_to_rows(summary, index=True), start=start_row):
    for c_idx, value in enumerate(row, start=1):
        cell = ws.cell(row=start_row + r_idx - start_row, column=c_idx, value=value)
        if r_idx == start_row:
            cell.font = Font(color="FFFFFF", bold=True)
            cell.fill = header_fill

# Embed chart
img = Image("chart.png")
ws.add_image(img, "F4")

wb.save("weekly_sales_report.xlsx")
print("Report generated successfully.")

This script is approximately 50 lines of code. It replaces a manual workflow that typically takes one to two hours. Run it once manually to verify the output, then schedule it and reclaim your time permanently.

Tips for Building Reliable Report Automation

Add error handling. Wrap your script in try-except blocks and send yourself an email if something fails. This way you know immediately if a data source is unavailable or if the format has changed.

Log everything. Write a log file that records when the script ran, how many rows it processed, and the key output metrics. This makes debugging straightforward when something looks off.

Use configuration files. Store file paths, email addresses, and other settings in a separate configuration file rather than hardcoding them. This makes it easy to update settings without modifying the code.

Test with sample data first. Before pointing your script at production data, test it with a small sample file. Verify that every calculation matches what you would get in Excel. Once confirmed, switch to the real data source.

Version control your scripts. Use Git to track changes to your report scripts. This lets you see what changed, when, and why, and you can roll back to a previous version if a change introduces a problem.

The Return on Investment

The math on automating reports is compelling. If you spend just two hours per week on a report that can be automated, that is 100 hours per year. If you have three such reports, that is 300 hours, or roughly seven and a half full work weeks. Even if it takes you 20 hours to build and test the automation, you recoup that investment in the first two months and save hundreds of hours every year after that.

Beyond time savings, automated reports are more accurate. They do not have off days, they do not forget steps, and they do not accidentally overwrite formulas. The quality improvement is often as valuable as the time savings.

Ready to learn Python from scratch? Read our free Python for Business Beginners textbook — no coding experience required.