E-commerce Sales Analyzer

Project Overview & Use Case

In the real world, businesses generate massive amounts of data stored in spreadsheets or databases. Processing this data row-by-row using standard Python loops is extremely inefficient. Pandas is the industry standard tool for data manipulation and analysis.

The Use Case: Imagine you manage an online retail store. You have just downloaded a messy CSV file containing hundreds of recent sales transactions. You need to quickly figure out your total revenue, which product category is making the most money, and identify your top-selling items.

The Output: This script first generates a realistic CSV file of mock sales data. Then, it uses Pandas to read the file, clean the data, calculate new financial metrics, and print a business intelligence report.

System Workflow (How It Works)

Data Generation: To make the script self-contained, it first uses basic Python to create a raw_sales_data.csv file containing 100 random transactions.

Data Loading: Pandas reads the CSV file and converts it into a DataFrame (think of this as a highly powerful, programmatic Excel spreadsheet).

Feature Engineering: The original data only contains the “Quantity” and “Unit Price” of items. Pandas creates a brand-new column called “Total Revenue” by instantly multiplying these two columns together.

Aggregation & Grouping: The script uses the .groupby() function to clump all identical product categories together and sum up their revenues.

Sorting & Reporting: It sorts the results from highest to lowest and prints a formatted summary to the console.

Source Code

sales_analyzer.py


import pandas as pd
import random
import os

# --- Helper Function to Generate Mock Data ---
def generate_mock_csv(filename="raw_sales_data.csv"):
  """Generates a CSV file with 100 rows of random e-commerce data."""
  print(f"⚙️ Generating raw data file: {filename}...
")
  
  categories = {
      "Electronics": ["Laptop", "Smartphone", "Monitor", "Tablet"],
      "Furniture": ["Desk", "Office Chair", "Bookshelf"],
      "Accessories": ["Mouse", "Keyboard", "Webcam", "USB Hub"]
  }
  
  with open(filename, "w") as file:
      file.write("OrderID,Category,Product,Quantity,UnitPrice
") # Header
      
      for i in range(1001, 1101): # Generate 100 orders
          category = random.choice(list(categories.keys()))
          product = random.choice(categories[category])
          quantity = random.randint(1, 5)
          
          # Assign realistic prices based on category
          if category == "Electronics": price = random.uniform(200.0, 1500.0)
          elif category == "Furniture": price = random.uniform(50.0, 300.0)
          else: price = random.uniform(10.0, 80.0)
          
          file.write(f"{i},{category},{product},{quantity},{price:.2f}
")

# --- Main Pandas Analysis ---
def analyze_sales_data(filename="raw_sales_data.csv"):
  """Uses Pandas to read, clean, and analyze the sales data."""
  
  # 1. Load the CSV into a Pandas DataFrame
  df = pd.read_csv(filename)
  
  print("-" * 40)
  print("📊 DATA PREVIEW (First 5 Rows):")
  print("-" * 40)
  print(df.head()) # .head() prints the first 5 rows
  print("
")

  # 2. Feature Engineering: Create a new column
  # Vectorized operation: Multiplies the entire column at once
  df['TotalRevenue'] = df['Quantity'] * df['UnitPrice']

  # 3. Calculate Overall Metrics
  total_sales_volume = df['Quantity'].sum()
  total_gross_revenue = df['TotalRevenue'].sum()

  print("-" * 40)
  print("📈 BUSINESS INTELLIGENCE REPORT")
  print("-" * 40)
  print(f"🔹 Total Items Sold:  {total_sales_volume}")
  print(f"🔹 Total Revenue:     $ {total_gross_revenue:,.2f}
")

  # 4. GroupBy & Aggregation
  # Group data by Category, focus on TotalRevenue, and sum it up
  revenue_by_category = df.groupby('Category')['TotalRevenue'].sum()
  
  # Sort the results descending (highest revenue first)
  revenue_by_category = revenue_by_category.sort_values(ascending=False)

  print("🏆 REVENUE BY CATEGORY:")
  for category, revenue in revenue_by_category.items():
      print(f"   - {category}: $ {revenue:,.2f}")
  
  print("
🥇 TOP 3 BEST-SELLING PRODUCTS (By Quantity):")
  # Group by product, sum the quantities, sort descending, and grab the top 3
  top_products = df.groupby('Product')['Quantity'].sum().sort_values(ascending=False).head(3)
  
  for product, qty in top_products.items():
      print(f"   - {product}: {qty} units sold")
  print("-" * 40)

if __name__ == "__main__":
  file_name = "raw_sales_data.csv"
  
  # Generate the file if it doesn't exist
  if not os.path.exists(file_name):
      generate_mock_csv(file_name)
      
  # Run the Pandas analysis
  analyze_sales_data(file_name)

Code Explanation (Pandas Concepts)

pd.read_csv(): This is the gateway into Pandas. It reads a raw text file and neatly organizes it into a DataFrame (rows and columns) that Python can easily manipulate.

df.head(): A crucial debugging tool. It prints the top 5 rows of your dataset so you can visually verify that the data loaded correctly.

Creating a New Column (df[‘NewCol’] = …): Unlike standard Python where you would iterate through a list to calculate totals, Pandas allows you to define a new column (TotalRevenue) by simply stating the mathematical relationship between two existing columns (Quantity * UnitPrice).

.groupby(‘ColumnName’): This is the most powerful feature in Pandas. It gathers all rows sharing the same value in a specific column (e.g., pulling all “Electronics” rows together).

Chaining Methods: Look at this line: df.groupby(‘Product’)[‘Quantity’].sum().sort_values(ascending=False).head(3). Pandas allows you to chain commands together. In one line, it groups by product, isolates the quantity, adds them up, sorts them from largest to smallest, and then slices off just the top 3 results.

Execution Guide

Install Pandas: Open your terminal or command prompt and run: pip install pandas

Save the file: Create a new Python file named sales_analyzer.py and paste the provided code.

Run the script: Navigate to the folder in your terminal and execute: python sales_analyzer.py

Review Output: You will see the raw data preview, followed by the calculated business metrics. Check your folder, and you will also see the raw_sales_data.csv file that the script successfully generated and analyzed.