HR Analytics & Salary Visualizer

Project Overview & Use Case

While Matplotlib is great for basic plotting, it often requires a lot of code to make charts look modern or to calculate statistical trends. Seaborn is built on top of Matplotlib and is specifically designed for making beautiful, complex statistical graphics with just one or two lines of code. It integrates flawlessly with Pandas DataFrames.

The Use Case: Imagine you are a Data Scientist working for the Human Resources department of a mid-sized company. You need to present a visually appealing report showing how salaries are distributed, how pay compares across different departments, and how years of experience correlate with compensation.

The Output: This script generates a mock dataset of 200 employees, applies a modern Seaborn visual theme, and creates a professional 4-panel dashboard featuring advanced statistical charts (including a distribution curve, a box plot, a scatter plot, and a correlation heatmap).

System Workflow (How It Works)

Theming: The script immediately applies sns.set_theme() to instantly upgrade the aesthetics (fonts, gridlines, background colors) of all charts without manual configuration.

Data Generation: It creates a Pandas DataFrame containing realistic, randomized employee data (Department, Years of Experience, and Salary).

Subplot Setup: Just like in Matplotlib, it creates a 2x2 grid (4 total chart areas) to hold the Seaborn plots.

Advanced Plotting: Each subplot uses a different Seaborn function to visualize the data in a unique way:

  • Chart 1: Shows the overall shape of the salary data.

  • Chart 2: Compares salary ranges and outliers across different departments.

  • Chart 3: Maps exactly how experience dictates salary, color-coded by department.

  • Chart 4: Calculates the mathematical correlation between numeric variables and visualizes it as a color-coded heatmap.

Source Code

python hr_analytics.py

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def generate_hr_data(num_employees=200):
  """Generates a realistic Pandas DataFrame of employee data."""
  print("⚙️ Generating mock HR dataset...")
  np.random.seed(42) # Ensures we get the same random data every time we run it
  
  departments = np.random.choice(['Engineering', 'Sales', 'HR', 'Marketing'], num_employees)
  experience = np.random.randint(1, 25, num_employees)
  
  # Base salary of $45k, plus $2.5k per year of experience, plus some random noise
  salaries = 45000 + (experience * 2500) + np.random.normal(0, 8000, num_employees)
  
  # Create the DataFrame
  df = pd.DataFrame({
      'Department': departments,
      'YearsExperience': experience,
      'Salary': salaries
  })
  return df

def create_hr_dashboard(df):
  """Uses Seaborn to create a 4-panel statistical dashboard."""
  print("📊 Building Seaborn visualizations...")

  # 1. Apply a global Seaborn theme (Instantly makes plots look professional)
  sns.set_theme(style="whitegrid", palette="muted")

  # 2. Set up the Matplotlib canvas (2 rows, 2 columns)
  fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(14, 10))
  fig.suptitle('🏢 Corporate HR Analytics Dashboard', fontsize=18, fontweight='bold', y=0.98)

  # --- TOP LEFT: Histogram with KDE (Kernel Density Estimate) ---
  # Shows the distribution of salaries. kde=True adds the smooth trend line.
  sns.histplot(data=df, x="Salary", kde=True, color="indigo", ax=axes[0, 0])
  axes[0, 0].set_title("Overall Salary Distribution", fontsize=12)
  axes[0, 0].set_xlabel("Salary ($)")

  # --- TOP RIGHT: Box Plot ---
  # Great for finding outliers and seeing the median/range per category.
  sns.boxplot(data=df, x="Department", y="Salary", palette="Set2", ax=axes[0, 1])
  axes[0, 1].set_title("Salary Ranges by Department", fontsize=12)

  # --- BOTTOM LEFT: Scatter Plot with Hue ---
  # Maps Experience vs Salary, color-codes (hue) by Department, and changes dot size (size)
  sns.scatterplot(data=df, x="YearsExperience", y="Salary", hue="Department", 
                  size="Salary", sizes=(20, 150), alpha=0.8, ax=axes[1, 0])
  axes[1, 0].set_title("Experience vs. Compensation", fontsize=12)
  axes[1, 0].set_xlabel("Years of Experience")
  
  # Move the legend outside the plot so it doesn't cover data
  axes[1, 0].legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)

  # --- BOTTOM RIGHT: Correlation Heatmap ---
  # Calculate how strongly numbers are related (e.g., as experience goes up, does salary go up?)
  numeric_df = df[['YearsExperience', 'Salary']]
  correlation_matrix = numeric_df.corr()
  
  # annot=True puts the actual numbers inside the colored boxes
  sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", 
              vmin=-1, vmax=1, square=True, ax=axes[1, 1])
  axes[1, 1].set_title("Variable Correlation Heatmap", fontsize=12)

  # 3. Final Layout Adjustments
  plt.tight_layout()
  # Add some extra padding at the top so the main title isn't cut off
  plt.subplots_adjust(top=0.90) 
  
  print("✅ Dashboard ready! Close the window to end the script.")
  plt.show()

if __name__ == "__main__":
  # Generate data and pass it to the visualization function
  employee_data = generate_hr_data()
  create_hr_dashboard(employee_data)
  

Code Explanation (Seaborn Concepts)

sns.set_theme(): By simply calling this at the start of your script, Seaborn overrides Matplotlib’s default, slightly dated visuals. It updates the font, background, and gridlines to look modern and clean.

The data, x, and y arguments: Notice how almost every Seaborn function asks for data=df. Because Seaborn is heavily integrated with Pandas, you just hand it the whole DataFrame, and then simply tell it the names of the columns (e.g., x=“Department”) you want to plot. You don’t have to extract the lists manually.

The hue argument: This is one of Seaborn’s most powerful features. In the scatter plot, adding hue=“Department” tells Seaborn to automatically color-code every dot based on what department that employee is in, and it automatically builds a legend for you.

sns.boxplot(): A box plot automatically calculates statistical quartiles. The “box” shows where the middle 50% of the data lies, the line inside is the median, and any dots floating above or below the lines (whiskers) are mathematical outliers.

sns.heatmap(): This is incredibly common in Machine Learning and Data Science. It takes a matrix of numbers (like our correlation calculations) and assigns a color intensity to them, making it instantly obvious which variables are strongly linked.

6. Execution Guide

Install Requirements: Open your terminal or command prompt and run: pip install seaborn pandas matplotlib numpy

Save the file: Create a new Python file named hr_analytics.py and paste the provided code.

Run the script: Navigate to the folder in your terminal and execute: python hr_analytics.py

Review Output: A large dashboard will appear. Notice how much detail is present in the charts (automatic legends, color gradients, trend lines) compared to the minimal amount of code required to generate them!