Weather Station Data Analyzer

Project Overview & Use Case

When working in Data Science or Engineering, you rarely process data one piece at a time using standard Python loops—it is too slow. Instead, you use NumPy to perform operations on massive datasets all at once (a concept called vectorization).

The Use Case: Imagine you are a meteorologist who just received an entire year’s worth of daily temperature readings from a weather station. You need to analyze this data instantly to find averages, extremes, and specific trends (like how many days were dangerously hot).

The Output: This script simulates 365 days of temperature data, processes it entirely using NumPy arrays, and prints a statistical meteorological report in milliseconds.

System Workflow (How It Works)

Data Generation: Since we do not have a real weather sensor connected, the script uses NumPy’s random module to generate 365 realistic daily temperatures following a normal (Gaussian) distribution.

Vectorized Rounding: It instantly rounds all 365 floating-point numbers to one decimal place without a single for loop.

Statistical Analysis: It uses built-in NumPy methods (np.mean, np.max, np.min) to find the yearly average and extreme temperatures.

Boolean Masking (Filtering): It creates “masks” to filter the array and count exactly how many days fell into specific categories (e.g., above 30°C or below 10°C).

Source Code

weather_analyzer.py


import numpy as np

def generate_weather_data(days=365):
  """
  Simulates daily temperature readings (in Celsius) for a given number of days.
  Uses a normal distribution: Mean = 22.0°C, Standard Deviation = 7.5°C.
  """
  print(f"📡 Downloading data from weather station for {days} days...
")
  
  # Generate random temperatures
  raw_data = np.random.normal(loc=22.0, scale=7.5, size=days)
  
  # Round all values to 1 decimal place using vectorization
  clean_data = np.round(raw_data, 1)
  
  return clean_data

def analyze_data(temperatures):
  """
  Performs statistical analysis on the NumPy array of temperatures.
  """
  print("-" * 35)
  print("   🌤️ ANNUAL WEATHER REPORT 🌤️")
  print("-" * 35)

  # 1. Basic Statistics (Fast C-level operations)
  avg_temp = np.mean(temperatures)
  max_temp = np.max(temperatures)
  min_temp = np.min(temperatures)
  
  # np.argmax returns the INDEX of the highest value. 
  # We add 1 because arrays are 0-indexed, but days of the year start at 1.
  hottest_day = np.argmax(temperatures) + 1 
  coldest_day = np.argmin(temperatures) + 1

  print(f"🔹 Yearly Average: {avg_temp:.1f}°C")
  print(f"🔺 Highest Temp:   {max_temp}°C (Recorded on Day {hottest_day})")
  print(f"🔻 Lowest Temp:    {min_temp}°C (Recorded on Day {coldest_day})")
  print("-" * 35)

  # 2. Boolean Indexing (Filtering data without loops)
  # This creates a new array containing ONLY the values that meet the condition
  hot_days = temperatures[temperatures >= 30.0]
  cold_days = temperatures[temperatures <= 10.0]
  pleasant_days = temperatures[(temperatures > 15.0) & (temperatures < 25.0)]

  # Use .size to get the count of items in the filtered arrays
  print(f"🔥 Hot Days (>= 30°C):      {hot_days.size} days")
  print(f"❄️ Cold Days (<= 10°C):      {cold_days.size} days")
  print(f"😎 Pleasant Days (15-25°C):  {pleasant_days.size} days")
  print("-" * 35)

if __name__ == "__main__":
  # Generate the simulated array
  yearly_temperatures = generate_weather_data()
  
  # Pass the array to our analysis function
  analyze_data(yearly_temperatures)

Code Explanation (NumPy Concepts)

np.random.normal(loc, scale, size): This is a powerful data generation tool. Instead of truly random numbers (which would look chaotic), it generates numbers along a bell curve. loc is the average (mean) temperature, scale is the standard deviation (how far temperatures typically swing from the average), and size is the number of data points (365).

Vectorization: Notice how np.round(raw_data, 1) applies to the entire array at once. In standard Python, you would have to write a for loop to round each number individually. NumPy pushes this operation down to highly optimized C-code, making it lightning-fast.

np.argmax(): This function is a lifesaver. Instead of just telling you what the highest number in the array is, it tells you where it is (its index position).

Boolean Masking (temperatures[temperatures >= 30.0]): This is one of the most important concepts in NumPy. The inner expression temperatures >= 30.0 creates an array of True and False values. When you pass that back into the original bracket […], NumPy instantly filters out all the False values, leaving you with a sub-array of only the hot days.

Execution Guide

Install NumPy: Open your terminal or command prompt and run: pip install numpy

Save the file: Create a file named weather_analyzer.py and paste the code above.

Run the script: In your terminal, navigate to the folder where you saved the file and type: python weather_analyzer.py

Observe the Output: Every time you run the script, you will get slightly different results because the weather data is randomly generated, but it will always follow realistic statistical patterns!