🐼 Pandas Lesson 1 – Preview
4-stage series + this preview page. Lessons 2–5 dive deep.
Contact on WhatsApp🔹 Introduction
Pandas is Python’s go-to toolkit for working with tabular data—spreadsheets, CSVs, databases, and APIs. It provides Series (single column) and DataFrame (table) with powerful tools for loading, cleaning, transforming, and analyzing data.
🔹 Importance
- Industry standard for data analysis in Python.
- Bridge spreadsheets ↔ Python, databases ↔ ML pipelines.
- Rich I/O: CSV, Excel, Pickle, JSON, Parquet, SQL; integrates with NumPy, Matplotlib, scikit-learn.
- Fast iteration: filter, group, aggregate, pivot, visualize quickly.
🔑 Keywords
- Series, DataFrame, Index, dtypes
- loc, iloc, query, mask, filter
- groupby, aggregate, pivot, merge/join
- CSV, Excel, Pickle, JSON, Parquet
- SQL (MySQL, SQLite, PostgreSQL), MongoDB
- Missing values (NaN), vectorization, broadcasting
🔌 Data I/O at a Glance (Files & Databases)
CSV
import pandas as pd
df = pd.read_csv("data.csv") # Load
df.to_csv("out.csv", index=False) # Save
Excel
Install engine if needed: pip install openpyxl
import pandas as pd
df = pd.read_excel("book.xlsx", sheet_name="Sheet1") # Load
df.to_excel("out.xlsx", index=False) # Save
Pickle (fast Python-native)
import pandas as pd
df = pd.read_csv("data.csv")
df.to_pickle("df.pkl") # Save (binary)
df2 = pd.read_pickle("df.pkl") # Load
MySQL (via SQLAlchemy)
Install: pip install sqlalchemy pymysql
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://user:password@localhost:3306/dbname")
df = pd.read_sql("SELECT * FROM sales", engine) # Read table/query
df.to_sql("sales_copy", engine, if_exists="replace", index=False) # Write
MongoDB (via PyMongo)
Install: pip install pymongo
import pandas as pd
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["shop"]
coll = db["orders"]
docs = list(coll.find({}, {"_id": 0})) # Drop _id for cleaner DataFrame
df = pd.DataFrame(docs)
JSON / Parquet (columnar)
Parquet needs an engine: pip install pyarrow or fastparquet
import pandas as pd
# JSON
df = pd.read_json("data.json")
df.to_json("out.json", orient="records")
# Parquet
df.to_parquet("data.parquet") # Save
df2 = pd.read_parquet("data.parquet") # Load
🗺️ Roadmap (4 Stages After Preview)
Lesson 2 — Series (Stage 2)
- Create Series (list, dict, scalar), indexes
- Selection, slicing, vector ops
- Real-world mini tasks (prices, attendance)
Lesson 3 — DataFrames (Stage 3)
- Create DataFrames (dicts, CSV, Excel)
- Add/remove columns, dtypes, missing values
- Merges & concatenation basics
Lesson 4 — Access & Analysis (Stage 4)
locvsiloc, boolean filters, query()- Sort, rank, groupby, aggregate
- Top-N problems, KPIs
Lesson 5 — I/O & Pipelines (Stage 5)
- Files: CSV, Excel, Pickle, JSON, Parquet
- Databases: MySQL (SQLAlchemy), MongoDB (PyMongo)
- Clean → Analyze → Visualize pipeline
🧠 MCQs
1. Pandas Series is closest to?
- A) Full spreadsheet
- B) Single column with an index
- C) Python set
- D) 3D array
Answer: B
2. Correct way to read a CSV?
A) df = pd.csv("file.csv")
B) df = pd.read_csv("file.csv")
C) df = read.csv("file.csv")
D) df = pandas.read("file.csv")
Answer: B
3. Which is label-based indexing?
- A)
iloc - B)
loc - C)
ix - D)
atalways
Answer: B
🧩 Assignments
Assignment 1. Read sales.csv, compute revenue (= Units × Price), show top 3 items by revenue.
import pandas as pd
df = pd.read_csv("sales.csv")
df["Revenue"] = df["Units"] * df["Price"]
print(df.sort_values("Revenue", ascending=False).head(3))
Assignment 2. From book.xlsx (Sheet1), filter rows where Marks ≥ 80 and export to toppers.xlsx.
import pandas as pd
df = pd.read_excel("book.xlsx", sheet_name="Sheet1")
toppers = df[df["Marks"] >= 80]
toppers.to_excel("toppers.xlsx", index=False)
Assignment 3. Read from MySQL table orders and print revenue by category (sum of qty*price).
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://user:pass@localhost:3306/shop")
df = pd.read_sql("SELECT category, qty, price FROM orders", engine)
df["revenue"] = df["qty"] * df["price"]
print(df.groupby("category")["revenue"].sum().sort_values(ascending=False))
0 Comments