🐼 Pandas – Data Analysis in Python (Preview)
4-stage series. This page is the preview. Lessons 2–5 dive deep.
Contact on WhatsApp🔹 Introduction
Pandas is Python’s go-to toolkit for working with tabular data—spreadsheets, CSVs, databases, APIs. It provides Series (single column) and DataFrame (table) with powerful tools for loading, cleaning, transforming, and analyzing data.
🔹 Importance
- Industry standard for data analysis in Python.
- Bridges spreadsheets ↔ Python, databases ↔ ML pipelines.
- Rich I/O: CSV, Excel, Pickle, JSON, Parquet, SQL; works nicely with NumPy, Matplotlib, scikit-learn.
- Fast iteration: filter, group, aggregate, pivot, and visualize quickly.
🔑 Keywords
- Series, DataFrame, Index, dtypes
- loc, iloc, query, mask, filter
- groupby, aggregate, pivot, merge/join
- CSV, Excel, Pickle, JSON, Parquet
- SQL (MySQL, SQLite, PostgreSQL), MongoDB
- Missing values (NaN), vectorization, broadcasting
🔌 Data I/O at a Glance (Files & Databases)
CSV
import pandas as pd df = pd.read_csv("data.csv") # Load df.to_csv("out.csv", index=False) # Save
Excel
Install engine if needed: pip install openpyxl
import pandas as pd df = pd.read_excel("book.xlsx", sheet_name="Sheet1") # Load df.to_excel("out.xlsx", index=False) # Save
Pickle (fast Python-native)
import pandas as pd df = pd.read_csv("data.csv") df.to_pickle("df.pkl") # Save (binary) df2 = pd.read_pickle("df.pkl") # Load
MySQL (via SQLAlchemy)
Install: pip install sqlalchemy pymysql
import pandas as pd from sqlalchemy import create_engine engine = create_engine("mysql+pymysql://user:password@localhost:3306/dbname") df = pd.read_sql("SELECT * FROM sales", engine) # Read table/query df.to_sql("sales_copy", engine, if_exists="replace", index=False) # Write
MongoDB (via PyMongo)
Install: pip install pymongo
import pandas as pd from pymongo import MongoClient client = MongoClient("mongodb://localhost:27017/") db = client["shop"] coll = db["orders"] docs = list(coll.find({}, {"_id": 0})) # Drop _id for cleaner DataFrame df = pd.DataFrame(docs)
JSON / Parquet (columnar)
Parquet needs an engine: pip install pyarrow
or fastparquet
import pandas as pd # JSON df = pd.read_json("data.json") df.to_json("out.json", orient="records") # Parquet df.to_parquet("data.parquet") # Save df2 = pd.read_parquet("data.parquet") # Load
🗺️ Roadmap (4 Stages After Preview)
Lesson 2 — Series (Stage 2)
- Create Series (list, dict, scalar), indexes
- Selection, slicing, vector ops
- Real-world mini tasks (prices, attendance)
Lesson 3 — DataFrames (Stage 3)
- Create DataFrames (dicts, CSV, Excel)
- Add/remove columns, dtypes, missing values
- Merges & concatenation basics
Lesson 4 — Access & Analysis (Stage 4)
loc
vsiloc
, boolean filters, query()- Sort, rank, groupby, aggregate
- Top-N problems, KPIs
Lesson 5 — I/O & Pipelines (Stage 5)
- Files: CSV, Excel, Pickle, JSON, Parquet
- Databases: MySQL (SQLAlchemy), MongoDB (PyMongo)
- Clean → Analyze → Visualize pipeline
🧠 MCQs
1. Pandas Series is closest to?
- A) Full spreadsheet
- B) Single column with an index
- C) Python set
- D) 3D array
Answer: B
2. Correct way to read a CSV?
A) df = pd.csv("file.csv") B) df = pd.read_csv("file.csv") C) df = read.csv("file.csv") D) df = pandas.read("file.csv")
Answer: B
3. Which is label-based indexing?
- A)
iloc
- B)
loc
- C)
ix
- D)
at
always
Answer: B
🧩 Assignments
Assignment 1. Read sales.csv
, compute revenue (= Units × Price), show top 3 items by revenue.
import pandas as pd df = pd.read_csv("sales.csv") df["Revenue"] = df["Units"] * df["Price"] print(df.sort_values("Revenue", ascending=False).head(3))
Assignment 2. From book.xlsx
(Sheet1), filter rows where Marks ≥ 80
and export to toppers.xlsx
.
import pandas as pd df = pd.read_excel("book.xlsx", sheet_name="Sheet1") toppers = df[df["Marks"] >= 80] toppers.to_excel("toppers.xlsx", index=False)
Assignment 3. Read from MySQL table orders
and print revenue by category
(sum of qty*price
).
import pandas as pd from sqlalchemy import create_engine engine = create_engine("mysql+pymysql://user:pass@localhost:3306/shop") df = pd.read_sql("SELECT category, qty, price FROM orders", engine) df["revenue"] = df["qty"] * df["price"] print(df.groupby("category")["revenue"].sum().sort_values(ascending=False))
0 Comments