Pandas – Data Analysis in Python (Preview)

🐼 Pandas – Data Analysis in Python (Preview)

4-stage series. This page is the preview. Lessons 2–5 dive deep.

🔗 Pandas Home 📚 Pandas Documentation 📖 DataFrame Reference

🔹 Introduction

Pandas is Python’s go-to toolkit for working with tabular data—spreadsheets, CSVs, databases, APIs. It provides Series (single column) and DataFrame (table) with powerful tools for loading, cleaning, transforming, and analyzing data.

🔹 Importance

Industry standard for data analysis in Python.
Bridges spreadsheets ↔ Python, databases ↔ ML pipelines.
Rich I/O: CSV, Excel, Pickle, JSON, Parquet, SQL; works nicely with NumPy, Matplotlib, scikit-learn.
Fast iteration: filter, group, aggregate, pivot, and visualize quickly.

🔑 Keywords

Series, DataFrame, Index, dtypes
loc, iloc, query, mask, filter
groupby, aggregate, pivot, merge/join
CSV, Excel, Pickle, JSON, Parquet
SQL (MySQL, SQLite, PostgreSQL), MongoDB
Missing values (NaN), vectorization, broadcasting

🔌 Data I/O at a Glance (Files & Databases)

CSV

import pandas as pd
df = pd.read_csv("data.csv")          # Load
df.to_csv("out.csv", index=False)     # Save

Excel

Install engine if needed: pip install openpyxl

import pandas as pd
df = pd.read_excel("book.xlsx", sheet_name="Sheet1")   # Load
df.to_excel("out.xlsx", index=False)                   # Save

Pickle (fast Python-native)

import pandas as pd
df = pd.read_csv("data.csv")
df.to_pickle("df.pkl")          # Save (binary)
df2 = pd.read_pickle("df.pkl")  # Load

MySQL (via SQLAlchemy)

Install: pip install sqlalchemy pymysql

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine("mysql+pymysql://user:password@localhost:3306/dbname")

df = pd.read_sql("SELECT * FROM sales", engine)         # Read table/query
df.to_sql("sales_copy", engine, if_exists="replace", index=False)  # Write

MongoDB (via PyMongo)

Install: pip install pymongo

import pandas as pd
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["shop"]
coll = db["orders"]

docs = list(coll.find({}, {"_id": 0}))  # Drop _id for cleaner DataFrame
df = pd.DataFrame(docs)

JSON / Parquet (columnar)

Parquet needs an engine: pip install pyarrow or fastparquet

import pandas as pd
# JSON
df = pd.read_json("data.json")
df.to_json("out.json", orient="records")

# Parquet
df.to_parquet("data.parquet")      # Save
df2 = pd.read_parquet("data.parquet")  # Load

🗺️ Roadmap (4 Stages After Preview)

Lesson 2 — Series (Stage 2)

Create Series (list, dict, scalar), indexes
Selection, slicing, vector ops
Real-world mini tasks (prices, attendance)

Go to Lesson 2

Lesson 3 — DataFrames (Stage 3)

Create DataFrames (dicts, CSV, Excel)
Add/remove columns, dtypes, missing values
Merges & concatenation basics

Go to Lesson 3

Lesson 4 — Access & Analysis (Stage 4)

loc vs iloc, boolean filters, query()
Sort, rank, groupby, aggregate
Top-N problems, KPIs

Go to Lesson 4

Lesson 5 — I/O & Pipelines (Stage 5)

Files: CSV, Excel, Pickle, JSON, Parquet
Databases: MySQL (SQLAlchemy), MongoDB (PyMongo)
Clean → Analyze → Visualize pipeline

Go to Lesson 5

🧠 MCQs

1. Pandas Series is closest to?

A) Full spreadsheet
B) Single column with an index
C) Python set
D) 3D array

2. Correct way to read a CSV?

A) df = pd.csv("file.csv")
B) df = pd.read_csv("file.csv")
C) df = read.csv("file.csv")
D) df = pandas.read("file.csv")

3. Which is label-based indexing?

A) iloc
B) loc
C) ix
D) at always

🧩 Assignments

Assignment 1. Read sales.csv, compute revenue (= Units × Price), show top 3 items by revenue.

Assignment 2. From book.xlsx (Sheet1), filter rows where Marks ≥ 80 and export to toppers.xlsx.

Assignment 3. Read from MySQL table orders and print revenue by category (sum of qty*price).

Start the course:

Next → Lesson 1 (Series) Lesson 3 (DataFrames) Lesson 4 (Access & Analysis) Lesson 5 (I/O & Pipelines)

Left Sidebar

Pandas-0-Preview

🔹 Introduction

🔹 Importance

🔑 Keywords

🔌 Data I/O at a Glance (Files & Databases)

CSV

Excel

Pickle (fast Python-native)

MySQL (via SQLAlchemy)

MongoDB (via PyMongo)

JSON / Parquet (columnar)

🗺️ Roadmap (4 Stages After Preview)

Lesson 2 — Series (Stage 2)

Lesson 3 — DataFrames (Stage 3)

Lesson 4 — Access & Analysis (Stage 4)

Lesson 5 — I/O & Pipelines (Stage 5)

🧠 MCQs

🧩 Assignments

0 Comments

Post a Comment

0 Comments

My Zoom link

🔖 My Bookmarks

If you like the content here then consider sponsoring us on Github

Views last 30 days.

Pages

Trending ...

Recently viewed

Most Active Posts

Menu Footer Widget

Champak`s World