Pandas-1-Series

Pandas Lesson 1 – Series

🐼 Pandas Lesson 1 – Preview

4-stage series + this preview page. Lessons 2–5 dive deep.

Contact on WhatsApp

🔹 Introduction

Pandas is Python’s go-to toolkit for working with tabular data—spreadsheets, CSVs, databases, and APIs. It provides Series (single column) and DataFrame (table) with powerful tools for loading, cleaning, transforming, and analyzing data.

🔹 Importance

  • Industry standard for data analysis in Python.
  • Bridge spreadsheets ↔ Python, databases ↔ ML pipelines.
  • Rich I/O: CSV, Excel, Pickle, JSON, Parquet, SQL; integrates with NumPy, Matplotlib, scikit-learn.
  • Fast iteration: filter, group, aggregate, pivot, visualize quickly.

🔑 Keywords

  • Series, DataFrame, Index, dtypes
  • loc, iloc, query, mask, filter
  • groupby, aggregate, pivot, merge/join
  • CSV, Excel, Pickle, JSON, Parquet
  • SQL (MySQL, SQLite, PostgreSQL), MongoDB
  • Missing values (NaN), vectorization, broadcasting

🔌 Data I/O at a Glance (Files & Databases)

CSV

Excel

Install engine if needed: pip install openpyxl

Pickle (fast Python-native)

MySQL (via SQLAlchemy)

Install: pip install sqlalchemy pymysql

MongoDB (via PyMongo)

Install: pip install pymongo

JSON / Parquet (columnar)

Parquet needs an engine: pip install pyarrow or fastparquet

🗺️ Roadmap (4 Stages After Preview)

Lesson 2 — Series (Stage 2)

  • Create Series (list, dict, scalar), indexes
  • Selection, slicing, vector ops
  • Real-world mini tasks (prices, attendance)
Go to Lesson 2

Lesson 3 — DataFrames (Stage 3)

  • Create DataFrames (dicts, CSV, Excel)
  • Add/remove columns, dtypes, missing values
  • Merges & concatenation basics
Go to Lesson 3

Lesson 4 — Access & Analysis (Stage 4)

  • loc vs iloc, boolean filters, query()
  • Sort, rank, groupby, aggregate
  • Top-N problems, KPIs
Go to Lesson 4

Lesson 5 — I/O & Pipelines (Stage 5)

  • Files: CSV, Excel, Pickle, JSON, Parquet
  • Databases: MySQL (SQLAlchemy), MongoDB (PyMongo)
  • Clean → Analyze → Visualize pipeline
Go to Lesson 5

🧠 MCQs

1. Pandas Series is closest to?

  • A) Full spreadsheet
  • B) Single column with an index
  • C) Python set
  • D) 3D array

2. Correct way to read a CSV?

A) df = pd.csv("file.csv")
B) df = pd.read_csv("file.csv")
C) df = read.csv("file.csv")
D) df = pandas.read("file.csv")

3. Which is label-based indexing?

  • A) iloc
  • B) loc
  • C) ix
  • D) at always

🧩 Assignments

Assignment 1. Read sales.csv, compute revenue (= Units × Price), show top 3 items by revenue.

Assignment 2. From book.xlsx (Sheet1), filter rows where Marks ≥ 80 and export to toppers.xlsx.

Assignment 3. Read from MySQL table orders and print revenue by category (sum of qty*price).

0 Comments

Post a Comment

0 Comments