📚 AIO2025 - Week 2: Python Lists, SQL & Git Mastery

Welcome to Week 2 of the AIO2025 journey! This week, we’ll explore three fundamental pillars of programming and software development: Python Lists & Data Structures, Database Design with SQL, and Git Version Control.

💡 Learning Objectives: Master Python data structures, design professional databases, and manage source code effectively with Git.

📚 Table of Contents

🐍 1. Python Lists - Dynamic Data Powerhouse
🗄️ 2. Database - SQL (Part 2): Professional Database Design
- 2.1. Entity Relationship Diagram (ERD)
- 2.2. Database Normalization
🔧 3. Advanced Data Structures
🌿 4. Git & GitHub for Version Control
🧠 5. Practical Exercises
🎯 6. Week 2 Learning Assessment Matrix
🎉 Conclusion

🐍 1. Python Lists - Dynamic Data Powerhouse#

Lists are fundamental data structures in Python used to store ordered collections of items. The key characteristic of Lists is their mutability - contents can be changed after creation.

🎯 1.1. Core Characteristics of Lists#

Ordered: The items in a list have a defined order, and that order will not change. If you add new items to a list, the new items will be placed at the end of the list.
Mutable: You can change, add, and remove items in a list after it has been created.
Allow Duplicates: Since lists are indexed, they can have items with the same value.
Heterogeneous: Lists can contain elements of different data types (integers, strings, booleans, even other lists).

📊 Interactive List Structure Visualization#

💡 Interactive: Hover over elements to see exact values!

💻 1.2. Creating and Accessing Lists#

Creating Lists: Lists are created by placing comma-separated values inside square brackets [].

1
# Empty list
2
empty_list = []
3

4
# List of integers
5
numbers = [4, 5, 6, 7, 8, 9]
6

7
# Mixed data types
8
mixed_list = [True, 5, 'some string', 123.45]
9

10
# Nested list
11
nested_list = ["Happy", [2, 0, 1, 5]]

Accessing Elements (Indexing): You can access an element in a list by referring to its index number.

1
data = [4, 5, 6, 7, 8, 9]
2

3
# Access the first element
4
print(data[0])  # Output: 4
5

6
# Access the fourth element
7
print(data[3])  # Output: 7
8

9
# Access the last element using backward indexing
10
print(data[-1]) # Output: 9

Slicing: Slicing allows you to access a range of items in a list. Syntax: list[start:end:step].

1
data = [4, 5, 6, 7, 8, 9]
2

3
# Get items from the beginning to index 2 (end is exclusive)
4
print(data[:3])  # Output: [4, 5, 6]
5

6
# Get items from index 2 to index 3
7
print(data[2:4]) # Output: [6, 7]
8

9
# Get every second item
10
print(data[::2]) # Output: [4, 6, 8]

🛠️ 1.3. List Methods - Powerful Manipulation Tools#

Here’s a comprehensive table of the most common list methods:

Method	Description	Example	Time Complexity
`append(item)`	Add item to the end of list	`lst.append(5)`	O(1)
`insert(index, item)`	Insert item at specified index	`lst.insert(0, 'first')`	O(n)
`extend(iterable)`	Extend list with elements from another iterable	`lst.extend([1,2,3])`	O(k)
`remove(item)`	Remove first occurrence of item	`lst.remove('x')`	O(n)
`pop(index)`	Remove and return item at index (default: last)	`item = lst.pop()`	O(1) end, O(n) beginning
`clear()`	Remove all elements	`lst.clear()`	O(n)
`sort()`	Sort ascending (or `reverse=True`)	`lst.sort()`	O(n log n)
`reverse()`	Reverse the order of elements	`lst.reverse()`	O(n)
`copy()`	Create a shallow copy	`new_lst = lst.copy()`	O(n)
`count(item)`	Count occurrences of item	`lst.count('a')`	O(n)
`index(item)`	Find index of first occurrence	`idx = lst.index('x')`	O(n)

⚠️ Note: Operations at the beginning of list (insert(0, ...), pop(0)) have O(n) complexity due to element shifting.

Practical Examples:

1
# Adding elements
2
data = [6, 5, 7, 1]
3
data.append(4)      # data = [6, 5, 7, 1, 4]
4
data.insert(0, 10)  # data = [10, 6, 5, 7, 1, 4]
5
data.extend([9, 2]) # data = [10, 6, 5, 7, 1, 4, 9, 2]
6

7
# Updating and removing elements
8
data[1] = 99         # Update element at index 1
9
print(data)          # [10, 99, 5, 7, 1, 4, 9, 2]
10

11
data.remove(99)      # Remove value 99
12
popped_item = data.pop(2)  # Remove element at index 2
13
print(popped_item)   # 7
14

15
# Sorting
16
data.sort()
17
print(data)          # [1, 2, 4, 5, 9, 10]

🔧 1.4. Built-in Functions for Lists#

Function	Description	Example	Use Case
`len(list)`	Returns number of items in list	`len([1,2,3])` → 3	Size checking, loops
`min(list)`	Returns smallest value item	`min([5,1,9])` → 1	Finding minimum
`max(list)`	Returns largest value item	`max([5,1,9])` → 9	Finding maximum
`sum(list)`	Returns sum of numeric items	`sum([1,2,3])` → 6	Statistical calculations
`sorted(list)`	Returns new sorted list (original unchanged)	`sorted([3,1,2])` → [1,2,3]	Safe sorting
`enumerate(list)`	Returns (index, value) pairs	`list(enumerate(['a','b']))` → [(0,‘a’),(1,‘b’)]	Loops with index
`zip(list1, list2)`	Combines elements from multiple lists	`list(zip([1,2],[3,4]))` → [(1,3),(2,4)]	Parallel processing
`reversed(list)`	Returns reverse iterator	`list(reversed([1,2,3]))` → [3,2,1]	Reverse traversal

Practical Examples:

1
data = [6, 5, 7, 1, 9, 2]
2
print(len(data))      # Output: 6
3
print(sum(data))      # Output: 30
4
print(sorted(data))   # Output: [1, 2, 5, 6, 7, 9]
5

6
# Enumerate - loop with index
7
for index, value in enumerate(data):
8
    print(f"Index: {index}, Value: {value}")
9

10
# Zip - combine multiple lists
11
names = ['Alice', 'Bob']
12
ages = [25, 30]
13
for name, age in zip(names, ages):
14
    print(f"{name} is {age} years old.")

🚀 1.5. List Comprehension - Pythonic Power#

List comprehension is a concise and Pythonic way to create new lists from existing iterables.

Syntax: [expression for item in iterable if condition]

1
# Square numbers from 0 to 4
2
squares = [x**2 for x in range(5)]
3
# Output: [0, 1, 4, 9, 16]
4

5
# Filter even numbers from a list
6
data = [1, 5, -4, 3, -2]
7
evens = [x for x in data if x % 2 == 0]
8
# Output: [-4, -2]
9

10
# Apply ReLU activation function
11
def relu(x):
12
    return x if x > 0 else 0
13

14
relu_applied = [relu(x) for x in data]
15
# Output: [1, 5, 0, 3, 0]
16

17
# Nested list comprehension - 2D Matrix
18
matrix = [[i+j for j in range(3)] for i in range(3)]
19
# Output: [[0,1,2], [1,2,3], [2,3,4]]

💡 Tip: List comprehensions are typically faster than traditional for loops and result in more concise code!

📊 1.6. 2D Lists (Matrices) - Two-Dimensional Data Structure#

2D Lists are lists where each element is another list. This structure is used to represent grids, tables, or matrices in mathematics and data science.

Creating and Accessing 2D Lists:

1
# 3x3 Matrix
2
matrix = [
3
    [1, 2, 3],
4
    [4, 5, 6],
5
    [7, 8, 9]
6
]
7

8
# Access element (row 1, column 2)
9
print(matrix[1][2])  # Output: 6
10

11
# Iterate through entire matrix
12
num_rows = len(matrix)
13
num_cols = len(matrix[0])
14

15
for r in range(num_rows):
16
    for c in range(num_cols):
17
        print(matrix[r][c], end=' ')
18
    print()  # New line after each row

📈 Interactive 2D Matrix Visualization#

🎯 Interactive: Hover over cells to see [row][col] position and values!

🗄️ 2. Database - SQL (Part 2): Professional Database Design#

This section focuses on the design and structure of relational databases, with emphasis on Entity Relationship Diagrams (ERD) and Database Normalization - two core skills for every Database Developer.

🎯 2.1. Entity Relationship Diagram (ERD) - Visual Language#

ERD is a visual model that represents the structure of a database. It illustrates entities, attributes, and relationships between them.

Component	Symbol	Description	Real-World Example
Entity	Rectangle	Object or concept in the real world	Customer, Product, Order
Weak Entity	Double rectangle	Entity that cannot be uniquely identified on its own	OrderDetail (depends on Order)
Attribute	Oval	Property or characteristic of an entity	first_name, unit_price, email
Key Attribute	Underlined oval	Unique identifier attribute (Primary Key)	customer_id, product_code
Multivalued Attribute	Double oval	Attribute that can contain multiple values	phone_numbers, skills
Derived Attribute	Dashed oval	Attribute that can be calculated from other attributes	age (from birth_date), total_amount
Relationship	Diamond	Connection between two or more entities	Customer places Order, Product belongs to Category

🔄 Weak Entities in ERD#

A Weak Entity is an entity that cannot be uniquely identified by its attributes alone and must rely on a relationship with another entity.

Characteristics:

Represented by a double rectangle (double-lined box)
Depends on a strong (parent) entity
Primary key is a combination of foreign key (link to strong entity) and a discriminator (partial key)

Real-world Examples:

OrderDetail as a weak entity dependent on Order
Seat in a Flight (seat numbers only make sense within the context of a specific flight)
RoomNumber in a Building (room numbers are only meaningful within a specific building)

The relationship between weak entities and their strong entities:

Strong Entity	Relationship	Weak Entity	Discriminator
Order	Contains	OrderDetail	Product (part of composite key)
Book	Has	Chapter	Chapter Number
Employee	Manages	Dependent	Dependent Name

💡 Note: A weak entity always has an existence dependency relationship with its strong entity. This means that if the strong entity is deleted, all associated weak entities must also be deleted.

📊 Interactive ERD Visualization - Student Enrollment System#

💡 Interactive: Hover over components to see details about Entity, Attribute, and Relationship!

🔗 Cardinality (Relationships)#

Cardinality defines the number of entities that can participate in a relationship:

Type	Symbol	Description	Real-World Example
One-to-One (1:1)	1:1	Each entity relates to exactly one other entity	Person - Passport
One-to-Many (1)	1	One entity can relate to multiple other entities	Customer - Orders
Many-to-Many (M)	M	Multiple entities can relate to multiple other entities	Student - Courses

⚠️ Note: M relationships are typically resolved by creating a junction table (associative entity).

🏗️ 2.2. Database Normalization - Structural Excellence#

Normalization is the process of organizing columns and tables in a relational database to minimize data redundancy. It involves dividing large tables into smaller, well-structured tables and defining relationships between them.

Why Normalize?

✅ Minimize Redundancy: Avoid storing the same data in multiple places.

✅ Prevent Anomalies:

🚫 Database Anomalies - Why Normalization Matters#

Non-normalized data typically leads to three types of anomalies:

1. Insertion Anomaly:

Problem: Cannot add new data because related information is missing
Example: In a combined StudentCourses table, you cannot add a new student if they haven’t enrolled in any courses yet (because course information is required)
Solution: Split into separate Students and Enrollments tables

2. Update Anomaly:

Problem: A change requires updating multiple rows, easy to create inconsistency
Example: Changing an instructor’s name requires updating it in every student record for that course
Solution: Extract instructor information to a separate table and use foreign keys

3. Deletion Anomaly:

Problem: Deleting a record accidentally removes other essential information
Example: Removing the record of the last student in a course would also delete all information about that course
Solution: Separate data into linked tables

⚠️ Real-world Impact: These anomalies not only cause data integrity issues but can lead to permanent data loss, maintenance difficulties, and challenges when scaling your database.

Normal Forms:

🥇 First Normal Form (1NF)#

✅ Table must have a Primary Key
✅ Each cell contains only single, atomic values (no lists or repeating groups)
✅ All columns must contain values of the same data type

🥈 Second Normal Form (2NF)#

✅ Must satisfy 1NF
✅ All non-key attributes must be fully functionally dependent on the entire Primary Key
✅ Eliminate Partial Dependencies (applies to Composite Primary Keys)

🥉 Third Normal Form (3NF)#

✅ Must satisfy 2NF
✅ No Transitive Dependencies
✅ Non-key attributes must not depend on other non-key attributes
✅ Example: StudentID → ZipCode → City is a transitive dependency

📊 Interactive Normalization Process Visualization#

🔧 3. Advanced Data Structures#

This section delves into other built-in Python data structures: Tuples, Sets, and Dictionaries.

🎯 3.1. Tuples - Immutable Sequences#

Tuples are ordered collections of items, similar to lists. The key difference is that tuples are immutable, meaning they cannot be changed after creation.

Characteristics:

Ordered: Items have a defined order and maintain that order
Immutable: Cannot be changed after creation
Allow Duplicates: Can contain duplicate values
Heterogeneous: Can store different data types

Syntax: Created with parentheses ()

Use Cases:

Ideal for data that should not change
Returning multiple values from functions
Using as dictionary keys (since they’re immutable)
Representing coordinates or fixed data structures

1
# Creating a tuple
2
person = ("Camedinh", 2004, "Student")
3

4
# Accessing elements (same as lists)
5
print(person[0])  # Output: Camedinh
6

7
# Tuples are immutable
8
# person[0] = "Hacked"  # This will raise a TypeError
9

10
# Unpacking a tuple
11
name, age, occupation = person
12
print(age)  # Output: 2004
13

14
# Creating a single-item tuple (note the comma)
15
single_tuple = (42,)
16

17
# Tuple methods
18
coordinates = (10, 20, 10, 30)
19
print(coordinates.count(10))  # Output: 2
20
print(coordinates.index(20))  # Output: 1

🎯 3.2. Sets - Unique Collections#

Sets are unordered collections of unique items. They are mutable and perfect for mathematical set operations.

Characteristics:

Unordered: No defined order (items can appear in any sequence)
Mutable: Can add and remove items
No Duplicates: Automatically removes duplicate values
Fast Membership Testing: O(1) average time complexity

Syntax: Created with curly braces {} or set()

Use Cases:

Membership testing
Removing duplicates from data
Mathematical set operations (union, intersection, difference)
Finding unique elements

1
# Creating a set (duplicates are automatically removed)
2
animals = {"cat", "dog", "tiger", "cat"}
3
print(animals)  # Output: {'cat', 'dog', 'tiger'}
4

5
# Adding items
6
animals.add("bear")
7
animals.update({"chicken", "duck"})
8

9
# Set Operations
10
set1 = {1, 2, 3}
11
set2 = {3, 4, 5}
12

13
# Union (all unique elements from both sets)
14
print(set1 | set2)  # Output: {1, 2, 3, 4, 5}
15
print(set1.union(set2))  # Alternative syntax
16

17
# Intersection (common elements)
18
print(set1 & set2)  # Output: {3}
19
print(set1.intersection(set2))  # Alternative syntax
20

21
# Difference (elements in set1 but not in set2)
22
print(set1 - set2)  # Output: {1, 2}
23
print(set1.difference(set2))  # Alternative syntax
24

25
# Symmetric difference (elements in either set, but not both)
26
print(set1 ^ set2)  # Output: {1, 2, 4, 5}
27

28
# Membership testing (very fast)
29
print("cat" in animals)  # Output: True

🎯 3.3. Dictionaries - Key-Value Powerhouse#

Dictionaries are ordered (as of Python 3.7+) collections of key-value pairs. They are mutable and provide fast lookups.

Characteristics:

Ordered: Maintain insertion order (Python 3.7+)
Mutable: Can add, modify, and remove items
Keys must be unique: Each key can appear only once
Keys must be immutable: Strings, numbers, tuples (but not lists)
Fast Access: O(1) average time complexity for lookups

Syntax: Created with curly braces {key: value}

Use Cases:

Storing structured data with fast lookups
Mapping relationships between data
Caching and memoization
Configuration settings
Counting and frequency analysis

🔄 3.3.1. Shallow vs Deep Copy in Dictionaries#

When working with dictionaries, especially those containing nested data structures, understanding the difference between shallow and deep copying is crucial:

Copy methods:

Assignment (=): Creates a new reference to the same dictionary object

1
original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}
2
reference = original_dict  # Same reference to one object
3
reference['name'] = 'Bob'  # Change affects original_dict
4
print(original_dict['name'])  # Output: 'Bob'

Shallow Copy (.copy()): Creates a new dictionary object, but for nested objects (like lists), only references are copied

1
original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}
2
shallow = original_dict.copy()  # or dict(original_dict)
3

4
# Changing simple values doesn't affect the original
5
shallow['name'] = 'Bob'
6
print(original_dict['name'])  # Output: 'Alice'
7

8
# But changing nested objects WILL affect original!
9
shallow['grades'].append(70)
10
print(original_dict['grades'])  # Output: [90, 85, 95, 70]

Deep Copy (copy.deepcopy()): Creates a completely independent copy of the dictionary and all nested objects

1
import copy
2
original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}
3
deep = copy.deepcopy(original_dict)
4

5
# Changes to nested objects will NOT affect original
6
deep['grades'].append(70)
7
print(original_dict['grades'])  # Output: [90, 85, 95]

💡 When to use Deep Copy? Use deep copy when you need a completely independent copy, especially when working with complex data and you don’t want changes to the copy affecting the original data.

📊 Interactive Data Structure Comparison - List vs Dictionary#

💡 Interactive: Hover over list indices or dictionary keys to see access patterns!

Code Examples:

1
# Creating a dictionary
2
params = {
3
    'learning_rate': 0.1,
4
    'optimizer': 'Adam',
5
    'metric': 'Accuracy'
6
}
7

8
# Accessing a value by key
9
print(params['optimizer']) # Output: Adam
10

11
# Using .get() to avoid errors for non-existent keys
12
print(params.get('batch_size', 32))  # Output: 32 (default value)
13

14
# Adding or updating an item
15
params['epochs'] = 100
16
params['learning_rate'] = 0.05
17

18
# Iterating over a dictionary
19
for key, value in params.items():
20
    print(f"{key}: {value}")

🎯 3.4. Data Structure Applications#

Image Histogram with Dictionaries: A dictionary can be used to count pixel intensity frequencies. The pixel value is the key, and its count is the value.

Text Vectorization with Sets/Dictionaries: A set can create a unique vocabulary from a text corpus. A dictionary can then map each unique word to an integer index for machine learning.

Non-Maximum Suppression (NMS) with Tuples/Lists: NMS in object detection uses lists of bounding boxes. Each box can be represented as a tuple (x_min, y_min, x_max, y_max, confidence_score), which is immutable and perfect for this data.

🌿 4. Git & GitHub for Version Control#

Version Control Systems (VCS) track and manage changes to files over time. Git is a powerful, distributed VCS. GitHub is a web-based hosting service for Git repositories.

🎯 4.1. Git Fundamentals#

Core Concepts:

VCS (Version Control System): A system that records changes to a file or set of files over time so that you can recall specific versions later.
Distributed VCS (DVCS): Systems (like Git) where clients don’t just check out the latest snapshot of the files; they fully mirror the repository.
Repository (Repo): A directory where your project’s files and their revision history are stored.
Commit: A snapshot of your files at a specific point in time.
Branch: A lightweight movable pointer to a commit. It allows for developing features in isolation. main is the default branch.
SHA-1 Hash: A unique 40-character identifier for every commit, file, and object in Git, ensuring data integrity.

🔄 Interactive Git Workflow Visualization#

🔄 Interactive: Hover over each stage to understand the Git workflow process!

🛡️ 4.1.1. Core Principles of Git#

Git is built upon revolutionary foundational principles of version control:

Snapshots, Not Differences: Unlike traditional VCS systems that store differences (diffs) between versions, Git stores complete “snapshots” of the entire project at each commit. This enables faster operations and efficient branching.
Data Integrity: Everything in Git is checksummed using SHA-1 hash before it is stored. This means:
- Content cannot be changed without Git detecting it
- Each commit has a unique SHA-1 hash identifier, ensuring consistency
- Protection against transport errors and prevents history tampering
Offline-First Design: Most operations in Git (commits, branches, merges) occur locally without needing internet connection. Benefits include:
- Significantly faster processing
- Flexibility to work without network dependency
- Autonomy in deciding when to synchronize with remote repositories

📘 Design Philosophy: Git was designed by Linus Torvalds with a focus on speed, simple design, support for non-linear development, and the ability to handle large projects efficiently.

💻 Basic Git Workflow#

Initialize or Clone a Repository:

1
# Initialize a new repository in the current directory
2
git init
3

4
# Clone an existing repository from a URL
5
git clone https://github.com/user/repo.git

Make Changes and Stage Them:

1
# Check the status of your files
2
git status
3

4
# Add a specific file to the staging area
5
git add <filename>
6

7
# Add all modified files to the staging area
8
git add .

Commit the Changes:

1
# Commit staged changes with a message
2
git commit -m "Your descriptive commit message"

Work with Remotes:

1
# List your remote repositories (usually 'origin')
2
git remote -v
3

4
# Push your committed changes to the remote repository
5
git push origin main
6

7
# Pull changes from the remote repository
8
git pull origin main

🌿 4.2. Git Branching & Merging#

Branching is a core feature of Git that allows for parallel development.

🌿 Interactive Git Branching & Merging#

🌿 Interactive: Hover over commits to see details and understand the complete branching workflow!

Branching Workflow:

1
# Create a new branch and switch to it
2
git checkout -b feature-x
3

4
# Work on the feature branch (add, commit)
5
git add .
6
git commit -m "Add new feature"
7

8
# Switch back to the main branch
9
git checkout main
10

11
# Merge the feature branch into main
12
git merge feature-x
13

14
# Delete the feature branch (optional)
15
git branch -d feature-x

Conflict Resolution: If the same lines are changed in both branches, a merge conflict occurs. Git will mark the conflicted areas in the file. You must manually edit the file to resolve the conflict, then git add and git commit to finalize the merge.

🔀 4.4. Popular Branching Models#

Model	Description	Best For
GitFlow	Robust model with dedicated branches for features, releases, and hotfixes	Projects with scheduled releases, complex workflows
GitHub Flow	Simple model where `main` is always deployable. Feature branches merged via PRs	Continuous deployment, CI/CD pipelines
Trunk-Based Development	Small, frequent commits to single `main` branch with feature flags	High-frequency releases, advanced teams

🤝 4.5. Collaboration with GitHub Fork & Pull Request Workflow#

Key Concepts:

Fork: Create a personal copy of someone else’s repository
Pull Request (PR): Propose changes from your fork or branch back to the original repository

Complete Workflow:

Fork the target repository on GitHub
Clone your forked repository to your local machine
Create a new branch for your changes
Make your changes, commit, and push to your forked repository
Open a Pull Request from your branch to the original repository’s main branch
Participate in code review and discussion. Once approved, your changes will be merged

1
# Step 1: Fork on GitHub (done through web interface)
2

3
# Step 2: Clone your fork
4
git clone https://github.com/YOUR_USERNAME/REPO_NAME.git
5

6
# Step 3: Create feature branch
7
git checkout -b feature/new-login
8

9
# Step 4: Make changes and commit
10
git add .
11
git commit -m "Add login functionality"
12
git push origin feature/new-login
13

14
# Step 5: Create Pull Request (done through GitHub web interface)

🧠 5. Practical Exercises#

This section summarizes the problems and solution approaches from the TA session.

🔍 5.1. Sliding Window Technique#

Problem: Given a list of numbers and a window size k, find the maximum value in the sliding window as it moves from left to right.

Solution Approach:

Iterate through the list to create slices (sub-lists) of size k
For each sub-list, calculate the max() value
Append the result to a new list
The number of iterations will be len(num_list) - k + 1

1
def max_kernel(num_list, k):
2
    result = []
3
    for i in range(len(num_list) - k + 1):
4
        kernel = num_list[i : i + k]
5
        result.append(max(kernel))
6
    return result
7

8
# Example
9
print(max_kernel([3, 4, 5, 1, -44], 3))  # Output: [5, 5, 5]

📊 5.2. Two Pointers Technique#

Problem: Count the occurrences of each character in a given string.

Solution Approach:

Initialize an empty dictionary
Iterate through each character of the string
Use the character as a key in the dictionary. If the key exists, increment its value. If not, add it to the dictionary with a value of 1
It’s good practice to convert the string to lowercase first to treat A and a as the same character

1
def count_characters(word):
2
    char_stats = {}
3
    for char in word.lower():
4
        char_stats[char] = char_stats.get(char, 0) + 1
5
    return char_stats
6

7
# Example
8
print(count_characters("Baby"))  # Output: {'b': 2, 'a': 1, 'y': 1}

📝 5.3. Dynamic Programming#

Problem: Count the occurrences of each word in a text file.

Solution Approach:

Read the content of the file
Preprocess the text: convert to lowercase, remove punctuation
Split the text into a list of words
Use the same dictionary-based counting method as in character counting

1
def count_words_in_file(filepath):
2
    with open(filepath, 'r') as f:
3
        text = f.read()
4

5
    # Preprocessing
6
    text = text.lower()
7
    text = text.replace('.', '').replace(',', '')
8
    words = text.split()
9

10
    word_counts = {}
11
    for word in words:
12
        word_counts[word] = word_counts.get(word, 0) + 1
13
    return word_counts

🧮 5.4. Levenshtein Distance#

Problem: Calculate the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.

Solution Approach (Dynamic Programming):

Create a 2D matrix (a list of lists) of size (len(source)+1) x (len(target)+1)
Initialize the first row and column with 0, 1, 2, … representing the cost of transforming an empty string to a prefix of the target/source
Iterate through the matrix. For each cell D[i, j], calculate the cost:
- If source[i-1] == target[j-1], the cost is D[i-1, j-1] (no operation)
- Otherwise, the cost is 1 + min(D[i-1, j], D[i, j-1], D[i-1, j-1]), which corresponds to the minimum cost of a deletion, insertion, or substitution
The final value at D[len(source)][len(target)] is the Levenshtein distance

🧮 5.4.1. Recursive Formula for Levenshtein Distance#

The Levenshtein Distance algorithm is based on dynamic programming principles. The distance is calculated by building a matrix D, where D[i,j] represents the distance between the first i characters of source and the first j characters of target.

Recursive Formula:

Base Cases:
- D[i,0] = i (cost of transforming i characters to an empty string)
- D[0,j] = j (cost of transforming an empty string to j characters)

General Formula: For each cell D[i,j]:

1
D[i,j] = min(
2
    D[i-1,j] + 1,          // Deletion (remove character from source)
3
    D[i,j-1] + 1,          // Insertion (add character to source)
4
    D[i-1,j-1] + cost      // Substitution/Match
5
)

Where cost = 0 if source[i-1] = target[j-1], otherwise cost = 1.

Practical Applications:

Spell checking and correction
Fuzzy search functionality
Plagiarism detection
DNA and protein sequence comparison in computational biology

🔍 Implementation Details: This algorithm has a time complexity of O(m×n) and space complexity of O(m×n), where m and n are the lengths of the two strings. However, when only the final value is needed (not the edit path), the space complexity can be reduced to O(min(m,n)).

🧮 Interactive Levenshtein Distance Matrix#

🎯 6. Week 2 Learning Assessment Matrix#

📊 Interactive Self-Assessment Tool#

🎉 Conclusion#

Week 2 of AIO2025 has covered fundamental building blocks of programming and data management:

🔑 Key Takeaways#

Python Lists Mastery: From basic operations to advanced comprehensions and 2D matrices
Database Design Excellence: ERD modeling and normalization principles for robust data architecture
Advanced Data Structures: Tuples, sets, and dictionaries for efficient data manipulation
Version Control Proficiency: Git and GitHub for collaborative development
Problem-Solving Skills: Practical algorithms from sliding windows to dynamic programming

🚀 Next Steps#

Practice implementing the covered algorithms with different datasets
Design your own database schema using ERD principles
Create a personal Git repository for your AIO2025 journey
Combine these skills in real-world projects

📚 Additional Resources#

Python Documentation: docs.python.org
Database Design: “Database Design for Mere Mortals” by Michael Hernandez
Git Learning: Git Interactive Tutorial
Practice Problems: LeetCode, HackerRank algorithm challenges

💪 Challenge: Try implementing a complete student management system using all the concepts learned this week - from data structures to database design to version control!

Happy coding, and see you in Week 3 of the AIO2025 journey! 🎯