12500 words
63 minutes
📚 AIO2025 - Week 2: Python Lists, SQL & Git Mastery

Welcome to Week 2 of the AIO2025 journey! This week, we’ll explore three fundamental pillars of programming and software development: Python Lists & Data Structures, Database Design with SQL, and Git Version Control.

💡 Learning Objectives: Master Python data structures, design professional databases, and manage source code effectively with Git.

🐍 1. Python Lists - Dynamic Data Powerhouse#

Lists are fundamental data structures in Python used to store ordered collections of items. The key characteristic of Lists is their mutability - contents can be changed after creation.

🎯 1.1. Core Characteristics of Lists#

  • Ordered: The items in a list have a defined order, and that order will not change. If you add new items to a list, the new items will be placed at the end of the list.
  • Mutable: You can change, add, and remove items in a list after it has been created.
  • Allow Duplicates: Since lists are indexed, they can have items with the same value.
  • Heterogeneous: Lists can contain elements of different data types (integers, strings, booleans, even other lists).

📊 Interactive List Structure Visualization#

💡 Interactive: Hover over elements to see exact values!

💻 1.2. Creating and Accessing Lists#

Creating Lists: Lists are created by placing comma-separated values inside square brackets [].

# Empty list
empty_list = []
# List of integers
numbers = [4, 5, 6, 7, 8, 9]
# Mixed data types
mixed_list = [True, 5, 'some string', 123.45]
# Nested list
nested_list = ["Happy", [2, 0, 1, 5]]

Accessing Elements (Indexing): You can access an element in a list by referring to its index number.

data = [4, 5, 6, 7, 8, 9]
# Access the first element
print(data[0]) # Output: 4
# Access the fourth element
print(data[3]) # Output: 7
# Access the last element using backward indexing
print(data[-1]) # Output: 9

Slicing: Slicing allows you to access a range of items in a list. Syntax: list[start:end:step].

data = [4, 5, 6, 7, 8, 9]
# Get items from the beginning to index 2 (end is exclusive)
print(data[:3]) # Output: [4, 5, 6]
# Get items from index 2 to index 3
print(data[2:4]) # Output: [6, 7]
# Get every second item
print(data[::2]) # Output: [4, 6, 8]

🛠️ 1.3. List Methods - Powerful Manipulation Tools#

Here’s a comprehensive table of the most common list methods:

MethodDescriptionExampleTime Complexity
append(item)Add item to the end of listlst.append(5)O(1)
insert(index, item)Insert item at specified indexlst.insert(0, 'first')O(n)
extend(iterable)Extend list with elements from another iterablelst.extend([1,2,3])O(k)
remove(item)Remove first occurrence of itemlst.remove('x')O(n)
pop(index)Remove and return item at index (default: last)item = lst.pop()O(1) end, O(n) beginning
clear()Remove all elementslst.clear()O(n)
sort()Sort ascending (or reverse=True)lst.sort()O(n log n)
reverse()Reverse the order of elementslst.reverse()O(n)
copy()Create a shallow copynew_lst = lst.copy()O(n)
count(item)Count occurrences of itemlst.count('a')O(n)
index(item)Find index of first occurrenceidx = lst.index('x')O(n)

⚠️ Note: Operations at the beginning of list (insert(0, ...), pop(0)) have O(n) complexity due to element shifting.

Practical Examples:

# Adding elements
data = [6, 5, 7, 1]
data.append(4) # data = [6, 5, 7, 1, 4]
data.insert(0, 10) # data = [10, 6, 5, 7, 1, 4]
data.extend([9, 2]) # data = [10, 6, 5, 7, 1, 4, 9, 2]
# Updating and removing elements
data[1] = 99 # Update element at index 1
print(data) # [10, 99, 5, 7, 1, 4, 9, 2]
data.remove(99) # Remove value 99
popped_item = data.pop(2) # Remove element at index 2
print(popped_item) # 7
# Sorting
data.sort()
print(data) # [1, 2, 4, 5, 9, 10]

🔧 1.4. Built-in Functions for Lists#

FunctionDescriptionExampleUse Case
len(list)Returns number of items in listlen([1,2,3]) → 3Size checking, loops
min(list)Returns smallest value itemmin([5,1,9]) → 1Finding minimum
max(list)Returns largest value itemmax([5,1,9]) → 9Finding maximum
sum(list)Returns sum of numeric itemssum([1,2,3]) → 6Statistical calculations
sorted(list)Returns new sorted list (original unchanged)sorted([3,1,2]) → [1,2,3]Safe sorting
enumerate(list)Returns (index, value) pairslist(enumerate(['a','b'])) → [(0,‘a’),(1,‘b’)]Loops with index
zip(list1, list2)Combines elements from multiple listslist(zip([1,2],[3,4])) → [(1,3),(2,4)]Parallel processing
reversed(list)Returns reverse iteratorlist(reversed([1,2,3])) → [3,2,1]Reverse traversal

Practical Examples:

data = [6, 5, 7, 1, 9, 2]
print(len(data)) # Output: 6
print(sum(data)) # Output: 30
print(sorted(data)) # Output: [1, 2, 5, 6, 7, 9]
# Enumerate - loop with index
for index, value in enumerate(data):
print(f"Index: {index}, Value: {value}")
# Zip - combine multiple lists
names = ['Alice', 'Bob']
ages = [25, 30]
for name, age in zip(names, ages):
print(f"{name} is {age} years old.")

🚀 1.5. List Comprehension - Pythonic Power#

List comprehension is a concise and Pythonic way to create new lists from existing iterables.

Syntax: [expression for item in iterable if condition]

# Square numbers from 0 to 4
squares = [x**2 for x in range(5)]
# Output: [0, 1, 4, 9, 16]
# Filter even numbers from a list
data = [1, 5, -4, 3, -2]
evens = [x for x in data if x % 2 == 0]
# Output: [-4, -2]
# Apply ReLU activation function
def relu(x):
return x if x > 0 else 0
relu_applied = [relu(x) for x in data]
# Output: [1, 5, 0, 3, 0]
# Nested list comprehension - 2D Matrix
matrix = [[i+j for j in range(3)] for i in range(3)]
# Output: [[0,1,2], [1,2,3], [2,3,4]]

💡 Tip: List comprehensions are typically faster than traditional for loops and result in more concise code!

📊 1.6. 2D Lists (Matrices) - Two-Dimensional Data Structure#

2D Lists are lists where each element is another list. This structure is used to represent grids, tables, or matrices in mathematics and data science.

Creating and Accessing 2D Lists:

# 3x3 Matrix
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
# Access element (row 1, column 2)
print(matrix[1][2]) # Output: 6
# Iterate through entire matrix
num_rows = len(matrix)
num_cols = len(matrix[0])
for r in range(num_rows):
for c in range(num_cols):
print(matrix[r][c], end=' ')
print() # New line after each row

📈 Interactive 2D Matrix Visualization#

🎯 Interactive: Hover over cells to see [row][col] position and values!


🗄️ 2. Database - SQL (Part 2): Professional Database Design#

This section focuses on the design and structure of relational databases, with emphasis on Entity Relationship Diagrams (ERD) and Database Normalization - two core skills for every Database Developer.

🎯 2.1. Entity Relationship Diagram (ERD) - Visual Language#

ERD is a visual model that represents the structure of a database. It illustrates entities, attributes, and relationships between them.

ComponentSymbolDescriptionReal-World Example
EntityRectangleObject or concept in the real worldCustomer, Product, Order
Weak EntityDouble rectangleEntity that cannot be uniquely identified on its ownOrderDetail (depends on Order)
AttributeOvalProperty or characteristic of an entityfirst_name, unit_price, email
Key AttributeUnderlined ovalUnique identifier attribute (Primary Key)customer_id, product_code
Multivalued AttributeDouble ovalAttribute that can contain multiple valuesphone_numbers, skills
Derived AttributeDashed ovalAttribute that can be calculated from other attributesage (from birth_date), total_amount
RelationshipDiamondConnection between two or more entitiesCustomer places Order, Product belongs to Category

🔄 Weak Entities in ERD#

A Weak Entity is an entity that cannot be uniquely identified by its attributes alone and must rely on a relationship with another entity.

Characteristics:

  • Represented by a double rectangle (double-lined box)
  • Depends on a strong (parent) entity
  • Primary key is a combination of foreign key (link to strong entity) and a discriminator (partial key)

Real-world Examples:

  • OrderDetail as a weak entity dependent on Order
  • Seat in a Flight (seat numbers only make sense within the context of a specific flight)
  • RoomNumber in a Building (room numbers are only meaningful within a specific building)

The relationship between weak entities and their strong entities:

Strong EntityRelationshipWeak EntityDiscriminator
OrderContainsOrderDetailProduct (part of composite key)
BookHasChapterChapter Number
EmployeeManagesDependentDependent Name

💡 Note: A weak entity always has an existence dependency relationship with its strong entity. This means that if the strong entity is deleted, all associated weak entities must also be deleted.

📊 Interactive ERD Visualization - Student Enrollment System#

💡 Interactive: Hover over components to see details about Entity, Attribute, and Relationship!

🔗 Cardinality (Relationships)#

Cardinality defines the number of entities that can participate in a relationship:

TypeSymbolDescriptionReal-World Example
One-to-One (1:1)1:1Each entity relates to exactly one other entityPerson - Passport
One-to-Many (1)1One entity can relate to multiple other entitiesCustomer - Orders
Many-to-Many (M)MMultiple entities can relate to multiple other entitiesStudent - Courses

⚠️ Note: M relationships are typically resolved by creating a junction table (associative entity).

🏗️ 2.2. Database Normalization - Structural Excellence#

Normalization is the process of organizing columns and tables in a relational database to minimize data redundancy. It involves dividing large tables into smaller, well-structured tables and defining relationships between them.

Why Normalize?

Minimize Redundancy: Avoid storing the same data in multiple places.

Prevent Anomalies:

🚫 Database Anomalies - Why Normalization Matters#

Non-normalized data typically leads to three types of anomalies:

1. Insertion Anomaly:

  • Problem: Cannot add new data because related information is missing
  • Example: In a combined StudentCourses table, you cannot add a new student if they haven’t enrolled in any courses yet (because course information is required)
  • Solution: Split into separate Students and Enrollments tables

2. Update Anomaly:

  • Problem: A change requires updating multiple rows, easy to create inconsistency
  • Example: Changing an instructor’s name requires updating it in every student record for that course
  • Solution: Extract instructor information to a separate table and use foreign keys

3. Deletion Anomaly:

  • Problem: Deleting a record accidentally removes other essential information
  • Example: Removing the record of the last student in a course would also delete all information about that course
  • Solution: Separate data into linked tables

⚠️ Real-world Impact: These anomalies not only cause data integrity issues but can lead to permanent data loss, maintenance difficulties, and challenges when scaling your database.

Normal Forms:

🥇 First Normal Form (1NF)#

  • ✅ Table must have a Primary Key
  • ✅ Each cell contains only single, atomic values (no lists or repeating groups)
  • ✅ All columns must contain values of the same data type

🥈 Second Normal Form (2NF)#

  • ✅ Must satisfy 1NF
  • ✅ All non-key attributes must be fully functionally dependent on the entire Primary Key
  • ✅ Eliminate Partial Dependencies (applies to Composite Primary Keys)

🥉 Third Normal Form (3NF)#

  • ✅ Must satisfy 2NF
  • ✅ No Transitive Dependencies
  • ✅ Non-key attributes must not depend on other non-key attributes
  • ✅ Example: StudentID → ZipCode → City is a transitive dependency

📊 Interactive Normalization Process Visualization#


🔧 3. Advanced Data Structures#

This section delves into other built-in Python data structures: Tuples, Sets, and Dictionaries.

🎯 3.1. Tuples - Immutable Sequences#

Tuples are ordered collections of items, similar to lists. The key difference is that tuples are immutable, meaning they cannot be changed after creation.

Characteristics:

  • Ordered: Items have a defined order and maintain that order
  • Immutable: Cannot be changed after creation
  • Allow Duplicates: Can contain duplicate values
  • Heterogeneous: Can store different data types

Syntax: Created with parentheses ()

Use Cases:

  • Ideal for data that should not change
  • Returning multiple values from functions
  • Using as dictionary keys (since they’re immutable)
  • Representing coordinates or fixed data structures
# Creating a tuple
person = ("Camedinh", 2004, "Student")
# Accessing elements (same as lists)
print(person[0]) # Output: Camedinh
# Tuples are immutable
# person[0] = "Hacked" # This will raise a TypeError
# Unpacking a tuple
name, age, occupation = person
print(age) # Output: 2004
# Creating a single-item tuple (note the comma)
single_tuple = (42,)
# Tuple methods
coordinates = (10, 20, 10, 30)
print(coordinates.count(10)) # Output: 2
print(coordinates.index(20)) # Output: 1

🎯 3.2. Sets - Unique Collections#

Sets are unordered collections of unique items. They are mutable and perfect for mathematical set operations.

Characteristics:

  • Unordered: No defined order (items can appear in any sequence)
  • Mutable: Can add and remove items
  • No Duplicates: Automatically removes duplicate values
  • Fast Membership Testing: O(1) average time complexity

Syntax: Created with curly braces {} or set()

Use Cases:

  • Membership testing
  • Removing duplicates from data
  • Mathematical set operations (union, intersection, difference)
  • Finding unique elements
# Creating a set (duplicates are automatically removed)
animals = {"cat", "dog", "tiger", "cat"}
print(animals) # Output: {'cat', 'dog', 'tiger'}
# Adding items
animals.add("bear")
animals.update({"chicken", "duck"})
# Set Operations
set1 = {1, 2, 3}
set2 = {3, 4, 5}
# Union (all unique elements from both sets)
print(set1 | set2) # Output: {1, 2, 3, 4, 5}
print(set1.union(set2)) # Alternative syntax
# Intersection (common elements)
print(set1 & set2) # Output: {3}
print(set1.intersection(set2)) # Alternative syntax
# Difference (elements in set1 but not in set2)
print(set1 - set2) # Output: {1, 2}
print(set1.difference(set2)) # Alternative syntax
# Symmetric difference (elements in either set, but not both)
print(set1 ^ set2) # Output: {1, 2, 4, 5}
# Membership testing (very fast)
print("cat" in animals) # Output: True

🎯 3.3. Dictionaries - Key-Value Powerhouse#

Dictionaries are ordered (as of Python 3.7+) collections of key-value pairs. They are mutable and provide fast lookups.

Characteristics:

  • Ordered: Maintain insertion order (Python 3.7+)
  • Mutable: Can add, modify, and remove items
  • Keys must be unique: Each key can appear only once
  • Keys must be immutable: Strings, numbers, tuples (but not lists)
  • Fast Access: O(1) average time complexity for lookups

Syntax: Created with curly braces {key: value}

Use Cases:

  • Storing structured data with fast lookups
  • Mapping relationships between data
  • Caching and memoization
  • Configuration settings
  • Counting and frequency analysis

🔄 3.3.1. Shallow vs Deep Copy in Dictionaries#

When working with dictionaries, especially those containing nested data structures, understanding the difference between shallow and deep copying is crucial:

Copy methods:

  1. Assignment (=): Creates a new reference to the same dictionary object

    original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}
    reference = original_dict # Same reference to one object
    reference['name'] = 'Bob' # Change affects original_dict
    print(original_dict['name']) # Output: 'Bob'
  2. Shallow Copy (.copy()): Creates a new dictionary object, but for nested objects (like lists), only references are copied

    original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}
    shallow = original_dict.copy() # or dict(original_dict)
    # Changing simple values doesn't affect the original
    shallow['name'] = 'Bob'
    print(original_dict['name']) # Output: 'Alice'
    # But changing nested objects WILL affect original!
    shallow['grades'].append(70)
    print(original_dict['grades']) # Output: [90, 85, 95, 70]
  3. Deep Copy (copy.deepcopy()): Creates a completely independent copy of the dictionary and all nested objects

    import copy
    original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}
    deep = copy.deepcopy(original_dict)
    # Changes to nested objects will NOT affect original
    deep['grades'].append(70)
    print(original_dict['grades']) # Output: [90, 85, 95]

💡 When to use Deep Copy? Use deep copy when you need a completely independent copy, especially when working with complex data and you don’t want changes to the copy affecting the original data.

📊 Interactive Data Structure Comparison - List vs Dictionary#

💡 Interactive: Hover over list indices or dictionary keys to see access patterns!

Code Examples:

# Creating a dictionary
params = {
'learning_rate': 0.1,
'optimizer': 'Adam',
'metric': 'Accuracy'
}
# Accessing a value by key
print(params['optimizer']) # Output: Adam
# Using .get() to avoid errors for non-existent keys
print(params.get('batch_size', 32)) # Output: 32 (default value)
# Adding or updating an item
params['epochs'] = 100
params['learning_rate'] = 0.05
# Iterating over a dictionary
for key, value in params.items():
print(f"{key}: {value}")

🎯 3.4. Data Structure Applications#

Image Histogram with Dictionaries: A dictionary can be used to count pixel intensity frequencies. The pixel value is the key, and its count is the value.

Text Vectorization with Sets/Dictionaries: A set can create a unique vocabulary from a text corpus. A dictionary can then map each unique word to an integer index for machine learning.

Non-Maximum Suppression (NMS) with Tuples/Lists: NMS in object detection uses lists of bounding boxes. Each box can be represented as a tuple (x_min, y_min, x_max, y_max, confidence_score), which is immutable and perfect for this data.


🌿 4. Git & GitHub for Version Control#

Version Control Systems (VCS) track and manage changes to files over time. Git is a powerful, distributed VCS. GitHub is a web-based hosting service for Git repositories.

🎯 4.1. Git Fundamentals#

Core Concepts:

  • VCS (Version Control System): A system that records changes to a file or set of files over time so that you can recall specific versions later.
  • Distributed VCS (DVCS): Systems (like Git) where clients don’t just check out the latest snapshot of the files; they fully mirror the repository.
  • Repository (Repo): A directory where your project’s files and their revision history are stored.
  • Commit: A snapshot of your files at a specific point in time.
  • Branch: A lightweight movable pointer to a commit. It allows for developing features in isolation. main is the default branch.
  • SHA-1 Hash: A unique 40-character identifier for every commit, file, and object in Git, ensuring data integrity.

🔄 Interactive Git Workflow Visualization#

🔄 Interactive: Hover over each stage to understand the Git workflow process!

🛡️ 4.1.1. Core Principles of Git#

Git is built upon revolutionary foundational principles of version control:

  1. Snapshots, Not Differences: Unlike traditional VCS systems that store differences (diffs) between versions, Git stores complete “snapshots” of the entire project at each commit. This enables faster operations and efficient branching.

  2. Data Integrity: Everything in Git is checksummed using SHA-1 hash before it is stored. This means:

    • Content cannot be changed without Git detecting it
    • Each commit has a unique SHA-1 hash identifier, ensuring consistency
    • Protection against transport errors and prevents history tampering
  3. Offline-First Design: Most operations in Git (commits, branches, merges) occur locally without needing internet connection. Benefits include:

    • Significantly faster processing
    • Flexibility to work without network dependency
    • Autonomy in deciding when to synchronize with remote repositories

📘 Design Philosophy: Git was designed by Linus Torvalds with a focus on speed, simple design, support for non-linear development, and the ability to handle large projects efficiently.

💻 Basic Git Workflow#

Initialize or Clone a Repository:

Terminal window
# Initialize a new repository in the current directory
git init
# Clone an existing repository from a URL
git clone https://github.com/user/repo.git

Make Changes and Stage Them:

Terminal window
# Check the status of your files
git status
# Add a specific file to the staging area
git add <filename>
# Add all modified files to the staging area
git add .

Commit the Changes:

Terminal window
# Commit staged changes with a message
git commit -m "Your descriptive commit message"

Work with Remotes:

Terminal window
# List your remote repositories (usually 'origin')
git remote -v
# Push your committed changes to the remote repository
git push origin main
# Pull changes from the remote repository
git pull origin main

🌿 4.2. Git Branching & Merging#

Branching is a core feature of Git that allows for parallel development.

🌿 Interactive Git Branching & Merging#

🌿 Interactive: Hover over commits to see details and understand the complete branching workflow!

Branching Workflow:

Terminal window
# Create a new branch and switch to it
git checkout -b feature-x
# Work on the feature branch (add, commit)
git add .
git commit -m "Add new feature"
# Switch back to the main branch
git checkout main
# Merge the feature branch into main
git merge feature-x
# Delete the feature branch (optional)
git branch -d feature-x

Conflict Resolution: If the same lines are changed in both branches, a merge conflict occurs. Git will mark the conflicted areas in the file. You must manually edit the file to resolve the conflict, then git add and git commit to finalize the merge.

ModelDescriptionBest For
GitFlowRobust model with dedicated branches for features, releases, and hotfixesProjects with scheduled releases, complex workflows
GitHub FlowSimple model where main is always deployable. Feature branches merged via PRsContinuous deployment, CI/CD pipelines
Trunk-Based DevelopmentSmall, frequent commits to single main branch with feature flagsHigh-frequency releases, advanced teams

🤝 4.5. Collaboration with GitHub Fork & Pull Request Workflow#

Key Concepts:

  • Fork: Create a personal copy of someone else’s repository
  • Pull Request (PR): Propose changes from your fork or branch back to the original repository

Complete Workflow:

  1. Fork the target repository on GitHub
  2. Clone your forked repository to your local machine
  3. Create a new branch for your changes
  4. Make your changes, commit, and push to your forked repository
  5. Open a Pull Request from your branch to the original repository’s main branch
  6. Participate in code review and discussion. Once approved, your changes will be merged
Terminal window
# Step 1: Fork on GitHub (done through web interface)
# Step 2: Clone your fork
git clone https://github.com/YOUR_USERNAME/REPO_NAME.git
# Step 3: Create feature branch
git checkout -b feature/new-login
# Step 4: Make changes and commit
git add .
git commit -m "Add login functionality"
git push origin feature/new-login
# Step 5: Create Pull Request (done through GitHub web interface)

🧠 5. Practical Exercises#

This section summarizes the problems and solution approaches from the TA session.

🔍 5.1. Sliding Window Technique#

Problem: Given a list of numbers and a window size k, find the maximum value in the sliding window as it moves from left to right.

Solution Approach:

  1. Iterate through the list to create slices (sub-lists) of size k
  2. For each sub-list, calculate the max() value
  3. Append the result to a new list
  4. The number of iterations will be len(num_list) - k + 1
def max_kernel(num_list, k):
result = []
for i in range(len(num_list) - k + 1):
kernel = num_list[i : i + k]
result.append(max(kernel))
return result
# Example
print(max_kernel([3, 4, 5, 1, -44], 3)) # Output: [5, 5, 5]

📊 5.2. Two Pointers Technique#

Problem: Count the occurrences of each character in a given string.

Solution Approach:

  1. Initialize an empty dictionary
  2. Iterate through each character of the string
  3. Use the character as a key in the dictionary. If the key exists, increment its value. If not, add it to the dictionary with a value of 1
  4. It’s good practice to convert the string to lowercase first to treat A and a as the same character
def count_characters(word):
char_stats = {}
for char in word.lower():
char_stats[char] = char_stats.get(char, 0) + 1
return char_stats
# Example
print(count_characters("Baby")) # Output: {'b': 2, 'a': 1, 'y': 1}

📝 5.3. Dynamic Programming#

Problem: Count the occurrences of each word in a text file.

Solution Approach:

  1. Read the content of the file
  2. Preprocess the text: convert to lowercase, remove punctuation
  3. Split the text into a list of words
  4. Use the same dictionary-based counting method as in character counting
def count_words_in_file(filepath):
with open(filepath, 'r') as f:
text = f.read()
# Preprocessing
text = text.lower()
text = text.replace('.', '').replace(',', '')
words = text.split()
word_counts = {}
for word in words:
word_counts[word] = word_counts.get(word, 0) + 1
return word_counts

🧮 5.4. Levenshtein Distance#

Problem: Calculate the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.

Solution Approach (Dynamic Programming):

  1. Create a 2D matrix (a list of lists) of size (len(source)+1) x (len(target)+1)
  2. Initialize the first row and column with 0, 1, 2, … representing the cost of transforming an empty string to a prefix of the target/source
  3. Iterate through the matrix. For each cell D[i, j], calculate the cost:
    • If source[i-1] == target[j-1], the cost is D[i-1, j-1] (no operation)
    • Otherwise, the cost is 1 + min(D[i-1, j], D[i, j-1], D[i-1, j-1]), which corresponds to the minimum cost of a deletion, insertion, or substitution
  4. The final value at D[len(source)][len(target)] is the Levenshtein distance

🧮 5.4.1. Recursive Formula for Levenshtein Distance#

The Levenshtein Distance algorithm is based on dynamic programming principles. The distance is calculated by building a matrix D, where D[i,j] represents the distance between the first i characters of source and the first j characters of target.

Recursive Formula:

  1. Base Cases:

    • D[i,0] = i (cost of transforming i characters to an empty string)
    • D[0,j] = j (cost of transforming an empty string to j characters)
  2. General Formula: For each cell D[i,j]:

    D[i,j] = min(
    D[i-1,j] + 1, // Deletion (remove character from source)
    D[i,j-1] + 1, // Insertion (add character to source)
    D[i-1,j-1] + cost // Substitution/Match
    )

    Where cost = 0 if source[i-1] = target[j-1], otherwise cost = 1.

Practical Applications:

  • Spell checking and correction
  • Fuzzy search functionality
  • Plagiarism detection
  • DNA and protein sequence comparison in computational biology

🔍 Implementation Details: This algorithm has a time complexity of O(m×n) and space complexity of O(m×n), where m and n are the lengths of the two strings. However, when only the final value is needed (not the edit path), the space complexity can be reduced to O(min(m,n)).

🧮 Interactive Levenshtein Distance Matrix#


🎯 6. Week 2 Learning Assessment Matrix#

📊 Interactive Self-Assessment Tool#


🎉 Conclusion#

Week 2 of AIO2025 has covered fundamental building blocks of programming and data management:

🔑 Key Takeaways#

  1. Python Lists Mastery: From basic operations to advanced comprehensions and 2D matrices
  2. Database Design Excellence: ERD modeling and normalization principles for robust data architecture
  3. Advanced Data Structures: Tuples, sets, and dictionaries for efficient data manipulation
  4. Version Control Proficiency: Git and GitHub for collaborative development
  5. Problem-Solving Skills: Practical algorithms from sliding windows to dynamic programming

🚀 Next Steps#

  • Practice implementing the covered algorithms with different datasets
  • Design your own database schema using ERD principles
  • Create a personal Git repository for your AIO2025 journey
  • Combine these skills in real-world projects

📚 Additional Resources#

  • Python Documentation: docs.python.org
  • Database Design: “Database Design for Mere Mortals” by Michael Hernandez
  • Git Learning: Git Interactive Tutorial
  • Practice Problems: LeetCode, HackerRank algorithm challenges

💪 Challenge: Try implementing a complete student management system using all the concepts learned this week - from data structures to database design to version control!


Happy coding, and see you in Week 3 of the AIO2025 journey! 🎯

📚 AIO2025 - Week 2: Python Lists, SQL & Git Mastery
https://hung2124.github.io/blog/posts/week2/
Author
AI Blog Pro Team
Published at
2025-06-15
License
CC BY-NC-SA 4.0