Welcome to Week 2 of the AIO2025 journey! This week, we’ll explore three fundamental pillars of programming and software development: Python Lists & Data Structures, Database Design with SQL, and Git Version Control.
💡 Learning Objectives: Master Python data structures, design professional databases, and manage source code effectively with Git.
🐍 1. Python Lists - Dynamic Data Powerhouse
Lists are fundamental data structures in Python used to store ordered collections of items. The key characteristic of Lists is their mutability - contents can be changed after creation.
🎯 1.1. Core Characteristics of Lists
- Ordered: The items in a list have a defined order, and that order will not change. If you add new items to a list, the new items will be placed at the end of the list.
- Mutable: You can change, add, and remove items in a list after it has been created.
- Allow Duplicates: Since lists are indexed, they can have items with the same value.
- Heterogeneous: Lists can contain elements of different data types (integers, strings, booleans, even other lists).
📊 Interactive List Structure Visualization
💡 Interactive: Hover over elements to see exact values!
💻 1.2. Creating and Accessing Lists
Creating Lists:
Lists are created by placing comma-separated values inside square brackets []
.
# Empty listempty_list = []
# List of integersnumbers = [4, 5, 6, 7, 8, 9]
# Mixed data typesmixed_list = [True, 5, 'some string', 123.45]
# Nested listnested_list = ["Happy", [2, 0, 1, 5]]
Accessing Elements (Indexing): You can access an element in a list by referring to its index number.
data = [4, 5, 6, 7, 8, 9]
# Access the first elementprint(data[0]) # Output: 4
# Access the fourth elementprint(data[3]) # Output: 7
# Access the last element using backward indexingprint(data[-1]) # Output: 9
Slicing:
Slicing allows you to access a range of items in a list. Syntax: list[start:end:step]
.
data = [4, 5, 6, 7, 8, 9]
# Get items from the beginning to index 2 (end is exclusive)print(data[:3]) # Output: [4, 5, 6]
# Get items from index 2 to index 3print(data[2:4]) # Output: [6, 7]
# Get every second itemprint(data[::2]) # Output: [4, 6, 8]
🛠️ 1.3. List Methods - Powerful Manipulation Tools
Here’s a comprehensive table of the most common list methods:
Method | Description | Example | Time Complexity |
---|---|---|---|
append(item) | Add item to the end of list | lst.append(5) | O(1) |
insert(index, item) | Insert item at specified index | lst.insert(0, 'first') | O(n) |
extend(iterable) | Extend list with elements from another iterable | lst.extend([1,2,3]) | O(k) |
remove(item) | Remove first occurrence of item | lst.remove('x') | O(n) |
pop(index) | Remove and return item at index (default: last) | item = lst.pop() | O(1) end, O(n) beginning |
clear() | Remove all elements | lst.clear() | O(n) |
sort() | Sort ascending (or reverse=True ) | lst.sort() | O(n log n) |
reverse() | Reverse the order of elements | lst.reverse() | O(n) |
copy() | Create a shallow copy | new_lst = lst.copy() | O(n) |
count(item) | Count occurrences of item | lst.count('a') | O(n) |
index(item) | Find index of first occurrence | idx = lst.index('x') | O(n) |
⚠️ Note: Operations at the beginning of list (
insert(0, ...)
,pop(0)
) have O(n) complexity due to element shifting.
Practical Examples:
# Adding elementsdata = [6, 5, 7, 1]data.append(4) # data = [6, 5, 7, 1, 4]data.insert(0, 10) # data = [10, 6, 5, 7, 1, 4]data.extend([9, 2]) # data = [10, 6, 5, 7, 1, 4, 9, 2]
# Updating and removing elementsdata[1] = 99 # Update element at index 1print(data) # [10, 99, 5, 7, 1, 4, 9, 2]
data.remove(99) # Remove value 99popped_item = data.pop(2) # Remove element at index 2print(popped_item) # 7
# Sortingdata.sort()print(data) # [1, 2, 4, 5, 9, 10]
🔧 1.4. Built-in Functions for Lists
Function | Description | Example | Use Case |
---|---|---|---|
len(list) | Returns number of items in list | len([1,2,3]) → 3 | Size checking, loops |
min(list) | Returns smallest value item | min([5,1,9]) → 1 | Finding minimum |
max(list) | Returns largest value item | max([5,1,9]) → 9 | Finding maximum |
sum(list) | Returns sum of numeric items | sum([1,2,3]) → 6 | Statistical calculations |
sorted(list) | Returns new sorted list (original unchanged) | sorted([3,1,2]) → [1,2,3] | Safe sorting |
enumerate(list) | Returns (index, value) pairs | list(enumerate(['a','b'])) → [(0,‘a’),(1,‘b’)] | Loops with index |
zip(list1, list2) | Combines elements from multiple lists | list(zip([1,2],[3,4])) → [(1,3),(2,4)] | Parallel processing |
reversed(list) | Returns reverse iterator | list(reversed([1,2,3])) → [3,2,1] | Reverse traversal |
Practical Examples:
data = [6, 5, 7, 1, 9, 2]print(len(data)) # Output: 6print(sum(data)) # Output: 30print(sorted(data)) # Output: [1, 2, 5, 6, 7, 9]
# Enumerate - loop with indexfor index, value in enumerate(data): print(f"Index: {index}, Value: {value}")
# Zip - combine multiple listsnames = ['Alice', 'Bob']ages = [25, 30]for name, age in zip(names, ages): print(f"{name} is {age} years old.")
🚀 1.5. List Comprehension - Pythonic Power
List comprehension is a concise and Pythonic way to create new lists from existing iterables.
Syntax: [expression for item in iterable if condition]
# Square numbers from 0 to 4squares = [x**2 for x in range(5)]# Output: [0, 1, 4, 9, 16]
# Filter even numbers from a listdata = [1, 5, -4, 3, -2]evens = [x for x in data if x % 2 == 0]# Output: [-4, -2]
# Apply ReLU activation functiondef relu(x): return x if x > 0 else 0
relu_applied = [relu(x) for x in data]# Output: [1, 5, 0, 3, 0]
# Nested list comprehension - 2D Matrixmatrix = [[i+j for j in range(3)] for i in range(3)]# Output: [[0,1,2], [1,2,3], [2,3,4]]
💡 Tip: List comprehensions are typically faster than traditional
for
loops and result in more concise code!
📊 1.6. 2D Lists (Matrices) - Two-Dimensional Data Structure
2D Lists are lists where each element is another list. This structure is used to represent grids, tables, or matrices in mathematics and data science.
Creating and Accessing 2D Lists:
# 3x3 Matrixmatrix = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]
# Access element (row 1, column 2)print(matrix[1][2]) # Output: 6
# Iterate through entire matrixnum_rows = len(matrix)num_cols = len(matrix[0])
for r in range(num_rows): for c in range(num_cols): print(matrix[r][c], end=' ') print() # New line after each row
📈 Interactive 2D Matrix Visualization
🎯 Interactive: Hover over cells to see
[row][col]
position and values!
🗄️ 2. Database - SQL (Part 2): Professional Database Design
This section focuses on the design and structure of relational databases, with emphasis on Entity Relationship Diagrams (ERD) and Database Normalization - two core skills for every Database Developer.
🎯 2.1. Entity Relationship Diagram (ERD) - Visual Language
ERD is a visual model that represents the structure of a database. It illustrates entities, attributes, and relationships between them.
Component | Symbol | Description | Real-World Example |
---|---|---|---|
Entity | Rectangle | Object or concept in the real world | Customer, Product, Order |
Weak Entity | Double rectangle | Entity that cannot be uniquely identified on its own | OrderDetail (depends on Order) |
Attribute | Oval | Property or characteristic of an entity | first_name, unit_price, email |
Key Attribute | Underlined oval | Unique identifier attribute (Primary Key) | customer_id, product_code |
Multivalued Attribute | Double oval | Attribute that can contain multiple values | phone_numbers, skills |
Derived Attribute | Dashed oval | Attribute that can be calculated from other attributes | age (from birth_date), total_amount |
Relationship | Diamond | Connection between two or more entities | Customer places Order, Product belongs to Category |
🔄 Weak Entities in ERD
A Weak Entity is an entity that cannot be uniquely identified by its attributes alone and must rely on a relationship with another entity.
Characteristics:
- Represented by a double rectangle (double-lined box)
- Depends on a strong (parent) entity
- Primary key is a combination of foreign key (link to strong entity) and a discriminator (partial key)
Real-world Examples:
OrderDetail
as a weak entity dependent onOrder
Seat
in aFlight
(seat numbers only make sense within the context of a specific flight)RoomNumber
in aBuilding
(room numbers are only meaningful within a specific building)
The relationship between weak entities and their strong entities:
Strong Entity | Relationship | Weak Entity | Discriminator |
---|---|---|---|
Order | Contains | OrderDetail | Product (part of composite key) |
Book | Has | Chapter | Chapter Number |
Employee | Manages | Dependent | Dependent Name |
💡 Note: A weak entity always has an existence dependency relationship with its strong entity. This means that if the strong entity is deleted, all associated weak entities must also be deleted.
📊 Interactive ERD Visualization - Student Enrollment System
💡 Interactive: Hover over components to see details about Entity, Attribute, and Relationship!
🔗 Cardinality (Relationships)
Cardinality defines the number of entities that can participate in a relationship:
Type | Symbol | Description | Real-World Example |
---|---|---|---|
One-to-One (1:1) | 1:1 | Each entity relates to exactly one other entity | Person - Passport |
One-to-Many (1 | 1 | One entity can relate to multiple other entities | Customer - Orders |
Many-to-Many (M | M | Multiple entities can relate to multiple other entities | Student - Courses |
⚠️ Note: M
relationships are typically resolved by creating a junction table (associative entity).
🏗️ 2.2. Database Normalization - Structural Excellence
Normalization is the process of organizing columns and tables in a relational database to minimize data redundancy. It involves dividing large tables into smaller, well-structured tables and defining relationships between them.
Why Normalize?
✅ Minimize Redundancy: Avoid storing the same data in multiple places.
✅ Prevent Anomalies:
🚫 Database Anomalies - Why Normalization Matters
Non-normalized data typically leads to three types of anomalies:
1. Insertion Anomaly:
- Problem: Cannot add new data because related information is missing
- Example: In a combined
StudentCourses
table, you cannot add a new student if they haven’t enrolled in any courses yet (because course information is required) - Solution: Split into separate
Students
andEnrollments
tables
2. Update Anomaly:
- Problem: A change requires updating multiple rows, easy to create inconsistency
- Example: Changing an instructor’s name requires updating it in every student record for that course
- Solution: Extract instructor information to a separate table and use foreign keys
3. Deletion Anomaly:
- Problem: Deleting a record accidentally removes other essential information
- Example: Removing the record of the last student in a course would also delete all information about that course
- Solution: Separate data into linked tables
⚠️ Real-world Impact: These anomalies not only cause data integrity issues but can lead to permanent data loss, maintenance difficulties, and challenges when scaling your database.
Normal Forms:
🥇 First Normal Form (1NF)
- ✅ Table must have a Primary Key
- ✅ Each cell contains only single, atomic values (no lists or repeating groups)
- ✅ All columns must contain values of the same data type
🥈 Second Normal Form (2NF)
- ✅ Must satisfy 1NF
- ✅ All non-key attributes must be fully functionally dependent on the entire Primary Key
- ✅ Eliminate Partial Dependencies (applies to Composite Primary Keys)
🥉 Third Normal Form (3NF)
- ✅ Must satisfy 2NF
- ✅ No Transitive Dependencies
- ✅ Non-key attributes must not depend on other non-key attributes
- ✅ Example:
StudentID → ZipCode → City
is a transitive dependency
📊 Interactive Normalization Process Visualization
🔧 3. Advanced Data Structures
This section delves into other built-in Python data structures: Tuples, Sets, and Dictionaries.
🎯 3.1. Tuples - Immutable Sequences
Tuples are ordered collections of items, similar to lists. The key difference is that tuples are immutable, meaning they cannot be changed after creation.
Characteristics:
- Ordered: Items have a defined order and maintain that order
- Immutable: Cannot be changed after creation
- Allow Duplicates: Can contain duplicate values
- Heterogeneous: Can store different data types
Syntax: Created with parentheses ()
Use Cases:
- Ideal for data that should not change
- Returning multiple values from functions
- Using as dictionary keys (since they’re immutable)
- Representing coordinates or fixed data structures
# Creating a tupleperson = ("Camedinh", 2004, "Student")
# Accessing elements (same as lists)print(person[0]) # Output: Camedinh
# Tuples are immutable# person[0] = "Hacked" # This will raise a TypeError
# Unpacking a tuplename, age, occupation = personprint(age) # Output: 2004
# Creating a single-item tuple (note the comma)single_tuple = (42,)
# Tuple methodscoordinates = (10, 20, 10, 30)print(coordinates.count(10)) # Output: 2print(coordinates.index(20)) # Output: 1
🎯 3.2. Sets - Unique Collections
Sets are unordered collections of unique items. They are mutable and perfect for mathematical set operations.
Characteristics:
- Unordered: No defined order (items can appear in any sequence)
- Mutable: Can add and remove items
- No Duplicates: Automatically removes duplicate values
- Fast Membership Testing: O(1) average time complexity
Syntax: Created with curly braces {}
or set()
Use Cases:
- Membership testing
- Removing duplicates from data
- Mathematical set operations (union, intersection, difference)
- Finding unique elements
# Creating a set (duplicates are automatically removed)animals = {"cat", "dog", "tiger", "cat"}print(animals) # Output: {'cat', 'dog', 'tiger'}
# Adding itemsanimals.add("bear")animals.update({"chicken", "duck"})
# Set Operationsset1 = {1, 2, 3}set2 = {3, 4, 5}
# Union (all unique elements from both sets)print(set1 | set2) # Output: {1, 2, 3, 4, 5}print(set1.union(set2)) # Alternative syntax
# Intersection (common elements)print(set1 & set2) # Output: {3}print(set1.intersection(set2)) # Alternative syntax
# Difference (elements in set1 but not in set2)print(set1 - set2) # Output: {1, 2}print(set1.difference(set2)) # Alternative syntax
# Symmetric difference (elements in either set, but not both)print(set1 ^ set2) # Output: {1, 2, 4, 5}
# Membership testing (very fast)print("cat" in animals) # Output: True
🎯 3.3. Dictionaries - Key-Value Powerhouse
Dictionaries are ordered (as of Python 3.7+) collections of key-value pairs. They are mutable and provide fast lookups.
Characteristics:
- Ordered: Maintain insertion order (Python 3.7+)
- Mutable: Can add, modify, and remove items
- Keys must be unique: Each key can appear only once
- Keys must be immutable: Strings, numbers, tuples (but not lists)
- Fast Access: O(1) average time complexity for lookups
Syntax: Created with curly braces {key: value}
Use Cases:
- Storing structured data with fast lookups
- Mapping relationships between data
- Caching and memoization
- Configuration settings
- Counting and frequency analysis
🔄 3.3.1. Shallow vs Deep Copy in Dictionaries
When working with dictionaries, especially those containing nested data structures, understanding the difference between shallow and deep copying is crucial:
Copy methods:
-
Assignment (=): Creates a new reference to the same dictionary object
original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}reference = original_dict # Same reference to one objectreference['name'] = 'Bob' # Change affects original_dictprint(original_dict['name']) # Output: 'Bob' -
Shallow Copy (.copy()): Creates a new dictionary object, but for nested objects (like lists), only references are copied
original_dict = {'name': 'Alice', 'grades': [90, 85, 95]}shallow = original_dict.copy() # or dict(original_dict)# Changing simple values doesn't affect the originalshallow['name'] = 'Bob'print(original_dict['name']) # Output: 'Alice'# But changing nested objects WILL affect original!shallow['grades'].append(70)print(original_dict['grades']) # Output: [90, 85, 95, 70] -
Deep Copy (copy.deepcopy()): Creates a completely independent copy of the dictionary and all nested objects
import copyoriginal_dict = {'name': 'Alice', 'grades': [90, 85, 95]}deep = copy.deepcopy(original_dict)# Changes to nested objects will NOT affect originaldeep['grades'].append(70)print(original_dict['grades']) # Output: [90, 85, 95]
💡 When to use Deep Copy? Use deep copy when you need a completely independent copy, especially when working with complex data and you don’t want changes to the copy affecting the original data.
📊 Interactive Data Structure Comparison - List vs Dictionary
💡 Interactive: Hover over list indices or dictionary keys to see access patterns!
Code Examples:
# Creating a dictionaryparams = { 'learning_rate': 0.1, 'optimizer': 'Adam', 'metric': 'Accuracy'}
# Accessing a value by keyprint(params['optimizer']) # Output: Adam
# Using .get() to avoid errors for non-existent keysprint(params.get('batch_size', 32)) # Output: 32 (default value)
# Adding or updating an itemparams['epochs'] = 100params['learning_rate'] = 0.05
# Iterating over a dictionaryfor key, value in params.items(): print(f"{key}: {value}")
🎯 3.4. Data Structure Applications
Image Histogram with Dictionaries: A dictionary can be used to count pixel intensity frequencies. The pixel value is the key, and its count is the value.
Text Vectorization with Sets/Dictionaries: A set can create a unique vocabulary from a text corpus. A dictionary can then map each unique word to an integer index for machine learning.
Non-Maximum Suppression (NMS) with Tuples/Lists: NMS in object detection uses lists of bounding boxes. Each box can be represented as a tuple (x_min, y_min, x_max, y_max, confidence_score)
, which is immutable and perfect for this data.
🌿 4. Git & GitHub for Version Control
Version Control Systems (VCS) track and manage changes to files over time. Git is a powerful, distributed VCS. GitHub is a web-based hosting service for Git repositories.
🎯 4.1. Git Fundamentals
Core Concepts:
- VCS (Version Control System): A system that records changes to a file or set of files over time so that you can recall specific versions later.
- Distributed VCS (DVCS): Systems (like Git) where clients don’t just check out the latest snapshot of the files; they fully mirror the repository.
- Repository (Repo): A directory where your project’s files and their revision history are stored.
- Commit: A snapshot of your files at a specific point in time.
- Branch: A lightweight movable pointer to a commit. It allows for developing features in isolation.
main
is the default branch. - SHA-1 Hash: A unique 40-character identifier for every commit, file, and object in Git, ensuring data integrity.
🔄 Interactive Git Workflow Visualization
🔄 Interactive: Hover over each stage to understand the Git workflow process!
🛡️ 4.1.1. Core Principles of Git
Git is built upon revolutionary foundational principles of version control:
-
Snapshots, Not Differences: Unlike traditional VCS systems that store differences (diffs) between versions, Git stores complete “snapshots” of the entire project at each commit. This enables faster operations and efficient branching.
-
Data Integrity: Everything in Git is checksummed using SHA-1 hash before it is stored. This means:
- Content cannot be changed without Git detecting it
- Each commit has a unique SHA-1 hash identifier, ensuring consistency
- Protection against transport errors and prevents history tampering
-
Offline-First Design: Most operations in Git (commits, branches, merges) occur locally without needing internet connection. Benefits include:
- Significantly faster processing
- Flexibility to work without network dependency
- Autonomy in deciding when to synchronize with remote repositories
📘 Design Philosophy: Git was designed by Linus Torvalds with a focus on speed, simple design, support for non-linear development, and the ability to handle large projects efficiently.
💻 Basic Git Workflow
Initialize or Clone a Repository:
# Initialize a new repository in the current directorygit init
# Clone an existing repository from a URLgit clone https://github.com/user/repo.git
Make Changes and Stage Them:
# Check the status of your filesgit status
# Add a specific file to the staging areagit add <filename>
# Add all modified files to the staging areagit add .
Commit the Changes:
# Commit staged changes with a messagegit commit -m "Your descriptive commit message"
Work with Remotes:
# List your remote repositories (usually 'origin')git remote -v
# Push your committed changes to the remote repositorygit push origin main
# Pull changes from the remote repositorygit pull origin main
🌿 4.2. Git Branching & Merging
Branching is a core feature of Git that allows for parallel development.
🌿 Interactive Git Branching & Merging
🌿 Interactive: Hover over commits to see details and understand the complete branching workflow!
Branching Workflow:
# Create a new branch and switch to itgit checkout -b feature-x
# Work on the feature branch (add, commit)git add .git commit -m "Add new feature"
# Switch back to the main branchgit checkout main
# Merge the feature branch into maingit merge feature-x
# Delete the feature branch (optional)git branch -d feature-x
Conflict Resolution:
If the same lines are changed in both branches, a merge conflict occurs. Git will mark the conflicted areas in the file. You must manually edit the file to resolve the conflict, then git add
and git commit
to finalize the merge.
🔀 4.4. Popular Branching Models
Model | Description | Best For |
---|---|---|
GitFlow | Robust model with dedicated branches for features, releases, and hotfixes | Projects with scheduled releases, complex workflows |
GitHub Flow | Simple model where main is always deployable. Feature branches merged via PRs | Continuous deployment, CI/CD pipelines |
Trunk-Based Development | Small, frequent commits to single main branch with feature flags | High-frequency releases, advanced teams |
🤝 4.5. Collaboration with GitHub Fork & Pull Request Workflow
Key Concepts:
- Fork: Create a personal copy of someone else’s repository
- Pull Request (PR): Propose changes from your fork or branch back to the original repository
Complete Workflow:
- Fork the target repository on GitHub
- Clone your forked repository to your local machine
- Create a new branch for your changes
- Make your changes, commit, and push to your forked repository
- Open a Pull Request from your branch to the original repository’s main branch
- Participate in code review and discussion. Once approved, your changes will be merged
# Step 1: Fork on GitHub (done through web interface)
# Step 2: Clone your forkgit clone https://github.com/YOUR_USERNAME/REPO_NAME.git
# Step 3: Create feature branchgit checkout -b feature/new-login
# Step 4: Make changes and commitgit add .git commit -m "Add login functionality"git push origin feature/new-login
# Step 5: Create Pull Request (done through GitHub web interface)
🧠 5. Practical Exercises
This section summarizes the problems and solution approaches from the TA session.
🔍 5.1. Sliding Window Technique
Problem: Given a list of numbers and a window size k, find the maximum value in the sliding window as it moves from left to right.
Solution Approach:
- Iterate through the list to create slices (sub-lists) of size k
- For each sub-list, calculate the
max()
value - Append the result to a new list
- The number of iterations will be
len(num_list) - k + 1
def max_kernel(num_list, k): result = [] for i in range(len(num_list) - k + 1): kernel = num_list[i : i + k] result.append(max(kernel)) return result
# Exampleprint(max_kernel([3, 4, 5, 1, -44], 3)) # Output: [5, 5, 5]
📊 5.2. Two Pointers Technique
Problem: Count the occurrences of each character in a given string.
Solution Approach:
- Initialize an empty dictionary
- Iterate through each character of the string
- Use the character as a key in the dictionary. If the key exists, increment its value. If not, add it to the dictionary with a value of 1
- It’s good practice to convert the string to lowercase first to treat A and a as the same character
def count_characters(word): char_stats = {} for char in word.lower(): char_stats[char] = char_stats.get(char, 0) + 1 return char_stats
# Exampleprint(count_characters("Baby")) # Output: {'b': 2, 'a': 1, 'y': 1}
📝 5.3. Dynamic Programming
Problem: Count the occurrences of each word in a text file.
Solution Approach:
- Read the content of the file
- Preprocess the text: convert to lowercase, remove punctuation
- Split the text into a list of words
- Use the same dictionary-based counting method as in character counting
def count_words_in_file(filepath): with open(filepath, 'r') as f: text = f.read()
# Preprocessing text = text.lower() text = text.replace('.', '').replace(',', '') words = text.split()
word_counts = {} for word in words: word_counts[word] = word_counts.get(word, 0) + 1 return word_counts
🧮 5.4. Levenshtein Distance
Problem: Calculate the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.
Solution Approach (Dynamic Programming):
- Create a 2D matrix (a list of lists) of size
(len(source)+1) x (len(target)+1)
- Initialize the first row and column with 0, 1, 2, … representing the cost of transforming an empty string to a prefix of the target/source
- Iterate through the matrix. For each cell
D[i, j]
, calculate the cost:- If
source[i-1] == target[j-1]
, the cost isD[i-1, j-1]
(no operation) - Otherwise, the cost is
1 + min(D[i-1, j], D[i, j-1], D[i-1, j-1])
, which corresponds to the minimum cost of a deletion, insertion, or substitution
- If
- The final value at
D[len(source)][len(target)]
is the Levenshtein distance
🧮 5.4.1. Recursive Formula for Levenshtein Distance
The Levenshtein Distance algorithm is based on dynamic programming principles. The distance is calculated by building a matrix D, where D[i,j] represents the distance between the first i characters of source and the first j characters of target.
Recursive Formula:
-
Base Cases:
- D[i,0] = i (cost of transforming i characters to an empty string)
- D[0,j] = j (cost of transforming an empty string to j characters)
-
General Formula: For each cell D[i,j]:
D[i,j] = min(D[i-1,j] + 1, // Deletion (remove character from source)D[i,j-1] + 1, // Insertion (add character to source)D[i-1,j-1] + cost // Substitution/Match)Where
cost = 0
if source[i-1] = target[j-1], otherwisecost = 1
.
Practical Applications:
- Spell checking and correction
- Fuzzy search functionality
- Plagiarism detection
- DNA and protein sequence comparison in computational biology
🔍 Implementation Details: This algorithm has a time complexity of O(m×n) and space complexity of O(m×n), where m and n are the lengths of the two strings. However, when only the final value is needed (not the edit path), the space complexity can be reduced to O(min(m,n)).
🧮 Interactive Levenshtein Distance Matrix
🎯 6. Week 2 Learning Assessment Matrix
📊 Interactive Self-Assessment Tool
🎉 Conclusion
Week 2 of AIO2025 has covered fundamental building blocks of programming and data management:
🔑 Key Takeaways
- Python Lists Mastery: From basic operations to advanced comprehensions and 2D matrices
- Database Design Excellence: ERD modeling and normalization principles for robust data architecture
- Advanced Data Structures: Tuples, sets, and dictionaries for efficient data manipulation
- Version Control Proficiency: Git and GitHub for collaborative development
- Problem-Solving Skills: Practical algorithms from sliding windows to dynamic programming
🚀 Next Steps
- Practice implementing the covered algorithms with different datasets
- Design your own database schema using ERD principles
- Create a personal Git repository for your AIO2025 journey
- Combine these skills in real-world projects
📚 Additional Resources
- Python Documentation: docs.python.org
- Database Design: “Database Design for Mere Mortals” by Michael Hernandez
- Git Learning: Git Interactive Tutorial
- Practice Problems: LeetCode, HackerRank algorithm challenges
💪 Challenge: Try implementing a complete student management system using all the concepts learned this week - from data structures to database design to version control!
Happy coding, and see you in Week 3 of the AIO2025 journey! 🎯