📚 AIO2025 - Week 5 Complete Study Guide: NumPy, NoSQL & Data Science Fundamentals

🎯 Learning Objectives: Master NumPy arrays, understand NoSQL databases, implement cosine similarity, and develop systematic problem-solving skills for data science applications.

This comprehensive guide covers the essential topics from Week 5 of the AIO2025 course with interactive visualizations, practical examples, and hands-on exercises designed to solidify your understanding of fundamental data science concepts.

🎯 1. NumPy Basics#

NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays.

💻 1.1. Python Lists vs. NumPy Arrays#

While Python lists are versatile, NumPy arrays offer significant advantages in terms of performance, memory, and functionality for numerical operations.

Feature	Python List	NumPy Array	Performance Impact
Data Type	Heterogeneous (mixed types)	Homogeneous (same type)	🔥 Type checking overhead eliminated
Memory	Pointers to objects (overhead)	Contiguous memory block	🚀 50-100x faster access
Performance	Slower (type checking)	Optimized C code	⚡ 10-100x faster operations
Functionality	Basic operations	Vectorized operations	📊 Broadcasting & advanced math
Memory Usage	Higher (pointer overhead)	Lower (direct storage)	💾 2-10x less memory

🔬 Interactive Memory Layout Comparison#

🔧 1.2. Array Creation#

NumPy provides multiple efficient ways to create arrays, each optimized for different use cases.

📋 Creation Methods Comparison#

Method	Use Case	Performance	Memory Efficiency
`np.array()`	Convert existing data	⭐⭐⭐	⭐⭐⭐
`np.zeros()`	Initialize with zeros	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
`np.ones()`	Initialize with ones	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
`np.arange()`	Sequential numbers	⭐⭐⭐⭐	⭐⭐⭐⭐
`np.linspace()`	Evenly spaced values	⭐⭐⭐⭐	⭐⭐⭐⭐

From a Python list:

1
import numpy as np
2

3
# Create a list
4
my_list = [2, 0, 2, 5, 7, 1]
5

6
# Convert the list to a NumPy array
7
my_array = np.array(my_list)
8
print(my_array)
9
# Output: [2 0 2 5 7 1]
10
print(f"Type: {my_array.dtype}, Shape: {my_array.shape}")
11
# Output: Type: int64, Shape: (6,)

Using built-in functions:

1
# Create an array of 5 floats, initialized with zeros
2
zeros_arr = np.zeros(5)
3
print(zeros_arr)
4
# Output: [0. 0. 0. 0. 0.]
5

6
# Create an array of shape (2, 3) filled with ones
7
ones_arr = np.ones((2, 3))
8
print(ones_arr)
9
# Output:
10
# [[1. 1. 1.]
11
#  [1. 1. 1.]]
12

13
# Create an array with a range of elements
14
range_arr = np.arange(0, 10, 2) # (start, stop, step)
15
print(range_arr)
16
# Output: [0 2 4 6 8]
17

18
# Create evenly spaced values
19
linspace_arr = np.linspace(0, 1, 5)  # 5 values from 0 to 1
20
print(linspace_arr)
21
# Output: [0.   0.25 0.5  0.75 1.  ]

💡 Pro Tip: Use np.zeros() or np.ones() when you need to initialize large arrays efficiently. They’re much faster than creating Python lists first!

🔍 1.3. Indexing and Slicing#

Indexing and slicing work similarly to Python lists but can be extended to multiple dimensions with powerful capabilities.

📝 Basic Operations#

1
a_data = np.array([4, 5, 6, 7, 8, 9])
2

3
# Accessing elements
4
print(a_data[2])    # Output: 6
5
print(a_data[-1])   # Output: 9
6

7
# Slicing: array[start:stop:step]
8
print(a_data[:3])   # Output: [4 5 6] (first 3 elements)
9
print(a_data[3:])   # Output: [7 8 9] (from index 3 to end)
10
print(a_data[::2])  # Output: [4 6 8] (every other element)

⚠️ Critical Concept: Views vs Copies#

Important Note: Unlike Python lists, slices of NumPy arrays are views into the original array, not copies. Modifying a slice will modify the original array.

1
x = np.array([0., 0.25, 0.5, 0.75, 1.])
2
y = x[1:4] # Create a slice (view)
3
y[-1] = 1000.0 # Modify the slice
4

5
print(y) # Output: [   0.25    0.5  1000.  ]
6
print(x) # Output: [   0.     0.25    0.5  1000.     1.  ] -> Original is changed!
7

8
# To create a copy, use arr.copy()
9
z = x[1:4].copy()  # This creates an independent copy
10
z[0] = 999.0
11
print(x)  # Original array remains unchanged

🎯 Advanced Indexing Examples#

1
# Boolean indexing
2
arr = np.array([1, 2, 3, 4, 5, 6])
3
mask = arr > 3
4
print(arr[mask])  # Output: [4 5 6]
5

6
# Fancy indexing with arrays
7
indices = np.array([0, 2, 4])
8
print(arr[indices])  # Output: [1 3 5]
9

10
# Conditional replacement
11
arr[arr > 4] = 0
12
print(arr)  # Output: [1 2 3 4 0 0]

Operation	Syntax	Result	Memory Impact
Single Element	`arr[2]`	`6`	No extra memory
Slice (View)	`arr[1:4]`	`[5 6 7]`	Shared memory ⚠️
Slice (Copy)	`arr[1:4].copy()`	`[5 6 7]`	New memory allocation
Boolean Mask	`arr[arr > 5]`	`[6 7 8 9]`	New array created
Fancy Index	`arr[[0,2,4]]`	`[4 6 8]`	New array created

⚡ 1.4. Basic Operations & Vectorization#

NumPy allows for element-wise operations, which is called vectorization. This is much faster than looping through elements as you would with Python lists.

🔥 Vectorization Performance#

1
import numpy as np
2
import time
3

4
# Compare performance: Loop vs Vectorization
5
size = 1000000
6
a = np.random.random(size)
7
b = np.random.random(size)
8

9
# Python loop approach
10
start = time.time()
11
result_loop = []
12
for i in range(size):
13
    result_loop.append(a[i] + b[i])
14
loop_time = time.time() - start
15

16
# NumPy vectorization
17
start = time.time()
18
result_vectorized = a + b
19
vectorized_time = time.time() - start
20

21
print(f"Loop time: {loop_time:.4f}s")
22
print(f"Vectorized time: {vectorized_time:.4f}s")
23
print(f"Speedup: {loop_time/vectorized_time:.1f}x faster!")

📊 Common Vectorized Operations#

1
arr1 = np.array([1, 2, 3])
2
arr2 = np.array([4, 5, 6])
3

4
# Element-wise addition
5
print(arr1 + arr2)     # Output: [5 7 9]
6

7
# Element-wise multiplication
8
print(arr1 * 3)        # Output: [3 6 9]
9
print(arr1 * arr2)     # Output: [ 4 10 18]
10

11
# More operations
12
print(arr1 ** 2)       # Output: [1 4 9] (power)
13
print(np.sqrt(arr1))   # Output: [1.    1.414 1.732] (square root)
14
print(arr1 > 2)        # Output: [False False True] (comparison)

🧮 Mathematical Functions#

Function	Description	Example Input	Example Output
`np.sum()`	Sum of elements	`[1, 2, 3]`	`6`
`np.mean()`	Average value	`[1, 2, 3]`	`2.0`
`np.std()`	Standard deviation	`[1, 2, 3]`	`0.816`
`np.min()`	Minimum value	`[1, 2, 3]`	`1`
`np.max()`	Maximum value	`[1, 2, 3]`	`3`
`np.argmin()`	Index of minimum	`[3, 1, 2]`	`1`
`np.argmax()`	Index of maximum	`[3, 1, 2]`	`0`

🚀 Performance Tip: Always use NumPy’s vectorized operations instead of Python loops when working with arrays. It can be 10-100x faster!

🎯 2. NumPy Programming: 2D & 3D Data#

📐 2.1. Multi-dimensional Arrays#

NumPy arrays can have multiple dimensions, making them perfect for representing data like matrices (2D) and RGB images (3D).

🔢 Dimensional Concepts#

1D Array (Vector): [1, 2, 3] → shape: (3,)
2D Array (Matrix): [[1, 2], [3, 4], [5, 6]] → shape: (3, 2) (3 rows, 2 columns)
3D Array (Tensor): Often used for a collection of matrices, like an RGB image → shape: (height, width, channels)

🎨 Interactive Multi-dimensional Array Visualization#

🔧 2.2. Common Functions for Multi-dimensional Arrays#

reshape(new_shape): Changes the shape of an array without changing its data. The total number of elements must remain the same.

1
data = np.arange(6) # [0 1 2 3 4 5]
2
data_reshaped = data.reshape((2, 3))
3
print(data_reshaped)
4
# [[0 1 2]
5
#  [3 4 5]]

flatten(): Collapses a multi-dimensional array into a single 1D array.

1
data_2d = np.array([[1, 2], [3, 4]])
2
flat_data = data_2d.flatten()
3
print(flat_data) # Output: [1 2 3 4]

sum(axis=…), max(axis=…), min(axis=…): Perform aggregation along a specified axis.

axis=0: operates along the columns.
axis=1: operates along the rows.

1
data = np.array([[1, 2], [3, 4]])
2
print(np.sum(data, axis=0)) # Output: [4 6] (sum of columns)
3
print(np.sum(data, axis=1)) # Output: [3 7] (sum of rows)

📡 2.3. Broadcasting#

Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. The smaller array is “broadcast” across the larger array so that they have compatible shapes.

📐 Broadcasting Rules#

Dimension Alignment: If arrays don’t have the same number of dimensions, prepend the shape of the lower-dimensional array with 1s
Size Compatibility: For each dimension, sizes must be equal, or one of them is 1
Output Shape: The size of each dimension in the output is the maximum of the input arrays

🎭 Interactive Broadcasting Demonstration#

💻 Code Example#

1
import numpy as np
2

3
# Example: Adding a vector to each row of a matrix
4
matrix = np.array([[1, 2, 3],
5
                   [4, 5, 6]])  # shape (2, 3)
6

7
vector = np.array([10, 20, 30])  # shape (3,)
8

9
result = matrix + vector  # Broadcasting happens here
10
print(result)
11
# [[11 22 33]
12
#  [14 25 36]]
13

14
# The vector is conceptually stretched to shape (2, 3) to match matrix

🎯 Broadcasting Examples#

Array 1 Shape	Array 2 Shape	Result Shape	Compatible?
`(3, 4)`	`(4,)`	`(3, 4)`	✅ Yes
`(3, 4)`	`(3, 1)`	`(3, 4)`	✅ Yes
`(3, 4)`	`(1, 4)`	`(3, 4)`	✅ Yes
`(3, 4)`	`(3, 2)`	`N/A`	❌ No
`(2, 3, 4)`	`(3, 4)`	`(2, 3, 4)`	✅ Yes

🖼️ 2.4. Application: Image Representation & Manipulation#

Grayscale Image: A 2D NumPy array where each element represents the intensity of a pixel (0=black, 255=white). Shape: (height, width).

RGB Image: A 3D NumPy array. Shape: (height, width, 3). The last dimension represents the three color channels (Red, Green, Blue).

Note: Libraries like OpenCV read images in BGR order by default, while Matplotlib expects RGB. You may need to convert between them: image_rgb = image_bgr[:, :, ::-1].

Brightness Adjustment#

Image data is often stored as uint8 (unsigned 8-bit integer, 0-255). Simple addition can cause values to “wrap around” (e.g., 250 + 10 becomes 4, not 255).

1
# Incorrect way
2
image = cv2.imread('image.png')
3
bright_image = image + 100 # This will cause wrap-around issues
4

5
# Correct way using np.clip
6
image = image.astype(np.float32) # Convert to float to avoid overflow
7
bright_image = image + 100
8
bright_image = np.clip(bright_image, 0, 255) # Clip values to the 0-255 range
9
bright_image = bright_image.astype(np.uint8) # Convert back to uint8

🗄️ 3. Database - NoSQL#

📊 3.1. Introduction to Databases#

A database is an organized collection of data. A Database Management System (DBMS) is the software that interacts with users, applications, and the database itself to capture and analyze the data.

🔄 3.2. SQL vs. NoSQL#

Aspect	SQL (Relational)	NoSQL (Non-relational)
Model	Data is stored in tables with rows and columns	Data can be stored in various models (document, key-value, graph, etc.)
Schema	Predefined, rigid schema (schema-on-write)	Dynamic or flexible schema (schema-on-read)
Scalability	Typically scales vertically (increasing power of a single server)	Typically scales horizontally (distributing load across many servers)
Language	Uses Structured Query Language (SQL)	Varies by database; often called “Not Only SQL”
Examples	MySQL, PostgreSQL, SQL Server	MongoDB, Redis, Cassandra, Neo4j

🗂️ 3.3. Types of NoSQL Databases#

📄 Document Databases#

Store data in documents, similar to JSON objects. Each document contains field-value pairs. The values can be a variety of types, including nested documents and arrays.

Example: MongoDB
Use Case: Content management, user profiles

🔑 Key-Value Stores#

The simplest model. Every item is stored as a key-value pair.

Example: Redis, Amazon DynamoDB
Use Case: Caching, session management

🕸️ Graph Databases#

Use nodes and edges to represent and store data. Excellent for exploring relationships between entities.

Example: Neo4j
Use Case: Social networks, recommendation engines, fraud detection

📊 Column-Family Stores#

Store data in columns rather than rows. Optimized for fast queries over large datasets.

Example: Cassandra, HBase
Use Case: Big data analytics, time-series data

📊 Interactive NoSQL Database Types Overview#

🍃 3.4. Introduction to MongoDB#

MongoDB is a leading document database.

Database: A container for collections.

Collection: A group of MongoDB documents. It is the equivalent of a table in a relational database.

Document: A set of key-value pairs, represented in a format called BSON (Binary JSON). Documents have a dynamic schema. The _id field is a unique primary key automatically added if not provided.

Example Document:#

1
{
2
  "_id": " ObjectId('...') ",
3
  "username": "aivn_student",
4
  "course": "AIO2025",
5
  "enrollment_date": "ISODate('2025-07-01T00:00:00Z')",
6
  "scores": [95, 88, 92],
7
  "address": {
8
    "city": "Hanoi",
9
    "country": "Vietnam"
10
  }
11
}

🔍 3.5. Basic MongoDB Query Language (MQL)#

Insert a document:#

1
db.students.insertOne({ name: "Thai", age: 20, likes: ["AI", "Data"] })

Find documents:#

1
// Find all documents
2
db.students.find()
3

4
// Find documents where age is greater than 21
5
db.students.find({ age: { $gt: 21 } })
6

7
// Find documents matching two conditions (implicit AND)
8
db.students.find({ age: { $gt: 21 }, likes: "AI" })

Update a document:#

1
// Find the first student named "Thai" and set their age to 21
2
db.students.updateOne(
3
  { name: "Thai" },
4
  { $set: { age: 21 } }
5
)

Delete a document:#

1
db.students.deleteOne({ name: "Thai" })

🎯 4. Measuring Data Similarity: Cosine Similarity#

🧮 4.1. Vector Dot Product#

The dot product of two vectors A and B can be defined in two ways:

📊 Mathematical Definitions#

Algebraic: The sum of the products of the corresponding entries.

1
A · B = Σ(Aᵢ * Bᵢ)

Geometric: The product of the Euclidean magnitudes of the two vectors and the cosine of the angle between them.

1
A · B = ||A|| * ||B|| * cos(θ)

📐 4.2. Cosine Similarity#

By rearranging the geometric definition of the dot product, we get the formula for Cosine Similarity. It measures the cosine of the angle between two non-zero vectors, which indicates their directional similarity.

1
Cosine Similarity (cs) = cos(θ) = (A · B) / (||A|| * ||B||)

🎯 Interpretation Guide#

Range	Value	Meaning	Use Case
1	Perfect similarity	Vectors point in exact same direction	Identical documents
0	No similarity	Vectors are orthogonal (90°)	Unrelated topics
-1	Perfect dissimilarity	Vectors point in opposite directions	Contradictory content
0.5 to 1	High similarity	Small angle between vectors	Related documents
-0.5 to 0.5	Moderate similarity	Medium angle	Somewhat related

🔑 Key Property: Cosine similarity is a measure of orientation, not magnitude. Two vectors with the same orientation but different magnitudes will have a cosine similarity of 1. This makes it perfect for text analysis, where document length varies greatly.

🎨 Interactive Vector Similarity Visualization#

💻 4.3. Python Implementation#

1
import numpy as np
2

3
def cosine_similarity(v1, v2):
4
  """Computes the cosine similarity between two vectors."""
5
  dot_product = np.dot(v1, v2)
6
  norm_v1 = np.linalg.norm(v1)
7
  norm_v2 = np.linalg.norm(v2)
8

9
  # Avoid division by zero
10
  if norm_v1 == 0 or norm_v2 == 0:
11
    return 0.0
12

13
  return dot_product / (norm_v1 * norm_v2)
14

15
# Example usage
16
doc1_vector = np.array([1, 1, 0, 1]) # "AI is fun"
17
doc2_vector = np.array([1, 1, 1, 0]) # "AI is cool"
18
similarity = cosine_similarity(doc1_vector, doc2_vector)
19
print(f"Cosine Similarity: {similarity:.4f}")
20
# Cosine Similarity: 0.6667

🧠 5. Logic Thinking and Problem Solving#

A structured approach to problem-solving is crucial for the success of any Data Science or AI project.

🔄 5.1. The 7-Step Problem-Solving Framework#

This is an iterative process to systematically tackle complex problems.

🔄 Interactive 7-Step Problem-Solving Framework#

🔍 Step 1: Define the Problem#

Goal: Clearly articulate the problem. A problem is the gap between the current state and the desired state.

Techniques:

5W1H: Who, What, Where, When, Why, How
5 Whys: Repeatedly ask “Why?” to uncover the root cause
SMART Goals: Ensure the objective is Specific, Measurable, Achievable, Relevant, and Time-bound

🧩 Step 2: Decompose the Problem#

Goal: Break down a complex problem into smaller, more manageable components.

Techniques:

MECE Principle: Mutually Exclusive, Collectively Exhaustive. Ensure sub-problems don’t overlap and that all parts of the original problem are covered
Logic Trees: A visual tool to structure the decomposition

🌳 Interactive Logic Tree Example#

⭐ Step 3: Prioritize Issues#

Goal: Focus resources on the most critical issues.

Techniques:

Impact-Feasibility Matrix: A 2x2 grid to plot tasks based on their potential impact and ease of implementation
Pareto Principle (80/20 Rule): Identify the 20% of causes that are responsible for 80% of the effects

📊 Interactive Impact-Feasibility Matrix#

🗄️ Step 4: Data Collection#

Goal: Gather the necessary data to analyze hypotheses.

Methods: Interviews, surveys, system logs, databases, A/B testing, etc.

Data Quality: Ensure data is Accurate, Complete, Consistent, Timely, and Valid.

📊 Step 5: Data Analysis#

Goal: Extract insights from the data.

Process: Clean data → Exploratory Data Analysis (EDA) → Diagnostic Analysis → Generate actionable insights.

💡 Step 6: Design Solution#

Goal: Develop potential solutions based on the analysis.

Techniques: Brainstorming, SCAMPER, prototyping, A/B testing.

🚀 Step 7: Implement & Present#

Goal: Execute the chosen solution and communicate the results effectively.

Technique:

Pyramid Principle: Structure your communication by starting with the main conclusion, followed by supporting arguments, and finally the data evidence

🔺 Interactive Pyramid Principle Structure#

🛠️ 6. TA-Exercise: Practical Applications#

This section covers the practical exercises applying the concepts learned.

🎨 6.1. Image Processing: Grayscale Conversion#

A color image (3 channels: R, G, B) can be converted to a grayscale image (1 channel) using several methods.

Conversion Methods#

Lightness Method: Averages the most and least prominent colors.

1
Grayscale = (max(R, G, B) + min(R, G, B)) / 2

Average Method: Averages all three channels.

1
Grayscale = (R + G + B) / 3

Luminosity Method: A weighted average that accounts for human perception (we are more sensitive to green). This is generally the best method.

1
Grayscale = 0.21*R + 0.72*G + 0.07*B

Implementation (Luminosity):#

1
# Assuming 'img' is a NumPy array of shape (H, W, 3)
2
gray_img = 0.21*img[:,:,0] + 0.72*img[:,:,1] + 0.07*img[:,:,2]

🎬 6.2. Image Processing: Background Subtraction#

This technique, often used with green screens, involves replacing the background of one image with another.

📸 Interactive Background Subtraction Process#

📝 Steps & Code Implementation:#

1. Read Images#

Load the object image, original background, and target background. Ensure they are the same size.

1
import cv2
2
import numpy as np
3

4
obj_img = cv2.imread('Object.png')
5
bg1_img = cv2.imread('GreenBackground.png')
6
bg2_img = cv2.imread('NewBackground.jpg')
7

8
# Resize images to be the same
9
IMG_SIZE = (obj_img.shape[1], obj_img.shape[0])
10
bg1_img = cv2.resize(bg1_img, IMG_SIZE)
11
bg2_img = cv2.resize(bg2_img, IMG_SIZE)

2. Compute Difference#

Find the absolute difference between the object image and the original background.

1
diff = cv2.absdiff(bg1_img, obj_img)
2
diff_single_channel = np.mean(diff, axis=2)

3. Create Binary Mask#

Threshold the difference image to create a mask that separates the foreground (object) from the background.

1
# Where the difference is low (background), value is 0. Where it's high (object), value is 255.
2
_, binary_mask = cv2.threshold(diff_single_channel.astype(np.uint8), 15, 255, cv2.THRESH_BINARY)
3

4
# Expand to 3 channels to apply to color image
5
binary_mask_3ch = np.stack((binary_mask,)*3, axis=-1)

4. Replace Background#

Use np.where to combine the images. Where the mask is 255 (object), use the object image’s pixels. Otherwise, use the new background’s pixels.

1
output = np.where(binary_mask_3ch == 255, obj_img, bg2_img)
2
cv2.imwrite('final_output.png', output)

📊 6.3. Tabular Data Analysis#

Using NumPy to perform quick analysis on tabular data (e.g., from a CSV file).

1
import pandas as pd
2
import numpy as np
3

4
# Load data using pandas and convert to NumPy array
5
df = pd.read_csv('advertising.csv')
6
data = df.to_numpy()
7

8
# Get the 'Sales' column (last column)
9
sales = data[:, -1]
10

11
# 1. Get the maximum sales value
12
max_sales = np.max(sales)
13
print(f"Max Sales: {max_sales}")
14

15
# 2. Get the average value of the 'TV' column (first column)
16
tv_ads = data[:, 0]
17
mean_tv = np.mean(tv_ads)
18
print(f"Average TV spending: {mean_tv:.2f}")
19

20
# 3. Count how many records have Sales >= 20
21
high_sales_count = np.sum(sales >= 20)
22
print(f"Number of high sales records: {high_sales_count}")
23

24
# 4. Calculate the average 'Radio' spending for records where Sales >= 15
25
radio_ads = data[:, 1]
26
avg_radio_for_high_sales = np.mean(radio_ads[sales >= 15])
27
print(f"Average Radio for high sales: {avg_radio_for_high_sales:.2f}")

📈 Learning Progress Tracker#

🎯 Learning Progress Dashboard#

Track your mastery of Week 5 concepts with this clean progress tracker:

📚 Week 5 Learning Tracker

📊 NumPy Basics

Arrays vs Lists Array Creation Vectorization

🚀 Advanced NumPy

Multi-dimensional Arrays Broadcasting Image Processing

🗄️ Databases

SQL vs NoSQL MongoDB Basics

🔢 Data Similarity

Dot Product Cosine Similarity

🧠 Problem Solving

7-Step Framework Analysis Process

🎯 Overall Progress

0/12

Topics Mastered

🌟 Start checking off topics to track your progress!

✅ Learning Checklist#

Mark your progress as you work through each section:

🎯 Action Plan#

Week 5 Goals:

Practice Daily: Spend 30 minutes daily on NumPy array manipulation
Build Projects: Create 2 mini-projects using the concepts learned
Apply Knowledge: Use cosine similarity in a real text analysis task
Document Learning: Write summary notes for each major concept

Next Steps:

Set up a local Python environment with NumPy and MongoDB
Download sample datasets for practice
Join online communities for additional practice problems
Schedule time for hands-on coding exercises

🔗 Additional Resources#

📚 Recommended Reading#

NumPy Official Documentation: numpy.org
MongoDB Tutorial: mongodb.com/docs
Linear Algebra for ML: Khan Academy Linear Algebra course
Problem-Solving Methods: “Thinking, Fast and Slow” by Daniel Kahneman

🛠️ Practice Platforms#

Kaggle Learn: Free micro-courses on data science topics
LeetCode: Array and database problems
MongoDB University: Free MongoDB courses
NumPy Exercises: github.com/rougier/numpy-100

🎉 Key Achievements#

After completing this study guide, you should be able to:

✅ Optimize Performance: Use NumPy for 10-100x faster numerical operations
✅ Handle Big Data: Work with multi-dimensional arrays efficiently
✅ Choose Databases: Select appropriate database technologies for projects
✅ Measure Similarity: Implement and apply cosine similarity in real applications
✅ Solve Problems: Apply systematic approaches to complex data science challenges