Image Representation

What is a Digital Image?

A digital image is a 2D array (matrix) of pixels:

  • Grayscale: 2D array, each pixel has intensity value (0-255)
  • Color (RGB): 3D array with 3 channels (Red, Green, Blue)
  • Data type: Usually uint8 (unsigned 8-bit integer, range 0-255)

Image coordinates:

  • Origin (0,0) is top-left corner
  • First index = row (y-axis), Second index = column (x-axis)
  • Format: image[row, col] or image[y, x]

Pixel Intensity Values

Grayscale images:

  • 0 = pure black
  • 255 = pure white
  • 128 = middle gray

Color images (BGR in OpenCV):

  • [0, 0, 0] = black
  • [255, 255, 255] = white
  • [255, 0, 0] = blue, [0, 255, 0] = green, [0, 0, 255] = red

Image as NumPy Array

  1. import cv2
  2. import numpy as np
  3. # images are numpy arrays
  4. img = np.array([[0, 50, 100],
  5. [150, 200, 255]], dtype=np.uint8)
  6. #ans: 2x3 grayscale image
  7. print(img.shape)
  8. #ans: (2, 3) - 2 rows, 3 columns
  9. print(img.dtype)
  10. #ans: uint8 - values 0-255

Loading Images

  1. # load a color image
  2. img = cv2.imread('image.jpg')
  3. #ans: loads as BGR format (not RGB!)
  4. print(img.shape)
  5. #ans: (height, width, channels)
  6. height, width, channels = img.shape
  7. #ans: height=rows, width=cols, channels=3
  8. # load as grayscale
  9. gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
  10. #ans: loads as grayscale
  11. print(gray.shape)
  12. #ans: (height, width) - no channel dimension

Accessing Pixels

  1. # accessing individual pixels
  2. img = cv2.imread('image.jpg')
  3. pixel = img[100, 50]
  4. #ans: pixel at row 100, col 50
  5. print(pixel)
  6. #ans: [B, G, R] values (3 numbers)
  7. # modify a pixel
  8. img[100, 50] = [255, 0, 0]
  9. #ans: sets pixel to blue (BGR format)

Creating Blank Images

  1. # create blank black image
  2. black = np.zeros((480, 640, 3), dtype=np.uint8)
  3. #ans: 480x640 black image (all zeros)
  4. # create white image
  5. white = np.ones((480, 640, 3), dtype=np.uint8) * 255
  6. #ans: 480x640 white image (all 255s)
  7. # create colored image
  8. red = np.zeros((480, 640, 3), dtype=np.uint8)
  9. red[:, :, 2] = 255
  10. #ans: red image (3rd channel = red in BGR)

Image Memory Size

Formula: height × width × channels × bytes_per_pixel

Example: 1920×1080 RGB image

  • 1920 × 1080 × 3 × 1 = 6,220,800 bytes ≈ 6.2 MB

Why uint8?

  • Saves memory (1 byte vs 4 bytes for int32)
  • Human eye can't distinguish more than 256 intensity levels
  • Standard for most image formats (JPEG, PNG)

Exercises - Part 1 (Concepts)

  1. # what is the shape?
  2. img = np.zeros((300, 400, 3))
  3. #ans: (300, 400, 3) - 300 rows, 400 cols, 3 channels
  4. # how many pixels total?
  5. #ans: 300 × 400 = 120,000 pixels
  6. # how many values in total?
  7. #ans: 300 × 400 × 3 = 360,000 values
  8. # what color is pixel [255, 255, 0] in BGR?
  9. #ans: yellow (blue=255, green=255, red=0)

Exercises - Part 2 (Concepts)

  1. # why use uint8 instead of int?
  2. #ans: saves memory, 0-255 range sufficient for images
  3. # what's the difference?
  4. gray = np.zeros((100, 100))
  5. color = np.zeros((100, 100, 3))
  6. #ans: gray is 2D, color is 3D with 3 channels
  7. # which is the height?
  8. img.shape returns (480, 640, 3)
  9. #ans: 480 is height (first dimension = rows)
  10. # what happens with value 300 in uint8?
  11. img = np.array([[300]], dtype=np.uint8)
  12. #ans: overflow, 300 % 256 = 44

Exercises - Part 3 (Coding)

  1. # create 100x100 grayscale image
  2. gray = np.zeros((100, 100), dtype=np.uint8)
  3. #ans: black grayscale image
  4. # create 100x100 color image
  5. color = np.zeros((100, 100, 3), dtype=np.uint8)
  6. #ans: black color image with 3 channels
  7. # access pixel at row 10, col 20
  8. img = np.random.randint(0, 256, (50, 50, 3), dtype=np.uint8)
  9. pixel = img[10, 20]
  10. #ans: array with 3 BGR values

Exercises - Part 4 (Coding)

  1. # create green image (BGR format)
  2. green = np.zeros((100, 100, 3), dtype=np.uint8)
  3. green[:, :, 1] = 255
  4. #ans: green channel (index 1) set to 255
  5. # create half-white half-black image
  6. img = np.zeros((100, 100), dtype=np.uint8)
  7. img[:50, :] = 255
  8. #ans: top 50 rows white, bottom 50 rows black
  9. # create solid cyan image (blue + green)
  10. cyan = np.zeros((100, 100, 3), dtype=np.uint8)
  11. cyan[:, :, 0] = 255 # blue
  12. cyan[:, :, 1] = 255 # green
  13. #ans: cyan image [255, 255, 0]

Exercises - Part 5 (Mixed)

  1. # how to get image dimensions?
  2. height = img.shape[0]
  3. width = img.shape[1]
  4. channels = img.shape[2]
  5. #ans: shape gives (height, width, channels)
  6. # what's the memory size of 1024x768 RGB image?
  7. #ans: 1024 × 768 × 3 = 2,359,296 bytes ≈ 2.4 MB
  8. # create image using np.full
  9. img = np.full((100, 100, 3), [255, 0, 0], dtype=np.uint8)
  10. #ans: blue image (all pixels [255, 0, 0])
  11. # what is img.ndim for grayscale?
  12. gray = np.zeros((100, 100))
  13. #ans: 2 (2-dimensional array)

Google tag (gtag.js)