Perspective Transform

Document Scanning

Problem: Photo of document from angle

Solution: Perspective transform to frontal view

Steps:

  1. Detect document corners (4 points)
  2. Define output rectangle
  3. Compute homography
  4. Warp to frontal view

Result: Rectangular, readable document

Finding Document Corners

  1. import cv2
  2. import numpy as np
  3. gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  4. # threshold to binary
  5. _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
  6. # find contours
  7. #ans: contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
  8. # find largest contour
  9. #ans: largest = max(contours, key=cv2.contourArea)
  10. # approximate to polygon
  11. epsilon = 0.02 * cv2.arcLength(largest, True)
  12. #ans: approx = cv2.approxPolyDP(largest, epsilon, True)
  13. #ans: if len(approx) == 4, it's a rectangle (document)

Ordering Corner Points

  1. # corners may be in any order, need to order: TL, TR, BR, BL
  2. def order_points(pts):
  3. rect = np.zeros((4, 2), dtype='float32')
  4. # sum: TL has smallest, BR has largest
  5. s = pts.sum(axis=1)
  6. rect[0] = pts[np.argmin(s)] # top-left
  7. rect[2] = pts[np.argmax(s)] # bottom-right
  8. # diff: TR has smallest, BL has largest
  9. diff = np.diff(pts, axis=1)
  10. rect[1] = pts[np.argmin(diff)] # top-right
  11. rect[3] = pts[np.argmax(diff)] # bottom-left
  12. return rect
  13. #ans: ordered_pts = order_points(approx.reshape(4, 2))

Computing Output Size

  1. # compute width and height of output
  2. #ans: (tl, tr, br, bl) = ordered_pts
  3. # width = max of top and bottom widths
  4. #ans: widthA = np.sqrt((br[0] - bl[0])**2 + (br[1] - bl[1])**2)
  5. #ans: widthB = np.sqrt((tr[0] - tl[0])**2 + (tr[1] - tl[1])**2)
  6. #ans: maxWidth = int(max(widthA, widthB))
  7. # height = max of left and right heights
  8. #ans: heightA = np.sqrt((tr[0] - br[0])**2 + (tr[1] - br[1])**2)
  9. #ans: heightB = np.sqrt((tl[0] - bl[0])**2 + (tl[1] - bl[1])**2)
  10. #ans: maxHeight = int(max(heightA, heightB))

Complete Document Scan

  1. # source points (detected corners)
  2. src = ordered_pts
  3. # destination points (rectangle)
  4. #ans: dst = np.array([[0, 0],
  5. #ans: [maxWidth - 1, 0],
  6. #ans: [maxWidth - 1, maxHeight - 1],
  7. #ans: [0, maxHeight - 1]], dtype='float32')
  8. # compute perspective transform
  9. #ans: M = cv2.getPerspectiveTransform(src, dst)
  10. # apply transform
  11. #ans: warped = cv2.warpPerspective(img, M, (maxWidth, maxHeight))

Bird's Eye View

Purpose: Top-down view of scene

Use case: Lane detection, parking assistance, sports analysis

Method: Same as document scan

  1. Define region of interest (trapezoid on road)
  2. Map to rectangle (top-down view)

Bird's Eye View Example

  1. # road region (trapezoid)
  2. h, w = img.shape[:2]
  3. #ans: src = np.float32([[w*0.45, h*0.6],
  4. #ans: [w*0.55, h*0.6],
  5. #ans: [w*0.9, h],
  6. #ans: [w*0.1, h]])
  7. # bird's eye view (rectangle)
  8. #ans: dst = np.float32([[0, 0],
  9. #ans: [w, 0],
  10. #ans: [w, h],
  11. #ans: [0, h]])
  12. # transform
  13. M = cv2.getPerspectiveTransform(src, dst)
  14. #ans: birds_eye = cv2.warpPerspective(img, M, (w, h))

Inverse Perspective

Purpose: Map bird's eye view back to original perspective

Method: Use inverse of homography matrix

  1. # forward transform
  2. M = cv2.getPerspectiveTransform(src, dst)
  3. birds_eye = cv2.warpPerspective(img, M, (w, h))
  4. # inverse transform
  5. #ans: M_inv = cv2.getPerspectiveTransform(dst, src)
  6. #ans: original = cv2.warpPerspective(birds_eye, M_inv, (w, h))
  7. # or use numpy inverse
  8. #ans: M_inv = np.linalg.inv(M)

Exercises - Part 1 (Concepts)

  1. # what is perspective transform used for?
  2. #ans: document scanning, bird's eye view, rectification
  3. # how many points needed?
  4. #ans: 4 point pairs (source and destination)
  5. # what is homography?
  6. #ans: 3×3 perspective transformation matrix
  7. # why order corner points?
  8. #ans: ensure correct TL, TR, BR, BL correspondence
  9. # what is bird's eye view?
  10. #ans: top-down perspective of scene

Exercises - Part 2 (Concepts)

  1. # how to compute output rectangle size?
  2. #ans: measure distances in source, take max width/height
  3. # what is inverse perspective?
  4. #ans: transform back from warped to original view
  5. # how to get inverse homography?
  6. #ans: cv2.getPerspectiveTransform(dst, src) or np.linalg.inv(M)
  7. # document scan steps?
  8. #ans: detect corners, order points, compute homography, warp

Exercises - Part 3 (Coding)

  1. # perspective transform with 4 points
  2. pts1 = np.float32([[56, 65], [368, 52], [28, 387], [389, 390]])
  3. pts2 = np.float32([[0, 0], [300, 0], [0, 300], [300, 300]])
  4. #ans: M = cv2.getPerspectiveTransform(pts1, pts2)
  5. #ans: result = cv2.warpPerspective(img, M, (300, 300))

Exercises - Part 4 (Coding)

  1. # order points function
  2. def order_points(pts):
  3. rect = np.zeros((4, 2), dtype='float32')
  4. s = pts.sum(axis=1)
  5. #ans: rect[0] = pts[np.argmin(s)] # TL
  6. #ans: rect[2] = pts[np.argmax(s)] # BR
  7. diff = np.diff(pts, axis=1)
  8. #ans: rect[1] = pts[np.argmin(diff)] # TR
  9. #ans: rect[3] = pts[np.argmax(diff)] # BL
  10. return rect

Exercises - Part 5 (Mixed)

  1. # bird's eye view transform
  2. h, w = img.shape[:2]
  3. src = np.float32([[w*0.45, h*0.6], [w*0.55, h*0.6], [w*0.9, h], [w*0.1, h]])
  4. dst = np.float32([[0, 0], [w, 0], [w, h], [0, h]])
  5. #ans: M = cv2.getPerspectiveTransform(src, dst)
  6. #ans: birds_eye = cv2.warpPerspective(img, M, (w, h))
  7. # inverse transform
  8. #ans: M_inv = cv2.getPerspectiveTransform(dst, src)
  9. #ans: original = cv2.warpPerspective(birds_eye, M_inv, (w, h))

Google tag (gtag.js)