WWDC21 · 18 min · AI & Machine Learning

Detect people, faces, and poses using Vision

Discover the latest updates to the Vision framework to help your apps detect people, faces, and poses. Meet the Person Segmentation API, which helps your app separate people in images from their surroundings, and explore the latest contiguous metrics for tracking pitch, yaw, and the roll of the human head. And learn how these capabilities can be combined with other APIs like Core Image to deliver anything from simple virtual backgrounds to rich offline compositing in an image-editing app. To get the most out of this session, we recommend watching “Detect Body and Hand Pose with Vision” from WWDC20 and “Understanding Images in Vision Framework” from WWDC19. To learn even more about people analysis, see “Detect Body and Hand Pose with Vision” from WWDC20 and “Understanding Images in Vision Framework” from WWDC19.

Watch at developer.apple.com ↗

Transcript all transcripts

Code shown on screen · 6 snippets

Get segmentation mask from an image swift · at 8:13 ↗

// Create request 
let request = VNGeneratePersonSegmentationRequest()

// Create request handler
let requestHandler = VNImageRequestHandler(url: imageURL, options: options)

// Process request
try requestHandler.perform([request])

// Review results
let mask = request.results!.first!
let maskBuffer = mask.pixelBuffer

Configuring the segmentation request swift · at 8:33 ↗

let request = VNGeneratePersonSegmentationRequest()

request.revision = 
VNGeneratePersonSegmentationRequestRevision1

request.qualityLevel = 
VNGeneratePersonSegmentationRequest.QualityLevel.accurate

request.outputPixelFormat = 
kCVPixelFormatType_OneComponent8

Applying a segmentation mask swift · at 12:24 ↗

let input = CIImage?(contentsOf: imageUrl)!
let mask = CIImage(cvPixelBuffer: maskBuffer)
let background = CIImage?(contentsOf: backgroundImageUrl)!

let maskScaleX = input.extent.width / mask.extent.width
let maskScaleY = input.extent.height / mask.extent.height
let maskScaled = mask.transformed(by: __CGAffineTransformMake(
                                  maskScaleX, 0, 0, maskScaleY, 0, 0))

let backgroundScaleX = input.extent.width / background.extent.width
let backgroundScaleY = input.extent.height / background.extent.height
let backgroundScaled = background.transformed(by: __CGAffineTransformMake(
                          backgroundScaleX, 0, 0, backgroundScaleY, 0, 0))

let blendFilter = CIFilter.blendWithRedMask()
blendFilter.inputImage = input
blendFilter.backgroundImage = backgroundScaled 
blendFilter.maskImage = maskScaled

let blendedImage = blendFilter.outputImage

Segmentation from AVCapture swift · at 14:37 ↗

private let photoOutput = AVCapturePhotoOutput()
…
if self.photoOutput.isPortraitEffectsMatteDeliverySupported {
   self.photoOutput.isPortraitEffectsMatteDeliveryEnabled = true
}

open class AVCapturePhoto {
…
var portraitEffectsMatte: AVPortraitEffectsMatte? { get } // nil if no people in the scene
…
}

Segmentation in ARKit swift · at 14:58 ↗

if ARWorldTrackingConfiguration.supportsFrameSemantics(.personSegmentationWithDepth) {
// Proceed with getting Person Segmentation Mask
…
}

open class ARFrame {
…
var segmentationBuffer: CVPixelBuffer? { get }
…
}

Segmentation in CoreImage swift · at 15:31 ↗

let input = CIImage?(contentsOf: imageUrl)!

let segmentationFilter = CIFilter.personSegmentation()
segmentationFilter.inputImage = input

let mask = segmentationFilter.outputImage

Resources

[documentation] Applying Matte Effects to People in Images and Video
[documentation] Vision

What’s new in Vision

WWDC22

20 min
Classify hand poses and actions with Create ML

WWDC21 · 5 snippets

27 min
Detect Body and Hand Pose with Vision

WWDC20 · 10 snippets

24 min
Understanding Images in Vision Framework

WWDC19

40 min

Code shown on screen · 6 snippets

Resources

Related sessions

What’s new in Vision

Classify hand poses and actions with Create ML

Detect Body and Hand Pose with Vision

Understanding Images in Vision Framework