Face tracking with AVFoundation

Paweł Chmiel

Paweł Chmiel

Face tracking is an interesting feature which is available on iOS since it’s 5th version. In this tutorial, I would like to show you how to implement it in Swift 3.0.

The biggest issue with the implementation of this feature is a fact that you have to implement camera support by using AVFoundation which is an alternative for UIImagePicker but it’s much more customizable and allow you for doing almost everything with your cameras, and also requires a bit more time…

Ok so let’s code something.

First of all, we can create an instance of CIDetector object which will be used later for our face-detecting features. We have to create it by setting detector type to CIDetectorTypeFace (in the same way we are able to detect rectangles, QR codes or texts) and also specify it’s accuracy (which can be low or high).

Now it’s time to focus on camera support implementation.  We have to create AVCaptureSession instance and set the sessionPreset which defines the quality of captured images.

Next step will be getting an access to AVCaptureDevice. For face tracking purpose we will use the front camera. We can get it by filtering the list of all possible devices (cameras) in our real-device.

Once we have an instance of captureDevice (front camera) we have to create AVCaptureDeviceInput which will be added to our AVCaptureSession.

We should lock the session for our changes by using beginConfiguration method.

Good and safe way to add a new input to our session is checking if we are able to add it before using addInput method.

If our input is created we have to create output to capture data from the camera.

We can use AVCaptureVideoDataOutput instance for that. We should also add some video settings like pixels format type, and add it in the same way as in the case of input.

Once we are finished with the configuration we have to call commitConfiguration to let our session instance know that everything is set.

Because our output will be collecting data all the time once session will be running, we have to create a special dispatch queue for it.

Next, we have to implement AVCaptureVideoDataOutputSampleBufferDelegate to have an access to raw data which is gathered by the camera.

Now the magic begins…

Our face detector is able to look for features in the instance of CIImage so we have to convert our sampleBuffer form delegate method to it.

By features I mean mouths, eyes, heads (yes, we can detect for more than one person at once).

To get CIImage on which faceDetector will be looking for features, we have to use CMSampleBuggerGetImageBuffer.

We also have to create an options object which will define what exactly on our faces we have to be looking for.

In example app which is available on our GitHub (link at the bottom of page), I focused on detecting smile and eyes blink.

Once we have our ciImage and options set up, we can start real tracking.  CIDetector object has a function called features which returns an array with all found features.

Now we can loop through the array to examine the bounds of each face and each feature in the faces. I’m focused only on displaying  details about one person.

To have an access to properties like mouthPosition, hasSmile or left/right eye closed we need to cast feature to CIFaceFeature before.

Inside for loop, I have also helper function which can calculate proper face rest and update label.

So as you can see here, the implementation of face features tracking is really easy but only once AVFoundation camera support is implemented.

The post is written by Droids On Roids Team Member.
We would love to take care of your app.

Leave comment

  • David Jenness

    Hi Paweł,
    I am experimenting with your code and find it very cool! I wonder if you can help me with a problem I am having. I would like to be able to click on the red square of the person when they are highlighted which will then crop their face image to be used in another method I have.

    I have most of it working, but I find that for some reason, the touchesBegan method isn’t called very reliably within the parameters of the red box. It is called some of the time, but it seems like the camera interferes with it. When I slow down the frame rate of the capture device (in the commented lines), it seems to be a little more reliable, but still not 100%.

    Here is my ViewController code which will show an XY coordinate if you click outside the red box, and should also show the message “Click Detected within Face Bounds” if you are in the box..]

    Have you seen problems like this where the touchesBegan event doesn’t work on the face detected events? Thanks for any insight you can provide and keep up the awesome code samples!


    Here are my 2 files:

    //// Globals.swift
    import UIKit

    var CapturedImage:UIImage = UIImage();
    var CapturedFaceRect:CGRect = CGRect()
    var wasEventCaptured:Bool = false

    // ViewController.swift
    // AutoCamera
    // Created by Pawel Chmiel on 26.09.2016.
    // Copyright © 2016 Pawel Chmiel. All rights reserved.

    import Foundation
    import AVFoundation
    import UIKit

    class ViewController: UIViewController {

    var session: AVCaptureSession?
    var stillOutput = AVCaptureStillImageOutput()
    var borderLayer: CAShapeLayer?

    let detailsView: DetailsView = {
    let detailsView = DetailsView()
    return detailsView

    lazy var previewLayer: AVCaptureVideoPreviewLayer? = {
    var previewLay = AVCaptureVideoPreviewLayer(session: self.session!)
    previewLay?.videoGravity = AVLayerVideoGravityResizeAspectFill
    return previewLay

    lazy var frontCamera: AVCaptureDevice? = {
    guard let devices = AVCaptureDevice.devices(withMediaType: AVMediaTypeVideo) as? [AVCaptureDevice] else { return nil }
    return devices.filter { $0.position == .front }.first

    let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy : CIDetectorAccuracyHigh])

    override func touchesBegan(_ touches: Set, with event: UIEvent?) {
    if let touch = touches.first

    let position:CGPoint = touch.location(in: detailsView)
    if (wasEventCaptured){
    CapturedImage = imageRotatedByDegrees(oldImage: CapturedImage, deg: 90.0)
    wasEventCaptured = false


    func imageRotatedByDegrees(oldImage: UIImage, deg degrees: CGFloat) -> UIImage {
    //Calculate the size of the rotated view’s containing box for our drawing space
    let rotatedViewBox: UIView = UIView(frame: CGRect(x: 0, y: 0, width: oldImage.size.width, height: oldImage.size.height))
    let t: CGAffineTransform = CGAffineTransform(rotationAngle: degrees * CGFloat(Double.pi / 180))
    rotatedViewBox.transform = t
    let rotatedSize: CGSize = rotatedViewBox.frame.size
    //Create the bitmap context
    let bitmap: CGContext = UIGraphicsGetCurrentContext()!
    //Move the origin to the middle of the image so we will rotate and scale around the center.
    bitmap.translateBy(x: rotatedSize.width / 2, y: rotatedSize.height / 2)
    //Rotate the image context
    bitmap.rotate(by: (degrees * CGFloat(Double.pi / 180)))
    //Now, draw the rotated/scaled image into the context
    bitmap.scaleBy(x: 1.0, y: -1.0)
    bitmap.draw(oldImage.cgImage!, in: CGRect(x: -oldImage.size.width / 2, y: -oldImage.size.height / 2, width: oldImage.size.width, height: oldImage.size.height))
    let newImage: UIImage = UIGraphicsGetImageFromCurrentImageContext()!
    return newImage

    override func viewDidLayoutSubviews() {
    previewLayer?.frame = view.frame

    override func viewDidAppear(_ animated: Bool) {
    guard let previewLayer = previewLayer else { return }


    override func viewDidLoad() {
    self.view.isUserInteractionEnabled = true

    class DetailsView: UIView {

    override func touchesBegan(_ touches: Set, with event: UIEvent?) {
    print(“Click Detected within Face Bounds”)
    wasEventCaptured = true
    super.touchesBegan(touches, with: event)

    func setup() {
    layer.borderColor =
    layer.borderWidth = 5.0


    extension ViewController {

    func sessionPrepare() {
    session = AVCaptureSession()

    guard let session = session, let captureDevice = frontCamera else { return }

    session.sessionPreset = AVCaptureSessionPresetPhoto

    do {

    let deviceInput = try AVCaptureDeviceInput(device: captureDevice)

    if session.canAddInput(deviceInput) {

    //Framerate throttle
    // try captureDevice.lockForConfiguration()
    // captureDevice.activeVideoMinFrameDuration = CMTimeMake(1, 2)
    // captureDevice.activeVideoMaxFrameDuration = CMTimeMake(1, 2)
    // captureDevice.unlockForConfiguration()

    let output = AVCaptureVideoDataOutput()
    output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : NSNumber(value: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]

    output.alwaysDiscardsLateVideoFrames = false

    if session.canAddOutput(output) {


    let queue = DispatchQueue(label: “output.queue”)
    output.setSampleBufferDelegate(self, queue: queue)

    } catch {
    print(“error with creating AVCaptureDeviceInput”)

    extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {

    func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
    let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
    let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
    let options: [String : Any] = [CIDetectorImageOrientation: exifOrientation(orientation: UIDevice.current.orientation),
    CIDetectorSmile: true,
    CIDetectorEyeBlink: true]
    let allFeatures = faceDetector?.features(in: ciImage, options: options)

    let formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer)
    let cleanAperture = CMVideoFormatDescriptionGetCleanAperture(formatDescription!, false)

    guard let features = allFeatures else { return }

    for feature in features {
    if let faceFeature = feature as? CIFaceFeature {
    let faceRect = calculateFaceRect(facePosition: faceFeature.mouthPosition, faceBounds: faceFeature.bounds, clearAperture: cleanAperture)
    // let featureDetails = [“has smile: (faceFeature.hasSmile)”,
    // “has closed left eye: (faceFeature.leftEyeClosed)”,
    // “has closed right eye: (faceFeature.rightEyeClosed)”]
    //update(with: faceRect, text: featureDetails.joined(separator: “n”))
    update(with: faceRect, text: “”)


    if features.count == 0 {
    DispatchQueue.main.async {
    self.detailsView.alpha = 0.0
    //Capture Image contained within Bounds
    let myOrigin = CapturedFaceRect.origin
    let myTopRight = CGPoint(x: CapturedFaceRect.maxX, y: CapturedFaceRect.minY)
    let myBottomLeft = CGPoint(x: CapturedFaceRect.minX, y: CapturedFaceRect.maxY)
    let myBottomRight = CGPoint(x: CapturedFaceRect.maxX, y: CapturedFaceRect.maxY)

    let croppedFace = cropFaceForPoints(image: ciImage, topLeft: myOrigin, topRight: myTopRight, bottomLeft: myBottomLeft, bottomRight: myBottomRight)
    CapturedImage = convert(cmage: croppedFace)



    func cropFaceForPoints(image: CIImage, topLeft: CGPoint, topRight: CGPoint, bottomLeft: CGPoint, bottomRight: CGPoint) -> CIImage {

    var newImage: CIImage
    newImage = image.applyingFilter(
    withInputParameters: [
    “inputExtent”: CIVector(cgRect: image.extent),
    “inputTopLeft”: CIVector(cgPoint: topLeft),
    “inputTopRight”: CIVector(cgPoint: topRight),
    “inputBottomLeft”: CIVector(cgPoint: bottomLeft),
    “inputBottomRight”: CIVector(cgPoint: bottomRight)])
    newImage = image.cropping(to: newImage.extent)

    return newImage

    func convert(cmage:CIImage) -> UIImage
    let context:CIContext = CIContext.init(options: nil)
    let cgImage:CGImage = context.createCGImage(cmage, from: cmage.extent)!
    let image:UIImage = UIImage.init(cgImage: cgImage)
    return image

    func exifOrientation(orientation: UIDeviceOrientation) -> Int {
    switch orientation {
    case .portraitUpsideDown:
    return 8
    case .landscapeLeft:
    return 3
    case .landscapeRight:
    return 1
    return 6

    func videoBox(frameSize: CGSize, apertureSize: CGSize) -> CGRect {
    let apertureRatio = apertureSize.height / apertureSize.width
    let viewRatio = frameSize.width / frameSize.height

    var size =

    if (viewRatio > apertureRatio) {
    size.width = frameSize.width
    size.height = apertureSize.width * (frameSize.width / apertureSize.height)
    } else {
    size.width = apertureSize.height * (frameSize.height / apertureSize.width)
    size.height = frameSize.height

    var videoBox = CGRect(origin: .zero, size: size)

    if (size.width < frameSize.width) {
    videoBox.origin.x = (frameSize.width – size.width) / 2.0
    } else {
    videoBox.origin.x = (size.width – frameSize.width) / 2.0

    if (size.height CGRect {
    let parentFrameSize = previewLayer!.frame.size
    let previewBox = videoBox(frameSize: parentFrameSize, apertureSize: clearAperture.size)

    var faceRect = faceBounds

    swap(&faceRect.size.width, &faceRect.size.height)
    swap(&faceRect.origin.x, &faceRect.origin.y)

    let widthScaleBy = previewBox.size.width / clearAperture.size.height
    let heightScaleBy = previewBox.size.height / clearAperture.size.width

    faceRect.size.width *= widthScaleBy
    faceRect.size.height *= heightScaleBy
    faceRect.origin.x *= widthScaleBy
    faceRect.origin.y *= heightScaleBy

    faceRect = faceRect.offsetBy(dx: 0.0, dy: previewBox.origin.y)
    let frame = CGRect(x: parentFrameSize.width – faceRect.origin.x – faceRect.size.width / 2.0 – previewBox.origin.x / 2.0, y: faceRect.origin.y, width: faceRect.width, height: faceRect.height)
    CapturedFaceRect = faceBounds

    return frame

    extension ViewController {
    func update(with faceRect: CGRect, text: String) {
    DispatchQueue.main.async {
    UIView.animate(withDuration: 0.2) {
    //self.detailsView.detailsLabel.text = text
    self.detailsView.alpha = 1.0
    self.detailsView.frame = faceRect

  • 倪望龙

    hi,Paweł Chmiel.I am interested in -(CGRect)calculateFaceRectFacePosition:(CGPoint)facePosition FaceBounds:(CGRect)faceBounds ClearAperture:(CGRect)clearAperture this function can you elaborate on the principle?

  • yarn