Technology

How Smiles Kept Our Office Safe

Tommi Urtti
November 24th 2020
Earlier this year a Vincit colleague implemented a custom iPad doorbell for our office. You could tap a huge red button on the screen to let everybody know that somebody is at the door. Much like doorbells tend to work.
In practice, a Slack message was sent to our office channel.
Alternatively, you could interact with a list of people that work in our office, select the person, type in your name and the iPad would send that person the message (again through Slack) that you have indeed arrived. This was a more data-rich interaction, but on the other hand, required more touch points on the display.

Touch points become a cause for concern

When COVID-19 hit, there were a lot of initial fears about the virus lingering on surfaces, which meant that touchable user input methods, like touch screens, would need to be cleaned a lot more frequently and preferably avoided.
Most of our commonly used input methods rely heavily on touch. Whether it’s a touchscreen, keyboard, mouse, keypad, joystick, game controller, remote controller, separate button, you name it. All touch-based.
But there are other options out there as well when you put your imagination to work. The iPad has a front-facing camera, which sees the person that is interacting with it. So, we thought, let’s use an expression.
And what expression would you prefer for your visitors to express? Smiles.

How did we do it?

The high-level overview is this
  1. Camera grabs frames continuously
  2. Images are processed to answer the following: Is there a person in the picture and do they appear to be smiling?
  3. If a person is smiling for approximately 2 consecutive seconds we consider the doorbell pressed.

Grabbing the frame

Initializing the camera and grabbing the screen frames is a bit clunky, to be honest. I couldn’t find a nice Swifty API for it, so it’s really legacy, but it works. There’s quite a bit of code involved, but I’ll show the CaptureManager here that is responsible for most of it.
Capture manager runs its camera sample buffer handling in a dedicated DispatchQueue. We’ve set the DispatchQueue to user-initiated QoS level to ensure it gets enough processing priority. Camera FPS is set to 30 here.
// Despite singletons being evil, this class is used as a singleton
class CaptureManager: NSObject {
    internal static let shared = CaptureManager()
    weak var delegate: CaptureManagerDelegate?
    var session: AVCaptureSession?
    var device: AVCaptureDevice?

    override init() {
        super.init()
        session = AVCaptureSession()

        // Setup camera, choosing the correct front camera here
        if let device =  AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front) {
            self.device = device
            print(device.activeFormat.videoSupportedFrameRateRanges)
            let input = try! AVCaptureDeviceInput(device: device)
            session?.addInput(input)

            // Initialize output, setting the appropriate buffer format
            let output = AVCaptureVideoDataOutput()
            output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as AnyHashable as! String: kCVPixelFormatType_32BGRA]
            output.setSampleBufferDelegate(self, queue: avQueue)
            session?.addOutput(output)
        }
    }

    func startSession() {
        avQueue.async {
            self.session?.startRunning()
            self.device?.set(frameRate: CAMERA_FPS)
        }
    }

    func stopSession() {
        avQueue.async {
            self.session?.stopRunning()
        }
    }

    func getImageFromSampleBuffer(sampleBuffer: CMSampleBuffer) -> UIImage? {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            return nil
        }
        CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
        let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
        let width = CVPixelBufferGetWidth(pixelBuffer)
        let height = CVPixelBufferGetHeight(pixelBuffer)
        let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
        let colorSpace = CGColorSpaceCreateDeviceRGB()
        let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue | CGBitmapInfo.byteOrder32Little.rawValue)
        guard let context = CGContext(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue), let cgImage = context.makeImage() else {
            return nil
        }
        let image = UIImage(cgImage: cgImage, scale: 1, orientation: .right)
        CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
        return image
    }
}

extension CaptureManager: AVCaptureVideoDataOutputSampleBufferDelegate {
    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    // Get image from the sample buffer and pass to delegate for processing
        guard let image = getImageFromSampleBuffer(sampleBuffer: sampleBuffer) else { return }
        self.delegate!.processCapturedImage(image: image)
    }
}

Image processing

In a real-time solution, it’s essential that the image processing is done on-device. On our previous-generation iPad, we noticed that the smile detection was too much to handle for processing at the camera FPS rate of 30. The processing power of this iPad could handle 8 FPS, so we’re discarding the other images that are generated by the camera.
You might ask, why not drop the camera FPS rate then?
Well, in our UI experimentation, we’ve been on-and-off about the idea of displaying the camera stream in the UI. And for that 8 FPS would be really clunky. So, with the current solution, we have the flexibility of showing a smooth 30 FPS stream on the screen while only processing at 8 hz.
The image processing and smile detection is done in a separate DispatchQueue to keep the user interface running smoothly during processing. The UI keeps the user updated about what’s going on with a nice animation.
The API that we use to check for faces and smiles is Apple’s own Core Image, which offers a very simple “hasSmile” boolean when the CIDetector is run with the correct parameters.
DispatchQueue.global(qos: .userInitiated).async {
    // Process the frame
    // faceDetector is initialised as CIDetector(ofType:  CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy: CIDetectorAccuracyHigh])!)
    let smileDetected = self.faceDetector
        .features(in: CIImage(cgImage: image.cgImage!), options: [CIDetectorSmile: true])
        .reduce(false, { result, face in
            if let face = face as? CIFaceFeature {
                return face.hasSmile || result
            }
            return result
        })

        // Update smiley states
        if smileDetected {
            incrementSmile(image : image)
        } else {
            decrementSmile(image : image)
        }
    }
}

Slack notification

Finally, when we determine there’s been an intentional smile at the camera, we trigger the Slack notification.
If you’ve ever built Slack integrations that post to Slack it probably looks very similar. There’s a custom Slack app that we’ve built that you can add to the channels you wish to use. Then we use Slack API files.upload and chat.postMessage endpoints to upload an image of the visitor and post a message that they have arrived.

The dramatic plot twist!

Quite ironically, while one of the catalysts for this project was COVID, it also represents one of the challenges for this technology.
Until image processing gets better at detecting a smile around the eyes, masks effectively prevent detecting the visitor’s smile using the above code. So, the visitor will need to remove their mask for a moment or fallback to the touch interface.

Want invites to cool events and things?

Boom, Newsletter sign up form.