mediapipe/docs/solutions/face_detection.md
MediaPipe Team cf101e62a9 Project import generated by Copybara.
GitOrigin-RevId: 7e1d382a1788ebd8412c5626581b4c4cf2fe75ea
2021-11-16 14:32:04 -05:00

19 KiB

layout title parent nav_order
default Face Detection Solutions 1

MediaPipe Face Detection

{: .no_toc }

Table of contents {: .text-delta } 1. TOC {:toc}
---

Overview

MediaPipe Face Detection is an ultrafast face detection solution that comes with 6 landmarks and multi-face support. It is based on BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference. The detector's super-realtime performance enables it to be applied to any live viewfinder experience that requires an accurate facial region of interest as an input for other task-specific models, such as 3D facial keypoint or geometry estimation (e.g., MediaPipe Face Mesh), facial features or expression classification, and face region segmentation. BlazeFace uses a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression. For more information about BlazeFace, please see the Resources section.

face_detection_android_gpu.gif

Solution APIs

Configuration Options

Naming style and availability may differ slightly across platforms/languages.

model_selection

An integer index 0 or 1. Use 0 to select a short-range model that works best for faces within 2 meters from the camera, and 1 for a full-range model best for faces within 5 meters. For the full-range option, a sparse model is used for its improved inference speed. Please refer to the model cards for details. Default to 0 if not specified.

min_detection_confidence

Minimum confidence value ([0.0, 1.0]) from the face detection model for the detection to be considered successful. Default to 0.5.

Output

Naming style may differ slightly across platforms/languages.

detections

Collection of detected faces, where each face is represented as a detection proto message that contains a bounding box and 6 key points (right eye, left eye, nose tip, mouth center, right ear tragion, and left ear tragion). The bounding box is composed of xmin and width (both normalized to [0.0, 1.0] by the image width) and ymin and height (both normalized to [0.0, 1.0] by the image height). Each key point is composed of x and y, which are normalized to [0.0, 1.0] by the image width and height respectively.

Python Solution API

Please first follow general instructions to install MediaPipe Python package, then learn more in the companion Python Colab and the usage example below.

Supported configuration options:

import cv2
import mediapipe as mp
mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils

# For static images:
IMAGE_FILES = []
with mp_face_detection.FaceDetection(
    model_selection=1, min_detection_confidence=0.5) as face_detection:
  for idx, file in enumerate(IMAGE_FILES):
    image = cv2.imread(file)
    # Convert the BGR image to RGB and process it with MediaPipe Face Detection.
    results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    # Draw face detections of each face.
    if not results.detections:
      continue
    annotated_image = image.copy()
    for detection in results.detections:
      print('Nose tip:')
      print(mp_face_detection.get_key_point(
          detection, mp_face_detection.FaceKeyPoint.NOSE_TIP))
      mp_drawing.draw_detection(annotated_image, detection)
    cv2.imwrite('/tmp/annotated_image' + str(idx) + '.png', annotated_image)

# For webcam input:
cap = cv2.VideoCapture(0)
with mp_face_detection.FaceDetection(
    model_selection=0, min_detection_confidence=0.5) as face_detection:
  while cap.isOpened():
    success, image = cap.read()
    if not success:
      print("Ignoring empty camera frame.")
      # If loading a video, use 'break' instead of 'continue'.
      continue

    # To improve performance, optionally mark the image as not writeable to
    # pass by reference.
    image.flags.writeable = False
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    results = face_detection.process(image)

    # Draw the face detection annotations on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    if results.detections:
      for detection in results.detections:
        mp_drawing.draw_detection(image, detection)
    # Flip the image horizontally for a selfie-view display.
    cv2.imshow('MediaPipe Face Detection', cv2.flip(image, 1))
    if cv2.waitKey(5) & 0xFF == 27:
      break
cap.release()

JavaScript Solution API

Please first see general introduction on MediaPipe in JavaScript, then learn more in the companion web demo and the following usage example.

Supported configuration options:

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/face_detection/face_detection.js" crossorigin="anonymous"></script>
</head>

<body>
  <div class="container">
    <video class="input_video"></video>
    <canvas class="output_canvas" width="1280px" height="720px"></canvas>
  </div>
</body>
</html>
<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');

function onResults(results) {
  // Draw the overlays.
  canvasCtx.save();
  canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
  canvasCtx.drawImage(
      results.image, 0, 0, canvasElement.width, canvasElement.height);
  if (results.detections.length > 0) {
    drawingUtils.drawRectangle(
        canvasCtx, results.detections[0].boundingBox,
        {color: 'blue', lineWidth: 4, fillColor: '#00000000'});
    drawingUtils.drawLandmarks(canvasCtx, results.detections[0].landmarks, {
      color: 'red',
      radius: 5,
    });
  }
  canvasCtx.restore();
}

const faceDetection = new FaceDetection({locateFile: (file) => {
  return `https://cdn.jsdelivr.net/npm/@mediapipe/face_detection@0.0/${file}`;
}});
faceDetection.setOptions({
  modelSelection: 0,
  minDetectionConfidence: 0.5
});
faceDetection.onResults(onResults);

const camera = new Camera(videoElement, {
  onFrame: async () => {
    await faceDetection.send({image: videoElement});
  },
  width: 1280,
  height: 720
});
camera.start();
</script>

Android Solution API

Please first follow general instructions to add MediaPipe Gradle dependencies and try the Android Solution API in the companion example Android Studio project, and learn more in the usage example below.

Supported configuration options:

Camera Input

// For camera input and result rendering with OpenGL.
FaceDetectionOptions faceDetectionOptions =
    FaceDetectionOptions.builder()
        .setStaticImageMode(false)
        .setModelSelection(0).build();
FaceDetection faceDetection = new FaceDetection(this, faceDetectionOptions);
faceDetection.setErrorListener(
    (message, e) -> Log.e(TAG, "MediaPipe Face Detection error:" + message));

// Initializes a new CameraInput instance and connects it to MediaPipe Face Detection Solution.
CameraInput cameraInput = new CameraInput(this);
cameraInput.setNewFrameListener(
    textureFrame -> faceDetection.send(textureFrame));

// Initializes a new GlSurfaceView with a ResultGlRenderer<FaceDetectionResult> instance
// that provides the interfaces to run user-defined OpenGL rendering code.
// See mediapipe/examples/android/solutions/facedetection/src/main/java/com/google/mediapipe/examples/facedetection/FaceDetectionResultGlRenderer.java
// as an example.
SolutionGlSurfaceView<FaceDetectionResult> glSurfaceView =
    new SolutionGlSurfaceView<>(
        this, faceDetection.getGlContext(), faceDetection.getGlMajorVersion());
glSurfaceView.setSolutionResultRenderer(new FaceDetectionResultGlRenderer());
glSurfaceView.setRenderInputImage(true);
faceDetection.setResultListener(
    faceDetectionResult -> {
      if (faceDetectionResult.multiFaceDetections().isEmpty()) {
        return;
      }
      RelativeKeypoint noseTip =
          faceDetectionResult
              .multiFaceDetections()
              .get(0)
              .getLocationData()
              .getRelativeKeypoints(FaceKeypoint.NOSE_TIP);
      Log.i(
          TAG,
          String.format(
              "MediaPipe Face Detection nose tip normalized coordinates (value range: [0, 1]): x=%f, y=%f",
              noseTip.getX(), noseTip.getY()));
      // Request GL rendering.
      glSurfaceView.setRenderData(faceDetectionResult);
      glSurfaceView.requestRender();
    });

// The runnable to start camera after the GLSurfaceView is attached.
glSurfaceView.post(
    () ->
        cameraInput.start(
            this,
            faceDetection.getGlContext(),
            CameraInput.CameraFacing.FRONT,
            glSurfaceView.getWidth(),
            glSurfaceView.getHeight()));

Image Input

// For reading images from gallery and drawing the output in an ImageView.
FaceDetectionOptions faceDetectionOptions =
    FaceDetectionOptions.builder()
        .setStaticImageMode(true)
        .setModelSelection(0).build();
FaceDetection faceDetection = new FaceDetection(this, faceDetectionOptions);

// Connects MediaPipe Face Detection Solution to the user-defined ImageView
// instance that allows users to have the custom drawing of the output landmarks
// on it. See mediapipe/examples/android/solutions/facedetection/src/main/java/com/google/mediapipe/examples/facedetection/FaceDetectionResultImageView.java
// as an example.
FaceDetectionResultImageView imageView = new FaceDetectionResultImageView(this);
faceDetection.setResultListener(
    faceDetectionResult -> {
      if (faceDetectionResult.multiFaceDetections().isEmpty()) {
        return;
      }
      int width = faceDetectionResult.inputBitmap().getWidth();
      int height = faceDetectionResult.inputBitmap().getHeight();
      RelativeKeypoint noseTip =
          faceDetectionResult
              .multiFaceDetections()
              .get(0)
              .getLocationData()
              .getRelativeKeypoints(FaceKeypoint.NOSE_TIP);
      Log.i(
          TAG,
          String.format(
              "MediaPipe Face Detection nose tip coordinates (pixel values): x=%f, y=%f",
              noseTip.getX() * width, noseTip.getY() * height));
      // Request canvas drawing.
      imageView.setFaceDetectionResult(faceDetectionResult);
      runOnUiThread(() -> imageView.update());
    });
faceDetection.setErrorListener(
    (message, e) -> Log.e(TAG, "MediaPipe Face Detection error:" + message));

// ActivityResultLauncher to get an image from the gallery as Bitmap.
ActivityResultLauncher<Intent> imageGetter =
    registerForActivityResult(
        new ActivityResultContracts.StartActivityForResult(),
        result -> {
          Intent resultIntent = result.getData();
          if (resultIntent != null && result.getResultCode() == RESULT_OK) {
            Bitmap bitmap = null;
            try {
              bitmap =
                  MediaStore.Images.Media.getBitmap(
                      this.getContentResolver(), resultIntent.getData());
              // Please also rotate the Bitmap based on its orientation.
            } catch (IOException e) {
              Log.e(TAG, "Bitmap reading error:" + e);
            }
            if (bitmap != null) {
              faceDetection.send(bitmap);
            }
          }
        });
Intent pickImageIntent = new Intent(Intent.ACTION_PICK);
pickImageIntent.setDataAndType(MediaStore.Images.Media.INTERNAL_CONTENT_URI, "image/*");
imageGetter.launch(pickImageIntent);

Video Input

// For video input and result rendering with OpenGL.
FaceDetectionOptions faceDetectionOptions =
    FaceDetectionOptions.builder()
        .setStaticImageMode(false)
        .setModelSelection(0).build();
FaceDetection faceDetection = new FaceDetection(this, faceDetectionOptions);
faceDetection.setErrorListener(
    (message, e) -> Log.e(TAG, "MediaPipe Face Detection error:" + message));

// Initializes a new VideoInput instance and connects it to MediaPipe Face Detection Solution.
VideoInput videoInput = new VideoInput(this);
videoInput.setNewFrameListener(
    textureFrame -> faceDetection.send(textureFrame));

// Initializes a new GlSurfaceView with a ResultGlRenderer<FaceDetectionResult> instance
// that provides the interfaces to run user-defined OpenGL rendering code.
// See mediapipe/examples/android/solutions/facedetection/src/main/java/com/google/mediapipe/examples/facedetection/FaceDetectionResultGlRenderer.java
// as an example.
SolutionGlSurfaceView<FaceDetectionResult> glSurfaceView =
    new SolutionGlSurfaceView<>(
        this, faceDetection.getGlContext(), faceDetection.getGlMajorVersion());
glSurfaceView.setSolutionResultRenderer(new FaceDetectionResultGlRenderer());
glSurfaceView.setRenderInputImage(true);

faceDetection.setResultListener(
    faceDetectionResult -> {
      if (faceDetectionResult.multiFaceDetections().isEmpty()) {
        return;
      }
      RelativeKeypoint noseTip =
          faceDetectionResult
              .multiFaceDetections()
              .get(0)
              .getLocationData()
              .getRelativeKeypoints(FaceKeypoint.NOSE_TIP);
      Log.i(
          TAG,
          String.format(
              "MediaPipe Face Detection nose tip normalized coordinates (value range: [0, 1]): x=%f, y=%f",
              noseTip.getX(), noseTip.getY()));
      // Request GL rendering.
      glSurfaceView.setRenderData(faceDetectionResult);
      glSurfaceView.requestRender();
    });

ActivityResultLauncher<Intent> videoGetter =
    registerForActivityResult(
        new ActivityResultContracts.StartActivityForResult(),
        result -> {
          Intent resultIntent = result.getData();
          if (resultIntent != null) {
            if (result.getResultCode() == RESULT_OK) {
              glSurfaceView.post(
                  () ->
                      videoInput.start(
                          this,
                          resultIntent.getData(),
                          faceDetection.getGlContext(),
                          glSurfaceView.getWidth(),
                          glSurfaceView.getHeight()));
            }
          }
        });
Intent pickVideoIntent = new Intent(Intent.ACTION_PICK);
pickVideoIntent.setDataAndType(MediaStore.Video.Media.INTERNAL_CONTENT_URI, "video/*");
videoGetter.launch(pickVideoIntent);

Example Apps

Please first see general instructions for Android, iOS and desktop on how to build MediaPipe examples.

Note: To visualize a graph, copy the graph and paste it into MediaPipe Visualizer. For more information on how to visualize its associated subgraphs, please see visualizer documentation.

Mobile

GPU Pipeline

CPU Pipeline

This is very similar to the GPU pipeline except that at the beginning and the end of the pipeline it performs GPU-to-CPU and CPU-to-GPU image transfer respectively. As a result, the rest of graph, which shares the same configuration as the GPU pipeline, runs entirely on CPU.

Desktop

Coral

Please refer to these instructions to cross-compile and run MediaPipe examples on the Coral Dev Board.

Resources