Merge branch 'google:master' into face-landmarker-python

2023-03-14 11:27:52 +05:30 · 2023-03-14 11:27:52 +05:30 · 4a7489cd3a
commit 4a7489cd3a
parent efae2830f1 46ba1d8051
56 changed files with 969 additions and 285 deletions
--- a/docs/solutions/object_detection.md
+++ b/docs/solutions/object_detection.md
@ -118,9 +118,9 @@ on how to build MediaPipe examples.
 *   With a TensorFlow Model

    This uses the
-    [TensorFlow model](https://github.com/google/mediapipe/tree/master/mediapipe/models/object_detection_saved_model)
+    [TensorFlow model](https://github.com/google/mediapipe/tree/v0.8.10/mediapipe/models/object_detection_saved_model)
    ( see also
-    [model info](https://github.com/google/mediapipe/tree/master/mediapipe/models/object_detection_saved_model/README.md)),
+    [model info](https://github.com/google/mediapipe/tree/master/mediapipe/g3doc/solutions/object_detection_saved_model.md)),
    and the pipeline is implemented in this
    [graph](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/object_detection/object_detection_mobile_cpu.pbtxt).

--- a/docs/solutions/object_detection_saved_model.md
+++ b/docs/solutions/object_detection_saved_model.md
@ -0,0 +1,62 @@
+## TensorFlow/TFLite Object Detection Model
+
+### TensorFlow model
+
+The model is trained on [MSCOCO 2014](http://cocodataset.org) dataset using [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). It is a MobileNetV2-based SSD model with 0.5 depth multiplier. Detailed training configuration is in the provided `pipeline.config`. The model is a relatively compact model which has `0.171 mAP` to achieve real-time performance on mobile devices. You can compare it with other models from the [TensorFlow detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1_detection_zoo.md).
+
+
+### TFLite model
+
+The TFLite model is converted from the TensorFlow above. The steps needed to convert the model are similar to [this tutorial](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193) with minor modifications. Assuming now we have a trained TensorFlow model which includes the checkpoint files and the training configuration file, for example the files provided in this repo:
+
+   * `model.ckpt.index`
+   * `model.ckpt.meta`
+   * `model.ckpt.data-00000-of-00001`
+   * `pipeline.config`
+
+Make sure you have installed these [python libraries](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf1.md). Then to get the frozen graph, run the `export_tflite_ssd_graph.py` script from the `models/research` directory with this command:
+
+```bash
+$ PATH_TO_MODEL=path/to/the/model
+$ bazel run object_detection:export_tflite_ssd_graph -- \
+    --pipeline_config_path ${PATH_TO_MODEL}/pipeline.config \
+    --trained_checkpoint_prefix ${PATH_TO_MODEL}/model.ckpt \
+    --output_directory ${PATH_TO_MODEL} \
+    --add_postprocessing_op=False
+```
+
+The exported model contains two files:
+
+   * `tflite_graph.pb`
+   * `tflite_graph.pbtxt`
+
+The difference between this step and the one in [the tutorial](https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193) is that we set `add_postprocessing_op` to False. In MediaPipe, we have provided all the calculators needed for post-processing such that we can exclude the custom TFLite ops for post-processing in the original graph, e.g., non-maximum suppression. This enables the flexibility to integrate with different post-processing algorithms and implementations.
+
+Optional: You can install and use the [graph tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms) to inspect the input/output of the exported model:
+
+```bash
+$ bazel run graph_transforms:summarize_graph -- \
+    --in_graph=${PATH_TO_MODEL}/tflite_graph.pb
+```
+
+You should be able to see the input image size of the model is 320x320 and the outputs of the model are:
+
+   * `raw_outputs/box_encodings`
+   * `raw_outputs/class_predictions`
+
+The last step is to convert the model to TFLite. You can look at [this guide](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/g3doc/r1/convert/cmdline_examples.md) for more detail. For this example, you just need to run:
+
+```bash
+$ tflite_convert --  \
+  --graph_def_file=${PATH_TO_MODEL}/tflite_graph.pb \
+  --output_file=${PATH_TO_MODEL}/model.tflite \
+  --input_format=TENSORFLOW_GRAPHDEF \
+  --output_format=TFLITE \
+  --inference_type=FLOAT \
+  --input_shapes=1,320,320,3 \
+  --input_arrays=normalized_input_image_tensor \
+  --output_arrays=raw_outputs/box_encodings,raw_outputs/class_predictions
+
+```
+
+Now you have the TFLite model `model.tflite` ready to use with MediaPipe Object Detection graphs. Please see the examples for more detail.
--- a/docs/solutions/pose.md
+++ b/docs/solutions/pose.md
@ -269,6 +269,7 @@ Supported configuration options:
 ```python
 import cv2
 import mediapipe as mp
+import numpy as np
 mp_drawing = mp.solutions.drawing_utils
 mp_drawing_styles = mp.solutions.drawing_styles
 mp_pose = mp.solutions.pose
--- a/mediapipe/calculators/image/BUILD
+++ b/mediapipe/calculators/image/BUILD
@ -748,6 +748,7 @@ cc_test(
        "//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_keep_aspect_with_rotation_border_zero.png",
        "//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_with_rotation.png",
        "//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_with_rotation_border_zero.png",
+        "//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_with_rotation_border_zero_interp_cubic.png",
        "//mediapipe/calculators/tensor:testdata/image_to_tensor/noop_except_range.png",
    ],
    tags = ["desktop_only_test"],
--- a/mediapipe/calculators/image/affine_transformation.h
+++ b/mediapipe/calculators/image/affine_transformation.h
@ -29,6 +29,9 @@ class AffineTransformation {
  // pixels will be calculated.
  enum class BorderMode { kZero, kReplicate };

+  // Pixel sampling interpolation method.
+  enum class Interpolation { kLinear, kCubic };
+
  struct Size {
    int width;
    int height;
--- a/mediapipe/calculators/image/affine_transformation_runner_gl.cc
+++ b/mediapipe/calculators/image/affine_transformation_runner_gl.cc
@ -77,8 +77,11 @@ class GlTextureWarpAffineRunner
                                          std::unique_ptr<GpuBuffer>> {
 public:
  GlTextureWarpAffineRunner(std::shared_ptr<GlCalculatorHelper> gl_helper,
-                            GpuOrigin::Mode gpu_origin)
-      : gl_helper_(gl_helper), gpu_origin_(gpu_origin) {}
+                            GpuOrigin::Mode gpu_origin,
+                            AffineTransformation::Interpolation interpolation)
+      : gl_helper_(gl_helper),
+        gpu_origin_(gpu_origin),
+        interpolation_(interpolation) {}
  absl::Status Init() {
    return gl_helper_->RunInGlContext([this]() -> absl::Status {
      const GLint attr_location[kNumAttributes] = {
@ -103,28 +106,83 @@ class GlTextureWarpAffineRunner
            }
          )";

+      // TODO Move bicubic code to common shared place.
      constexpr GLchar kFragShader[] = R"(
-            DEFAULT_PRECISION(highp, float)
-            in vec2 sample_coordinate;
-            uniform sampler2D input_texture;
+        DEFAULT_PRECISION(highp, float)

-          #ifdef GL_ES
-            #define fragColor gl_FragColor
-          #else
-            out vec4 fragColor;
-          #endif  // defined(GL_ES);
+        in vec2 sample_coordinate;
+        uniform sampler2D input_texture;
+        uniform vec2 input_size;

-            void main() {
-              vec4 color = texture2D(input_texture, sample_coordinate);
-          #ifdef CUSTOM_ZERO_BORDER_MODE
-              float out_of_bounds =
-                  float(sample_coordinate.x < 0.0 || sample_coordinate.x > 1.0 ||
-                        sample_coordinate.y < 0.0 || sample_coordinate.y > 1.0);
-              color = mix(color, vec4(0.0, 0.0, 0.0, 0.0), out_of_bounds);
-          #endif  // defined(CUSTOM_ZERO_BORDER_MODE)
-              fragColor = color;
-            }
-          )";
+      #ifdef GL_ES
+        #define fragColor gl_FragColor
+      #else
+        out vec4 fragColor;
+      #endif  // defined(GL_ES);
+
+      #ifdef CUBIC_INTERPOLATION
+        vec4 sample(sampler2D tex, vec2 tex_coord, vec2 tex_size) {
+          const vec2 halve = vec2(0.5,0.5);
+          const vec2 one = vec2(1.0,1.0);
+          const vec2 two = vec2(2.0,2.0);
+          const vec2 three = vec2(3.0,3.0);
+          const vec2 six = vec2(6.0,6.0);
+
+          // Calculate the fraction and integer.
+          tex_coord = tex_coord * tex_size - halve;
+          vec2 frac = fract(tex_coord);
+          vec2 index = tex_coord - frac + halve;
+
+          // Calculate weights for Catmull-Rom filter.
+          vec2 w0 = frac * (-halve + frac * (one - halve * frac));
+          vec2 w1 = one + frac * frac * (-(two+halve) + three/two * frac);
+          vec2 w2 = frac * (halve + frac * (two - three/two * frac));
+          vec2 w3 = frac * frac * (-halve + halve * frac);
+
+          // Calculate weights to take advantage of bilinear texture lookup.
+          vec2 w12 = w1 + w2;
+          vec2 offset12 = w2 / (w1 + w2);
+
+          vec2 index_tl = index - one;
+          vec2 index_br = index + two;
+          vec2 index_eq = index + offset12;
+
+          index_tl /= tex_size;
+          index_br /= tex_size;
+          index_eq /= tex_size;
+
+          // 9 texture lookup and linear blending.
+          vec4 color = vec4(0.0);
+          color += texture2D(tex, vec2(index_tl.x, index_tl.y)) * w0.x * w0.y;
+          color += texture2D(tex, vec2(index_eq.x, index_tl.y)) * w12.x *w0.y;
+          color += texture2D(tex, vec2(index_br.x, index_tl.y)) * w3.x * w0.y;
+
+          color += texture2D(tex, vec2(index_tl.x, index_eq.y)) * w0.x * w12.y;
+          color += texture2D(tex, vec2(index_eq.x, index_eq.y)) * w12.x *w12.y;
+          color += texture2D(tex, vec2(index_br.x, index_eq.y)) * w3.x * w12.y;
+
+          color += texture2D(tex, vec2(index_tl.x, index_br.y)) * w0.x * w3.y;
+          color += texture2D(tex, vec2(index_eq.x, index_br.y)) * w12.x *w3.y;
+          color += texture2D(tex, vec2(index_br.x, index_br.y)) * w3.x * w3.y;
+          return color;
+        }
+      #else
+        vec4 sample(sampler2D tex, vec2 tex_coord, vec2 tex_size) {
+          return texture2D(tex, tex_coord);
+        }
+      #endif  // defined(CUBIC_INTERPOLATION)
+
+        void main() {
+          vec4 color = sample(input_texture, sample_coordinate, input_size);
+      #ifdef CUSTOM_ZERO_BORDER_MODE
+          float out_of_bounds =
+              float(sample_coordinate.x < 0.0 || sample_coordinate.x > 1.0 ||
+                    sample_coordinate.y < 0.0 || sample_coordinate.y > 1.0);
+          color = mix(color, vec4(0.0, 0.0, 0.0, 0.0), out_of_bounds);
+      #endif  // defined(CUSTOM_ZERO_BORDER_MODE)
+          fragColor = color;
+        }
+      )";

      // Create program and set parameters.
      auto create_fn = [&](const std::string& vs,
@ -137,14 +195,28 @@ class GlTextureWarpAffineRunner
        glUseProgram(program);
        glUniform1i(glGetUniformLocation(program, "input_texture"), 1);
        GLint matrix_id = glGetUniformLocation(program, "transform_matrix");
-        return Program{.id = program, .matrix_id = matrix_id};
+        GLint size_id = glGetUniformLocation(program, "input_size");
+        return Program{
+            .id = program, .matrix_id = matrix_id, .size_id = size_id};
      };

      const std::string vert_src =
          absl::StrCat(mediapipe::kMediaPipeVertexShaderPreamble, kVertShader);

-      const std::string frag_src = absl::StrCat(
-          mediapipe::kMediaPipeFragmentShaderPreamble, kFragShader);
+      std::string interpolation_def;
+      switch (interpolation_) {
+        case AffineTransformation::Interpolation::kCubic:
+          interpolation_def = R"(
+            #define CUBIC_INTERPOLATION
+          )";
+          break;
+        case AffineTransformation::Interpolation::kLinear:
+          break;
+      }
+
+      const std::string frag_src =
+          absl::StrCat(mediapipe::kMediaPipeFragmentShaderPreamble,
+                       interpolation_def, kFragShader);

      ASSIGN_OR_RETURN(program_, create_fn(vert_src, frag_src));

@ -152,9 +224,9 @@ class GlTextureWarpAffineRunner
        std::string custom_zero_border_mode_def = R"(
          #define CUSTOM_ZERO_BORDER_MODE
        )";
-        const std::string frag_custom_zero_src =
-            absl::StrCat(mediapipe::kMediaPipeFragmentShaderPreamble,
-                         custom_zero_border_mode_def, kFragShader);
+        const std::string frag_custom_zero_src = absl::StrCat(
+            mediapipe::kMediaPipeFragmentShaderPreamble,
+            custom_zero_border_mode_def, interpolation_def, kFragShader);
        return create_fn(vert_src, frag_custom_zero_src);
      };
 #if GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
@ -256,6 +328,7 @@ class GlTextureWarpAffineRunner
    }
    glUseProgram(program->id);

+    // uniforms
    Eigen::Matrix<float, 4, 4, Eigen::RowMajor> eigen_mat(matrix.data());
    if (IsMatrixVerticalFlipNeeded(gpu_origin_)) {
      // @matrix describes affine transformation in terms of TOP LEFT origin, so
@ -275,6 +348,10 @@ class GlTextureWarpAffineRunner
    eigen_mat.transposeInPlace();
    glUniformMatrix4fv(program->matrix_id, 1, GL_FALSE, eigen_mat.data());

+    if (interpolation_ == AffineTransformation::Interpolation::kCubic) {
+      glUniform2f(program->size_id, texture.width(), texture.height());
+    }
+
    // vao
    glBindVertexArray(vao_);

@ -327,6 +404,7 @@ class GlTextureWarpAffineRunner
  struct Program {
    GLuint id;
    GLint matrix_id;
+    GLint size_id;
  };
  std::shared_ptr<GlCalculatorHelper> gl_helper_;
  GpuOrigin::Mode gpu_origin_;
@ -335,6 +413,8 @@ class GlTextureWarpAffineRunner
  Program program_;
  std::optional<Program> program_custom_zero_;
  GLuint framebuffer_ = 0;
+  AffineTransformation::Interpolation interpolation_ =
+      AffineTransformation::Interpolation::kLinear;
 };

 #undef GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
@ -344,9 +424,10 @@ class GlTextureWarpAffineRunner
 absl::StatusOr<std::unique_ptr<
    AffineTransformation::Runner<GpuBuffer, std::unique_ptr<GpuBuffer>>>>
 CreateAffineTransformationGlRunner(
-    std::shared_ptr<GlCalculatorHelper> gl_helper, GpuOrigin::Mode gpu_origin) {
-  auto runner =
-      absl::make_unique<GlTextureWarpAffineRunner>(gl_helper, gpu_origin);
+    std::shared_ptr<GlCalculatorHelper> gl_helper, GpuOrigin::Mode gpu_origin,
+    AffineTransformation::Interpolation interpolation) {
+  auto runner = absl::make_unique<GlTextureWarpAffineRunner>(
+      gl_helper, gpu_origin, interpolation);
  MP_RETURN_IF_ERROR(runner->Init());
  return runner;
 }
--- a/mediapipe/calculators/image/affine_transformation_runner_gl.h
+++ b/mediapipe/calculators/image/affine_transformation_runner_gl.h
@ -29,7 +29,8 @@ absl::StatusOr<std::unique_ptr<AffineTransformation::Runner<
    mediapipe::GpuBuffer, std::unique_ptr<mediapipe::GpuBuffer>>>>
 CreateAffineTransformationGlRunner(
    std::shared_ptr<mediapipe::GlCalculatorHelper> gl_helper,
-    mediapipe::GpuOrigin::Mode gpu_origin);
+    mediapipe::GpuOrigin::Mode gpu_origin,
+    AffineTransformation::Interpolation interpolation);

 }  // namespace mediapipe

--- a/mediapipe/calculators/image/affine_transformation_runner_opencv.cc
+++ b/mediapipe/calculators/image/affine_transformation_runner_opencv.cc
@ -39,9 +39,22 @@ cv::BorderTypes GetBorderModeForOpenCv(
  }
 }

+int GetInterpolationForOpenCv(
+    AffineTransformation::Interpolation interpolation) {
+  switch (interpolation) {
+    case AffineTransformation::Interpolation::kLinear:
+      return cv::INTER_LINEAR;
+    case AffineTransformation::Interpolation::kCubic:
+      return cv::INTER_CUBIC;
+  }
+}
+
 class OpenCvRunner
    : public AffineTransformation::Runner<ImageFrame, ImageFrame> {
 public:
+  OpenCvRunner(AffineTransformation::Interpolation interpolation)
+      : interpolation_(GetInterpolationForOpenCv(interpolation)) {}
+
  absl::StatusOr<ImageFrame> Run(
      const ImageFrame& input, const std::array<float, 16>& matrix,
      const AffineTransformation::Size& size,
@ -142,19 +155,23 @@ class OpenCvRunner

    cv::warpAffine(in_mat, out_mat, cv_affine_transform,
                   cv::Size(out_mat.cols, out_mat.rows),
-                   /*flags=*/cv::INTER_LINEAR | cv::WARP_INVERSE_MAP,
+                   /*flags=*/interpolation_ | cv::WARP_INVERSE_MAP,
                   GetBorderModeForOpenCv(border_mode));

    return out_image;
  }
+
+ private:
+  int interpolation_ = cv::INTER_LINEAR;
 };

 }  // namespace

 absl::StatusOr<
    std::unique_ptr<AffineTransformation::Runner<ImageFrame, ImageFrame>>>
-CreateAffineTransformationOpenCvRunner() {
-  return absl::make_unique<OpenCvRunner>();
+CreateAffineTransformationOpenCvRunner(
+    AffineTransformation::Interpolation interpolation) {
+  return absl::make_unique<OpenCvRunner>(interpolation);
 }

 }  // namespace mediapipe
--- a/mediapipe/calculators/image/affine_transformation_runner_opencv.h
+++ b/mediapipe/calculators/image/affine_transformation_runner_opencv.h
@ -25,7 +25,8 @@ namespace mediapipe {

 absl::StatusOr<
    std::unique_ptr<AffineTransformation::Runner<ImageFrame, ImageFrame>>>
-CreateAffineTransformationOpenCvRunner();
+CreateAffineTransformationOpenCvRunner(
+    AffineTransformation::Interpolation interpolation);

 }  // namespace mediapipe

--- a/mediapipe/calculators/image/warp_affine_calculator.cc
+++ b/mediapipe/calculators/image/warp_affine_calculator.cc
@ -53,6 +53,17 @@ AffineTransformation::BorderMode GetBorderMode(
  }
 }

+AffineTransformation::Interpolation GetInterpolation(
+    mediapipe::WarpAffineCalculatorOptions::Interpolation interpolation) {
+  switch (interpolation) {
+    case mediapipe::WarpAffineCalculatorOptions::INTER_UNSPECIFIED:
+    case mediapipe::WarpAffineCalculatorOptions::INTER_LINEAR:
+      return AffineTransformation::Interpolation::kLinear;
+    case mediapipe::WarpAffineCalculatorOptions::INTER_CUBIC:
+      return AffineTransformation::Interpolation::kCubic;
+  }
+}
+
 template <typename ImageT>
 class WarpAffineRunnerHolder {};

@ -61,16 +72,22 @@ template <>
 class WarpAffineRunnerHolder<ImageFrame> {
 public:
  using RunnerType = AffineTransformation::Runner<ImageFrame, ImageFrame>;
-  absl::Status Open(CalculatorContext* cc) { return absl::OkStatus(); }
+  absl::Status Open(CalculatorContext* cc) {
+    interpolation_ = GetInterpolation(
+        cc->Options<mediapipe::WarpAffineCalculatorOptions>().interpolation());
+    return absl::OkStatus();
+  }
  absl::StatusOr<RunnerType*> GetRunner() {
    if (!runner_) {
-      ASSIGN_OR_RETURN(runner_, CreateAffineTransformationOpenCvRunner());
+      ASSIGN_OR_RETURN(runner_,
+                       CreateAffineTransformationOpenCvRunner(interpolation_));
    }
    return runner_.get();
  }

 private:
  std::unique_ptr<RunnerType> runner_;
+  AffineTransformation::Interpolation interpolation_;
 };
 #endif  // !MEDIAPIPE_DISABLE_OPENCV

@ -85,12 +102,14 @@ class WarpAffineRunnerHolder<mediapipe::GpuBuffer> {
    gpu_origin_ =
        cc->Options<mediapipe::WarpAffineCalculatorOptions>().gpu_origin();
    gl_helper_ = std::make_shared<mediapipe::GlCalculatorHelper>();
+    interpolation_ = GetInterpolation(
+        cc->Options<mediapipe::WarpAffineCalculatorOptions>().interpolation());
    return gl_helper_->Open(cc);
  }
  absl::StatusOr<RunnerType*> GetRunner() {
    if (!runner_) {
-      ASSIGN_OR_RETURN(
-          runner_, CreateAffineTransformationGlRunner(gl_helper_, gpu_origin_));
+      ASSIGN_OR_RETURN(runner_, CreateAffineTransformationGlRunner(
+                                    gl_helper_, gpu_origin_, interpolation_));
    }
    return runner_.get();
  }
@ -99,6 +118,7 @@ class WarpAffineRunnerHolder<mediapipe::GpuBuffer> {
  mediapipe::GpuOrigin::Mode gpu_origin_;
  std::shared_ptr<mediapipe::GlCalculatorHelper> gl_helper_;
  std::unique_ptr<RunnerType> runner_;
+  AffineTransformation::Interpolation interpolation_;
 };
 #endif  // !MEDIAPIPE_DISABLE_GPU

--- a/mediapipe/calculators/image/warp_affine_calculator.proto
+++ b/mediapipe/calculators/image/warp_affine_calculator.proto
@ -31,6 +31,13 @@ message WarpAffineCalculatorOptions {
    BORDER_REPLICATE = 2;
  }

+  // Pixel sampling interpolation methods. See @interpolation.
+  enum Interpolation {
+    INTER_UNSPECIFIED = 0;
+    INTER_LINEAR = 1;
+    INTER_CUBIC = 2;
+  }
+
  // Pixel extrapolation method.
  // When converting image to tensor it may happen that tensor needs to read
  // pixels outside image boundaries. Border mode helps to specify how such
@ -43,4 +50,10 @@ message WarpAffineCalculatorOptions {
  // to be flipped vertically as tensors are expected to start at top.
  // (DEFAULT or unset interpreted as CONVENTIONAL.)
  optional GpuOrigin.Mode gpu_origin = 2;
+
+  // Sampling method for neighboring pixels.
+  // INTER_LINEAR (bilinear) linearly interpolates from the nearest 4 neighbors.
+  // INTER_CUBIC (bicubic) interpolates a small neighborhood with cubic weights.
+  // INTER_UNSPECIFIED or unset interpreted as INTER_LINEAR.
+  optional Interpolation interpolation = 3;
 }
--- a/mediapipe/calculators/image/warp_affine_calculator_test.cc
+++ b/mediapipe/calculators/image/warp_affine_calculator_test.cc
@ -63,7 +63,8 @@ void RunTest(const std::string& graph_text, const std::string& tag,
             const cv::Mat& input, cv::Mat expected_result,
             float similarity_threshold, std::array<float, 16> matrix,
             int out_width, int out_height,
-             absl::optional<AffineTransformation::BorderMode> border_mode) {
+             std::optional<AffineTransformation::BorderMode> border_mode,
+             std::optional<AffineTransformation::Interpolation> interpolation) {
  std::string border_mode_str;
  if (border_mode) {
    switch (*border_mode) {
@ -75,8 +76,20 @@ void RunTest(const std::string& graph_text, const std::string& tag,
        break;
    }
  }
+  std::string interpolation_str;
+  if (interpolation) {
+    switch (*interpolation) {
+      case AffineTransformation::Interpolation::kLinear:
+        interpolation_str = "interpolation: INTER_LINEAR";
+        break;
+      case AffineTransformation::Interpolation::kCubic:
+        interpolation_str = "interpolation: INTER_CUBIC";
+        break;
+    }
+  }
  auto graph_config = mediapipe::ParseTextProtoOrDie<CalculatorGraphConfig>(
-      absl::Substitute(graph_text, /*$0=*/border_mode_str));
+      absl::Substitute(graph_text, /*$0=*/border_mode_str,
+                       /*$1=*/interpolation_str));

  std::vector<Packet> output_packets;
  tool::AddVectorSink("output_image", &graph_config, &output_packets);
@ -132,7 +145,8 @@ struct SimilarityConfig {
 void RunTest(cv::Mat input, cv::Mat expected_result,
             const SimilarityConfig& similarity, std::array<float, 16> matrix,
             int out_width, int out_height,
-             absl::optional<AffineTransformation::BorderMode> border_mode) {
+             std::optional<AffineTransformation::BorderMode> border_mode,
+             std::optional<AffineTransformation::Interpolation> interpolation) {
  RunTest(R"(
        input_stream: "input_image"
        input_stream: "output_size"
@ -146,12 +160,13 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
          options {
            [mediapipe.WarpAffineCalculatorOptions.ext] {
              $0 # border mode
+              $1 # interpolation
            }
          }
        }
        )",
          "cpu", input, expected_result, similarity.threshold_on_cpu, matrix,
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);

  RunTest(R"(
        input_stream: "input_image"
@ -171,6 +186,7 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
          options {
            [mediapipe.WarpAffineCalculatorOptions.ext] {
              $0 # border mode
+              $1 # interpolation
            }
          }
        }
@ -181,7 +197,7 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
        }
        )",
          "cpu_image", input, expected_result, similarity.threshold_on_cpu,
-          matrix, out_width, out_height, border_mode);
+          matrix, out_width, out_height, border_mode, interpolation);

  RunTest(R"(
        input_stream: "input_image"
@ -201,6 +217,7 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
          options {
            [mediapipe.WarpAffineCalculatorOptions.ext] {
              $0 # border mode
+              $1 # interpolation
              gpu_origin: TOP_LEFT
            }
          }
@ -212,7 +229,7 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
        }
        )",
          "gpu", input, expected_result, similarity.threshold_on_gpu, matrix,
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);

  RunTest(R"(
        input_stream: "input_image"
@ -237,6 +254,7 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
          options {
            [mediapipe.WarpAffineCalculatorOptions.ext] {
              $0 # border mode
+              $1 # interpolation
              gpu_origin: TOP_LEFT
            }
          }
@ -253,7 +271,7 @@ void RunTest(cv::Mat input, cv::Mat expected_result,
        }
        )",
          "gpu_image", input, expected_result, similarity.threshold_on_gpu,
-          matrix, out_width, out_height, border_mode);
+          matrix, out_width, out_height, border_mode, interpolation);
 }

 std::array<float, 16> GetMatrix(cv::Mat input, mediapipe::NormalizedRect roi,
@ -287,10 +305,11 @@ TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspect) {
  int out_height = 256;
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode = {};
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.82},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspectBorderZero) {
@ -312,10 +331,11 @@ TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspectBorderZero) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.81},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspectWithRotation) {
@ -337,10 +357,11 @@ TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspectWithRotation) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kReplicate;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.77},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspectWithRotationBorderZero) {
@ -362,10 +383,11 @@ TEST(WarpAffineCalculatorTest, MediumSubRectKeepAspectWithRotationBorderZero) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.75},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, MediumSubRectWithRotation) {
@ -386,10 +408,11 @@ TEST(WarpAffineCalculatorTest, MediumSubRectWithRotation) {
  bool keep_aspect_ratio = false;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kReplicate;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.81},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, MediumSubRectWithRotationBorderZero) {
@ -411,10 +434,38 @@ TEST(WarpAffineCalculatorTest, MediumSubRectWithRotationBorderZero) {
  bool keep_aspect_ratio = false;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.80},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
+}
+
+TEST(WarpAffineCalculatorTest, MediumSubRectWithRotationBorderZeroInterpCubic) {
+  mediapipe::NormalizedRect roi;
+  roi.set_x_center(0.65f);
+  roi.set_y_center(0.4f);
+  roi.set_width(0.5f);
+  roi.set_height(0.5f);
+  roi.set_rotation(M_PI * -45.0f / 180.0f);
+  auto input = GetRgb(
+      "/mediapipe/calculators/"
+      "tensor/testdata/image_to_tensor/input.jpg");
+  auto expected_output = GetRgb(
+      "/mediapipe/calculators/"
+      "tensor/testdata/image_to_tensor/"
+      "medium_sub_rect_with_rotation_border_zero_interp_cubic.png");
+  int out_width = 256;
+  int out_height = 256;
+  bool keep_aspect_ratio = false;
+  std::optional<AffineTransformation::BorderMode> border_mode =
+      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation =
+      AffineTransformation::Interpolation::kCubic;
+  RunTest(input, expected_output,
+          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.78},
+          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, LargeSubRect) {
@ -435,10 +486,11 @@ TEST(WarpAffineCalculatorTest, LargeSubRect) {
  bool keep_aspect_ratio = false;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kReplicate;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.95},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, LargeSubRectBorderZero) {
@ -459,10 +511,11 @@ TEST(WarpAffineCalculatorTest, LargeSubRectBorderZero) {
  bool keep_aspect_ratio = false;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.92},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspect) {
@ -483,10 +536,11 @@ TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspect) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kReplicate;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.97},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspectBorderZero) {
@ -508,10 +562,11 @@ TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspectBorderZero) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.97},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspectWithRotation) {
@ -532,10 +587,11 @@ TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspectWithRotation) {
  int out_height = 128;
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode = {};
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.91},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspectWithRotationBorderZero) {
@ -557,10 +613,11 @@ TEST(WarpAffineCalculatorTest, LargeSubRectKeepAspectWithRotationBorderZero) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.88},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, NoOp) {
@ -581,10 +638,11 @@ TEST(WarpAffineCalculatorTest, NoOp) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kReplicate;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.99},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 TEST(WarpAffineCalculatorTest, NoOpBorderZero) {
@ -605,10 +663,11 @@ TEST(WarpAffineCalculatorTest, NoOpBorderZero) {
  bool keep_aspect_ratio = true;
  std::optional<AffineTransformation::BorderMode> border_mode =
      AffineTransformation::BorderMode::kZero;
+  std::optional<AffineTransformation::Interpolation> interpolation = {};
  RunTest(input, expected_output,
          {.threshold_on_cpu = 0.99, .threshold_on_gpu = 0.99},
          GetMatrix(input, roi, keep_aspect_ratio, out_width, out_height),
-          out_width, out_height, border_mode);
+          out_width, out_height, border_mode, interpolation);
 }

 }  // namespace
--- a/mediapipe/calculators/tensor/BUILD
+++ b/mediapipe/calculators/tensor/BUILD
@ -997,17 +997,20 @@ cc_library(
            ":image_to_tensor_converter_gl_buffer",
            "//mediapipe/gpu:gl_calculator_helper",
            "//mediapipe/gpu:gpu_buffer",
+            "//mediapipe/gpu:gpu_service",
        ],
        "//mediapipe:apple": [
            ":image_to_tensor_converter_metal",
            "//mediapipe/gpu:gl_calculator_helper",
            "//mediapipe/gpu:MPPMetalHelper",
            "//mediapipe/gpu:gpu_buffer",
+            "//mediapipe/gpu:gpu_service",
        ],
        "//conditions:default": [
            ":image_to_tensor_converter_gl_buffer",
            "//mediapipe/gpu:gl_calculator_helper",
            "//mediapipe/gpu:gpu_buffer",
+            "//mediapipe/gpu:gpu_service",
        ],
    }),
 )
@ -1045,6 +1048,10 @@ cc_test(
        ":image_to_tensor_calculator",
        ":image_to_tensor_converter",
        ":image_to_tensor_utils",
+        "@com_google_absl//absl/flags:flag",
+        "@com_google_absl//absl/memory",
+        "@com_google_absl//absl/strings",
+        "@com_google_absl//absl/strings:str_format",
        "//mediapipe/framework:calculator_framework",
        "//mediapipe/framework:calculator_runner",
        "//mediapipe/framework/deps:file_path",
@ -1061,11 +1068,10 @@ cc_test(
        "//mediapipe/framework/port:opencv_imgproc",
        "//mediapipe/framework/port:parse_text_proto",
        "//mediapipe/util:image_test_utils",
-        "@com_google_absl//absl/flags:flag",
-        "@com_google_absl//absl/memory",
-        "@com_google_absl//absl/strings",
-        "@com_google_absl//absl/strings:str_format",
-    ],
+    ] + select({
+        "//mediapipe:apple": [],
+        "//conditions:default": ["//mediapipe/gpu:gl_context"],
+    }),
 )

 cc_library(
--- a/mediapipe/calculators/tensor/image_to_tensor_calculator.cc
+++ b/mediapipe/calculators/tensor/image_to_tensor_calculator.cc
@ -45,9 +45,11 @@
 #elif MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
 #include "mediapipe/calculators/tensor/image_to_tensor_converter_gl_buffer.h"
 #include "mediapipe/gpu/gl_calculator_helper.h"
+#include "mediapipe/gpu/gpu_service.h"
 #else
 #include "mediapipe/calculators/tensor/image_to_tensor_converter_gl_texture.h"
 #include "mediapipe/gpu/gl_calculator_helper.h"
+#include "mediapipe/gpu/gpu_service.h"
 #endif  // MEDIAPIPE_METAL_ENABLED
 #endif  // !MEDIAPIPE_DISABLE_GPU

@ -147,7 +149,7 @@ class ImageToTensorCalculator : public Node {
 #if MEDIAPIPE_METAL_ENABLED
    MP_RETURN_IF_ERROR([MPPMetalHelper updateContract:cc]);
 #else
-    MP_RETURN_IF_ERROR(mediapipe::GlCalculatorHelper::UpdateContract(cc));
+    cc->UseService(kGpuService).Optional();
 #endif  // MEDIAPIPE_METAL_ENABLED
 #endif  // MEDIAPIPE_DISABLE_GPU

--- a/mediapipe/calculators/tensor/image_to_tensor_calculator_test.cc
+++ b/mediapipe/calculators/tensor/image_to_tensor_calculator_test.cc
@ -41,6 +41,10 @@
 #include "mediapipe/framework/port/status_matchers.h"
 #include "mediapipe/util/image_test_utils.h"

+#if !MEDIAPIPE_DISABLE_GPU && !MEDIAPIPE_METAL_ENABLED
+#include "mediapipe/gpu/gl_context.h"
+#endif  // !MEDIAPIPE_DISABLE_GPU && !MEDIAPIPE_METAL_ENABLED
+
 namespace mediapipe {
 namespace {

@ -507,5 +511,79 @@ TEST(ImageToTensorCalculatorTest, NoOpExceptRangeAndUseInputImageDims) {
          /*tensor_width=*/std::nullopt, /*tensor_height=*/std::nullopt,
          /*keep_aspect=*/false, BorderMode::kZero, roi);
 }
+
+TEST(ImageToTensorCalculatorTest, CanBeUsedWithoutGpuServiceSet) {
+  auto graph_config =
+      mediapipe::ParseTextProtoOrDie<CalculatorGraphConfig>(R"pb(
+        input_stream: "input_image"
+        node {
+          calculator: "ImageToTensorCalculator"
+          input_stream: "IMAGE:input_image"
+          output_stream: "TENSORS:tensor"
+          options {
+            [mediapipe.ImageToTensorCalculatorOptions.ext] {
+              output_tensor_float_range { min: 0.0f max: 1.0f }
+            }
+          }
+        }
+      )pb");
+  CalculatorGraph graph;
+  MP_ASSERT_OK(graph.Initialize(graph_config));
+  MP_ASSERT_OK(graph.DisallowServiceDefaultInitialization());
+  MP_ASSERT_OK(graph.StartRun({}));
+  auto image_frame =
+      std::make_shared<ImageFrame>(ImageFormat::SRGBA, 128, 256, 4);
+  Image image = Image(std::move(image_frame));
+  Packet packet = MakePacket<Image>(std::move(image));
+  MP_ASSERT_OK(
+      graph.AddPacketToInputStream("input_image", packet.At(Timestamp(1))));
+  MP_ASSERT_OK(graph.WaitUntilIdle());
+  MP_ASSERT_OK(graph.CloseAllPacketSources());
+  MP_ASSERT_OK(graph.WaitUntilDone());
+}
+
+#if !MEDIAPIPE_DISABLE_GPU && !MEDIAPIPE_METAL_ENABLED
+
+TEST(ImageToTensorCalculatorTest,
+     FailsGracefullyWhenGpuServiceNeededButNotAvailable) {
+  auto graph_config =
+      mediapipe::ParseTextProtoOrDie<CalculatorGraphConfig>(R"pb(
+        input_stream: "input_image"
+        node {
+          calculator: "ImageToTensorCalculator"
+          input_stream: "IMAGE:input_image"
+          output_stream: "TENSORS:tensor"
+          options {
+            [mediapipe.ImageToTensorCalculatorOptions.ext] {
+              output_tensor_float_range { min: 0.0f max: 1.0f }
+            }
+          }
+        }
+      )pb");
+  CalculatorGraph graph;
+  MP_ASSERT_OK(graph.Initialize(graph_config));
+  MP_ASSERT_OK(graph.DisallowServiceDefaultInitialization());
+  MP_ASSERT_OK(graph.StartRun({}));
+
+  MP_ASSERT_OK_AND_ASSIGN(auto context,
+                          GlContext::Create(nullptr, /*create_thread=*/true));
+  Packet packet;
+  context->Run([&packet]() {
+    auto image_frame =
+        std::make_shared<ImageFrame>(ImageFormat::SRGBA, 128, 256, 4);
+    Image image = Image(std::move(image_frame));
+    // Ensure image is available on GPU to force ImageToTensorCalculator to
+    // run on GPU.
+    ASSERT_TRUE(image.ConvertToGpu());
+    packet = MakePacket<Image>(std::move(image));
+  });
+  MP_ASSERT_OK(
+      graph.AddPacketToInputStream("input_image", packet.At(Timestamp(1))));
+  EXPECT_THAT(graph.WaitUntilIdle(),
+              StatusIs(absl::StatusCode::kInternal,
+                       HasSubstr("GPU service not available")));
+}
+#endif  // !MEDIAPIPE_DISABLE_GPU && !MEDIAPIPE_METAL_ENABLED
+
 }  // namespace
 }  // namespace mediapipe
--- a/mediapipe/calculators/tensor/testdata/image_to_tensor/medium_sub_rect_with_rotation_border_zero_interp_cubic.png
+++ b/mediapipe/calculators/tensor/testdata/image_to_tensor/medium_sub_rect_with_rotation_border_zero_interp_cubic.png
--- a/mediapipe/examples/desktop/autoflip/quality/padding_effect_generator_test.cc
+++ b/mediapipe/examples/desktop/autoflip/quality/padding_effect_generator_test.cc
@ -138,7 +138,23 @@ void TestWithAspectRatio(const double aspect_ratio,
    std::string result_image;
    MP_ASSERT_OK(
        mediapipe::file::GetContents(result_string_path, &result_image));
-    EXPECT_EQ(result_image, output_string);
+    if (result_image != output_string) {
+      // There may be slight differences due to the way the JPEG was encoded or
+      // the OpenCV version used to generate the reference files. Compare
+      // pixel-by-pixel using the Peak Signal-to-Noise Ratio instead.
+      cv::Mat result_mat =
+          cv::imdecode(cv::Mat(1, result_image.size(), CV_8UC1,
+                               const_cast<char*>(result_image.data())),
+                       cv::IMREAD_UNCHANGED);
+      cv::Mat output_mat =
+          cv::imdecode(cv::Mat(1, output_string.size(), CV_8UC1,
+                               const_cast<char*>(output_string.data())),
+                       cv::IMREAD_UNCHANGED);
+      ASSERT_EQ(result_mat.rows, output_mat.rows);
+      ASSERT_EQ(result_mat.cols, output_mat.cols);
+      ASSERT_EQ(result_mat.type(), output_mat.type());
+      EXPECT_GT(cv::PSNR(result_mat, output_mat), 45.0);
+    }
  } else {
    std::string output_string_path = mediapipe::file::JoinPath(
        absl::GetFlag(FLAGS_output_folder),
--- a/mediapipe/examples/desktop/autoflip/quality/testdata/result_0.3.jpg
+++ b/mediapipe/examples/desktop/autoflip/quality/testdata/result_0.3.jpg
--- a/mediapipe/examples/desktop/autoflip/quality/testdata/result_0.6.jpg
+++ b/mediapipe/examples/desktop/autoflip/quality/testdata/result_0.6.jpg
--- a/mediapipe/examples/desktop/autoflip/quality/testdata/result_1.jpg
+++ b/mediapipe/examples/desktop/autoflip/quality/testdata/result_1.jpg
--- a/mediapipe/examples/desktop/autoflip/quality/testdata/result_3.4.jpg
+++ b/mediapipe/examples/desktop/autoflip/quality/testdata/result_3.4.jpg
--- a/mediapipe/tasks/cc/vision/face_geometry/BUILD
+++ b/mediapipe/tasks/cc/vision/face_geometry/BUILD
@ -19,9 +19,6 @@ package(default_visibility = ["//mediapipe/tasks:internal"])
 cc_library(
    name = "face_geometry_from_landmarks_graph",
    srcs = ["face_geometry_from_landmarks_graph.cc"],
-    data = [
-        "//mediapipe/tasks/cc/vision/face_geometry/data:geometry_pipeline_metadata_landmarks",
-    ],
    deps = [
        "//mediapipe/calculators/core:begin_loop_calculator",
        "//mediapipe/calculators/core:end_loop_calculator",
@ -39,6 +36,7 @@ cc_library(
        "//mediapipe/tasks/cc/vision/face_geometry/calculators:geometry_pipeline_calculator_cc_proto",
        "//mediapipe/tasks/cc/vision/face_geometry/proto:environment_cc_proto",
        "//mediapipe/tasks/cc/vision/face_geometry/proto:face_geometry_cc_proto",
+        "//mediapipe/tasks/cc/vision/face_geometry/proto:face_geometry_graph_options_cc_proto",
        "//mediapipe/util:graph_builder_utils",
        "@com_google_absl//absl/status:statusor",
    ],
--- a/mediapipe/tasks/cc/vision/face_geometry/calculators/BUILD
+++ b/mediapipe/tasks/cc/vision/face_geometry/calculators/BUILD
@ -45,6 +45,7 @@ mediapipe_proto_library(
    srcs = ["geometry_pipeline_calculator.proto"],
    deps = [
        "//mediapipe/framework:calculator_options_proto",
+        "//mediapipe/tasks/cc/core/proto:external_file_proto",
    ],
 )

@ -59,6 +60,8 @@ cc_library(
        "//mediapipe/framework/port:ret_check",
        "//mediapipe/framework/port:status",
        "//mediapipe/framework/port:statusor",
+        "//mediapipe/tasks/cc/core:external_file_handler",
+        "//mediapipe/tasks/cc/core/proto:external_file_cc_proto",
        "//mediapipe/tasks/cc/vision/face_geometry/libs:geometry_pipeline",
        "//mediapipe/tasks/cc/vision/face_geometry/libs:validation_utils",
        "//mediapipe/tasks/cc/vision/face_geometry/proto:environment_cc_proto",
--- a/mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.cc
+++ b/mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.cc
@ -24,6 +24,8 @@
 #include "mediapipe/framework/port/status.h"
 #include "mediapipe/framework/port/status_macros.h"
 #include "mediapipe/framework/port/statusor.h"
+#include "mediapipe/tasks/cc/core/external_file_handler.h"
+#include "mediapipe/tasks/cc/core/proto/external_file.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/libs/geometry_pipeline.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/libs/validation_utils.h"
@ -69,8 +71,8 @@ using ::mediapipe::tasks::vision::face_geometry::proto::
 //     A vector of face geometry data.
 //
 // Options:
-//   metadata_path (`string`, optional):
-//     Defines a path for the geometry pipeline metadata file.
+//   metadata_file (`ExternalFile`, optional):
+//     Defines an ExternalFile for the geometry pipeline metadata file.
 //
 //     The geometry pipeline metadata file format must be the binary
 //     `GeometryPipelineMetadata` proto.
@ -95,7 +97,7 @@ class GeometryPipelineCalculator : public CalculatorBase {

    ASSIGN_OR_RETURN(
        GeometryPipelineMetadata metadata,
-        ReadMetadataFromFile(options.metadata_path()),
+        ReadMetadataFromFile(options.metadata_file()),
        _ << "Failed to read the geometry pipeline metadata from file!");

    MP_RETURN_IF_ERROR(ValidateGeometryPipelineMetadata(metadata))
@ -155,32 +157,19 @@ class GeometryPipelineCalculator : public CalculatorBase {

 private:
  static absl::StatusOr<GeometryPipelineMetadata> ReadMetadataFromFile(
-      const std::string& metadata_path) {
-    ASSIGN_OR_RETURN(std::string metadata_blob,
-                     ReadContentBlobFromFile(metadata_path),
-                     _ << "Failed to read a metadata blob from file!");
+      const core::proto::ExternalFile& metadata_file) {
+    ASSIGN_OR_RETURN(
+        const auto file_handler,
+        core::ExternalFileHandler::CreateFromExternalFile(&metadata_file));

    GeometryPipelineMetadata metadata;
-    RET_CHECK(metadata.ParseFromString(metadata_blob))
+    RET_CHECK(
+        metadata.ParseFromString(std::string(file_handler->GetFileContent())))
        << "Failed to parse a metadata proto from a binary blob!";

    return metadata;
  }

-  static absl::StatusOr<std::string> ReadContentBlobFromFile(
-      const std::string& unresolved_path) {
-    ASSIGN_OR_RETURN(std::string resolved_path,
-                     mediapipe::PathToResourceAsFile(unresolved_path),
-                     _ << "Failed to resolve path! Path = " << unresolved_path);
-
-    std::string content_blob;
-    MP_RETURN_IF_ERROR(
-        mediapipe::GetResourceContents(resolved_path, &content_blob))
-        << "Failed to read content blob! Resolved path = " << resolved_path;
-
-    return content_blob;
-  }
-
  std::unique_ptr<GeometryPipeline> geometry_pipeline_;
 };

--- a/mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.proto
+++ b/mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.proto
@ -17,11 +17,12 @@ syntax = "proto2";
 package mediapipe.tasks.vision.face_geometry;

 import "mediapipe/framework/calculator_options.proto";
+import "mediapipe/tasks/cc/core/proto/external_file.proto";

 message FaceGeometryPipelineCalculatorOptions {
  extend mediapipe.CalculatorOptions {
    optional FaceGeometryPipelineCalculatorOptions ext = 512499200;
  }

-  optional string metadata_path = 1;
+  optional core.proto.ExternalFile metadata_file = 1;
 }
--- a/mediapipe/tasks/cc/vision/face_geometry/face_geometry_from_landmarks_graph.cc
+++ b/mediapipe/tasks/cc/vision/face_geometry/face_geometry_from_landmarks_graph.cc
@ -28,6 +28,7 @@ limitations under the License.
 #include "mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/proto/environment.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry.pb.h"
+#include "mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry_graph_options.pb.h"
 #include "mediapipe/util/graph_builder_utils.h"

 namespace mediapipe::tasks::vision::face_geometry {
@ -49,10 +50,6 @@ constexpr char kIterableTag[] = "ITERABLE";
 constexpr char kBatchEndTag[] = "BATCH_END";
 constexpr char kItemTag[] = "ITEM";

-constexpr char kGeometryPipelineMetadataPath[] =
-    "mediapipe/tasks/cc/vision/face_geometry/data/"
-    "geometry_pipeline_metadata_landmarks.binarypb";
-
 struct FaceGeometryOuts {
  Stream<std::vector<FaceGeometry>> multi_face_geometry;
 };
@ -127,6 +124,7 @@ class FaceGeometryFromLandmarksGraph : public Subgraph {
    }
    ASSIGN_OR_RETURN(auto outs,
                     BuildFaceGeometryFromLandmarksGraph(
+                         *sc->MutableOptions<proto::FaceGeometryGraphOptions>(),
                         graph.In(kFaceLandmarksTag)
                             .Cast<std::vector<NormalizedLandmarkList>>(),
                         graph.In(kImageSizeTag).Cast<std::pair<int, int>>(),
@ -138,6 +136,7 @@ class FaceGeometryFromLandmarksGraph : public Subgraph {

 private:
  absl::StatusOr<FaceGeometryOuts> BuildFaceGeometryFromLandmarksGraph(
+      proto::FaceGeometryGraphOptions& graph_options,
      Stream<std::vector<NormalizedLandmarkList>> multi_face_landmarks,
      Stream<std::pair<int, int>> image_size,
      std::optional<SidePacket<Environment>> environment, Graph& graph) {
@ -185,7 +184,8 @@ class FaceGeometryFromLandmarksGraph : public Subgraph {
        "mediapipe.tasks.vision.face_geometry.FaceGeometryPipelineCalculator");
    auto& geometry_pipeline_options =
        geometry_pipeline.GetOptions<FaceGeometryPipelineCalculatorOptions>();
-    geometry_pipeline_options.set_metadata_path(kGeometryPipelineMetadataPath);
+    geometry_pipeline_options.Swap(
+        graph_options.mutable_geometry_pipeline_options());
    image_size >> geometry_pipeline.In(kImageSizeTag);
    multi_face_landmarks_no_iris >>
        geometry_pipeline.In(kMultiFaceLandmarksTag);
--- a/mediapipe/tasks/cc/vision/face_geometry/face_geometry_from_landmarks_graph_test.cc
+++ b/mediapipe/tasks/cc/vision/face_geometry/face_geometry_from_landmarks_graph_test.cc
@ -20,6 +20,7 @@ limitations under the License.
 #include "absl/status/statusor.h"
 #include "absl/strings/str_format.h"
 #include "absl/strings/string_view.h"
+#include "absl/strings/substitute.h"
 #include "mediapipe/framework/api2/port.h"
 #include "mediapipe/framework/calculator_framework.h"
 #include "mediapipe/framework/calculator_runner.h"
@ -31,6 +32,7 @@ limitations under the License.
 #include "mediapipe/framework/port/gtest.h"
 #include "mediapipe/framework/port/parse_text_proto.h"
 #include "mediapipe/framework/tool/sink.h"
+#include "mediapipe/tasks/cc/core/proto/external_file.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/proto/environment.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry.pb.h"

@ -49,6 +51,9 @@ constexpr char kTestDataDirectory[] = "/mediapipe/tasks/testdata/vision/";
 constexpr char kFaceLandmarksFileName[] =
    "face_blendshapes_in_landmarks.prototxt";
 constexpr char kFaceGeometryFileName[] = "face_geometry_expected_out.pbtxt";
+constexpr char kGeometryPipelineMetadataPath[] =
+    "mediapipe/tasks/cc/vision/face_geometry/data/"
+    "geometry_pipeline_metadata_landmarks.binarypb";

 std::vector<NormalizedLandmarkList> GetLandmarks(absl::string_view filename) {
  NormalizedLandmarkList landmarks;
@ -89,17 +94,25 @@ void MakeInputPacketsAndRunGraph(CalculatorGraph& graph) {

 TEST(FaceGeometryFromLandmarksGraphTest, DefaultEnvironment) {
  CalculatorGraphConfig graph_config = ParseTextProtoOrDie<
-      CalculatorGraphConfig>(R"pb(
-    input_stream: "FACE_LANDMARKS:face_landmarks"
-    input_stream: "IMAGE_SIZE:image_size"
-    output_stream: "FACE_GEOMETRY:face_geometry"
-    node {
-      calculator: "mediapipe.tasks.vision.face_geometry.FaceGeometryFromLandmarksGraph"
-      input_stream: "FACE_LANDMARKS:face_landmarks"
-      input_stream: "IMAGE_SIZE:image_size"
-      output_stream: "FACE_GEOMETRY:face_geometry"
-    }
-  )pb");
+      CalculatorGraphConfig>(absl::Substitute(
+      R"pb(
+        input_stream: "FACE_LANDMARKS:face_landmarks"
+        input_stream: "IMAGE_SIZE:image_size"
+        output_stream: "FACE_GEOMETRY:face_geometry"
+        node {
+          calculator: "mediapipe.tasks.vision.face_geometry.FaceGeometryFromLandmarksGraph"
+          input_stream: "FACE_LANDMARKS:face_landmarks"
+          input_stream: "IMAGE_SIZE:image_size"
+          output_stream: "FACE_GEOMETRY:face_geometry"
+          options: {
+            [mediapipe.tasks.vision.face_geometry.proto.FaceGeometryGraphOptions
+                 .ext] {
+              geometry_pipeline_options { metadata_file { file_name: "$0" } }
+            }
+          }
+        }
+      )pb",
+      kGeometryPipelineMetadataPath));
  std::vector<Packet> output_packets;
  tool::AddVectorSink("face_geometry", &graph_config, &output_packets);

@ -116,19 +129,27 @@ TEST(FaceGeometryFromLandmarksGraphTest, DefaultEnvironment) {

 TEST(FaceGeometryFromLandmarksGraphTest, SideInEnvironment) {
  CalculatorGraphConfig graph_config = ParseTextProtoOrDie<
-      CalculatorGraphConfig>(R"pb(
-    input_stream: "FACE_LANDMARKS:face_landmarks"
-    input_stream: "IMAGE_SIZE:image_size"
-    input_side_packet: "ENVIRONMENT:environment"
-    output_stream: "FACE_GEOMETRY:face_geometry"
-    node {
-      calculator: "mediapipe.tasks.vision.face_geometry.FaceGeometryFromLandmarksGraph"
-      input_stream: "FACE_LANDMARKS:face_landmarks"
-      input_stream: "IMAGE_SIZE:image_size"
-      input_side_packet: "ENVIRONMENT:environment"
-      output_stream: "FACE_GEOMETRY:face_geometry"
-    }
-  )pb");
+      CalculatorGraphConfig>(absl::Substitute(
+      R"pb(
+        input_stream: "FACE_LANDMARKS:face_landmarks"
+        input_stream: "IMAGE_SIZE:image_size"
+        input_side_packet: "ENVIRONMENT:environment"
+        output_stream: "FACE_GEOMETRY:face_geometry"
+        node {
+          calculator: "mediapipe.tasks.vision.face_geometry.FaceGeometryFromLandmarksGraph"
+          input_stream: "FACE_LANDMARKS:face_landmarks"
+          input_stream: "IMAGE_SIZE:image_size"
+          input_side_packet: "ENVIRONMENT:environment"
+          output_stream: "FACE_GEOMETRY:face_geometry"
+          options: {
+            [mediapipe.tasks.vision.face_geometry.proto.FaceGeometryGraphOptions
+                 .ext] {
+              geometry_pipeline_options { metadata_file { file_name: "$0" } }
+            }
+          }
+        }
+      )pb",
+      kGeometryPipelineMetadataPath));
  std::vector<Packet> output_packets;
  tool::AddVectorSink("face_geometry", &graph_config, &output_packets);

--- a/mediapipe/tasks/cc/vision/face_geometry/proto/BUILD
+++ b/mediapipe/tasks/cc/vision/face_geometry/proto/BUILD
@ -44,3 +44,12 @@ mediapipe_proto_library(
    name = "mesh_3d_proto",
    srcs = ["mesh_3d.proto"],
 )
+
+mediapipe_proto_library(
+    name = "face_geometry_graph_options_proto",
+    srcs = ["face_geometry_graph_options.proto"],
+    deps = [
+        "//mediapipe/framework:calculator_options_proto",
+        "//mediapipe/tasks/cc/vision/face_geometry/calculators:geometry_pipeline_calculator_proto",
+    ],
+)
--- a/mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry_graph_options.proto
+++ b/mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry_graph_options.proto
@ -0,0 +1,28 @@
+// Copyright 2023 The MediaPipe Authors.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//      http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+syntax = "proto2";
+
+package mediapipe.tasks.vision.face_geometry.proto;
+
+import "mediapipe/framework/calculator_options.proto";
+import "mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.proto";
+
+message FaceGeometryGraphOptions {
+  extend mediapipe.CalculatorOptions {
+    optional FaceGeometryGraphOptions ext = 515723506;
+  }
+
+  optional FaceGeometryPipelineCalculatorOptions geometry_pipeline_options = 1;
+}
--- a/mediapipe/tasks/cc/vision/face_landmarker/BUILD
+++ b/mediapipe/tasks/cc/vision/face_landmarker/BUILD
@ -210,8 +210,10 @@ cc_library(
        "//mediapipe/tasks/cc/vision/face_detector:face_detector_graph",
        "//mediapipe/tasks/cc/vision/face_detector/proto:face_detector_graph_options_cc_proto",
        "//mediapipe/tasks/cc/vision/face_geometry:face_geometry_from_landmarks_graph",
+        "//mediapipe/tasks/cc/vision/face_geometry/calculators:geometry_pipeline_calculator_cc_proto",
        "//mediapipe/tasks/cc/vision/face_geometry/proto:environment_cc_proto",
        "//mediapipe/tasks/cc/vision/face_geometry/proto:face_geometry_cc_proto",
+        "//mediapipe/tasks/cc/vision/face_geometry/proto:face_geometry_graph_options_cc_proto",
        "//mediapipe/tasks/cc/vision/face_landmarker/proto:face_blendshapes_graph_options_cc_proto",
        "//mediapipe/tasks/cc/vision/face_landmarker/proto:face_landmarker_graph_options_cc_proto",
        "//mediapipe/tasks/cc/vision/face_landmarker/proto:face_landmarks_detector_graph_options_cc_proto",
--- a/mediapipe/tasks/cc/vision/face_landmarker/face_landmarker_graph.cc
+++ b/mediapipe/tasks/cc/vision/face_landmarker/face_landmarker_graph.cc
@ -40,8 +40,10 @@ limitations under the License.
 #include "mediapipe/tasks/cc/core/utils.h"
 #include "mediapipe/tasks/cc/metadata/utils/zip_utils.h"
 #include "mediapipe/tasks/cc/vision/face_detector/proto/face_detector_graph_options.pb.h"
+#include "mediapipe/tasks/cc/vision/face_geometry/calculators/geometry_pipeline_calculator.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/proto/environment.pb.h"
 #include "mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry.pb.h"
+#include "mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry_graph_options.pb.h"
 #include "mediapipe/tasks/cc/vision/face_landmarker/proto/face_blendshapes_graph_options.pb.h"
 #include "mediapipe/tasks/cc/vision/face_landmarker/proto/face_landmarker_graph_options.pb.h"
 #include "mediapipe/tasks/cc/vision/face_landmarker/proto/face_landmarks_detector_graph_options.pb.h"
@ -93,6 +95,8 @@ constexpr char kFaceDetectorTFLiteName[] = "face_detector.tflite";
 constexpr char kFaceLandmarksDetectorTFLiteName[] =
    "face_landmarks_detector.tflite";
 constexpr char kFaceBlendshapeTFLiteName[] = "face_blendshapes.tflite";
+constexpr char kFaceGeometryPipelineMetadataName[] =
+    "geometry_pipeline_metadata_landmarks.binarypb";

 struct FaceLandmarkerOutputs {
  Source<std::vector<NormalizedLandmarkList>> landmark_lists;
@ -305,6 +309,7 @@ class FaceLandmarkerGraph : public core::ModelTaskGraph {
  absl::StatusOr<CalculatorGraphConfig> GetConfig(
      SubgraphContext* sc) override {
    Graph graph;
+    bool output_geometry = HasOutput(sc->OriginalNode(), kFaceGeometryTag);
    if (sc->Options<FaceLandmarkerGraphOptions>()
            .base_options()
            .has_model_asset()) {
@ -318,6 +323,18 @@ class FaceLandmarkerGraph : public core::ModelTaskGraph {
          sc->MutableOptions<FaceLandmarkerGraphOptions>(),
          !sc->Service(::mediapipe::tasks::core::kModelResourcesCacheService)
               .IsAvailable()));
+      if (output_geometry) {
+        // Set the face geometry metdata file for
+        // FaceGeometryFromLandmarksGraph.
+        ASSIGN_OR_RETURN(auto face_geometry_pipeline_metadata_file,
+                         model_asset_bundle_resources->GetModelFile(
+                             kFaceGeometryPipelineMetadataName));
+        SetExternalFile(face_geometry_pipeline_metadata_file,
+                        sc->MutableOptions<FaceLandmarkerGraphOptions>()
+                            ->mutable_face_geometry_graph_options()
+                            ->mutable_geometry_pipeline_options()
+                            ->mutable_metadata_file());
+      }
    }
    std::optional<SidePacket<Environment>> environment;
    if (HasSideInput(sc->OriginalNode(), kEnvironmentTag)) {
@ -338,7 +355,6 @@ class FaceLandmarkerGraph : public core::ModelTaskGraph {
              .face_landmarks_detector_graph_options()
              .has_face_blendshapes_graph_options()));
    }
-    bool output_geometry = HasOutput(sc->OriginalNode(), kFaceGeometryTag);
    ASSIGN_OR_RETURN(
        auto outs,
        BuildFaceLandmarkerGraph(
@ -481,6 +497,9 @@ class FaceLandmarkerGraph : public core::ModelTaskGraph {
      auto& face_geometry_from_landmarks = graph.AddNode(
          "mediapipe.tasks.vision.face_geometry."
          "FaceGeometryFromLandmarksGraph");
+      face_geometry_from_landmarks
+          .GetOptions<face_geometry::proto::FaceGeometryGraphOptions>()
+          .Swap(tasks_options.mutable_face_geometry_graph_options());
      if (environment.has_value()) {
        *environment >> face_geometry_from_landmarks.SideIn(kEnvironmentTag);
      }
--- a/mediapipe/tasks/cc/vision/face_landmarker/proto/BUILD
+++ b/mediapipe/tasks/cc/vision/face_landmarker/proto/BUILD
@ -60,5 +60,6 @@ mediapipe_proto_library(
        "//mediapipe/framework:calculator_proto",
        "//mediapipe/tasks/cc/core/proto:base_options_proto",
        "//mediapipe/tasks/cc/vision/face_detector/proto:face_detector_graph_options_proto",
+        "//mediapipe/tasks/cc/vision/face_geometry/proto:face_geometry_graph_options_proto",
    ],
 )
--- a/mediapipe/tasks/cc/vision/face_landmarker/proto/face_landmarker_graph_options.proto
+++ b/mediapipe/tasks/cc/vision/face_landmarker/proto/face_landmarker_graph_options.proto
@ -21,6 +21,7 @@ import "mediapipe/framework/calculator.proto";
 import "mediapipe/framework/calculator_options.proto";
 import "mediapipe/tasks/cc/core/proto/base_options.proto";
 import "mediapipe/tasks/cc/vision/face_detector/proto/face_detector_graph_options.proto";
+import "mediapipe/tasks/cc/vision/face_geometry/proto/face_geometry_graph_options.proto";
 import "mediapipe/tasks/cc/vision/face_landmarker/proto/face_landmarks_detector_graph_options.proto";

 option java_package = "com.google.mediapipe.tasks.vision.facelandmarker.proto";
@ -45,4 +46,8 @@ message FaceLandmarkerGraphOptions {
  // Minimum confidence for face landmarks tracking to be considered
  // successfully.
  optional float min_tracking_confidence = 4 [default = 0.5];
+
+  // Options for FaceGeometryGraph to get facial transformation matrix.
+  optional face_geometry.proto.FaceGeometryGraphOptions
+      face_geometry_graph_options = 5;
 }
--- a/mediapipe/tasks/cc/vision/face_stylizer/BUILD
+++ b/mediapipe/tasks/cc/vision/face_stylizer/BUILD
@ -47,6 +47,7 @@ cc_library(
        "//mediapipe/tasks/cc/vision/face_stylizer/proto:face_stylizer_graph_options_cc_proto",
        "@com_google_absl//absl/status:statusor",
    ],
+    alwayslink = 1,
 )

 cc_library(
--- a/mediapipe/tasks/cc/vision/face_stylizer/calculators/tensors_to_image_calculator.cc
+++ b/mediapipe/tasks/cc/vision/face_stylizer/calculators/tensors_to_image_calculator.cc
@ -294,7 +294,7 @@ absl::Status TensorsToImageCalculator::MetalProcess(CalculatorContext* cc) {
                  threadsPerThreadgroup:threads_per_group];
  [compute_encoder endEncoding];
  [command_buffer commit];
-
+  [command_buffer waitUntilCompleted];
  kOutputImage(cc).Send(Image(output));
  return absl::OkStatus();
 }
--- a/mediapipe/tasks/cc/vision/hand_landmarker/calculators/BUILD
+++ b/mediapipe/tasks/cc/vision/hand_landmarker/calculators/BUILD
@ -36,6 +36,7 @@ cc_library(
        ":hand_association_calculator_cc_proto",
        "//mediapipe/calculators/util:association_calculator",
        "//mediapipe/framework:calculator_framework",
+        "//mediapipe/framework:collection_item_id",
        "//mediapipe/framework/api2:node",
        "//mediapipe/framework/formats:rect_cc_proto",
        "//mediapipe/framework/port:rectangle",
--- a/mediapipe/tasks/cc/vision/hand_landmarker/calculators/hand_association_calculator.cc
+++ b/mediapipe/tasks/cc/vision/hand_landmarker/calculators/hand_association_calculator.cc
@ -19,6 +19,7 @@ limitations under the License.

 #include "mediapipe/framework/api2/node.h"
 #include "mediapipe/framework/calculator_framework.h"
+#include "mediapipe/framework/collection_item_id.h"
 #include "mediapipe/framework/formats/rect.pb.h"
 #include "mediapipe/framework/port/rectangle.h"
 #include "mediapipe/framework/port/status.h"
@ -29,30 +30,55 @@ namespace mediapipe::api2 {

 using ::mediapipe::NormalizedRect;

-// HandAssociationCalculator accepts multiple inputs of vectors of
-// NormalizedRect. The output is a vector of NormalizedRect that contains
-// rects from the input vectors that don't overlap with each other. When two
-// rects overlap, the rect that comes in from an earlier input stream is
-// kept in the output. If a rect has no ID (i.e. from detection stream),
-// then a unique rect ID is assigned for it.
-
-// The rects in multiple input streams are effectively flattened to a single
-// list.  For example:
-// Stream1 : rect 1, rect 2
-// Stream2:  rect 3, rect 4
-// Stream3: rect 5, rect 6
-// (Conceptually) flattened list : rect 1, 2, 3, 4, 5, 6
-// In the flattened list, if a rect with a higher index overlaps with a rect a
-// lower index, beyond a specified IOU threshold, the rect with the lower
-// index will be in the output, and the rect with higher index will be
-// discarded.
+// Input:
+//  BASE_RECTS - Vector of NormalizedRect.
+//  RECTS - Vector of NormalizedRect.
+//
+// Output:
+//  No tag - Vector of NormalizedRect.
+//
+// Example use:
+// node {
+//   calculator: "HandAssociationCalculator"
+//   input_stream: "BASE_RECTS:base_rects"
+//   input_stream: "RECTS:0:rects0"
+//   input_stream: "RECTS:1:rects1"
+//   input_stream: "RECTS:2:rects2"
+//   output_stream: "output_rects"
+//   options {
+//     [mediapipe.HandAssociationCalculatorOptions.ext] {
+//       min_similarity_threshold: 0.1
+//   }
+// }
+//
+// IMPORTANT Notes:
+//  - Rects from input streams tagged with "BASE_RECTS" are always preserved.
+//  - This calculator checks for overlap among rects from input streams tagged
+//    with "RECTS". Rects are prioritized based on their index in the vector and
+//    input streams to the calculator. When two rects overlap, the rect that
+//    comes from an input stream with lower tag-index is kept in the output.
+//  - Example of inputs for the node above:
+//      "base_rects": rect 0, rect 1
+//      "rects0": rect 2, rect 3
+//      "rects1": rect 4, rect 5
+//      "rects2": rect 6, rect 7
+//    (Conceptually) flattened list: 0, 1, 2, 3, 4, 5, 6, 7.
+//    Rects 0, 1 will be preserved. Rects 2, 3, 4, 5, 6, 7 will be checked for
+//    overlap. If a rect with a higher index overlaps with a rect with lower
+//    index, beyond a specified IOU threshold, the rect with the lower index
+//    will be in the output, and the rect with higher index will be discarded.
 // TODO: Upgrade this to latest API for calculators
 class HandAssociationCalculator : public CalculatorBase {
 public:
  static absl::Status GetContract(CalculatorContract* cc) {
    // Initialize input and output streams.
-    for (auto& input_stream : cc->Inputs()) {
-      input_stream.Set<std::vector<NormalizedRect>>();
+    for (CollectionItemId id = cc->Inputs().BeginId("BASE_RECTS");
+         id != cc->Inputs().EndId("BASE_RECTS"); ++id) {
+      cc->Inputs().Get(id).Set<std::vector<NormalizedRect>>();
+    }
+    for (CollectionItemId id = cc->Inputs().BeginId("RECTS");
+         id != cc->Inputs().EndId("RECTS"); ++id) {
+      cc->Inputs().Get(id).Set<std::vector<NormalizedRect>>();
    }
    cc->Outputs().Index(0).Set<std::vector<NormalizedRect>>();

@ -89,7 +115,24 @@ class HandAssociationCalculator : public CalculatorBase {
      CalculatorContext* cc) {
    std::vector<NormalizedRect> result;

-    for (const auto& input_stream : cc->Inputs()) {
+    for (CollectionItemId id = cc->Inputs().BeginId("BASE_RECTS");
+         id != cc->Inputs().EndId("BASE_RECTS"); ++id) {
+      const auto& input_stream = cc->Inputs().Get(id);
+      if (input_stream.IsEmpty()) {
+        continue;
+      }
+
+      for (auto rect : input_stream.Get<std::vector<NormalizedRect>>()) {
+        if (!rect.has_rect_id()) {
+          rect.set_rect_id(GetNextRectId());
+        }
+        result.push_back(rect);
+      }
+    }
+
+    for (CollectionItemId id = cc->Inputs().BeginId("RECTS");
+         id != cc->Inputs().EndId("RECTS"); ++id) {
+      const auto& input_stream = cc->Inputs().Get(id);
      if (input_stream.IsEmpty()) {
        continue;
      }
--- a/mediapipe/tasks/cc/vision/hand_landmarker/calculators/hand_association_calculator_test.cc
+++ b/mediapipe/tasks/cc/vision/hand_landmarker/calculators/hand_association_calculator_test.cc
@ -27,6 +27,8 @@ namespace mediapipe {
 namespace {

 using ::mediapipe::NormalizedRect;
+using ::testing::ElementsAre;
+using ::testing::EqualsProto;

 class HandAssociationCalculatorTest : public testing::Test {
 protected:
@ -87,9 +89,9 @@ class HandAssociationCalculatorTest : public testing::Test {
 TEST_F(HandAssociationCalculatorTest, NormRectAssocTest) {
  CalculatorRunner runner(ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
    calculator: "HandAssociationCalculator"
-    input_stream: "input_vec_0"
-    input_stream: "input_vec_1"
-    input_stream: "input_vec_2"
+    input_stream: "BASE_RECTS:input_vec_0"
+    input_stream: "RECTS:0:input_vec_1"
+    input_stream: "RECTS:1:input_vec_2"
    output_stream: "output_vec"
    options {
      [mediapipe.HandAssociationCalculatorOptions.ext] {
@ -103,20 +105,23 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTest) {
  input_vec_0->push_back(nr_0_);
  input_vec_0->push_back(nr_1_);
  input_vec_0->push_back(nr_2_);
-  runner.MutableInputs()->Index(0).packets.push_back(
-      Adopt(input_vec_0.release()).At(Timestamp(1)));
+  runner.MutableInputs()
+      ->Tag("BASE_RECTS")
+      .packets.push_back(Adopt(input_vec_0.release()).At(Timestamp(1)));

  // Input Stream 1: nr_3, nr_4.
  auto input_vec_1 = std::make_unique<std::vector<NormalizedRect>>();
  input_vec_1->push_back(nr_3_);
  input_vec_1->push_back(nr_4_);
-  runner.MutableInputs()->Index(1).packets.push_back(
+  auto index_id = runner.MutableInputs()->GetId("RECTS", 0);
+  runner.MutableInputs()->Get(index_id).packets.push_back(
      Adopt(input_vec_1.release()).At(Timestamp(1)));

  // Input Stream 2: nr_5.
  auto input_vec_2 = std::make_unique<std::vector<NormalizedRect>>();
  input_vec_2->push_back(nr_5_);
-  runner.MutableInputs()->Index(2).packets.push_back(
+  index_id = runner.MutableInputs()->GetId("RECTS", 1);
+  runner.MutableInputs()->Get(index_id).packets.push_back(
      Adopt(input_vec_2.release()).At(Timestamp(1)));

  MP_ASSERT_OK(runner.Run()) << "Calculator execution failed.";
@ -134,25 +139,18 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTest) {
  EXPECT_EQ(3, assoc_rects.size());

  // Check that IDs are filled in and contents match.
-  EXPECT_EQ(assoc_rects[0].rect_id(), 1);
-  assoc_rects[0].clear_rect_id();
-  EXPECT_THAT(assoc_rects[0], testing::EqualsProto(nr_0_));
-
-  EXPECT_EQ(assoc_rects[1].rect_id(), 2);
-  assoc_rects[1].clear_rect_id();
-  EXPECT_THAT(assoc_rects[1], testing::EqualsProto(nr_1_));
-
-  EXPECT_EQ(assoc_rects[2].rect_id(), 3);
-  assoc_rects[2].clear_rect_id();
-  EXPECT_THAT(assoc_rects[2], testing::EqualsProto(nr_2_));
+  nr_0_.set_rect_id(1);
+  nr_1_.set_rect_id(2);
+  nr_2_.set_rect_id(3);
+  EXPECT_THAT(assoc_rects, ElementsAre(EqualsProto(nr_0_), EqualsProto(nr_1_),
+                                       EqualsProto(nr_2_)));
 }

 TEST_F(HandAssociationCalculatorTest, NormRectAssocTestWithTrackedHands) {
  CalculatorRunner runner(ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
    calculator: "HandAssociationCalculator"
-    input_stream: "input_vec_0"
-    input_stream: "input_vec_1"
-    input_stream: "input_vec_2"
+    input_stream: "BASE_RECTS:input_vec_0"
+    input_stream: "RECTS:0:input_vec_1"
    output_stream: "output_vec"
    options {
      [mediapipe.HandAssociationCalculatorOptions.ext] {
@ -169,14 +167,15 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTestWithTrackedHands) {
  input_vec_0->push_back(nr_0_);
  nr_1_.set_rect_id(-1);
  input_vec_0->push_back(nr_1_);
-  runner.MutableInputs()->Index(0).packets.push_back(
-      Adopt(input_vec_0.release()).At(Timestamp(1)));
+  runner.MutableInputs()
+      ->Tag("BASE_RECTS")
+      .packets.push_back(Adopt(input_vec_0.release()).At(Timestamp(1)));

  // Input Stream 1: nr_2, nr_3. Newly detected palms.
  auto input_vec_1 = std::make_unique<std::vector<NormalizedRect>>();
  input_vec_1->push_back(nr_2_);
  input_vec_1->push_back(nr_3_);
-  runner.MutableInputs()->Index(1).packets.push_back(
+  runner.MutableInputs()->Tag("RECTS").packets.push_back(
      Adopt(input_vec_1.release()).At(Timestamp(1)));

  MP_ASSERT_OK(runner.Run()) << "Calculator execution failed.";
@ -192,23 +191,17 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTestWithTrackedHands) {
  EXPECT_EQ(3, assoc_rects.size());

  // Check that IDs are filled in and contents match.
-  EXPECT_EQ(assoc_rects[0].rect_id(), -2);
-  EXPECT_THAT(assoc_rects[0], testing::EqualsProto(nr_0_));
-
-  EXPECT_EQ(assoc_rects[1].rect_id(), -1);
-  EXPECT_THAT(assoc_rects[1], testing::EqualsProto(nr_1_));
-
-  EXPECT_EQ(assoc_rects[2].rect_id(), 1);
-  assoc_rects[2].clear_rect_id();
-  EXPECT_THAT(assoc_rects[2], testing::EqualsProto(nr_2_));
+  nr_2_.set_rect_id(1);
+  EXPECT_THAT(assoc_rects, ElementsAre(EqualsProto(nr_0_), EqualsProto(nr_1_),
+                                       EqualsProto(nr_2_)));
 }

 TEST_F(HandAssociationCalculatorTest, NormRectAssocTestReverse) {
  CalculatorRunner runner(ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
    calculator: "HandAssociationCalculator"
-    input_stream: "input_vec_0"
-    input_stream: "input_vec_1"
-    input_stream: "input_vec_2"
+    input_stream: "BASE_RECTS:input_vec_0"
+    input_stream: "RECTS:0:input_vec_1"
+    input_stream: "RECTS:1:input_vec_2"
    output_stream: "output_vec"
    options {
      [mediapipe.HandAssociationCalculatorOptions.ext] {
@ -220,14 +213,16 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTestReverse) {
  // Input Stream 0: nr_5.
  auto input_vec_0 = std::make_unique<std::vector<NormalizedRect>>();
  input_vec_0->push_back(nr_5_);
-  runner.MutableInputs()->Index(0).packets.push_back(
-      Adopt(input_vec_0.release()).At(Timestamp(1)));
+  runner.MutableInputs()
+      ->Tag("BASE_RECTS")
+      .packets.push_back(Adopt(input_vec_0.release()).At(Timestamp(1)));

  // Input Stream 1: nr_4, nr_3
  auto input_vec_1 = std::make_unique<std::vector<NormalizedRect>>();
  input_vec_1->push_back(nr_4_);
  input_vec_1->push_back(nr_3_);
-  runner.MutableInputs()->Index(1).packets.push_back(
+  auto index_id = runner.MutableInputs()->GetId("RECTS", 0);
+  runner.MutableInputs()->Get(index_id).packets.push_back(
      Adopt(input_vec_1.release()).At(Timestamp(1)));

  // Input Stream 2: nr_2, nr_1, nr_0.
@ -235,7 +230,8 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTestReverse) {
  input_vec_2->push_back(nr_2_);
  input_vec_2->push_back(nr_1_);
  input_vec_2->push_back(nr_0_);
-  runner.MutableInputs()->Index(2).packets.push_back(
+  index_id = runner.MutableInputs()->GetId("RECTS", 1);
+  runner.MutableInputs()->Get(index_id).packets.push_back(
      Adopt(input_vec_2.release()).At(Timestamp(1)));

  MP_ASSERT_OK(runner.Run()) << "Calculator execution failed.";
@ -253,23 +249,78 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocTestReverse) {
  EXPECT_EQ(3, assoc_rects.size());

  // Outputs are in same order as inputs, and IDs are filled in.
-  EXPECT_EQ(assoc_rects[0].rect_id(), 1);
-  assoc_rects[0].clear_rect_id();
-  EXPECT_THAT(assoc_rects[0], testing::EqualsProto(nr_5_));
+  nr_5_.set_rect_id(1);
+  nr_4_.set_rect_id(2);
+  nr_0_.set_rect_id(3);
+  EXPECT_THAT(assoc_rects, ElementsAre(EqualsProto(nr_5_), EqualsProto(nr_4_),
+                                       EqualsProto(nr_0_)));
+}

-  EXPECT_EQ(assoc_rects[1].rect_id(), 2);
-  assoc_rects[1].clear_rect_id();
-  EXPECT_THAT(assoc_rects[1], testing::EqualsProto(nr_4_));
+TEST_F(HandAssociationCalculatorTest, NormRectAssocTestReservesBaseRects) {
+  CalculatorRunner runner(ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
+    calculator: "HandAssociationCalculator"
+    input_stream: "BASE_RECTS:input_vec_0"
+    input_stream: "RECTS:0:input_vec_1"
+    input_stream: "RECTS:1:input_vec_2"
+    output_stream: "output_vec"
+    options {
+      [mediapipe.HandAssociationCalculatorOptions.ext] {
+        min_similarity_threshold: 0.1
+      }
+    }
+  )pb"));

-  EXPECT_EQ(assoc_rects[2].rect_id(), 3);
-  assoc_rects[2].clear_rect_id();
-  EXPECT_THAT(assoc_rects[2], testing::EqualsProto(nr_0_));
+  // Input Stream 0: nr_5, nr_3, nr_1.
+  auto input_vec_0 = std::make_unique<std::vector<NormalizedRect>>();
+  input_vec_0->push_back(nr_5_);
+  input_vec_0->push_back(nr_3_);
+  input_vec_0->push_back(nr_1_);
+  runner.MutableInputs()
+      ->Tag("BASE_RECTS")
+      .packets.push_back(Adopt(input_vec_0.release()).At(Timestamp(1)));
+
+  // Input Stream 1: nr_4.
+  auto input_vec_1 = std::make_unique<std::vector<NormalizedRect>>();
+  input_vec_1->push_back(nr_4_);
+  auto index_id = runner.MutableInputs()->GetId("RECTS", 0);
+  runner.MutableInputs()->Get(index_id).packets.push_back(
+      Adopt(input_vec_1.release()).At(Timestamp(1)));
+
+  // Input Stream 2: nr_2, nr_0.
+  auto input_vec_2 = std::make_unique<std::vector<NormalizedRect>>();
+  input_vec_2->push_back(nr_2_);
+  input_vec_2->push_back(nr_0_);
+  index_id = runner.MutableInputs()->GetId("RECTS", 1);
+  runner.MutableInputs()->Get(index_id).packets.push_back(
+      Adopt(input_vec_2.release()).At(Timestamp(1)));
+
+  MP_ASSERT_OK(runner.Run()) << "Calculator execution failed.";
+  const std::vector<Packet>& output = runner.Outputs().Index(0).packets;
+  EXPECT_EQ(1, output.size());
+  auto assoc_rects = output[0].Get<std::vector<NormalizedRect>>();
+
+  // Rectangles are added in the following sequence:
+  // nr_5 is added because it is in BASE_RECTS input stream.
+  // nr_3 is added because it is in BASE_RECTS input stream.
+  // nr_1 is added because it is in BASE_RECTS input stream.
+  // nr_4 is added because it does not overlap with nr_5.
+  // nr_2 is NOT added because it overlaps with nr_4.
+  // nr_0 is NOT added because it overlaps with nr_3.
+  EXPECT_EQ(4, assoc_rects.size());
+
+  // Outputs are in same order as inputs, and IDs are filled in.
+  nr_5_.set_rect_id(1);
+  nr_3_.set_rect_id(2);
+  nr_1_.set_rect_id(3);
+  nr_4_.set_rect_id(4);
+  EXPECT_THAT(assoc_rects, ElementsAre(EqualsProto(nr_5_), EqualsProto(nr_3_),
+                                       EqualsProto(nr_1_), EqualsProto(nr_4_)));
 }

 TEST_F(HandAssociationCalculatorTest, NormRectAssocSingleInputStream) {
  CalculatorRunner runner(ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
    calculator: "HandAssociationCalculator"
-    input_stream: "input_vec"
+    input_stream: "BASE_RECTS:input_vec"
    output_stream: "output_vec"
    options {
      [mediapipe.HandAssociationCalculatorOptions.ext] {
@ -282,8 +333,9 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocSingleInputStream) {
  auto input_vec = std::make_unique<std::vector<NormalizedRect>>();
  input_vec->push_back(nr_3_);
  input_vec->push_back(nr_5_);
-  runner.MutableInputs()->Index(0).packets.push_back(
-      Adopt(input_vec.release()).At(Timestamp(1)));
+  runner.MutableInputs()
+      ->Tag("BASE_RECTS")
+      .packets.push_back(Adopt(input_vec.release()).At(Timestamp(1)));

  MP_ASSERT_OK(runner.Run()) << "Calculator execution failed.";
  const std::vector<Packet>& output = runner.Outputs().Index(0).packets;
@ -292,12 +344,12 @@ TEST_F(HandAssociationCalculatorTest, NormRectAssocSingleInputStream) {

  // Rectangles are added in the following sequence:
  // nr_3 is added 1st.
-  // nr_5 is NOT added because it overlaps with nr_3.
-  EXPECT_EQ(1, assoc_rects.size());
+  // nr_5 is added 2nd. The calculator assumes it does not overlap with nr_3.
+  EXPECT_EQ(2, assoc_rects.size());

-  EXPECT_EQ(assoc_rects[0].rect_id(), 1);
-  assoc_rects[0].clear_rect_id();
-  EXPECT_THAT(assoc_rects[0], testing::EqualsProto(nr_3_));
+  nr_3_.set_rect_id(1);
+  nr_5_.set_rect_id(2);
+  EXPECT_THAT(assoc_rects, ElementsAre(EqualsProto(nr_3_), EqualsProto(nr_5_)));
 }

 }  // namespace
--- a/mediapipe/tasks/cc/vision/hand_landmarker/hand_landmarker_graph.cc
+++ b/mediapipe/tasks/cc/vision/hand_landmarker/hand_landmarker_graph.cc
@ -318,9 +318,9 @@ class HandLandmarkerGraph : public core::ModelTaskGraph {
          .set_min_similarity_threshold(
              tasks_options.min_tracking_confidence());
      prev_hand_rects_from_landmarks >>
-          hand_association[Input<std::vector<NormalizedRect>>::Multiple("")][0];
+          hand_association[Input<std::vector<NormalizedRect>>("BASE_RECTS")];
      hand_rects_from_hand_detector >>
-          hand_association[Input<std::vector<NormalizedRect>>::Multiple("")][1];
+          hand_association[Input<std::vector<NormalizedRect>>("RECTS")];
      auto hand_rects = hand_association.Out("");
      hand_rects >> clip_hand_rects.In("");
    } else {
--- a/mediapipe/tasks/java/com/google/mediapipe/tasks/mediapipe_tasks_aar.bzl
+++ b/mediapipe/tasks/java/com/google/mediapipe/tasks/mediapipe_tasks_aar.bzl
@ -34,18 +34,19 @@ _AUDIO_TASKS_JAVA_PROTO_LITE_TARGETS = [
 ]

 _VISION_TASKS_JAVA_PROTO_LITE_TARGETS = [
-    "//mediapipe/tasks/cc/vision/object_detector/proto:object_detector_options_java_proto_lite",
-    "//mediapipe/tasks/cc/vision/image_classifier/proto:image_classifier_graph_options_java_proto_lite",
+    "//mediapipe/tasks/cc/vision/face_detector/proto:face_detector_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/gesture_recognizer/proto:gesture_classifier_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/gesture_recognizer/proto:gesture_embedder_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/gesture_recognizer/proto:gesture_recognizer_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/gesture_recognizer/proto:hand_gesture_recognizer_graph_options_java_proto_lite",
+    "//mediapipe/tasks/cc/vision/image_classifier/proto:image_classifier_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/image_embedder/proto:image_embedder_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/image_segmenter/proto:image_segmenter_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/image_segmenter/proto:segmenter_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/hand_detector/proto:hand_detector_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/hand_landmarker/proto:hand_landmarker_graph_options_java_proto_lite",
    "//mediapipe/tasks/cc/vision/hand_landmarker/proto:hand_landmarks_detector_graph_options_java_proto_lite",
+    "//mediapipe/tasks/cc/vision/object_detector/proto:object_detector_options_java_proto_lite",
 ]

 _TEXT_TASKS_JAVA_PROTO_LITE_TARGETS = [
--- a/mediapipe/tasks/testdata/vision/face_landmarker_with_blendshapes.task
+++ b/mediapipe/tasks/testdata/vision/face_landmarker_with_blendshapes.task
--- a/mediapipe/tasks/web/audio/core/audio_task_runner.ts
+++ b/mediapipe/tasks/web/audio/core/audio_task_runner.ts
@ -36,11 +36,9 @@ export abstract class AudioTaskRunner<T> extends TaskRunner {

  /** Sends a single audio clip to the graph and awaits results. */
  protected processAudioClip(audioData: Float32Array, sampleRate?: number): T {
-    // Increment the timestamp by 1 millisecond to guarantee that we send
-    // monotonically increasing timestamps to the graph.
-    const syntheticTimestamp = this.getLatestOutputTimestamp() + 1;
    return this.process(
-        audioData, sampleRate ?? this.defaultSampleRate, syntheticTimestamp);
+        audioData, sampleRate ?? this.defaultSampleRate,
+        this.getSynctheticTimestamp());
  }
 }

--- a/mediapipe/tasks/web/components/containers/BUILD
+++ b/mediapipe/tasks/web/components/containers/BUILD
@ -15,6 +15,11 @@ mediapipe_ts_declaration(
    deps = [":category"],
 )

+mediapipe_ts_declaration(
+    name = "keypoint",
+    srcs = ["keypoint.d.ts"],
+)
+
 mediapipe_ts_declaration(
    name = "landmark",
    srcs = ["landmark.d.ts"],
--- a/mediapipe/tasks/web/components/containers/keypoint.d.ts
+++ b/mediapipe/tasks/web/components/containers/keypoint.d.ts
@ -0,0 +1,33 @@
+/**
+ * Copyright 2023 The MediaPipe Authors. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * A keypoint, defined by the coordinates (x, y), normalized by the image
+ * dimensions.
+ */
+export declare interface NormalizedKeypoint {
+  /** X in normalized image coordinates. */
+  x: number;
+
+  /** Y in normalized image coordinates. */
+  y: number;
+
+  /** Optional label of the keypoint. */
+  label?: string;
+
+  /** Optional score of the keypoint. */
+  score?: number;
+}
--- a/mediapipe/tasks/web/core/task_runner.ts
+++ b/mediapipe/tasks/web/core/task_runner.ts
@ -175,9 +175,13 @@ export abstract class TaskRunner {
        Math.max(this.latestOutputTimestamp, timestamp);
  }

-  /** Returns the latest output timestamp. */
-  protected getLatestOutputTimestamp() {
-    return this.latestOutputTimestamp;
+  /**
+   * Gets a syncthethic timestamp in ms that can be used to send data to the
+   * next packet. The timestamp is one millisecond past the last timestamp
+   * received from the graph.
+   */
+  protected getSynctheticTimestamp(): number {
+    return this.latestOutputTimestamp + 1;
  }

  /** Throws the error from the error listener if an error was raised. */
--- a/mediapipe/tasks/web/text/text_classifier/text_classifier.ts
+++ b/mediapipe/tasks/web/text/text_classifier/text_classifier.ts
@ -131,11 +131,9 @@ export class TextClassifier extends TaskRunner {
   * @return The classification result of the text
   */
  classify(text: string): TextClassifierResult {
-    // Increment the timestamp by 1 millisecond to guarantee that we send
-    // monotonically increasing timestamps to the graph.
-    const syntheticTimestamp = this.getLatestOutputTimestamp() + 1;
    this.classificationResult = {classifications: []};
-    this.graphRunner.addStringToStream(text, INPUT_STREAM, syntheticTimestamp);
+    this.graphRunner.addStringToStream(
+        text, INPUT_STREAM, this.getSynctheticTimestamp());
    this.finishProcessing();
    return this.classificationResult;
  }
--- a/mediapipe/tasks/web/text/text_embedder/text_embedder.ts
+++ b/mediapipe/tasks/web/text/text_embedder/text_embedder.ts
@ -135,10 +135,8 @@ export class TextEmbedder extends TaskRunner {
   * @return The embedding resuls of the text
   */
  embed(text: string): TextEmbedderResult {
-    // Increment the timestamp by 1 millisecond to guarantee that we send
-    // monotonically increasing timestamps to the graph.
-    const syntheticTimestamp = this.getLatestOutputTimestamp() + 1;
-    this.graphRunner.addStringToStream(text, INPUT_STREAM, syntheticTimestamp);
+    this.graphRunner.addStringToStream(
+        text, INPUT_STREAM, this.getSynctheticTimestamp());
    this.finishProcessing();
    return this.embeddingResult;
  }
--- a/mediapipe/tasks/web/vision/README.md
+++ b/mediapipe/tasks/web/vision/README.md
@ -2,23 +2,42 @@

 This package contains the vision tasks for MediaPipe.

-## Object Detection
+## Gesture Recognition

-The MediaPipe Object Detector task lets you detect the presence and location of
-multiple classes of objects within images or videos.
+The MediaPipe Gesture Recognizer task lets you recognize hand gestures in real
+time, and provides the recognized hand gesture results along with the landmarks
+of the detected hands. You can use this task to recognize specific hand gestures
+from a user, and invoke application features that correspond to those gestures.

 ```
 const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
 );
-const objectDetector = await ObjectDetector.createFromModelPath(vision,
-    "https://storage.googleapis.com/mediapipe-tasks/object_detector/efficientdet_lite0_uint8.tflite"
+const gestureRecognizer = await GestureRecognizer.createFromModelPath(vision,
+    "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
 );
 const image = document.getElementById("image") as HTMLImageElement;
-const detections = objectDetector.detect(image);
+const recognitions = gestureRecognizer.recognize(image);
 ```

-For more information, refer to the [Object Detector](https://developers.google.com/mediapipe/solutions/vision/object_detector/web_js) documentation.
+## Hand Landmark Detection
+
+The MediaPipe Hand Landmarker task lets you detect the landmarks of the hands in
+an image. You can use this Task to localize key points of the hands and render
+visual effects over the hands.
+
+```
+const vision = await FilesetResolver.forVisionTasks(
+    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
+);
+const handLandmarker = await HandLandmarker.createFromModelPath(vision,
+    "https://storage.googleapis.com/mediapipe-tasks/hand_landmarker/hand_landmarker.task"
+);
+const image = document.getElementById("image") as HTMLImageElement;
+const landmarks = handLandmarker.detect(image);
+```
+
+For more information, refer to the [Handlandmark Detection](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/web_js) documentation.

 ## Image Classification

@ -56,40 +75,21 @@ imageSegmenter.segment(image, (masks, width, height) => {
 });
 ```

-## Gesture Recognition
+## Object Detection

-The MediaPipe Gesture Recognizer task lets you recognize hand gestures in real
-time, and provides the recognized hand gesture results along with the landmarks
-of the detected hands. You can use this task to recognize specific hand gestures
-from a user, and invoke application features that correspond to those gestures.
+The MediaPipe Object Detector task lets you detect the presence and location of
+multiple classes of objects within images or videos.

 ```
 const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
 );
-const gestureRecognizer = await GestureRecognizer.createFromModelPath(vision,
-    "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
+const objectDetector = await ObjectDetector.createFromModelPath(vision,
+    "https://storage.googleapis.com/mediapipe-tasks/object_detector/efficientdet_lite0_uint8.tflite"
 );
 const image = document.getElementById("image") as HTMLImageElement;
-const recognitions = gestureRecognizer.recognize(image);
+const detections = objectDetector.detect(image);
 ```

-## Handlandmark Detection
-
-The MediaPipe Hand Landmarker task lets you detect the landmarks of the hands in
-an image. You can use this Task to localize key points of the hands and render
-visual effects over the hands.
-
-```
-const vision = await FilesetResolver.forVisionTasks(
-    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
-);
-const handLandmarker = await HandLandmarker.createFromModelPath(vision,
-    "https://storage.googleapis.com/mediapipe-tasks/hand_landmarker/hand_landmarker.task"
-);
-const image = document.getElementById("image") as HTMLImageElement;
-const landmarks = handLandmarker.detect(image);
-```
-
-For more information, refer to the [Handlandmark Detection](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/web_js) documentation.
+For more information, refer to the [Object Detector](https://developers.google.com/mediapipe/solutions/vision/object_detector/web_js) documentation.

--- a/mediapipe/tasks/web/vision/core/BUILD
+++ b/mediapipe/tasks/web/vision/core/BUILD
@ -21,6 +21,14 @@ mediapipe_ts_declaration(
    ],
 )

+mediapipe_ts_declaration(
+    name = "types",
+    srcs = ["types.d.ts"],
+    deps = [
+        "//mediapipe/tasks/web/components/containers:keypoint",
+    ],
+)
+
 mediapipe_ts_library(
    name = "vision_task_runner",
    srcs = ["vision_task_runner.ts"],
@ -51,6 +59,11 @@ mediapipe_ts_library(
    ],
 )

+mediapipe_ts_library(
+    name = "render_utils",
+    srcs = ["render_utils.ts"],
+)
+
 jasmine_node_test(
    name = "vision_task_runner_test",
    deps = [":vision_task_runner_test_lib"],
--- a/mediapipe/tasks/web/vision/core/render_utils.ts
+++ b/mediapipe/tasks/web/vision/core/render_utils.ts
@ -0,0 +1,78 @@
+/** @fileoverview Utility functions used in the vision demos. */
+
+/**
+ * Copyright 2023 The MediaPipe Authors. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// Pre-baked color table for a maximum of 12 classes.
+const CM_ALPHA = 128;
+const COLOR_MAP = [
+  [0, 0, 0, CM_ALPHA],        // class 0 is BG = transparent
+  [255, 0, 0, CM_ALPHA],      // class 1 is red
+  [0, 255, 0, CM_ALPHA],      // class 2 is light green
+  [0, 0, 255, CM_ALPHA],      // class 3 is blue
+  [255, 255, 0, CM_ALPHA],    // class 4 is yellow
+  [255, 0, 255, CM_ALPHA],    // class 5 is light purple / magenta
+  [0, 255, 255, CM_ALPHA],    // class 6 is light blue / aqua
+  [128, 128, 128, CM_ALPHA],  // class 7 is gray
+  [255, 128, 0, CM_ALPHA],    // class 8 is orange
+  [128, 0, 255, CM_ALPHA],    // class 9 is dark purple
+  [0, 128, 0, CM_ALPHA],      // class 10 is dark green
+  [255, 255, 255, CM_ALPHA]   // class 11 is white; could do black instead?
+];
+
+
+/** Helper function to draw a confidence mask */
+export function drawConfidenceMask(
+
+    ctx: CanvasRenderingContext2D, image: Float32Array|Uint8Array,
+    width: number, height: number): void {
+  const uint8ClampedArray = new Uint8ClampedArray(width * height * 4);
+  for (let i = 0; i < image.length; i++) {
+    uint8ClampedArray[4 * i] = 128;
+    uint8ClampedArray[4 * i + 1] = 0;
+    uint8ClampedArray[4 * i + 2] = 0;
+    uint8ClampedArray[4 * i + 3] = image[i] * 255;
+  }
+  ctx.putImageData(new ImageData(uint8ClampedArray, width, height), 0, 0);
+}
+
+/**
+ * Helper function to draw a category mask. For GPU, we only have F32Arrays
+ * for now.
+ */
+export function drawCategoryMask(
+    ctx: CanvasRenderingContext2D, image: Float32Array|Uint8Array,
+    width: number, height: number): void {
+  const uint8ClampedArray = new Uint8ClampedArray(width * height * 4);
+  const isFloatArray = image instanceof Float32Array;
+  for (let i = 0; i < image.length; i++) {
+    const colorIndex = isFloatArray ? Math.round(image[i] * 255) : image[i];
+    const color = COLOR_MAP[colorIndex];
+
+    // When we're given a confidence mask by accident, we just log and return.
+    // TODO: We should fix this.
+    if (!color) {
+      console.warn('No color for ', colorIndex);
+      return;
+    }
+
+    uint8ClampedArray[4 * i] = color[0];
+    uint8ClampedArray[4 * i + 1] = color[1];
+    uint8ClampedArray[4 * i + 2] = color[2];
+    uint8ClampedArray[4 * i + 3] = color[3];
+  }
+  ctx.putImageData(new ImageData(uint8ClampedArray, width, height), 0, 0);
+}
--- a/mediapipe/tasks/web/vision/core/types.d.ts
+++ b/mediapipe/tasks/web/vision/core/types.d.ts
@ -0,0 +1,42 @@
+/**
+ * Copyright 2023 The MediaPipe Authors. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import {NormalizedKeypoint} from '../../../../tasks/web/components/containers/keypoint';
+
+/**
+ * The segmentation tasks return the segmentation result as a Uint8Array
+ * (when the default mode of `CATEGORY_MASK` is used) or as a Float32Array (for
+ * output type `CONFIDENCE_MASK`). The `WebGLTexture` output type is reserved
+ * for future usage.
+ */
+export type SegmentationMask = Uint8Array|Float32Array|WebGLTexture;
+
+/**
+ * A callback that receives the computed masks from the segmentation tasks. The
+ * callback either receives a single element array with a category mask (as a
+ * `[Uint8Array]`) or multiple confidence masks (as a `Float32Array[]`).
+ * The returned data is only valid for the duration of the callback. If
+ * asynchronous processing is needed, all data needs to be copied before the
+ * callback returns.
+ */
+export type SegmentationMaskCallback =
+    (masks: SegmentationMask[], width: number, height: number) => void;
+
+/** A Region-Of-Interest (ROI) to represent a region within an image. */
+export declare interface RegionOfInterest {
+  /** The ROI in keypoint format. */
+  keypoint: NormalizedKeypoint;
+}
--- a/mediapipe/tasks/web/vision/core/vision_task_runner.ts
+++ b/mediapipe/tasks/web/vision/core/vision_task_runner.ts
@ -74,11 +74,7 @@ export abstract class VisionTaskRunner extends TaskRunner {
          'Task is not initialized with image mode. ' +
          '\'runningMode\' must be set to \'IMAGE\'.');
    }
-
-    // Increment the timestamp by 1 millisecond to guarantee that we send
-    // monotonically increasing timestamps to the graph.
-    const syntheticTimestamp = this.getLatestOutputTimestamp() + 1;
-    this.process(image, imageProcessingOptions, syntheticTimestamp);
+    this.process(image, imageProcessingOptions, this.getSynctheticTimestamp());
  }

  /** Sends a single video frame to the graph and awaits results. */
--- a/mediapipe/tasks/web/vision/image_segmenter/BUILD
+++ b/mediapipe/tasks/web/vision/image_segmenter/BUILD
@ -19,8 +19,8 @@ mediapipe_ts_library(
        "//mediapipe/tasks/cc/vision/image_segmenter/proto:segmenter_options_jspb_proto",
        "//mediapipe/tasks/web/core",
        "//mediapipe/tasks/web/vision/core:image_processing_options",
+        "//mediapipe/tasks/web/vision/core:types",
        "//mediapipe/tasks/web/vision/core:vision_task_runner",
-        "//mediapipe/web/graph_runner:graph_runner_image_lib_ts",
        "//mediapipe/web/graph_runner:graph_runner_ts",
    ],
 )
--- a/mediapipe/tasks/web/vision/image_segmenter/image_segmenter.ts
+++ b/mediapipe/tasks/web/vision/image_segmenter/image_segmenter.ts
@ -21,6 +21,7 @@ import {ImageSegmenterGraphOptions as ImageSegmenterGraphOptionsProto} from '../
 import {SegmenterOptions as SegmenterOptionsProto} from '../../../../tasks/cc/vision/image_segmenter/proto/segmenter_options_pb';
 import {WasmFileset} from '../../../../tasks/web/core/wasm_fileset';
 import {ImageProcessingOptions} from '../../../../tasks/web/vision/core/image_processing_options';
+import {SegmentationMask, SegmentationMaskCallback} from '../../../../tasks/web/vision/core/types';
 import {VisionGraphRunner, VisionTaskRunner} from '../../../../tasks/web/vision/core/vision_task_runner';
 import {ImageSource, WasmModule} from '../../../../web/graph_runner/graph_runner';
 // Placeholder for internal dependency on trusted resource url
@ -28,27 +29,9 @@ import {ImageSource, WasmModule} from '../../../../web/graph_runner/graph_runner
 import {ImageSegmenterOptions} from './image_segmenter_options';

 export * from './image_segmenter_options';
+export {SegmentationMask, SegmentationMaskCallback};
 export {ImageSource};  // Used in the public API

-/**
- * The ImageSegmenter returns the segmentation result as a Uint8Array (when
- * the default mode of `CATEGORY_MASK` is used) or as a Float32Array (for
- * output type `CONFIDENCE_MASK`). The `WebGLTexture` output type is reserved
- * for future usage.
- */
-export type SegmentationMask = Uint8Array|Float32Array|WebGLTexture;
-
-/**
- * A callback that receives the computed masks from the image segmenter. The
- * callback either receives a single element array with a category mask (as a
- * `[Uint8Array]`) or multiple confidence masks (as a `Float32Array[]`).
- * The returned data is only valid for the duration of the callback. If
- * asynchronous processing is needed, all data needs to be copied before the
- * callback returns.
- */
-export type SegmentationMaskCallback =
-    (masks: SegmentationMask[], width: number, height: number) => void;
-
 const IMAGE_STREAM = 'image_in';
 const NORM_RECT_STREAM = 'norm_rect';
 const GROUPED_SEGMENTATIONS_STREAM = 'segmented_masks';
--- a/third_party/BUILD
+++ b/third_party/BUILD
@ -112,6 +112,7 @@ cmake_external(
        "WITH_JPEG": "ON",
        "WITH_PNG": "ON",
        "WITH_TIFF": "ON",
+        "WITH_OPENCL": "OFF",
        "WITH_WEBP": "OFF",
        # Optimization flags
        "CV_ENABLE_INTRINSICS": "ON",
--- a/third_party/external_files.bzl
+++ b/third_party/external_files.bzl
@ -67,7 +67,7 @@ def external_files():
    http_file(
        name = "com_google_mediapipe_BUILD",
        sha256 = "d2b2a8346202691d7f831887c84e9642e974f64ed67851d9a58cf15c94b1f6b3",
-        urls = ["https://storage.googleapis.com/mediapipe-assets/BUILD?generation=1661875663693976"],
+        urls = ["https://storage.googleapis.com/mediapipe-assets/BUILD?generation=16618756636939761678323576393653"],
    )

    http_file(
@ -318,8 +318,8 @@ def external_files():

    http_file(
        name = "com_google_mediapipe_face_landmarker_with_blendshapes_task",
-        sha256 = "a75c1ba70e4b8568000af2ad0b355ed559ab5d5793db50fa9ad241f8dc4fad5f",
-        urls = ["https://storage.googleapis.com/mediapipe-assets/face_landmarker_with_blendshapes.task?generation=1678323586260800"],
+        sha256 = "b44e4cae6f5822456d60f33e7c852640d78c7e342aee7eacc22589451a0b9dc2",
+        urls = ["https://storage.googleapis.com/mediapipe-assets/face_landmarker_with_blendshapes.task?generation=1678504998301299"],
    )

    http_file(
@ -822,8 +822,8 @@ def external_files():

    http_file(
        name = "com_google_mediapipe_portrait_expected_face_geometry_with_attention_pbtxt",
-        sha256 = "5cc57b8da3ad0527dce581fe1309f6b36043e5837e3f4f5af5e24005a99dc52a",
-        urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_expected_face_geometry_with_attention.pbtxt?generation=1678323601064393"],
+        sha256 = "7ed1eed98e61e0a10811bb611c895d87c8023f398a36db01b6d9ba2e1ab09e16",
+        urls = ["https://storage.googleapis.com/mediapipe-assets/portrait_expected_face_geometry_with_attention.pbtxt?generation=1678505004840652"],
    )

    http_file(