This commit is contained in:
Yafei Zhao 2021-09-23 15:41:04 +08:00
commit a715f1ec4d
580 changed files with 29291 additions and 4481 deletions

View File

@ -0,0 +1,27 @@
---
name: "Build/Installation Issue"
about: Use this template for build/installation issues
labels: type:build/install
---
<em>Please make sure that this is a build/installation issue and also refer to the [troubleshooting](https://google.github.io/mediapipe/getting_started/troubleshooting.html) documentation before raising any issues.</em>
**System information** (Please provide as much relevant information as possible)
- OS Platform and Distribution (e.g. Linux Ubuntu 16.04, Android 11, iOS 14.4):
- Compiler version (e.g. gcc/g++ 8 /Apple clang version 12.0.0):
- Programming Language and version ( e.g. C++ 14, Python 3.6, Java ):
- Installed using virtualenv? pip? Conda? (if python):
- [MediaPipe version](https://github.com/google/mediapipe/releases):
- Bazel version:
- XCode and Tulsi versions (if iOS):
- Android SDK and NDK versions (if android):
- Android [AAR](https://google.github.io/mediapipe/getting_started/android_archive_library.html) ( if android):
- OpenCV version (if running on desktop):
**Describe the problem**:
**[Provide the exact sequence of commands / steps that you executed before running into the problem](https://google.github.io/mediapipe/getting_started/getting_started.html):**
**Complete Logs:**
Include Complete Log information or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached:

View File

@ -0,0 +1,26 @@
---
name: "Solution Issue"
about: Use this template for assistance with a specific mediapipe solution, such as "Pose" or "Iris", including inference model usage/training, solution-specific calculators, etc.
labels: type:support
---
<em>Please make sure that this is a [solution](https://google.github.io/mediapipe/solutions/solutions.html) issue.<em>
**System information** (Please provide as much relevant information as possible)
- Have I written custom code (as opposed to using a stock example script provided in Mediapipe):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4):
- [MediaPipe version](https://github.com/google/mediapipe/releases):
- Bazel version:
- Solution (e.g. FaceMesh, Pose, Holistic):
- Programming Language and version ( e.g. C++, Python, Java):
**Describe the expected behavior:**
**Standalone code you may have used to try to get what you need :**
If there is a problem, provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/repo link /any notebook:
**Other info / Complete Logs :**
Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached:

View File

@ -0,0 +1,51 @@
---
name: "Documentation Issue"
about: Use this template for documentation related issues
labels: type:docs
---
Thank you for submitting a MediaPipe documentation issue.
The MediaPipe docs are open source! To get involved, read the documentation Contributor Guide
## URL(s) with the issue:
Please provide a link to the documentation entry, for example: https://github.com/google/mediapipe/blob/master/docs/solutions/face_mesh.md#models
## Description of issue (what needs changing):
Kinds of documentation problems:
### Clear description
For example, why should someone use this method? How is it useful?
### Correct links
Is the link to the source code correct?
### Parameters defined
Are all parameters defined and formatted correctly?
### Returns defined
Are return values defined?
### Raises listed and defined
Are the errors defined? For example,
### Usage example
Is there a usage example?
See the API guide:
on how to write testable usage examples.
### Request visuals, if applicable
Are there currently visuals? If not, will it clarify the content?
### Submit a pull request?
Are you planning to also submit a pull request to fix the issue? See the docs
https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md

32
.github/ISSUE_TEMPLATE/30-bug-issue.md vendored Normal file
View File

@ -0,0 +1,32 @@
---
name: "Bug Issue"
about: Use this template for reporting a bug
labels: type:bug
---
<em>Please make sure that this is a bug and also refer to the [troubleshooting](https://google.github.io/mediapipe/getting_started/troubleshooting.html), FAQ documentation before raising any issues.</em>
**System information** (Please provide as much relevant information as possible)
- Have I written custom code (as opposed to using a stock example script provided in MediaPipe):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4):
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- Browser and version (e.g. Google Chrome, Safari) if the issue happens on browser:
- Programming Language and version ( e.g. C++, Python, Java):
- [MediaPipe version](https://github.com/google/mediapipe/releases):
- Bazel version (if compiling from source):
- Solution ( e.g. FaceMesh, Pose, Holistic ):
- Android Studio, NDK, SDK versions (if issue is related to building in Android environment):
- Xcode & Tulsi version (if issue is related to building for iOS):
**Describe the current behavior:**
**Describe the expected behavior:**
**Standalone code to reproduce the issue:**
Provide a reproducible test case that is the bare minimum necessary to replicate the problem. If possible, please share a link to Colab/repo link /any notebook:
**Other info / Complete Logs :**
Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached

View File

@ -0,0 +1,24 @@
---
name: "Feature Request"
about: Use this template for raising a feature request
labels: type:feature
---
<em>Please make sure that this is a feature request.</em>
**System information** (Please provide as much relevant information as possible)
- MediaPipe Solution (you are using):
- Programming language : C++/typescript/Python/Objective C/Android Java
- Are you willing to contribute it (Yes/No):
**Describe the feature and the current behavior/state:**
**Will this change the current api? How?**
**Who will benefit with this feature?**
**Please specify the use cases for this feature:**
**Any Other info:**

View File

@ -0,0 +1,14 @@
---
name: "Other Issue"
about: Use this template for any other non-support related issues.
labels: type:others
---
This template is for miscellaneous issues not covered by the other issue categories
For questions on how to work with MediaPipe, or support for problems that are not verified bugs in MediaPipe, please go to [StackOverflow](https://stackoverflow.com/questions/tagged/mediapipe) and [Slack](https://mediapipe.page.link/joinslack) communities.
If you are reporting a vulnerability, please use the [dedicated reporting process](https://github.com/google/mediapipe/security).
For high-level discussions about MediaPipe, please post to discuss@mediapipe.org, for questions about the development or internal workings of MediaPipe, or if you would like to know how to contribute to MediaPipe, please post to developers@mediapipe.org.

18
.github/bot_config.yml vendored Normal file
View File

@ -0,0 +1,18 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# A list of assignees
assignees:
- sgowroji

34
.github/stale.yml vendored Normal file
View File

@ -0,0 +1,34 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
#
# This file was assembled from multiple pieces, whose use is documented
# throughout. Please refer to the TensorFlow dockerfiles documentation
# for more information.
# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 7
# Number of days of inactivity before a stale Issue or Pull Request is closed
daysUntilClose: 7
# Only issues or pull requests with all of these labels are checked if stale. Defaults to `[]` (disabled)
onlyLabels:
- stat:awaiting response
# Comment to post when marking as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you.
# Comment to post when removing the stale label. Set to `false` to disable
unmarkComment: false
closeComment: >
Closing as stale. Please reopen if you'd like to work on this further.

View File

@ -8,6 +8,7 @@ include README.md
include requirements.txt
recursive-include mediapipe/modules *.tflite *.txt *.binarypb
exclude mediapipe/modules/face_detection/face_detection_full_range.tflite
exclude mediapipe/modules/objectron/object_detection_3d_chair_1stage.tflite
exclude mediapipe/modules/objectron/object_detection_3d_sneakers_1stage.tflite
exclude mediapipe/modules/objectron/object_detection_3d_sneakers.tflite

View File

@ -40,11 +40,12 @@ Hair Segmentation
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |
[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | |
[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | |
[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | |
[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | |
@ -54,46 +55,22 @@ See also
[MediaPipe Models and Model Cards](https://google.github.io/mediapipe/solutions/models)
for ML models released in MediaPipe.
## MediaPipe in Python
MediaPipe offers customizable Python solutions as a prebuilt Python package on
[PyPI](https://pypi.org/project/mediapipe/), which can be installed simply with
`pip install mediapipe`. It also provides tools for users to build their own
solutions. Please see
[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python)
for more info.
## MediaPipe on the Web
MediaPipe on the Web is an effort to run the same ML solutions built for mobile
and desktop also in web browsers. The official API is under construction, but
the core technology has been proven effective. Please see
[MediaPipe on the Web](https://developers.googleblog.com/2020/01/mediapipe-on-web.html)
in Google Developers Blog for details.
You can use the following links to load a demo in the MediaPipe Visualizer, and
over there click the "Runner" icon in the top bar like shown below. The demos
use your webcam video as input, which is processed all locally in real-time and
never leaves your device.
![visualizer_runner](docs/images/visualizer_runner.png)
* [MediaPipe Face Detection](https://viz.mediapipe.dev/demo/face_detection)
* [MediaPipe Iris](https://viz.mediapipe.dev/demo/iris_tracking)
* [MediaPipe Iris: Depth-from-Iris](https://viz.mediapipe.dev/demo/iris_depth)
* [MediaPipe Hands](https://viz.mediapipe.dev/demo/hand_tracking)
* [MediaPipe Hands (palm/hand detection only)](https://viz.mediapipe.dev/demo/hand_detection)
* [MediaPipe Pose](https://viz.mediapipe.dev/demo/pose_tracking)
* [MediaPipe Hair Segmentation](https://viz.mediapipe.dev/demo/hair_segmentation)
## Getting started
Learn how to [install](https://google.github.io/mediapipe/getting_started/install)
MediaPipe and
[build example applications](https://google.github.io/mediapipe/getting_started/building_examples),
and start exploring our ready-to-use
[solutions](https://google.github.io/mediapipe/solutions/solutions) that you can
further extend and customize.
To start using MediaPipe
[solutions](https://google.github.io/mediapipe/solutions/solutions) with only a few
lines code, see example code and demos in
[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python) and
[MediaPipe in JavaScript](https://google.github.io/mediapipe/getting_started/javascript).
To use MediaPipe in C++, Android and iOS, which allow further customization of
the [solutions](https://google.github.io/mediapipe/solutions/solutions) as well as
building your own, learn how to
[install](https://google.github.io/mediapipe/getting_started/install) MediaPipe and
start building example applications in
[C++](https://google.github.io/mediapipe/getting_started/cpp),
[Android](https://google.github.io/mediapipe/getting_started/android) and
[iOS](https://google.github.io/mediapipe/getting_started/ios).
The source code is hosted in the
[MediaPipe Github repository](https://github.com/google/mediapipe), and you can
@ -167,6 +144,13 @@ bash build_macos_desktop_examples.sh --cpu i386 --app face_detection -r
## Publications
* [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html)
in Google Developers Blog
* [Prosthesis control via Mirru App using MediaPipe hand tracking](https://developers.googleblog.com/2021/05/control-your-mirru-prosthesis-with-mediapipe-hand-tracking.html)
in Google Developers Blog
* [SignAll SDK: Sign language interface using MediaPipe is now available for
developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html)
in Google Developers Blog
* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html)
in Google AI Blog
* [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html)

View File

@ -65,26 +65,19 @@ rules_foreign_cc_dependencies()
all_content = """filegroup(name = "all", srcs = glob(["**"]), visibility = ["//visibility:public"])"""
# GoogleTest/GoogleMock framework. Used by most unit-tests.
# Last updated 2020-06-30.
# Last updated 2021-07-02.
http_archive(
name = "com_google_googletest",
urls = ["https://github.com/google/googletest/archive/aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e.zip"],
patches = [
# fix for https://github.com/google/googletest/issues/2817
"@//third_party:com_google_googletest_9d580ea80592189e6d44fa35bcf9cdea8bf620d6.diff"
],
patch_args = [
"-p1",
],
strip_prefix = "googletest-aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e",
sha256 = "04a1751f94244307cebe695a69cc945f9387a80b0ef1af21394a490697c5c895",
urls = ["https://github.com/google/googletest/archive/4ec4cd23f486bf70efcc5d2caa40f24368f752e3.zip"],
strip_prefix = "googletest-4ec4cd23f486bf70efcc5d2caa40f24368f752e3",
sha256 = "de682ea824bfffba05b4e33b67431c247397d6175962534305136aa06f92e049",
)
# Google Benchmark library.
http_archive(
name = "com_google_benchmark",
urls = ["https://github.com/google/benchmark/archive/master.zip"],
strip_prefix = "benchmark-master",
urls = ["https://github.com/google/benchmark/archive/main.zip"],
strip_prefix = "benchmark-main",
build_file = "@//third_party:benchmark.BUILD",
)
@ -176,11 +169,11 @@ http_archive(
http_archive(
name = "pybind11",
urls = [
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/pybind/pybind11/archive/v2.4.3.tar.gz",
"https://github.com/pybind/pybind11/archive/v2.4.3.tar.gz",
"https://storage.googleapis.com/mirror.tensorflow.org/github.com/pybind/pybind11/archive/v2.7.1.tar.gz",
"https://github.com/pybind/pybind11/archive/v2.7.1.tar.gz",
],
sha256 = "1eed57bc6863190e35637290f97a20c81cfe4d9090ac0a24f3bbf08f265eb71d",
strip_prefix = "pybind11-2.4.3",
sha256 = "616d1c42e4cf14fa27b2a4ff759d7d7b33006fdc5ad8fd603bb2c22622f27020",
strip_prefix = "pybind11-2.7.1",
build_file = "@pybind11_bazel//:pybind11.BUILD",
)
@ -254,6 +247,20 @@ http_archive(
url = "https://github.com/opencv/opencv/releases/download/3.2.0/opencv-3.2.0-ios-framework.zip",
)
http_archive(
name = "stblib",
strip_prefix = "stb-b42009b3b9d4ca35bc703f5310eedc74f584be58",
sha256 = "13a99ad430e930907f5611325ec384168a958bf7610e63e60e2fd8e7b7379610",
urls = ["https://github.com/nothings/stb/archive/b42009b3b9d4ca35bc703f5310eedc74f584be58.tar.gz"],
build_file = "@//third_party:stblib.BUILD",
patches = [
"@//third_party:stb_image_impl.diff"
],
patch_args = [
"-p1",
],
)
# You may run setup_android.sh to install Android SDK and NDK.
android_ndk_repository(
name = "androidndk",
@ -336,7 +343,9 @@ load("@rules_jvm_external//:defs.bzl", "maven_install")
maven_install(
artifacts = [
"androidx.concurrent:concurrent-futures:1.0.0-alpha03",
"androidx.lifecycle:lifecycle-common:2.2.0",
"androidx.lifecycle:lifecycle-common:2.3.1",
"androidx.activity:activity:1.2.2",
"androidx.fragment:fragment:1.3.4",
"androidx.annotation:annotation:aar:1.1.0",
"androidx.appcompat:appcompat:aar:1.1.0-rc01",
"androidx.camera:camera-core:1.0.0-beta10",
@ -349,11 +358,11 @@ maven_install(
"androidx.test.espresso:espresso-core:3.1.1",
"com.github.bumptech.glide:glide:4.11.0",
"com.google.android.material:material:aar:1.0.0-rc01",
"com.google.auto.value:auto-value:1.6.4",
"com.google.auto.value:auto-value-annotations:1.6.4",
"com.google.code.findbugs:jsr305:3.0.2",
"com.google.flogger:flogger-system-backend:0.3.1",
"com.google.flogger:flogger:0.3.1",
"com.google.auto.value:auto-value:1.8.1",
"com.google.auto.value:auto-value-annotations:1.8.1",
"com.google.code.findbugs:jsr305:latest.release",
"com.google.flogger:flogger-system-backend:latest.release",
"com.google.flogger:flogger:latest.release",
"com.google.guava:guava:27.0.1-android",
"com.google.guava:listenablefuture:1.0",
"junit:junit:4.12",
@ -381,9 +390,9 @@ http_archive(
)
# Tensorflow repo should always go after the other external dependencies.
# 2021-04-30
_TENSORFLOW_GIT_COMMIT = "5bd3c57ef184543d22e34e36cff9d9bea608e06d"
_TENSORFLOW_SHA256= "9a45862834221aafacf6fb275f92b3876bc89443cbecc51be93f13839a6609f0"
# 2021-07-29
_TENSORFLOW_GIT_COMMIT = "52a2905cbc21034766c08041933053178c5d10e3"
_TENSORFLOW_SHA256 = "06d4691bcdb700f3275fa0971a1585221c2b9f3dffe867963be565a6643d7f56"
http_archive(
name = "org_tensorflow",
urls = [
@ -404,3 +413,18 @@ load("@org_tensorflow//tensorflow:workspace3.bzl", "tf_workspace3")
tf_workspace3()
load("@org_tensorflow//tensorflow:workspace2.bzl", "tf_workspace2")
tf_workspace2()
# Edge TPU
http_archive(
name = "libedgetpu",
sha256 = "14d5527a943a25bc648c28a9961f954f70ba4d79c0a9ca5ae226e1831d72fe80",
strip_prefix = "libedgetpu-3164995622300286ef2bb14d7fdc2792dae045b7",
urls = [
"https://github.com/google-coral/libedgetpu/archive/3164995622300286ef2bb14d7fdc2792dae045b7.tar.gz"
],
)
load("@libedgetpu//:workspace.bzl", "libedgetpu_dependencies")
libedgetpu_dependencies()
load("@coral_crosstool//:configure.bzl", "cc_crosstool")
cc_crosstool(name = "crosstool")

View File

@ -97,6 +97,7 @@ for app in ${apps}; do
if [[ ${target_name} == "holistic_tracking" ||
${target_name} == "iris_tracking" ||
${target_name} == "pose_tracking" ||
${target_name} == "selfie_segmentation" ||
${target_name} == "upper_body_pose_tracking" ]]; then
graph_suffix="cpu"
else

View File

@ -248,12 +248,70 @@ absl::Status MyCalculator::Process() {
}
```
## Calculator options
Calculators accept processing parameters through (1) input stream packets (2)
input side packets, and (3) calculator options. Calculator options, if
specified, appear as literal values in the `node_options` field of the
`CalculatorGraphConfiguration.Node` message.
```
node {
calculator: "TfLiteInferenceCalculator"
input_stream: "TENSORS:main_model_input"
output_stream: "TENSORS:main_model_output"
node_options: {
[type.googleapis.com/mediapipe.TfLiteInferenceCalculatorOptions] {
model_path: "mediapipe/models/detection_model.tflite"
}
}
}
```
The `node_options` field accepts the proto3 syntax. Alternatively, calculator
options can be specified in the `options` field using proto2 syntax.
```
node {
calculator: "TfLiteInferenceCalculator"
input_stream: "TENSORS:main_model_input"
output_stream: "TENSORS:main_model_output"
node_options: {
[type.googleapis.com/mediapipe.TfLiteInferenceCalculatorOptions] {
model_path: "mediapipe/models/detection_model.tflite"
}
}
}
```
Not all calculators accept calcuator options. In order to accept options, a
calculator will normally define a new protobuf message type to represent its
options, such as `PacketClonerCalculatorOptions`. The calculator will then
read that protobuf message in its `CalculatorBase::Open` method, and possibly
also in its `CalculatorBase::GetContract` function or its
`CalculatorBase::Process` method. Normally, the new protobuf message type will
be defined as a protobuf schema using a ".proto" file and a
`mediapipe_proto_library()` build rule.
```
mediapipe_proto_library(
name = "packet_cloner_calculator_proto",
srcs = ["packet_cloner_calculator.proto"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
],
)
```
## Example calculator
This section discusses the implementation of `PacketClonerCalculator`, which
does a relatively simple job, and is used in many calculator graphs.
`PacketClonerCalculator` simply produces a copy of its most recent input
packets on demand.
`PacketClonerCalculator` simply produces a copy of its most recent input packets
on demand.
`PacketClonerCalculator` is useful when the timestamps of arriving data packets
are not aligned perfectly. Suppose we have a room with a microphone, light
@ -279,8 +337,8 @@ input streams:
imageframe of video data representing video collected from camera in the
room with timestamp.
Below is the implementation of the `PacketClonerCalculator`. You can see
the `GetContract()`, `Open()`, and `Process()` methods as well as the instance
Below is the implementation of the `PacketClonerCalculator`. You can see the
`GetContract()`, `Open()`, and `Process()` methods as well as the instance
variable `current_` which holds the most recent input packets.
```c++
@ -401,6 +459,6 @@ node {
The diagram below shows how the `PacketClonerCalculator` defines its output
packets (bottom) based on its series of input packets (top).
| ![Graph using PacketClonerCalculator](../images/packet_cloner_calculator.png) |
| :---------------------------------------------------------------------------: |
| *Each time it receives a packet on its TICK input stream, the PacketClonerCalculator outputs the most recent packet from each of its input streams. The sequence of output packets (bottom) is determined by the sequence of input packets (top) and their timestamps. The timestamps are shown along the right side of the diagram.* |
![Graph using PacketClonerCalculator](../images/packet_cloner_calculator.png) |
:--------------------------------------------------------------------------: |
*Each time it receives a packet on its TICK input stream, the PacketClonerCalculator outputs the most recent packet from each of its input streams. The sequence of output packets (bottom) is determined by the sequence of input packets (top) and their timestamps. The timestamps are shown along the right side of the diagram.* |

View File

@ -111,11 +111,11 @@ component known as an InputStreamHandler.
See [Synchronization](synchronization.md) for more details.
### Realtime data streams
### Real-time streams
MediaPipe calculator graphs are often used to process streams of video or audio
frames for interactive applications. Normally, each Calculator runs as soon as
all of its input packets for a given timestamp become available. Calculators
used in realtime graphs need to define output timestamp bounds based on input
used in real-time graphs need to define output timestamp bounds based on input
timestamp bounds in order to allow downstream calculators to be scheduled
promptly. See [Realtime data streams](realtime.md) for details.
promptly. See [Real-time Streams](realtime_streams.md) for details.

View File

@ -1,29 +1,28 @@
---
layout: default
title: Processing real-time data streams
title: Real-time Streams
parent: Framework Concepts
nav_order: 6
has_children: true
has_toc: false
---
# Processing real-time data streams
# Real-time Streams
{: .no_toc }
1. TOC
{:toc}
---
## Realtime timestamps
## Real-time timestamps
MediaPipe calculator graphs are often used to process streams of video or audio
frames for interactive applications. The MediaPipe framework requires only that
successive packets be assigned monotonically increasing timestamps. By
convention, realtime calculators and graphs use the recording time or the
convention, real-time calculators and graphs use the recording time or the
presentation time of each frame as its timestamp, with each timestamp indicating
the microseconds since `Jan/1/1970:00:00:00`. This allows packets from various
sources to be processed in a globally consistent sequence.
## Realtime scheduling
## Real-time scheduling
Normally, each Calculator runs as soon as all of its input packets for a given
timestamp become available. Normally, this happens when the calculator has
@ -38,7 +37,7 @@ When a calculator does not produce any output packets for a given timestamp, it
can instead output a "timestamp bound" indicating that no packet will be
produced for that timestamp. This indication is necessary to allow downstream
calculators to run at that timestamp, even though no packet has arrived for
certain streams for that timestamp. This is especially important for realtime
certain streams for that timestamp. This is especially important for real-time
graphs in interactive applications, where it is crucial that each calculator
begin processing as soon as possible.
@ -83,12 +82,12 @@ For example, `Timestamp(1).NextAllowedInStream() == Timestamp(2)`.
## Propagating timestamp bounds
Calculators that will be used in realtime graphs need to define output timestamp
bounds based on input timestamp bounds in order to allow downstream calculators
to be scheduled promptly. A common pattern is for calculators to output packets
with the same timestamps as their input packets. In this case, simply outputting
a packet on every call to `Calculator::Process` is sufficient to define output
timestamp bounds.
Calculators that will be used in real-time graphs need to define output
timestamp bounds based on input timestamp bounds in order to allow downstream
calculators to be scheduled promptly. A common pattern is for calculators to
output packets with the same timestamps as their input packets. In this case,
simply outputting a packet on every call to `Calculator::Process` is sufficient
to define output timestamp bounds.
However, calculators are not required to follow this common pattern for output
timestamps, they are only required to choose monotonically increasing output

View File

@ -16,12 +16,14 @@ nav_order: 1
Please follow instructions below to build Android example apps in the supported
MediaPipe [solutions](../solutions/solutions.md). To learn more about these
example apps, start from [Hello World! on Android](./hello_world_android.md). To
incorporate MediaPipe into an existing Android Studio project, see these
[instructions](./android_archive_library.md) that use Android Archive (AAR) and
Gradle.
example apps, start from [Hello World! on Android](./hello_world_android.md).
## Building Android example apps
To incorporate MediaPipe into Android Studio projects, see these
[instructions](./android_solutions.md) to use the MediaPipe Android Solution
APIs (currently in alpha) that are now available in
[Google's Maven Repository](https://maven.google.com/web/index.html?#com.google.mediapipe).
## Building Android example apps with Bazel
### Prerequisite
@ -51,16 +53,6 @@ $YOUR_INTENDED_API_LEVEL` in android_ndk_repository() and/or
android_sdk_repository() in the
[`WORKSPACE`](https://github.com/google/mediapipe/blob/master/WORKSPACE) file.
Please verify all the necessary packages are installed.
* Android SDK Platform API Level 28 or 29
* Android SDK Build-Tools 28 or 29
* Android SDK Platform-Tools 28 or 29
* Android SDK Tools 26.1.1
* Android NDK 19c or above
### Option 1: Build with Bazel in Command Line
Tip: You can run this
[script](https://github.com/google/mediapipe/blob/master/build_android_examples.sh)
to build (and install) all MediaPipe Android example apps.
@ -84,108 +76,3 @@ to build (and install) all MediaPipe Android example apps.
```bash
adb install bazel-bin/mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu/handtrackinggpu.apk
```
### Option 2: Build with Bazel in Android Studio
The MediaPipe project can be imported into Android Studio using the Bazel
plugins. This allows the MediaPipe examples to be built and modified in Android
Studio.
To incorporate MediaPipe into an existing Android Studio project, see these
[instructions](./android_archive_library.md) that use Android Archive (AAR) and
Gradle.
The steps below use Android Studio 3.5 to build and install a MediaPipe example
app:
1. Install and launch Android Studio 3.5.
2. Select `Configure` -> `SDK Manager` -> `SDK Platforms`.
* Verify that Android SDK Platform API Level 28 or 29 is installed.
* Take note of the Android SDK Location, e.g.,
`/usr/local/home/Android/Sdk`.
3. Select `Configure` -> `SDK Manager` -> `SDK Tools`.
* Verify that Android SDK Build-Tools 28 or 29 is installed.
* Verify that Android SDK Platform-Tools 28 or 29 is installed.
* Verify that Android SDK Tools 26.1.1 is installed.
* Verify that Android NDK 19c or above is installed.
* Take note of the Android NDK Location, e.g.,
`/usr/local/home/Android/Sdk/ndk-bundle` or
`/usr/local/home/Android/Sdk/ndk/20.0.5594570`.
4. Set environment variables `$ANDROID_HOME` and `$ANDROID_NDK_HOME` to point
to the installed SDK and NDK.
```bash
export ANDROID_HOME=/usr/local/home/Android/Sdk
# If the NDK libraries are installed by a previous version of Android Studio, do
export ANDROID_NDK_HOME=/usr/local/home/Android/Sdk/ndk-bundle
# If the NDK libraries are installed by Android Studio 3.5, do
export ANDROID_NDK_HOME=/usr/local/home/Android/Sdk/ndk/<version number>
```
5. Select `Configure` -> `Plugins` to install `Bazel`.
6. On Linux, select `File` -> `Settings` -> `Bazel settings`. On macos, select
`Android Studio` -> `Preferences` -> `Bazel settings`. Then, modify `Bazel
binary location` to be the same as the output of `$ which bazel`.
7. Select `Import Bazel Project`.
* Select `Workspace`: `/path/to/mediapipe` and select `Next`.
* Select `Generate from BUILD file`: `/path/to/mediapipe/BUILD` and select
`Next`.
* Modify `Project View` to be the following and select `Finish`.
```
directories:
# read project settings, e.g., .bazelrc
.
-mediapipe/objc
-mediapipe/examples/ios
targets:
//mediapipe/examples/android/...:all
//mediapipe/java/...:all
android_sdk_platform: android-29
sync_flags:
--host_crosstool_top=@bazel_tools//tools/cpp:toolchain
```
8. Select `Bazel` -> `Sync` -> `Sync project with Build files`.
Note: Even after doing step 4, if you still see the error: `"no such package
'@androidsdk//': Either the path attribute of android_sdk_repository or the
ANDROID_HOME environment variable must be set."`, please modify the
[`WORKSPACE`](https://github.com/google/mediapipe/blob/master/WORKSPACE)
file to point to your SDK and NDK library locations, as below:
```
android_sdk_repository(
name = "androidsdk",
path = "/path/to/android/sdk"
)
android_ndk_repository(
name = "androidndk",
path = "/path/to/android/ndk"
)
```
9. Connect an Android device to the workstation.
10. Select `Run...` -> `Edit Configurations...`.
* Select `Templates` -> `Bazel Command`.
* Enter Target Expression:
`//mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu:handtrackinggpu`
* Enter Bazel command: `mobile-install`.
* Enter Bazel flags: `-c opt --config=android_arm64`.
* Press the `[+]` button to add the new configuration.
* Select `Run` to run the example app on the connected Android device.

View File

@ -3,7 +3,7 @@ layout: default
title: MediaPipe Android Archive
parent: MediaPipe on Android
grand_parent: Getting Started
nav_order: 2
nav_order: 3
---
# MediaPipe Android Archive
@ -92,12 +92,12 @@ each project.
and copy
[the binary graph](https://github.com/google/mediapipe/blob/master/mediapipe/examples/android/src/java/com/google/mediapipe/apps/facedetectiongpu/BUILD#L41)
and
[the face detection tflite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_front.tflite).
[the face detection tflite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_short_range.tflite).
```bash
bazel build -c opt mediapipe/graphs/face_detection:face_detection_mobile_gpu_binary_graph
cp bazel-bin/mediapipe/graphs/face_detection/face_detection_mobile_gpu.binarypb /path/to/your/app/src/main/assets/
cp mediapipe/modules/face_detection/face_detection_front.tflite /path/to/your/app/src/main/assets/
cp mediapipe/modules/face_detection/face_detection_short_range.tflite /path/to/your/app/src/main/assets/
```
![Screenshot](../images/mobile/assets_location.png)
@ -113,10 +113,9 @@ each project.
androidTestImplementation 'androidx.test.ext:junit:1.1.0'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.1.1'
// MediaPipe deps
implementation 'com.google.flogger:flogger:0.3.1'
implementation 'com.google.flogger:flogger-system-backend:0.3.1'
implementation 'com.google.code.findbugs:jsr305:3.0.2'
implementation 'com.google.guava:guava:27.0.1-android'
implementation 'com.google.flogger:flogger:latest.release'
implementation 'com.google.flogger:flogger-system-backend:latest.release'
implementation 'com.google.code.findbugs:jsr305:latest.release'
implementation 'com.google.guava:guava:27.0.1-android'
implementation 'com.google.protobuf:protobuf-java:3.11.4'
// CameraX core library
@ -125,7 +124,7 @@ each project.
implementation "androidx.camera:camera-camera2:$camerax_version"
implementation "androidx.camera:camera-lifecycle:$camerax_version"
// AutoValue
def auto_value_version = "1.6.4"
def auto_value_version = "1.8.1"
implementation "com.google.auto.value:auto-value-annotations:$auto_value_version"
annotationProcessor "com.google.auto.value:auto-value:$auto_value_version"
}

View File

@ -0,0 +1,79 @@
---
layout: default
title: Android Solutions
parent: MediaPipe on Android
grand_parent: Getting Started
nav_order: 2
---
# Android Solution APIs
{: .no_toc }
1. TOC
{:toc}
---
Please follow instructions below to use the MediaPipe Solution APIs in Android
Studio projects and build the Android example apps in the supported MediaPipe
[solutions](../solutions/solutions.md).
## Integrate MediaPipe Android Solutions in Android Studio
MediaPipe Android Solution APIs (currently in alpha) are now available in
[Google's Maven Repository](https://maven.google.com/web/index.html?#com.google.mediapipe).
To incorporate MediaPipe Android Solutions into an Android Studio project, add
the following into the project's Gradle dependencies:
```
dependencies {
// MediaPipe solution-core is the foundation of any MediaPipe solutions.
implementation 'com.google.mediapipe:solution-core:latest.release'
// Optional: MediaPipe Hands solution.
implementation 'com.google.mediapipe:hands:latest.release'
// Optional: MediaPipe FaceMesh solution.
implementation 'com.google.mediapipe:facemesh:latest.release'
// MediaPipe deps
implementation 'com.google.flogger:flogger:latest.release'
implementation 'com.google.flogger:flogger-system-backend:latest.release'
implementation 'com.google.guava:guava:27.0.1-android'
implementation 'com.google.protobuf:protobuf-java:3.11.4'
// CameraX core library
def camerax_version = "1.0.0-beta10"
implementation "androidx.camera:camera-core:$camerax_version"
implementation "androidx.camera:camera-camera2:$camerax_version"
implementation "androidx.camera:camera-lifecycle:$camerax_version"
}
```
See the detailed solutions API usage examples for different use cases in the
solution example apps'
[source code](https://github.com/google/mediapipe/tree/master/mediapipe/examples/android/solutions).
If the prebuilt maven packages are not sufficient, building the MediaPipe
Android archive library locally by following these
[instructions](./android_archive_library.md).
## Build solution example apps in Android Studio
1. Open Android Studio Arctic Fox on Linux, macOS, or Windows.
2. Import mediapipe/examples/android/solutions directory into Android Studio.
![Screenshot](../images/import_mp_android_studio_project.png)
3. For Windows users, run `create_win_symlinks.bat` as administrator to create
res directory symlinks.
![Screenshot](../images/run_create_win_symlinks.png)
4. Select "File" -> "Sync Project with Gradle Files" to sync project.
5. Run solution example app in Android Studio.
![Screenshot](../images/run_android_solution_app.png)
6. (Optional) Run solutions on CPU.
MediaPipe solution example apps run the pipeline and the model inference on
GPU by default. If needed, for example to run the apps on Android Emulator,
set the `RUN_ON_GPU` boolean variable to `false` in the app's
MainActivity.java to run the pipeline and the model inference on CPU.

View File

@ -31,8 +31,8 @@ stream on an Android device.
## Setup
1. Install MediaPipe on your system, see [MediaPipe installation guide] for
details.
1. Install MediaPipe on your system, see
[MediaPipe installation guide](./install.md) for details.
2. Install Android Development SDK and Android NDK. See how to do so also in
[MediaPipe installation guide].
3. Enable [developer options] on your Android device.
@ -770,7 +770,6 @@ If you ran into any issues, please see the full code of the tutorial
[`ExternalTextureConverter`]:https://github.com/google/mediapipe/tree/master/mediapipe/java/com/google/mediapipe/components/ExternalTextureConverter.java
[`FrameLayout`]:https://developer.android.com/reference/android/widget/FrameLayout
[`FrameProcessor`]:https://github.com/google/mediapipe/tree/master/mediapipe/java/com/google/mediapipe/components/FrameProcessor.java
[MediaPipe installation guide]:./install.md
[`PermissionHelper`]: https://github.com/google/mediapipe/tree/master/mediapipe/java/com/google/mediapipe/components/PermissionHelper.java
[`SurfaceHolder.Callback`]:https://developer.android.com/reference/android/view/SurfaceHolder.Callback.html
[`SurfaceView`]:https://developer.android.com/reference/android/view/SurfaceView

View File

@ -31,8 +31,8 @@ stream on an iOS device.
## Setup
1. Install MediaPipe on your system, see [MediaPipe installation guide] for
details.
1. Install MediaPipe on your system, see
[MediaPipe installation guide](./install.md) for details.
2. Setup your iOS device for development.
3. Setup [Bazel] on your system to build and deploy the iOS app.
@ -113,6 +113,10 @@ bazel to build the iOS application. The content of the
5. `Main.storyboard` and `Launch.storyboard`
6. `Assets.xcassets` directory.
Note: In newer versions of Xcode, you may see additional files `SceneDelegate.h`
and `SceneDelegate.m`. Make sure to copy them too and add them to the `BUILD`
file mentioned below.
Copy these files to a directory named `HelloWorld` to a location that can access
the MediaPipe source code. For example, the source code of the application that
we will build in this tutorial is located in
@ -247,6 +251,12 @@ We need to get frames from the `_cameraSource` into our application
`MPPInputSourceDelegate`. So our application `ViewController` can be a delegate
of `_cameraSource`.
Update the interface definition of `ViewController` accordingly:
```
@interface ViewController () <MPPInputSourceDelegate>
```
To handle camera setup and process incoming frames, we should use a queue
different from the main queue. Add the following to the implementation block of
the `ViewController`:
@ -288,6 +298,12 @@ utility called `MPPLayerRenderer` to display images on the screen. This utility
can be used to display `CVPixelBufferRef` objects, which is the type of the
images provided by `MPPCameraInputSource` to its delegates.
In `ViewController.m`, add the following import line:
```
#import "mediapipe/objc/MPPLayerRenderer.h"
```
To display images of the screen, we need to add a new `UIView` object called
`_liveView` to the `ViewController`.
@ -411,6 +427,12 @@ Objective-C++.
### Use the graph in `ViewController`
In `ViewController.m`, add the following import line:
```
#import "mediapipe/objc/MPPGraph.h"
```
Declare a static constant with the name of the graph, the input stream and the
output stream:
@ -549,6 +571,12 @@ method to receive packets on this output stream and display them on the screen:
}
```
Update the interface definition of `ViewController` with `MPPGraphDelegate`:
```
@interface ViewController () <MPPGraphDelegate, MPPInputSourceDelegate>
```
And that is all! Build and run the app on your iOS device. You should see the
results of running the edge detection graph on a live video feed. Congrats!
@ -560,6 +588,5 @@ appropriate `BUILD` file dependencies for the edge detection graph.
[Bazel]:https://bazel.build/
[`edge_detection_mobile_gpu.pbtxt`]:https://github.com/google/mediapipe/tree/master/mediapipe/graphs/edge_detection/edge_detection_mobile_gpu.pbtxt
[MediaPipe installation guide]:./install.md
[common]:(https://github.com/google/mediapipe/tree/master/mediapipe/examples/ios/common)
[helloworld]:(https://github.com/google/mediapipe/tree/master/mediapipe/examples/ios/helloworld)
[common]:https://github.com/google/mediapipe/tree/master/mediapipe/examples/ios/common
[helloworld]:https://github.com/google/mediapipe/tree/master/mediapipe/examples/ios/helloworld

View File

@ -43,104 +43,189 @@ install --user six`.
3. Install OpenCV and FFmpeg.
Option 1. Use package manager tool to install the pre-compiled OpenCV
libraries. FFmpeg will be installed via libopencv-video-dev.
**Option 1**. Use package manager tool to install the pre-compiled OpenCV
libraries. FFmpeg will be installed via `libopencv-video-dev`.
Note: Debian 9 and Ubuntu 16.04 provide OpenCV 2.4.9. You may want to take
option 2 or 3 to install OpenCV 3 or above.
OS | OpenCV
-------------------- | ------
Debian 9 (stretch) | 2.4
Debian 10 (buster) | 3.2
Debian 11 (bullseye) | 4.5
Ubuntu 16.04 LTS | 2.4
Ubuntu 18.04 LTS | 3.2
Ubuntu 20.04 LTS | 4.2
Ubuntu 20.04 LTS | 4.2
Ubuntu 21.04 | 4.5
```bash
$ sudo apt-get install libopencv-core-dev libopencv-highgui-dev \
libopencv-calib3d-dev libopencv-features2d-dev \
libopencv-imgproc-dev libopencv-video-dev
$ sudo apt-get install -y \
libopencv-core-dev \
libopencv-highgui-dev \
libopencv-calib3d-dev \
libopencv-features2d-dev \
libopencv-imgproc-dev \
libopencv-video-dev
```
Debian 9 and Ubuntu 18.04 install the packages in
`/usr/lib/x86_64-linux-gnu`. MediaPipe's [`opencv_linux.BUILD`] and
[`ffmpeg_linux.BUILD`] are configured for this library path. Ubuntu 20.04
may install the OpenCV and FFmpeg packages in `/usr/local`, Please follow
the option 3 below to modify the [`WORKSPACE`], [`opencv_linux.BUILD`] and
[`ffmpeg_linux.BUILD`] files accordingly.
Moreover, for Nvidia Jetson and Raspberry Pi devices with ARM Ubuntu, the
library path needs to be modified like the following:
MediaPipe's [`opencv_linux.BUILD`] and [`WORKSPACE`] are already configured
for OpenCV 2/3 and should work correctly on any architecture:
```bash
sed -i "s/x86_64-linux-gnu/aarch64-linux-gnu/g" third_party/opencv_linux.BUILD
# WORKSPACE
new_local_repository(
name = "linux_opencv",
build_file = "@//third_party:opencv_linux.BUILD",
path = "/usr",
)
# opencv_linux.BUILD for OpenCV 2/3 installed from Debian package
cc_library(
name = "opencv",
linkopts = [
"-l:libopencv_core.so",
"-l:libopencv_calib3d.so",
"-l:libopencv_features2d.so",
"-l:libopencv_highgui.so",
"-l:libopencv_imgcodecs.so",
"-l:libopencv_imgproc.so",
"-l:libopencv_video.so",
"-l:libopencv_videoio.so",
],
)
```
Option 2. Run [`setup_opencv.sh`] to automatically build OpenCV from source
and modify MediaPipe's OpenCV config.
For OpenCV 4 you need to modify [`opencv_linux.BUILD`] taking into account
current architecture:
Option 3. Follow OpenCV's
```bash
# WORKSPACE
new_local_repository(
name = "linux_opencv",
build_file = "@//third_party:opencv_linux.BUILD",
path = "/usr",
)
# opencv_linux.BUILD for OpenCV 4 installed from Debian package
cc_library(
name = "opencv",
hdrs = glob([
# Uncomment according to your multiarch value (gcc -print-multiarch):
# "include/aarch64-linux-gnu/opencv4/opencv2/cvconfig.h",
# "include/arm-linux-gnueabihf/opencv4/opencv2/cvconfig.h",
# "include/x86_64-linux-gnu/opencv4/opencv2/cvconfig.h",
"include/opencv4/opencv2/**/*.h*",
]),
includes = [
# Uncomment according to your multiarch value (gcc -print-multiarch):
# "include/aarch64-linux-gnu/opencv4/",
# "include/arm-linux-gnueabihf/opencv4/",
# "include/x86_64-linux-gnu/opencv4/",
"include/opencv4/",
],
linkopts = [
"-l:libopencv_core.so",
"-l:libopencv_calib3d.so",
"-l:libopencv_features2d.so",
"-l:libopencv_highgui.so",
"-l:libopencv_imgcodecs.so",
"-l:libopencv_imgproc.so",
"-l:libopencv_video.so",
"-l:libopencv_videoio.so",
],
)
```
**Option 2**. Run [`setup_opencv.sh`] to automatically build OpenCV from
source and modify MediaPipe's OpenCV config. This option will do all steps
defined in Option 3 automatically.
**Option 3**. Follow OpenCV's
[documentation](https://docs.opencv.org/3.4.6/d7/d9f/tutorial_linux_install.html)
to manually build OpenCV from source code.
Note: You may need to modify [`WORKSPACE`], [`opencv_linux.BUILD`] and
[`ffmpeg_linux.BUILD`] to point MediaPipe to your own OpenCV and FFmpeg
libraries. For example if OpenCV and FFmpeg are both manually installed in
"/usr/local/", you will need to update: (1) the "linux_opencv" and
"linux_ffmpeg" new_local_repository rules in [`WORKSPACE`], (2) the "opencv"
cc_library rule in [`opencv_linux.BUILD`], and (3) the "libffmpeg"
cc_library rule in [`ffmpeg_linux.BUILD`]. These 3 changes are shown below:
You may need to modify [`WORKSPACE`] and [`opencv_linux.BUILD`] to point
MediaPipe to your own OpenCV libraries. Assume OpenCV would be installed to
`/usr/local/` which is recommended by default.
OpenCV 2/3 setup:
```bash
# WORKSPACE
new_local_repository(
name = "linux_opencv",
build_file = "@//third_party:opencv_linux.BUILD",
path = "/usr/local",
)
# opencv_linux.BUILD for OpenCV 2/3 installed to /usr/local
cc_library(
name = "opencv",
linkopts = [
"-L/usr/local/lib",
"-l:libopencv_core.so",
"-l:libopencv_calib3d.so",
"-l:libopencv_features2d.so",
"-l:libopencv_highgui.so",
"-l:libopencv_imgcodecs.so",
"-l:libopencv_imgproc.so",
"-l:libopencv_video.so",
"-l:libopencv_videoio.so",
],
)
```
OpenCV 4 setup:
```bash
# WORKSPACE
new_local_repository(
name = "linux_ffmpeg",
build_file = "@//third_party:ffmpeg_linux.BUILD",
name = "linux_opencv",
build_file = "@//third_party:opencv_linux.BUILD",
path = "/usr/local",
)
# opencv_linux.BUILD for OpenCV 4 installed to /usr/local
cc_library(
name = "opencv",
srcs = glob(
[
"lib/libopencv_core.so",
"lib/libopencv_highgui.so",
"lib/libopencv_imgcodecs.so",
"lib/libopencv_imgproc.so",
"lib/libopencv_video.so",
"lib/libopencv_videoio.so",
],
),
hdrs = glob([
# For OpenCV 3.x
"include/opencv2/**/*.h*",
# For OpenCV 4.x
# "include/opencv4/opencv2/**/*.h*",
"include/opencv4/opencv2/**/*.h*",
]),
includes = [
# For OpenCV 3.x
"include/",
# For OpenCV 4.x
# "include/opencv4/",
"include/opencv4/",
],
linkstatic = 1,
visibility = ["//visibility:public"],
linkopts = [
"-L/usr/local/lib",
"-l:libopencv_core.so",
"-l:libopencv_calib3d.so",
"-l:libopencv_features2d.so",
"-l:libopencv_highgui.so",
"-l:libopencv_imgcodecs.so",
"-l:libopencv_imgproc.so",
"-l:libopencv_video.so",
"-l:libopencv_videoio.so",
],
)
```
Current FFmpeg setup is defined in [`ffmpeg_linux.BUILD`] and should work
for any architecture:
```bash
# WORKSPACE
new_local_repository(
name = "linux_ffmpeg",
build_file = "@//third_party:ffmpeg_linux.BUILD",
path = "/usr"
)
# ffmpeg_linux.BUILD for FFmpeg installed from Debian package
cc_library(
name = "libffmpeg",
srcs = glob(
[
"lib/libav*.so",
],
),
hdrs = glob(["include/libav*/*.h"]),
includes = ["include"],
linkopts = [
"-lavcodec",
"-lavformat",
"-lavutil",
"-l:libavcodec.so",
"-l:libavformat.so",
"-l:libavutil.so",
],
linkstatic = 1,
visibility = ["//visibility:public"],
)
```
@ -711,7 +796,7 @@ This will use a Docker image that will isolate mediapipe's installation from the
```bash
$ docker run -it --name mediapipe mediapipe:latest
root@bca08b91ff63:/mediapipe# GLOG_logtostderr=1 bazel run --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/hello_world:hello_world
root@bca08b91ff63:/mediapipe# GLOG_logtostderr=1 bazelisk run --define MEDIAPIPE_DISABLE_GPU=1 mediapipe/examples/desktop/hello_world:hello_world
# Should print:
# Hello World!

View File

@ -17,16 +17,28 @@ nav_order: 4
MediaPipe currently offers the following solutions:
Solution | NPM Package | Example
----------------- | ----------------------------- | -------
--------------------------- | --------------------------------------- | -------
[Face Mesh][F-pg] | [@mediapipe/face_mesh][F-npm] | [mediapipe.dev/demo/face_mesh][F-demo]
[Face Detection][Fd-pg] | [@mediapipe/face_detection][Fd-npm] | [mediapipe.dev/demo/face_detection][Fd-demo]
[Hands][H-pg] | [@mediapipe/hands][H-npm] | [mediapipe.dev/demo/hands][H-demo]
[Holistic][Ho-pg] | [@mediapipe/holistic][Ho-npm] | [mediapipe.dev/demo/holistic][Ho-demo]
[Objectron][Ob-pg] | [@mediapipe/objectron][Ob-npm] | [mediapipe.dev/demo/objectron][Ob-demo]
[Pose][P-pg] | [@mediapipe/pose][P-npm] | [mediapipe.dev/demo/pose][P-demo]
[Selfie Segmentation][S-pg] | [@mediapipe/selfie_segmentation][S-npm] | [mediapipe.dev/demo/selfie_segmentation][S-demo]
Click on a solution link above for more information, including API and code
snippets.
### Supported plaforms:
| Browser | Platform | Notes |
| ------- | ----------------------- | -------------------------------------- |
| Chrome | Android / Windows / Mac | Pixel 4 and older unsupported. Fuschia |
| | | unsupported. |
| Chrome | iOS | Camera unavailable in Chrome on iOS. |
| Safari | iPad/iPhone/Mac | iOS and Safari on iPad / iPhone / |
| | | MacBook |
The quickest way to get acclimated is to look at the examples above. Each demo
has a link to a [CodePen][codepen] so that you can edit the code and try it
yourself. We have included a number of utility packages to help you get started:
@ -66,29 +78,25 @@ affecting your work, restrict your request to a `<minor>` number. e.g.,
[F-pg]: ../solutions/face_mesh#javascript-solution-api
[Fd-pg]: ../solutions/face_detection#javascript-solution-api
[H-pg]: ../solutions/hands#javascript-solution-api
[Ob-pg]: ../solutions/objectron#javascript-solution-api
[P-pg]: ../solutions/pose#javascript-solution-api
[S-pg]: ../solutions/selfie_segmentation#javascript-solution-api
[Ho-npm]: https://www.npmjs.com/package/@mediapipe/holistic
[F-npm]: https://www.npmjs.com/package/@mediapipe/face_mesh
[Fd-npm]: https://www.npmjs.com/package/@mediapipe/face_detection
[H-npm]: https://www.npmjs.com/package/@mediapipe/hands
[Ob-npm]: https://www.npmjs.com/package/@mediapipe/objectron
[P-npm]: https://www.npmjs.com/package/@mediapipe/pose
[S-npm]: https://www.npmjs.com/package/@mediapipe/selfie_segmentation
[draw-npm]: https://www.npmjs.com/package/@mediapipe/drawing_utils
[cam-npm]: https://www.npmjs.com/package/@mediapipe/camera_utils
[ctrl-npm]: https://www.npmjs.com/package/@mediapipe/control_utils
[Ho-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/holistic
[F-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/face_mesh
[Fd-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/face_detection
[H-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/hands
[P-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/pose
[Ho-pen]: https://code.mediapipe.dev/codepen/holistic
[F-pen]: https://code.mediapipe.dev/codepen/face_mesh
[Fd-pen]: https://code.mediapipe.dev/codepen/face_detection
[H-pen]: https://code.mediapipe.dev/codepen/hands
[P-pen]: https://code.mediapipe.dev/codepen/pose
[Ho-demo]: https://mediapipe.dev/demo/holistic
[F-demo]: https://mediapipe.dev/demo/face_mesh
[Fd-demo]: https://mediapipe.dev/demo/face_detection
[H-demo]: https://mediapipe.dev/demo/hands
[Ob-demo]: https://mediapipe.dev/demo/objectron
[P-demo]: https://mediapipe.dev/demo/pose
[S-demo]: https://mediapipe.dev/demo/selfie_segmentation
[npm]: https://www.npmjs.com/package/@mediapipe
[codepen]: https://code.mediapipe.dev/codepen

View File

@ -51,6 +51,7 @@ details in each solution via the links below:
* [MediaPipe Holistic](../solutions/holistic#python-solution-api)
* [MediaPipe Objectron](../solutions/objectron#python-solution-api)
* [MediaPipe Pose](../solutions/pose#python-solution-api)
* [MediaPipe Selfie Segmentation](../solutions/selfie_segmentation#python-solution-api)
## MediaPipe on Google Colab
@ -62,6 +63,7 @@ details in each solution via the links below:
* [MediaPipe Pose Colab](https://mediapipe.page.link/pose_py_colab)
* [MediaPipe Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic)
* [MediaPipe Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended)
* [MediaPipe Selfie Segmentation Colab](https://mediapipe.page.link/selfie_segmentation_py_colab)
## MediaPipe Python Framework

View File

@ -74,7 +74,7 @@ Mapping\[str, Packet\] | std::map<std::string, Packet> | create_st
np.ndarray<br>(cv.mat and PIL.Image) | mp::ImageFrame | create_image_frame(<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;format=ImageFormat.SRGB,<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;data=mat) | get_image_frame(packet)
np.ndarray | mp::Matrix | create_matrix(data) | get_matrix(packet)
Google Proto Message | Google Proto Message | create_proto(proto) | get_proto(packet)
List\[Proto\] | std::vector\<Proto\> | create_proto_vector(proto_list) | get_proto_list(packet)
List\[Proto\] | std::vector\<Proto\> | n/a | get_proto_list(packet)
It's not uncommon that users create custom C++ classes and and send those into
the graphs and calculators. To allow the custom classes to be used in Python

Binary file not shown.

After

Width:  |  Height:  |  Size: 128 KiB

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 56 KiB

After

Width:  |  Height:  |  Size: 77 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 258 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

View File

@ -40,11 +40,12 @@ Hair Segmentation
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |
[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | |
[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | |
[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | |
[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | |
@ -54,46 +55,22 @@ See also
[MediaPipe Models and Model Cards](https://google.github.io/mediapipe/solutions/models)
for ML models released in MediaPipe.
## MediaPipe in Python
MediaPipe offers customizable Python solutions as a prebuilt Python package on
[PyPI](https://pypi.org/project/mediapipe/), which can be installed simply with
`pip install mediapipe`. It also provides tools for users to build their own
solutions. Please see
[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python)
for more info.
## MediaPipe on the Web
MediaPipe on the Web is an effort to run the same ML solutions built for mobile
and desktop also in web browsers. The official API is under construction, but
the core technology has been proven effective. Please see
[MediaPipe on the Web](https://developers.googleblog.com/2020/01/mediapipe-on-web.html)
in Google Developers Blog for details.
You can use the following links to load a demo in the MediaPipe Visualizer, and
over there click the "Runner" icon in the top bar like shown below. The demos
use your webcam video as input, which is processed all locally in real-time and
never leaves your device.
![visualizer_runner](images/visualizer_runner.png)
* [MediaPipe Face Detection](https://viz.mediapipe.dev/demo/face_detection)
* [MediaPipe Iris](https://viz.mediapipe.dev/demo/iris_tracking)
* [MediaPipe Iris: Depth-from-Iris](https://viz.mediapipe.dev/demo/iris_depth)
* [MediaPipe Hands](https://viz.mediapipe.dev/demo/hand_tracking)
* [MediaPipe Hands (palm/hand detection only)](https://viz.mediapipe.dev/demo/hand_detection)
* [MediaPipe Pose](https://viz.mediapipe.dev/demo/pose_tracking)
* [MediaPipe Hair Segmentation](https://viz.mediapipe.dev/demo/hair_segmentation)
## Getting started
Learn how to [install](https://google.github.io/mediapipe/getting_started/install)
MediaPipe and
[build example applications](https://google.github.io/mediapipe/getting_started/building_examples),
and start exploring our ready-to-use
[solutions](https://google.github.io/mediapipe/solutions/solutions) that you can
further extend and customize.
To start using MediaPipe
[solutions](https://google.github.io/mediapipe/solutions/solutions) with only a few
lines code, see example code and demos in
[MediaPipe in Python](https://google.github.io/mediapipe/getting_started/python) and
[MediaPipe in JavaScript](https://google.github.io/mediapipe/getting_started/javascript).
To use MediaPipe in C++, Android and iOS, which allow further customization of
the [solutions](https://google.github.io/mediapipe/solutions/solutions) as well as
building your own, learn how to
[install](https://google.github.io/mediapipe/getting_started/install) MediaPipe and
start building example applications in
[C++](https://google.github.io/mediapipe/getting_started/cpp),
[Android](https://google.github.io/mediapipe/getting_started/android) and
[iOS](https://google.github.io/mediapipe/getting_started/ios).
The source code is hosted in the
[MediaPipe Github repository](https://github.com/google/mediapipe), and you can
@ -102,6 +79,13 @@ run code search using
## Publications
* [Bringing artworks to life with AR](https://developers.googleblog.com/2021/07/bringing-artworks-to-life-with-ar.html)
in Google Developers Blog
* [Prosthesis control via Mirru App using MediaPipe hand tracking](https://developers.googleblog.com/2021/05/control-your-mirru-prosthesis-with-mediapipe-hand-tracking.html)
in Google Developers Blog
* [SignAll SDK: Sign language interface using MediaPipe is now available for
developers](https://developers.googleblog.com/2021/04/signall-sdk-sign-language-interface-using-mediapipe-now-available.html)
in Google Developers Blog
* [MediaPipe Holistic - Simultaneous Face, Hand and Pose Prediction, on Device](https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html)
in Google AI Blog
* [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html)

View File

@ -2,7 +2,7 @@
layout: default
title: AutoFlip (Saliency-aware Video Cropping)
parent: Solutions
nav_order: 13
nav_order: 14
---
# AutoFlip: Saliency-aware Video Cropping

View File

@ -2,7 +2,7 @@
layout: default
title: Box Tracking
parent: Solutions
nav_order: 9
nav_order: 10
---
# MediaPipe Box Tracking

View File

@ -45,6 +45,15 @@ section.
Naming style and availability may differ slightly across platforms/languages.
#### model_selection
An integer index `0` or `1`. Use `0` to select a short-range model that works
best for faces within 2 meters from the camera, and `1` for a full-range model
best for faces within 5 meters. For the full-range option, a sparse model is
used for its improved inference speed. Please refer to the
[model cards](./models.md#face_detection) for details. Default to `0` if not
specified.
#### min_detection_confidence
Minimum confidence value (`[0.0, 1.0]`) from the face detection model for the
@ -68,10 +77,11 @@ normalized to `[0.0, 1.0]` by the image width and height respectively.
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
* [model_selection](#model_selection)
* [min_detection_confidence](#min_detection_confidence)
```python
@ -81,9 +91,10 @@ mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils
# For static images:
IMAGE_FILES = []
with mp_face_detection.FaceDetection(
min_detection_confidence=0.5) as face_detection:
for idx, file in enumerate(file_list):
model_selection=1, min_detection_confidence=0.5) as face_detection:
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
# Convert the BGR image to RGB and process it with MediaPipe Face Detection.
results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
@ -102,7 +113,7 @@ with mp_face_detection.FaceDetection(
# For webcam input:
cap = cv2.VideoCapture(0)
with mp_face_detection.FaceDetection(
min_detection_confidence=0.5) as face_detection:
model_selection=0, min_detection_confidence=0.5) as face_detection:
while cap.isOpened():
success, image = cap.read()
if not success:
@ -138,6 +149,7 @@ and the following usage example.
Supported configuration options:
* [modelSelection](#model_selection)
* [minDetectionConfidence](#min_detection_confidence)
```html
@ -188,6 +200,7 @@ const faceDetection = new FaceDetection({locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/face_detection@0.0/${file}`;
}});
faceDetection.setOptions({
modelSelection: 0
minDetectionConfidence: 0.5
});
faceDetection.onResults(onResults);
@ -254,10 +267,6 @@ same configuration as the GPU pipeline, runs entirely on CPU.
* Target:
[`mediapipe/examples/desktop/face_detection:face_detection_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/face_detection/BUILD)
### Web
Please refer to [these instructions](../index.md#mediapipe-on-the-web).
### Coral
Please refer to

View File

@ -69,7 +69,7 @@ and renders using a dedicated
The
[face landmark subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_landmark/face_landmark_front_gpu.pbtxt)
internally uses a
[face_detection_subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_front_gpu.pbtxt)
[face_detection_subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_short_range_gpu.pbtxt)
from the
[face detection module](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection).
@ -265,7 +265,7 @@ magnitude of `z` uses roughly the same scale as `x`.
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -278,15 +278,17 @@ Supported configuration options:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_face_mesh = mp.solutions.face_mesh
# For static images:
IMAGE_FILES = []
drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)
with mp_face_mesh.FaceMesh(
static_image_mode=True,
max_num_faces=1,
min_detection_confidence=0.5) as face_mesh:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
# Convert the BGR image to RGB before processing.
results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
@ -300,9 +302,17 @@ with mp_face_mesh.FaceMesh(
mp_drawing.draw_landmarks(
image=annotated_image,
landmark_list=face_landmarks,
connections=mp_face_mesh.FACE_CONNECTIONS,
landmark_drawing_spec=drawing_spec,
connection_drawing_spec=drawing_spec)
connections=mp_face_mesh.FACEMESH_TESSELATION,
landmark_drawing_spec=None,
connection_drawing_spec=mp_drawing_styles
.get_default_face_mesh_tesselation_style())
mp_drawing.draw_landmarks(
image=annotated_image,
landmark_list=face_landmarks,
connections=mp_face_mesh.FACEMESH_CONTOURS,
landmark_drawing_spec=None,
connection_drawing_spec=mp_drawing_styles
.get_default_face_mesh_contours_style())
cv2.imwrite('/tmp/annotated_image' + str(idx) + '.png', annotated_image)
# For webcam input:
@ -334,9 +344,17 @@ with mp_face_mesh.FaceMesh(
mp_drawing.draw_landmarks(
image=image,
landmark_list=face_landmarks,
connections=mp_face_mesh.FACE_CONNECTIONS,
landmark_drawing_spec=drawing_spec,
connection_drawing_spec=drawing_spec)
connections=mp_face_mesh.FACEMESH_TESSELATION,
landmark_drawing_spec=None,
connection_drawing_spec=mp_drawing_styles
.get_default_face_mesh_tesselation_style())
mp_drawing.draw_landmarks(
image=image,
landmark_list=face_landmarks,
connections=mp_face_mesh.FACEMESH_CONTOURS,
landmark_drawing_spec=None,
connection_drawing_spec=mp_drawing_styles
.get_default_face_mesh_contours_style())
cv2.imshow('MediaPipe FaceMesh', image)
if cv2.waitKey(5) & 0xFF == 27:
break
@ -422,6 +440,200 @@ camera.start();
</script>
```
### Android Solution API
Please first follow general
[instructions](../getting_started/android_solutions.md#integrate-mediapipe-android-solutions-api)
to add MediaPipe Gradle dependencies, then try the FaceMash solution API in the
companion
[example Android Studio project](https://github.com/google/mediapipe/tree/master/mediapipe/examples/android/solutions/facemesh)
following
[these instructions](../getting_started/android_solutions.md#build-solution-example-apps-in-android-studio)
and learn more in the usage example below.
Supported configuration options:
* [staticImageMode](#static_image_mode)
* [maxNumFaces](#max_num_faces)
* runOnGpu: Run the pipeline and the model inference on GPU or CPU.
#### Camera Input
```java
// For camera input and result rendering with OpenGL.
FaceMeshOptions faceMeshOptions =
FaceMeshOptions.builder()
.setMode(FaceMeshOptions.STREAMING_MODE) // API soon to become
.setMaxNumFaces(1) // setStaticImageMode(false)
.setRunOnGpu(true).build();
FaceMesh facemesh = new FaceMesh(this, faceMeshOptions);
facemesh.setErrorListener(
(message, e) -> Log.e(TAG, "MediaPipe FaceMesh error:" + message));
// Initializes a new CameraInput instance and connects it to MediaPipe FaceMesh.
CameraInput cameraInput = new CameraInput(this);
cameraInput.setNewFrameListener(
textureFrame -> facemesh.send(textureFrame));
// Initializes a new GlSurfaceView with a ResultGlRenderer<FaceMeshResult> instance
// that provides the interfaces to run user-defined OpenGL rendering code.
// See mediapipe/examples/android/solutions/facemesh/src/main/java/com/google/mediapipe/examples/facemesh/FaceMeshResultGlRenderer.java
// as an example.
SolutionGlSurfaceView<FaceMeshResult> glSurfaceView =
new SolutionGlSurfaceView<>(
this, facemesh.getGlContext(), facemesh.getGlMajorVersion());
glSurfaceView.setSolutionResultRenderer(new FaceMeshResultGlRenderer());
glSurfaceView.setRenderInputImage(true);
facemesh.setResultListener(
faceMeshResult -> {
NormalizedLandmark noseLandmark =
result.multiFaceLandmarks().get(0).getLandmarkList().get(1);
Log.i(
TAG,
String.format(
"MediaPipe FaceMesh nose normalized coordinates (value range: [0, 1]): x=%f, y=%f",
noseLandmark.getX(), noseLandmark.getY()));
// Request GL rendering.
glSurfaceView.setRenderData(faceMeshResult);
glSurfaceView.requestRender();
});
// The runnable to start camera after the GLSurfaceView is attached.
glSurfaceView.post(
() ->
cameraInput.start(
this,
facemesh.getGlContext(),
CameraInput.CameraFacing.FRONT,
glSurfaceView.getWidth(),
glSurfaceView.getHeight()));
```
#### Image Input
```java
// For reading images from gallery and drawing the output in an ImageView.
FaceMeshOptions faceMeshOptions =
FaceMeshOptions.builder()
.setMode(FaceMeshOptions.STATIC_IMAGE_MODE) // API soon to become
.setMaxNumFaces(1) // setStaticImageMode(true)
.setRunOnGpu(true).build();
FaceMesh facemesh = new FaceMesh(this, faceMeshOptions);
// Connects MediaPipe FaceMesh to the user-defined ImageView instance that allows
// users to have the custom drawing of the output landmarks on it.
// See mediapipe/examples/android/solutions/facemesh/src/main/java/com/google/mediapipe/examples/facemesh/FaceMeshResultImageView.java
// as an example.
FaceMeshResultImageView imageView = new FaceMeshResultImageView(this);
facemesh.setResultListener(
faceMeshResult -> {
int width = faceMeshResult.inputBitmap().getWidth();
int height = faceMeshResult.inputBitmap().getHeight();
NormalizedLandmark noseLandmark =
result.multiFaceLandmarks().get(0).getLandmarkList().get(1);
Log.i(
TAG,
String.format(
"MediaPipe FaceMesh nose coordinates (pixel values): x=%f, y=%f",
noseLandmark.getX() * width, noseLandmark.getY() * height));
// Request canvas drawing.
imageView.setFaceMeshResult(faceMeshResult);
runOnUiThread(() -> imageView.update());
});
facemesh.setErrorListener(
(message, e) -> Log.e(TAG, "MediaPipe FaceMesh error:" + message));
// ActivityResultLauncher to get an image from the gallery as Bitmap.
ActivityResultLauncher<Intent> imageGetter =
registerForActivityResult(
new ActivityResultContracts.StartActivityForResult(),
result -> {
Intent resultIntent = result.getData();
if (resultIntent != null && result.getResultCode() == RESULT_OK) {
Bitmap bitmap = null;
try {
bitmap =
MediaStore.Images.Media.getBitmap(
this.getContentResolver(), resultIntent.getData());
} catch (IOException e) {
Log.e(TAG, "Bitmap reading error:" + e);
}
if (bitmap != null) {
facemesh.send(bitmap);
}
}
});
Intent gallery = new Intent(
Intent.ACTION_PICK, MediaStore.Images.Media.INTERNAL_CONTENT_URI);
imageGetter.launch(gallery);
```
#### Video Input
```java
// For video input and result rendering with OpenGL.
FaceMeshOptions faceMeshOptions =
FaceMeshOptions.builder()
.setMode(FaceMeshOptions.STREAMING_MODE) // API soon to become
.setMaxNumFaces(1) // setStaticImageMode(false)
.setRunOnGpu(true).build();
FaceMesh facemesh = new FaceMesh(this, faceMeshOptions);
facemesh.setErrorListener(
(message, e) -> Log.e(TAG, "MediaPipe FaceMesh error:" + message));
// Initializes a new VideoInput instance and connects it to MediaPipe FaceMesh.
VideoInput videoInput = new VideoInput(this);
videoInput.setNewFrameListener(
textureFrame -> facemesh.send(textureFrame));
// Initializes a new GlSurfaceView with a ResultGlRenderer<FaceMeshResult> instance
// that provides the interfaces to run user-defined OpenGL rendering code.
// See mediapipe/examples/android/solutions/facemesh/src/main/java/com/google/mediapipe/examples/facemesh/FaceMeshResultGlRenderer.java
// as an example.
SolutionGlSurfaceView<FaceMeshResult> glSurfaceView =
new SolutionGlSurfaceView<>(
this, facemesh.getGlContext(), facemesh.getGlMajorVersion());
glSurfaceView.setSolutionResultRenderer(new FaceMeshResultGlRenderer());
glSurfaceView.setRenderInputImage(true);
facemesh.setResultListener(
faceMeshResult -> {
NormalizedLandmark noseLandmark =
result.multiFaceLandmarks().get(0).getLandmarkList().get(1);
Log.i(
TAG,
String.format(
"MediaPipe FaceMesh nose normalized coordinates (value range: [0, 1]): x=%f, y=%f",
noseLandmark.getX(), noseLandmark.getY()));
// Request GL rendering.
glSurfaceView.setRenderData(faceMeshResult);
glSurfaceView.requestRender();
});
ActivityResultLauncher<Intent> videoGetter =
registerForActivityResult(
new ActivityResultContracts.StartActivityForResult(),
result -> {
Intent resultIntent = result.getData();
if (resultIntent != null) {
if (result.getResultCode() == RESULT_OK) {
glSurfaceView.post(
() ->
videoInput.start(
this,
resultIntent.getData(),
facemesh.getGlContext(),
glSurfaceView.getWidth(),
glSurfaceView.getHeight()));
}
}
});
Intent gallery =
new Intent(Intent.ACTION_PICK, MediaStore.Video.Media.INTERNAL_CONTENT_URI);
videoGetter.launch(gallery);
```
## Example Apps
Please first see general instructions for

View File

@ -2,7 +2,7 @@
layout: default
title: Hair Segmentation
parent: Solutions
nav_order: 7
nav_order: 8
---
# MediaPipe Hair Segmentation
@ -51,7 +51,14 @@ to visualize its associated subgraphs, please see
### Web
Please refer to [these instructions](../index.md#mediapipe-on-the-web).
Use [this link](https://viz.mediapipe.dev/demo/hair_segmentation) to load a demo
in the MediaPipe Visualizer, and over there click the "Runner" icon in the top
bar like shown below. The demos use your webcam video as input, which is
processed all locally in real-time and never leaves your device. Please see
[MediaPipe on the Web](https://developers.googleblog.com/2020/01/mediapipe-on-web.html)
in Google Developers Blog for details.
![visualizer_runner](../images/visualizer_runner.png)
## Resources

View File

@ -206,7 +206,7 @@ is not the case, please swap the handedness output in the application.
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -219,14 +219,16 @@ Supported configuration options:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
# For static images:
IMAGE_FILES = []
with mp_hands.Hands(
static_image_mode=True,
max_num_hands=2,
min_detection_confidence=0.5) as hands:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
# Read an image, flip it around y-axis for correct handedness output (see
# above).
image = cv2.flip(cv2.imread(file), 1)
@ -247,7 +249,11 @@ with mp_hands.Hands(
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})'
)
mp_drawing.draw_landmarks(
annotated_image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
annotated_image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
cv2.imwrite(
'/tmp/annotated_image' + str(idx) + '.png', cv2.flip(annotated_image, 1))
@ -277,7 +283,11 @@ with mp_hands.Hands(
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
cv2.imshow('MediaPipe Hands', image)
if cv2.waitKey(5) & 0xFF == 27:
break
@ -358,6 +368,200 @@ camera.start();
</script>
```
### Android Solution API
Please first follow general
[instructions](../getting_started/android_solutions.md#integrate-mediapipe-android-solutions-api)
to add MediaPipe Gradle dependencies, then try the Hands solution API in the
companion
[example Android Studio project](https://github.com/google/mediapipe/tree/master/mediapipe/examples/android/solutions/hands)
following
[these instructions](../getting_started/android_solutions.md#build-solution-example-apps-in-android-studio)
and learn more in usage example below.
Supported configuration options:
* [staticImageMode](#static_image_mode)
* [maxNumHands](#max_num_hands)
* runOnGpu: Run the pipeline and the model inference on GPU or CPU.
#### Camera Input
```java
// For camera input and result rendering with OpenGL.
HandsOptions handsOptions =
HandsOptions.builder()
.setMode(HandsOptions.STREAMING_MODE) // API soon to become
.setMaxNumHands(1) // setStaticImageMode(false)
.setRunOnGpu(true).build();
Hands hands = new Hands(this, handsOptions);
hands.setErrorListener(
(message, e) -> Log.e(TAG, "MediaPipe Hands error:" + message));
// Initializes a new CameraInput instance and connects it to MediaPipe Hands.
CameraInput cameraInput = new CameraInput(this);
cameraInput.setNewFrameListener(
textureFrame -> hands.send(textureFrame));
// Initializes a new GlSurfaceView with a ResultGlRenderer<HandsResult> instance
// that provides the interfaces to run user-defined OpenGL rendering code.
// See mediapipe/examples/android/solutions/hands/src/main/java/com/google/mediapipe/examples/hands/HandsResultGlRenderer.java
// as an example.
SolutionGlSurfaceView<HandsResult> glSurfaceView =
new SolutionGlSurfaceView<>(
this, hands.getGlContext(), hands.getGlMajorVersion());
glSurfaceView.setSolutionResultRenderer(new HandsResultGlRenderer());
glSurfaceView.setRenderInputImage(true);
hands.setResultListener(
handsResult -> {
NormalizedLandmark wristLandmark = Hands.getHandLandmark(
handsResult, 0, HandLandmark.WRIST);
Log.i(
TAG,
String.format(
"MediaPipe Hand wrist normalized coordinates (value range: [0, 1]): x=%f, y=%f",
wristLandmark.getX(), wristLandmark.getY()));
// Request GL rendering.
glSurfaceView.setRenderData(handsResult);
glSurfaceView.requestRender();
});
// The runnable to start camera after the GLSurfaceView is attached.
glSurfaceView.post(
() ->
cameraInput.start(
this,
hands.getGlContext(),
CameraInput.CameraFacing.FRONT,
glSurfaceView.getWidth(),
glSurfaceView.getHeight()));
```
#### Image Input
```java
// For reading images from gallery and drawing the output in an ImageView.
HandsOptions handsOptions =
HandsOptions.builder()
.setMode(HandsOptions.STATIC_IMAGE_MODE) // API soon to become
.setMaxNumHands(1) // setStaticImageMode(true)
.setRunOnGpu(true).build();
Hands hands = new Hands(this, handsOptions);
// Connects MediaPipe Hands to the user-defined ImageView instance that allows
// users to have the custom drawing of the output landmarks on it.
// See mediapipe/examples/android/solutions/hands/src/main/java/com/google/mediapipe/examples/hands/HandsResultImageView.java
// as an example.
HandsResultImageView imageView = new HandsResultImageView(this);
hands.setResultListener(
handsResult -> {
int width = handsResult.inputBitmap().getWidth();
int height = handsResult.inputBitmap().getHeight();
NormalizedLandmark wristLandmark = Hands.getHandLandmark(
handsResult, 0, HandLandmark.WRIST);
Log.i(
TAG,
String.format(
"MediaPipe Hand wrist coordinates (pixel values): x=%f, y=%f",
wristLandmark.getX() * width, wristLandmark.getY() * height));
// Request canvas drawing.
imageView.setHandsResult(handsResult);
runOnUiThread(() -> imageView.update());
});
hands.setErrorListener(
(message, e) -> Log.e(TAG, "MediaPipe Hands error:" + message));
// ActivityResultLauncher to get an image from the gallery as Bitmap.
ActivityResultLauncher<Intent> imageGetter =
registerForActivityResult(
new ActivityResultContracts.StartActivityForResult(),
result -> {
Intent resultIntent = result.getData();
if (resultIntent != null && result.getResultCode() == RESULT_OK) {
Bitmap bitmap = null;
try {
bitmap =
MediaStore.Images.Media.getBitmap(
this.getContentResolver(), resultIntent.getData());
} catch (IOException e) {
Log.e(TAG, "Bitmap reading error:" + e);
}
if (bitmap != null) {
hands.send(bitmap);
}
}
});
Intent gallery = new Intent(
Intent.ACTION_PICK, MediaStore.Images.Media.INTERNAL_CONTENT_URI);
imageGetter.launch(gallery);
```
#### Video Input
```java
// For video input and result rendering with OpenGL.
HandsOptions handsOptions =
HandsOptions.builder()
.setMode(HandsOptions.STREAMING_MODE) // API soon to become
.setMaxNumHands(1) // setStaticImageMode(false)
.setRunOnGpu(true).build();
Hands hands = new Hands(this, handsOptions);
hands.setErrorListener(
(message, e) -> Log.e(TAG, "MediaPipe Hands error:" + message));
// Initializes a new VideoInput instance and connects it to MediaPipe Hands.
VideoInput videoInput = new VideoInput(this);
videoInput.setNewFrameListener(
textureFrame -> hands.send(textureFrame));
// Initializes a new GlSurfaceView with a ResultGlRenderer<HandsResult> instance
// that provides the interfaces to run user-defined OpenGL rendering code.
// See mediapipe/examples/android/solutions/hands/src/main/java/com/google/mediapipe/examples/hands/HandsResultGlRenderer.java
// as an example.
SolutionGlSurfaceView<HandsResult> glSurfaceView =
new SolutionGlSurfaceView<>(
this, hands.getGlContext(), hands.getGlMajorVersion());
glSurfaceView.setSolutionResultRenderer(new HandsResultGlRenderer());
glSurfaceView.setRenderInputImage(true);
hands.setResultListener(
handsResult -> {
NormalizedLandmark wristLandmark = Hands.getHandLandmark(
handsResult, 0, HandLandmark.WRIST);
Log.i(
TAG,
String.format(
"MediaPipe Hand wrist normalized coordinates (value range: [0, 1]): x=%f, y=%f",
wristLandmark.getX(), wristLandmark.getY()));
// Request GL rendering.
glSurfaceView.setRenderData(handsResult);
glSurfaceView.requestRender();
});
ActivityResultLauncher<Intent> videoGetter =
registerForActivityResult(
new ActivityResultContracts.StartActivityForResult(),
result -> {
Intent resultIntent = result.getData();
if (resultIntent != null) {
if (result.getResultCode() == RESULT_OK) {
glSurfaceView.post(
() ->
videoInput.start(
this,
resultIntent.getData(),
hands.getGlContext(),
glSurfaceView.getWidth(),
glSurfaceView.getHeight()));
}
}
});
Intent gallery =
new Intent(Intent.ACTION_PICK, MediaStore.Video.Media.INTERNAL_CONTENT_URI);
videoGetter.launch(gallery);
```
## Example Apps
Please first see general instructions for

View File

@ -176,6 +176,16 @@ A list of pose landmarks. Each landmark consists of the following:
* `visibility`: A value in `[0.0, 1.0]` indicating the likelihood of the
landmark being visible (present and not occluded) in the image.
#### pose_world_landmarks
Another list of pose landmarks in world coordinates. Each landmark consists of
the following:
* `x`, `y` and `z`: Real-world 3D coordinates in meters with the origin at the
center between hips.
* `visibility`: Identical to that defined in the corresponding
[pose_landmarks](#pose_landmarks).
#### face_landmarks
A list of 468 face landmarks. Each landmark consists of `x`, `y` and `z`. `x`
@ -201,7 +211,7 @@ A list of 21 hand landmarks on the right hand, in the same representation as
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -215,13 +225,15 @@ Supported configuration options:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_holistic = mp.solutions.holistic
# For static images:
IMAGE_FILES = []
with mp_holistic.Holistic(
static_image_mode=True,
model_complexity=2) as holistic:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
image_height, image_width, _ = image.shape
# Convert the BGR image to RGB before processing.
@ -236,14 +248,22 @@ with mp_holistic.Holistic(
# Draw pose, left and right hands, and face landmarks on the image.
annotated_image = image.copy()
mp_drawing.draw_landmarks(
annotated_image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS)
annotated_image,
results.face_landmarks,
mp_holistic.FACEMESH_TESSELATION,
landmark_drawing_spec=None,
connection_drawing_spec=mp_drawing_styles
.get_default_face_mesh_tesselation_style())
mp_drawing.draw_landmarks(
annotated_image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
mp_drawing.draw_landmarks(
annotated_image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
mp_drawing.draw_landmarks(
annotated_image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS)
annotated_image,
results.pose_landmarks,
mp_holistic.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles.
get_default_pose_landmarks_style())
cv2.imwrite('/tmp/annotated_image' + str(idx) + '.png', annotated_image)
# Plot pose world landmarks.
mp_drawing.plot_landmarks(
results.pose_world_landmarks, mp_holistic.POSE_CONNECTIONS)
# For webcam input:
cap = cv2.VideoCapture(0)
@ -269,13 +289,18 @@ with mp_holistic.Holistic(
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
mp_drawing.draw_landmarks(
image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS)
image,
results.face_landmarks,
mp_holistic.FACEMESH_CONTOURS,
landmark_drawing_spec=None,
connection_drawing_spec=mp_drawing_styles
.get_default_face_mesh_contours_style())
mp_drawing.draw_landmarks(
image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
mp_drawing.draw_landmarks(
image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
mp_drawing.draw_landmarks(
image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS)
image,
results.pose_landmarks,
mp_holistic.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles
.get_default_pose_landmarks_style())
cv2.imshow('MediaPipe Holistic', image)
if cv2.waitKey(5) & 0xFF == 27:
break

View File

@ -2,7 +2,7 @@
layout: default
title: Instant Motion Tracking
parent: Solutions
nav_order: 10
nav_order: 11
---
# MediaPipe Instant Motion Tracking

View File

@ -69,7 +69,7 @@ and renders using a dedicated
The
[face landmark subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_landmark/face_landmark_front_gpu.pbtxt)
internally uses a
[face detection subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_front_gpu.pbtxt)
[face detection subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_short_range_gpu.pbtxt)
from the
[face detection module](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection).
@ -193,7 +193,17 @@ on how to build MediaPipe examples.
### Web
Please refer to [these instructions](../index.md#mediapipe-on-the-web).
You can use the following links to load a demo in the MediaPipe Visualizer, and
over there click the "Runner" icon in the top bar like shown below. The demos
use your webcam video as input, which is processed all locally in real-time and
never leaves your device. Please see
[MediaPipe on the Web](https://developers.googleblog.com/2020/01/mediapipe-on-web.html)
in Google Developers Blog for details.
![visualizer_runner](../images/visualizer_runner.png)
* [MediaPipe Iris](https://viz.mediapipe.dev/demo/iris_tracking)
* [MediaPipe Iris: Depth-from-Iris](https://viz.mediapipe.dev/demo/iris_depth)
## Resources

View File

@ -2,7 +2,7 @@
layout: default
title: KNIFT (Template-based Feature Matching)
parent: Solutions
nav_order: 12
nav_order: 13
---
# MediaPipe KNIFT

View File

@ -2,7 +2,7 @@
layout: default
title: Dataset Preparation with MediaSequence
parent: Solutions
nav_order: 14
nav_order: 15
---
# Dataset Preparation with MediaSequence

View File

@ -14,12 +14,27 @@ nav_order: 30
### [Face Detection](https://google.github.io/mediapipe/solutions/face_detection)
* Face detection model for front-facing/selfie camera:
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_front.tflite),
[TFLite model quantized for EdgeTPU/Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/models/face-detector-quantized_edgetpu.tflite)
* Face detection model for back-facing camera:
[TFLite model ](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back.tflite)
* [Model card](https://mediapipe.page.link/blazeface-mc)
* Short-range model (best for faces within 2 meters from the camera):
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_short_range.tflite),
[TFLite model quantized for EdgeTPU/Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/models/face-detector-quantized_edgetpu.tflite),
[Model card](https://mediapipe.page.link/blazeface-mc)
* Full-range model (dense, best for faces within 5 meters from the camera):
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_full_range.tflite),
[Model card](https://mediapipe.page.link/blazeface-back-mc)
* Full-range model (sparse, best for faces within 5 meters from the camera):
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_full_range_sparse.tflite),
[Model card](https://mediapipe.page.link/blazeface-back-sparse-mc)
Full-range dense and sparse models have the same quality in terms of
[F-score](https://en.wikipedia.org/wiki/F-score) however differ in underlying
metrics. The dense model is slightly better in
[Recall](https://en.wikipedia.org/wiki/Precision_and_recall) whereas the sparse
model outperforms the dense one in
[Precision](https://en.wikipedia.org/wiki/Precision_and_recall). Speed-wise
sparse model is ~30% faster when executing on CPU via
[XNNPACK](https://github.com/google/XNNPACK) whereas on GPU the models
demonstrate comparable latencies. Depending on your application, you may prefer
one over the other.
### [Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh)
@ -60,6 +75,12 @@ nav_order: 30
* Hand recrop model:
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/holistic_landmark/hand_recrop.tflite)
### [Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation)
* [TFLite model (general)](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite)
* [TFLite model (landscape)](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite)
* [Model card](https://mediapipe.page.link/selfiesegmentation-mc)
### [Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation)
* [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/models/hair_segmentation.tflite)

View File

@ -2,7 +2,7 @@
layout: default
title: Object Detection
parent: Solutions
nav_order: 8
nav_order: 9
---
# MediaPipe Object Detection

View File

@ -2,7 +2,7 @@
layout: default
title: Objectron (3D Object Detection)
parent: Solutions
nav_order: 11
nav_order: 12
---
# MediaPipe Objectron
@ -224,29 +224,33 @@ where object detection simply runs on every image. Default to `0.99`.
#### model_name
Name of the model to use for predicting 3D bounding box landmarks. Currently supports
`{'Shoe', 'Chair', 'Cup', 'Camera'}`.
Name of the model to use for predicting 3D bounding box landmarks. Currently
supports `{'Shoe', 'Chair', 'Cup', 'Camera'}`. Default to `Shoe`.
#### focal_length
Camera focal length `(fx, fy)`, by default is defined in
[NDC space](#ndc-space). To use focal length `(fx_pixel, fy_pixel)` in
[pixel space](#pixel-space), users should provide `image_size` = `(image_width,
image_height)` to enable conversions inside the API. For further details about
NDC and pixel space, please see [Coordinate Systems](#coordinate-systems).
By default, camera focal length defined in [NDC space](#ndc-space), i.e., `(fx,
fy)`. Default to `(1.0, 1.0)`. To specify focal length in
[pixel space](#pixel-space) instead, i.e., `(fx_pixel, fy_pixel)`, users should
provide [`image_size`](#image_size) = `(image_width, image_height)` to enable
conversions inside the API. For further details about NDC and pixel space,
please see [Coordinate Systems](#coordinate-systems).
#### principal_point
Camera principal point `(px, py)`, by default is defined in
[NDC space](#ndc-space). To use principal point `(px_pixel, py_pixel)` in
[pixel space](#pixel-space), users should provide `image_size` = `(image_width,
image_height)` to enable conversions inside the API. For further details about
NDC and pixel space, please see [Coordinate Systems](#coordinate-systems).
By default, camera principal point defined in [NDC space](#ndc-space), i.e.,
`(px, py)`. Default to `(0.0, 0.0)`. To specify principal point in
[pixel space](#pixel-space), i.e.,`(px_pixel, py_pixel)`, users should provide
[`image_size`](#image_size) = `(image_width, image_height)` to enable
conversions inside the API. For further details about NDC and pixel space,
please see [Coordinate Systems](#coordinate-systems).
#### image_size
(**Optional**) size `(image_width, image_height)` of the input image, **ONLY**
needed when use `focal_length` and `principal_point` in pixel space.
**Specify only when [`focal_length`](#focal_length) and
[`principal_point`](#principal_point) are specified in pixel space.**
Size of the input image, i.e., `(image_width, image_height)`.
### Output
@ -277,7 +281,7 @@ following:
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -297,11 +301,12 @@ mp_drawing = mp.solutions.drawing_utils
mp_objectron = mp.solutions.objectron
# For static images:
IMAGE_FILES = []
with mp_objectron.Objectron(static_image_mode=True,
max_num_objects=5,
min_detection_confidence=0.5,
model_name='Shoe') as objectron:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
# Convert the BGR image to RGB and process it with MediaPipe Objectron.
results = objectron.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
@ -355,6 +360,89 @@ with mp_objectron.Objectron(static_image_mode=False,
cap.release()
```
## JavaScript Solution API
Please first see general [introduction](../getting_started/javascript.md) on
MediaPipe in JavaScript, then learn more in the companion [web demo](#resources)
and the following usage example.
Supported configuration options:
* [staticImageMode](#static_image_mode)
* [maxNumObjects](#max_num_objects)
* [minDetectionConfidence](#min_detection_confidence)
* [minTrackingConfidence](#min_tracking_confidence)
* [modelName](#model_name)
* [focalLength](#focal_length)
* [principalPoint](#principal_point)
* [imageSize](#image_size)
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/control_utils_3d.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/objectron/objectron.js" crossorigin="anonymous"></script>
</head>
<body>
<div class="container">
<video class="input_video"></video>
<canvas class="output_canvas" width="1280px" height="720px"></canvas>
</div>
</body>
</html>
```
```javascript
<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');
function onResults(results) {
canvasCtx.save();
canvasCtx.drawImage(
results.image, 0, 0, canvasElement.width, canvasElement.height);
if (!!results.objectDetections) {
for (const detectedObject of results.objectDetections) {
// Reformat keypoint information as landmarks, for easy drawing.
const landmarks: mpObjectron.Point2D[] =
detectedObject.keypoints.map(x => x.point2d);
// Draw bounding box.
drawingUtils.drawConnectors(canvasCtx, landmarks,
mpObjectron.BOX_CONNECTIONS, {color: '#FF0000'});
// Draw centroid.
drawingUtils.drawLandmarks(canvasCtx, [landmarks[0]], {color: '#FFFFFF'});
}
}
canvasCtx.restore();
}
const objectron = new Objectron({locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/objectron/${file}`;
}});
objectron.setOptions({
modelName: 'Chair',
maxNumObjects: 3,
});
objectron.onResults(onResults);
const camera = new Camera(videoElement, {
onFrame: async () => {
await objectron.send({image: videoElement});
},
width: 1280,
height: 720
});
camera.start();
</script>
```
## Example Apps
Please first see general instructions for
@ -441,7 +529,7 @@ Example app bounding boxes are rendered with [GlAnimationOverlayCalculator](http
> ```
> and then run
>
> ```build
> ```bash
> bazel run -c opt mediapipe/graphs/object_detection_3d/obj_parser:ObjParser -- input_dir=[INTERMEDIATE_OUTPUT_DIR] output_dir=[OUTPUT_DIR]
> ```
> INPUT_DIR should be the folder with initial asset .obj files to be processed,
@ -560,11 +648,15 @@ py = -py_pixel * 2.0 / image_height + 1.0
[Announcing the Objectron Dataset](https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html)
* Google AI Blog:
[Real-Time 3D Object Detection on Mobile Devices with MediaPipe](https://ai.googleblog.com/2020/03/real-time-3d-object-detection-on-mobile.html)
* Paper: [Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations](https://arxiv.org/abs/2012.09988), to appear in CVPR 2021
* Paper: [Objectron: A Large Scale Dataset of Object-Centric Videos in the
Wild with Pose Annotations](https://arxiv.org/abs/2012.09988), to appear in
CVPR 2021
* Paper: [MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak
Shape Supervision](https://arxiv.org/abs/2003.03522)
* Paper:
[Instant 3D Object Tracking with Applications in Augmented Reality](https://drive.google.com/open?id=1O_zHmlgXIzAdKljp20U_JUkEHOGG52R8)
([presentation](https://www.youtube.com/watch?v=9ndF1AIo7h0)), Fourth Workshop on Computer Vision for AR/VR, CVPR 2020
([presentation](https://www.youtube.com/watch?v=9ndF1AIo7h0)), Fourth
Workshop on Computer Vision for AR/VR, CVPR 2020
* [Models and model cards](./models.md#objectron)
* [Web demo](https://code.mediapipe.dev/codepen/objectron)
* [Python Colab](https://mediapipe.page.link/objectron_py_colab)

View File

@ -30,7 +30,8 @@ overlay of digital content and information on top of the physical world in
augmented reality.
MediaPipe Pose is a ML solution for high-fidelity body pose tracking, inferring
33 3D landmarks on the whole body from RGB video frames utilizing our
33 3D landmarks and background segmentation mask on the whole body from RGB
video frames utilizing our
[BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
research that also powers the
[ML Kit Pose Detection API](https://developers.google.com/ml-kit/vision/pose-detection).
@ -49,11 +50,11 @@ The solution utilizes a two-step detector-tracker ML pipeline, proven to be
effective in our [MediaPipe Hands](./hands.md) and
[MediaPipe Face Mesh](./face_mesh.md) solutions. Using a detector, the pipeline
first locates the person/pose region-of-interest (ROI) within the frame. The
tracker subsequently predicts the pose landmarks within the ROI using the
ROI-cropped frame as input. Note that for video use cases the detector is
invoked only as needed, i.e., for the very first frame and when the tracker
could no longer identify body pose presence in the previous frame. For other
frames the pipeline simply derives the ROI from the previous frames pose
tracker subsequently predicts the pose landmarks and segmentation mask within
the ROI using the ROI-cropped frame as input. Note that for video use cases the
detector is invoked only as needed, i.e., for the very first frame and when the
tracker could no longer identify body pose presence in the previous frame. For
other frames the pipeline simply derives the ROI from the previous frames pose
landmarks.
The pipeline is implemented as a MediaPipe
@ -87,11 +88,11 @@ from [COCO topology](https://cocodataset.org/#keypoints-2020).
Method | Yoga <br/> [`mAP`] | Yoga <br/> [`PCK@0.2`] | Dance <br/> [`mAP`] | Dance <br/> [`PCK@0.2`] | HIIT <br/> [`mAP`] | HIIT <br/> [`PCK@0.2`]
----------------------------------------------------------------------------------------------------- | -----------------: | ---------------------: | ------------------: | ----------------------: | -----------------: | ---------------------:
BlazePose.Heavy | 68.1 | **96.4** | 73.0 | **97.2** | 74.0 | **97.5**
BlazePose.Full | 62.6 | **95.5** | 67.4 | **96.3** | 68.0 | **95.7**
BlazePose.Lite | 45.0 | **90.2** | 53.6 | **92.5** | 53.8 | **93.5**
[AlphaPose.ResNet50](https://github.com/MVIG-SJTU/AlphaPose) | 63.4 | **96.0** | 57.8 | **95.5** | 63.4 | **96.0**
[Apple.Vision](https://developer.apple.com/documentation/vision/detecting_human_body_poses_in_images) | 32.8 | **82.7** | 36.4 | **91.4** | 44.5 | **88.6**
BlazePose GHUM Heavy | 68.1 | **96.4** | 73.0 | **97.2** | 74.0 | **97.5**
BlazePose GHUM Full | 62.6 | **95.5** | 67.4 | **96.3** | 68.0 | **95.7**
BlazePose GHUM Lite | 45.0 | **90.2** | 53.6 | **92.5** | 53.8 | **93.5**
[AlphaPose ResNet50](https://github.com/MVIG-SJTU/AlphaPose) | 63.4 | **96.0** | 57.8 | **95.5** | 63.4 | **96.0**
[Apple Vision](https://developer.apple.com/documentation/vision/detecting_human_body_poses_in_images) | 32.8 | **82.7** | 36.4 | **91.4** | 44.5 | **88.6**
![pose_tracking_pck_chart.png](../images/mobile/pose_tracking_pck_chart.png) |
:--------------------------------------------------------------------------: |
@ -101,10 +102,10 @@ We designed our models specifically for live perception use cases, so all of
them work in real-time on the majority of modern devices.
Method | Latency <br/> Pixel 3 [TFLite GPU](https://www.tensorflow.org/lite/performance/gpu_advanced) | Latency <br/> MacBook Pro (15-inch 2017)
--------------- | -------------------------------------------------------------------------------------------: | ---------------------------------------:
BlazePose.Heavy | 53 ms | 38 ms
BlazePose.Full | 25 ms | 27 ms
BlazePose.Lite | 20 ms | 25 ms
-------------------- | -------------------------------------------------------------------------------------------: | ---------------------------------------:
BlazePose GHUM Heavy | 53 ms | 38 ms
BlazePose GHUM Full | 25 ms | 27 ms
BlazePose GHUM Lite | 20 ms | 25 ms
## Models
@ -129,16 +130,19 @@ hip midpoints.
The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks
(see figure below).
Please find more detail in the
[BlazePose Google AI Blog](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html),
this [paper](https://arxiv.org/abs/2006.10204) and
[the model card](./models.md#pose), and the attributes in each landmark
[below](#pose_landmarks).
![pose_tracking_full_body_landmarks.png](../images/mobile/pose_tracking_full_body_landmarks.png) |
:----------------------------------------------------------------------------------------------: |
*Fig 4. 33 pose landmarks.* |
Optionally, MediaPipe Pose can predicts a full-body
[segmentation mask](#segmentation_mask) represented as a two-class segmentation
(human or background).
Please find more detail in the
[BlazePose Google AI Blog](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html),
this [paper](https://arxiv.org/abs/2006.10204),
[the model card](./models.md#pose) and the [Output](#output) section below.
## Solution APIs
### Cross-platform Configuration Options
@ -167,6 +171,18 @@ If set to `true`, the solution filters pose landmarks across different input
images to reduce jitter, but ignored if [static_image_mode](#static_image_mode)
is also set to `true`. Default to `true`.
#### enable_segmentation
If set to `true`, in addition to the pose landmarks the solution also generates
the segmentation mask. Default to `false`.
#### smooth_segmentation
If set to `true`, the solution filters segmentation masks across different input
images to reduce jitter. Ignored if [enable_segmentation](#enable_segmentation)
is `false` or [static_image_mode](#static_image_mode) is `true`. Default to
`true`.
#### min_detection_confidence
Minimum confidence value (`[0.0, 1.0]`) from the person-detection model for the
@ -187,28 +203,56 @@ Naming style may differ slightly across platforms/languages.
#### pose_landmarks
A list of pose landmarks. Each lanmark consists of the following:
A list of pose landmarks. Each landmark consists of the following:
* `x` and `y`: Landmark coordinates normalized to `[0.0, 1.0]` by the image
width and height respectively.
* `z`: Represents the landmark depth with the depth at the midpoint of hips
being the origin, and the smaller the value the closer the landmark is to
the camera. The magnitude of `z` uses roughly the same scale as `x`.
* `visibility`: A value in `[0.0, 1.0]` indicating the likelihood of the
landmark being visible (present and not occluded) in the image.
#### pose_world_landmarks
*Fig 5. Example of MediaPipe Pose real-world 3D coordinates.* |
:-----------------------------------------------------------: |
<video autoplay muted loop preload style="height: auto; width: 480px"><source src="../images/mobile/pose_world_landmarks.mp4" type="video/mp4"></video> |
Another list of pose landmarks in world coordinates. Each landmark consists of
the following:
* `x`, `y` and `z`: Real-world 3D coordinates in meters with the origin at the
center between hips.
* `visibility`: Identical to that defined in the corresponding
[pose_landmarks](#pose_landmarks).
#### segmentation_mask
The output segmentation mask, predicted only when
[enable_segmentation](#enable_segmentation) is set to `true`. The mask has the
same width and height as the input image, and contains values in `[0.0, 1.0]`
where `1.0` and `0.0` indicate high certainty of a "human" and "background"
pixel respectively. Please refer to the platform-specific usage examples below
for usage details.
*Fig 6. Example of MediaPipe Pose segmentation mask.* |
:---------------------------------------------------: |
<video autoplay muted loop preload style="height: auto; width: 480px"><source src="../images/mobile/pose_segmentation.mp4" type="video/mp4"></video> |
### Python Solution API
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
* [static_image_mode](#static_image_mode)
* [model_complexity](#model_complexity)
* [smooth_landmarks](#smooth_landmarks)
* [enable_segmentation](#enable_segmentation)
* [smooth_segmentation](#smooth_segmentation)
* [min_detection_confidence](#min_detection_confidence)
* [min_tracking_confidence](#min_tracking_confidence)
@ -216,14 +260,18 @@ Supported configuration options:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_pose = mp.solutions.pose
# For static images:
IMAGE_FILES = []
BG_COLOR = (192, 192, 192) # gray
with mp_pose.Pose(
static_image_mode=True,
model_complexity=2,
enable_segmentation=True,
min_detection_confidence=0.5) as pose:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
image_height, image_width, _ = image.shape
# Convert the BGR image to RGB before processing.
@ -233,14 +281,28 @@ with mp_pose.Pose(
continue
print(
f'Nose coordinates: ('
f'{results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].x * image_width}, '
f'{results.pose_landmarks.landmark[mp_holistic.PoseLandmark.NOSE].y * image_height})'
f'{results.pose_landmarks.landmark[mp_pose.PoseLandmark.NOSE].x * image_width}, '
f'{results.pose_landmarks.landmark[mp_pose.PoseLandmark.NOSE].y * image_height})'
)
# Draw pose landmarks on the image.
annotated_image = image.copy()
# Draw segmentation on the image.
# To improve segmentation around boundaries, consider applying a joint
# bilateral filter to "results.segmentation_mask" with "image".
condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
bg_image = np.zeros(image.shape, dtype=np.uint8)
bg_image[:] = BG_COLOR
annotated_image = np.where(condition, annotated_image, bg_image)
# Draw pose landmarks on the image.
mp_drawing.draw_landmarks(
annotated_image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
annotated_image,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style())
cv2.imwrite('/tmp/annotated_image' + str(idx) + '.png', annotated_image)
# Plot pose world landmarks.
mp_drawing.plot_landmarks(
results.pose_world_landmarks, mp_pose.POSE_CONNECTIONS)
# For webcam input:
cap = cv2.VideoCapture(0)
@ -266,7 +328,10 @@ with mp_pose.Pose(
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
mp_drawing.draw_landmarks(
image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
image,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style())
cv2.imshow('MediaPipe Pose', image)
if cv2.waitKey(5) & 0xFF == 27:
break
@ -283,6 +348,8 @@ Supported configuration options:
* [modelComplexity](#model_complexity)
* [smoothLandmarks](#smooth_landmarks)
* [enableSegmentation](#enable_segmentation)
* [smoothSegmentation](#smooth_segmentation)
* [minDetectionConfidence](#min_detection_confidence)
* [minTrackingConfidence](#min_tracking_confidence)
@ -293,6 +360,7 @@ Supported configuration options:
<meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/control_utils_3d.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/pose/pose.js" crossorigin="anonymous"></script>
</head>
@ -301,6 +369,7 @@ Supported configuration options:
<div class="container">
<video class="input_video"></video>
<canvas class="output_canvas" width="1280px" height="720px"></canvas>
<div class="landmark-grid-container"></div>
</div>
</body>
</html>
@ -311,17 +380,38 @@ Supported configuration options:
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');
const landmarkContainer = document.getElementsByClassName('landmark-grid-container')[0];
const grid = new LandmarkGrid(landmarkContainer);
function onResults(results) {
if (!results.poseLandmarks) {
grid.updateLandmarks([]);
return;
}
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasCtx.drawImage(results.segmentationMask, 0, 0,
canvasElement.width, canvasElement.height);
// Only overwrite existing pixels.
canvasCtx.globalCompositeOperation = 'source-in';
canvasCtx.fillStyle = '#00FF00';
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
// Only overwrite missing pixels.
canvasCtx.globalCompositeOperation = 'destination-atop';
canvasCtx.drawImage(
results.image, 0, 0, canvasElement.width, canvasElement.height);
canvasCtx.globalCompositeOperation = 'source-over';
drawConnectors(canvasCtx, results.poseLandmarks, POSE_CONNECTIONS,
{color: '#00FF00', lineWidth: 4});
drawLandmarks(canvasCtx, results.poseLandmarks,
{color: '#FF0000', lineWidth: 2});
canvasCtx.restore();
grid.updateLandmarks(results.poseWorldLandmarks);
}
const pose = new Pose({locateFile: (file) => {
@ -330,6 +420,8 @@ const pose = new Pose({locateFile: (file) => {
pose.setOptions({
modelComplexity: 1,
smoothLandmarks: true,
enableSegmentation: true,
smoothSegmentation: true,
minDetectionConfidence: 0.5,
minTrackingConfidence: 0.5
});

View File

@ -0,0 +1,290 @@
---
layout: default
title: Selfie Segmentation
parent: Solutions
nav_order: 7
---
# MediaPipe Selfie Segmentation
{: .no_toc }
<details close markdown="block">
<summary>
Table of contents
</summary>
{: .text-delta }
1. TOC
{:toc}
</details>
---
## Overview
*Fig 1. Example of MediaPipe Selfie Segmentation.* |
:------------------------------------------------: |
<video autoplay muted loop preload style="height: auto; width: 480px"><source src="../images/selfie_segmentation_web.mp4" type="video/mp4"></video> |
MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can
run in real-time on both smartphones and laptops. The intended use cases include
selfie effects and video conferencing, where the person is close (< 2m) to the
camera.
## Models
In this solution, we provide two models: general and landscape. Both models are
based on
[MobileNetV3](https://ai.googleblog.com/2019/11/introducing-next-generation-on-device.html),
with modifications to make them more efficient. The general model operates on a
256x256x3 (HWC) tensor, and outputs a 256x256x1 tensor representing the
segmentation mask. The landscape model is similar to the general model, but
operates on a 144x256x3 (HWC) tensor. It has fewer FLOPs than the general model,
and therefore, runs faster. Note that MediaPipe Selfie Segmentation
automatically resizes the input image to the desired tensor dimension before
feeding it into the ML models.
The general model is also powering
[ML Kit](https://developers.google.com/ml-kit/vision/selfie-segmentation), and a
variant of the landscape model is powering
[Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html).
Please find more detail about the models in the
[model card](./models.md#selfie-segmentation).
## ML Pipeline
The pipeline is implemented as a MediaPipe
[graph](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
that uses a
[selfie segmentation subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
from the
[selfie segmentation module](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation).
Note: To visualize a graph, copy the graph and paste it into
[MediaPipe Visualizer](https://viz.mediapipe.dev/). For more information on how
to visualize its associated subgraphs, please see
[visualizer documentation](../tools/visualizer.md).
## Solution APIs
### Cross-platform Configuration Options
Naming style and availability may differ slightly across platforms/languages.
#### model_selection
An integer index `0` or `1`. Use `0` to select the general model, and `1` to
select the landscape model (see details in [Models](#models)). Default to `0` if
not specified.
### Output
Naming style may differ slightly across platforms/languages.
#### segmentation_mask
The output segmentation mask, which has the same dimension as the input image.
### Python Solution API
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the usage example below.
Supported configuration options:
* [model_selection](#model_selection)
```python
import cv2
import mediapipe as mp
import numpy as np
mp_drawing = mp.solutions.drawing_utils
mp_selfie_segmentation = mp.solutions.selfie_segmentation
# For static images:
IMAGE_FILES = []
BG_COLOR = (192, 192, 192) # gray
MASK_COLOR = (255, 255, 255) # white
with mp_selfie_segmentation.SelfieSegmentation(
model_selection=0) as selfie_segmentation:
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
image_height, image_width, _ = image.shape
# Convert the BGR image to RGB before processing.
results = selfie_segmentation.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# Draw selfie segmentation on the background image.
# To improve segmentation around boundaries, consider applying a joint
# bilateral filter to "results.segmentation_mask" with "image".
condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
# Generate solid color images for showing the output selfie segmentation mask.
fg_image = np.zeros(image.shape, dtype=np.uint8)
fg_image[:] = MASK_COLOR
bg_image = np.zeros(image.shape, dtype=np.uint8)
bg_image[:] = BG_COLOR
output_image = np.where(condition, fg_image, bg_image)
cv2.imwrite('/tmp/selfie_segmentation_output' + str(idx) + '.png', output_image)
# For webcam input:
BG_COLOR = (192, 192, 192) # gray
cap = cv2.VideoCapture(0)
with mp_selfie_segmentation.SelfieSegmentation(
model_selection=1) as selfie_segmentation:
bg_image = None
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
# If loading a video, use 'break' instead of 'continue'.
continue
# Flip the image horizontally for a later selfie-view display, and convert
# the BGR image to RGB.
image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
# To improve performance, optionally mark the image as not writeable to
# pass by reference.
image.flags.writeable = False
results = selfie_segmentation.process(image)
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Draw selfie segmentation on the background image.
# To improve segmentation around boundaries, consider applying a joint
# bilateral filter to "results.segmentation_mask" with "image".
condition = np.stack(
(results.segmentation_mask,) * 3, axis=-1) > 0.1
# The background can be customized.
# a) Load an image (with the same width and height of the input image) to
# be the background, e.g., bg_image = cv2.imread('/path/to/image/file')
# b) Blur the input image by applying image filtering, e.g.,
# bg_image = cv2.GaussianBlur(image,(55,55),0)
if bg_image is None:
bg_image = np.zeros(image.shape, dtype=np.uint8)
bg_image[:] = BG_COLOR
output_image = np.where(condition, image, bg_image)
cv2.imshow('MediaPipe Selfie Segmentation', output_image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
```
### JavaScript Solution API
Please first see general [introduction](../getting_started/javascript.md) on
MediaPipe in JavaScript, then learn more in the companion [web demo](#resources)
and the following usage example.
Supported configuration options:
* [modelSelection](#model_selection)
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js" crossorigin="anonymous"></script>
</head>
<body>
<div class="container">
<video class="input_video"></video>
<canvas class="output_canvas" width="1280px" height="720px"></canvas>
</div>
</body>
</html>
```
```javascript
<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');
function onResults(results) {
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasCtx.drawImage(results.segmentationMask, 0, 0,
canvasElement.width, canvasElement.height);
// Only overwrite existing pixels.
canvasCtx.globalCompositeOperation = 'source-in';
canvasCtx.fillStyle = '#00FF00';
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
// Only overwrite missing pixels.
canvasCtx.globalCompositeOperation = 'destination-atop';
canvasCtx.drawImage(
results.image, 0, 0, canvasElement.width, canvasElement.height);
canvasCtx.restore();
}
const selfieSegmentation = new SelfieSegmentation({locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`;
}});
selfieSegmentation.setOptions({
modelSelection: 1,
});
selfieSegmentation.onResults(onResults);
const camera = new Camera(videoElement, {
onFrame: async () => {
await selfieSegmentation.send({image: videoElement});
},
width: 1280,
height: 720
});
camera.start();
</script>
```
## Example Apps
Please first see general instructions for
[Android](../getting_started/android.md), [iOS](../getting_started/ios.md), and
[desktop](../getting_started/cpp.md) on how to build MediaPipe examples.
Note: To visualize a graph, copy the graph and paste it into
[MediaPipe Visualizer](https://viz.mediapipe.dev/). For more information on how
to visualize its associated subgraphs, please see
[visualizer documentation](../tools/visualizer.md).
### Mobile
* Graph:
[`mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
* Android target:
[(or download prebuilt ARM64 APK)](https://drive.google.com/file/d/1DoeyGzMmWUsjfVgZfGGecrn7GKzYcEAo/view?usp=sharing)
[`mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu:selfiesegmentationgpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu/BUILD)
* iOS target:
[`mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/ios/selfiesegmentationgpu/BUILD)
### Desktop
Please first see general instructions for [desktop](../getting_started/cpp.md)
on how to build MediaPipe examples.
* Running on CPU
* Graph:
[`mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt)
* Target:
[`mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_cpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/selfie_segmentation/BUILD)
* Running on GPU
* Graph:
[`mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
* Target:
[`mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/selfie_segmentation/BUILD)
## Resources
* Google AI Blog:
[Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html)
* [ML Kit Selfie Segmentation API](https://developers.google.com/ml-kit/vision/selfie-segmentation)
* [Models and model cards](./models.md#selfie-segmentation)
* [Web demo](https://code.mediapipe.dev/codepen/selfie_segmentation)
* [Python Colab](https://mediapipe.page.link/selfie_segmentation_py_colab)

View File

@ -13,6 +13,9 @@ has_toc: false
{:toc}
---
MediaPipe offers open source cross-platform, customizable ML solutions for live
and streaming media.
<!-- []() in the first cell is needed to preserve table formatting in GitHub Pages. -->
<!-- Whenever this table is updated, paste a copy to ../external_index.md. -->
@ -24,11 +27,12 @@ has_toc: false
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |
[Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | |
[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | |
[KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | |
[AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | |
[MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | |

View File

@ -2,7 +2,7 @@
layout: default
title: YouTube-8M Feature Extraction and Model Inference
parent: Solutions
nav_order: 15
nav_order: 16
---
# YouTube-8M Feature Extraction and Model Inference

View File

@ -16,6 +16,7 @@
"mediapipe/examples/ios/objectdetectiongpu/BUILD",
"mediapipe/examples/ios/objectdetectiontrackinggpu/BUILD",
"mediapipe/examples/ios/posetrackinggpu/BUILD",
"mediapipe/examples/ios/selfiesegmentationgpu/BUILD",
"mediapipe/framework/BUILD",
"mediapipe/gpu/BUILD",
"mediapipe/objc/BUILD",
@ -35,6 +36,7 @@
"//mediapipe/examples/ios/objectdetectiongpu:ObjectDetectionGpuApp",
"//mediapipe/examples/ios/objectdetectiontrackinggpu:ObjectDetectionTrackingGpuApp",
"//mediapipe/examples/ios/posetrackinggpu:PoseTrackingGpuApp",
"//mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp",
"//mediapipe/objc:mediapipe_framework_ios"
],
"optionSet" : {
@ -103,6 +105,7 @@
"mediapipe/examples/ios/objectdetectioncpu",
"mediapipe/examples/ios/objectdetectiongpu",
"mediapipe/examples/ios/posetrackinggpu",
"mediapipe/examples/ios/selfiesegmentationgpu",
"mediapipe/framework",
"mediapipe/framework/deps",
"mediapipe/framework/formats",
@ -120,6 +123,7 @@
"mediapipe/graphs/hand_tracking",
"mediapipe/graphs/object_detection",
"mediapipe/graphs/pose_tracking",
"mediapipe/graphs/selfie_segmentation",
"mediapipe/models",
"mediapipe/modules",
"mediapipe/objc",

View File

@ -22,6 +22,7 @@
"mediapipe/examples/ios/objectdetectiongpu",
"mediapipe/examples/ios/objectdetectiontrackinggpu",
"mediapipe/examples/ios/posetrackinggpu",
"mediapipe/examples/ios/selfiesegmentationgpu",
"mediapipe/objc"
],
"projectName" : "Mediapipe",

View File

@ -140,6 +140,16 @@ mediapipe_proto_library(
],
)
mediapipe_proto_library(
name = "graph_profile_calculator_proto",
srcs = ["graph_profile_calculator.proto"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
],
)
cc_library(
name = "add_header_calculator",
srcs = ["add_header_calculator.cc"],
@ -419,6 +429,23 @@ cc_library(
alwayslink = 1,
)
cc_test(
name = "make_pair_calculator_test",
size = "small",
srcs = ["make_pair_calculator_test.cc"],
deps = [
":make_pair_calculator",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_runner",
"//mediapipe/framework:timestamp",
"//mediapipe/framework/port:gtest_main",
"//mediapipe/framework/port:status",
"//mediapipe/framework/tool:validate_type",
"//mediapipe/util:packet_test_util",
"//mediapipe/util:time_series_test_util",
],
)
cc_library(
name = "matrix_multiply_calculator",
srcs = ["matrix_multiply_calculator.cc"],
@ -933,8 +960,8 @@ cc_test(
)
cc_library(
name = "split_normalized_landmark_list_calculator",
srcs = ["split_normalized_landmark_list_calculator.cc"],
name = "split_landmarks_calculator",
srcs = ["split_landmarks_calculator.cc"],
visibility = ["//visibility:public"],
deps = [
":split_vector_calculator_cc_proto",
@ -948,10 +975,10 @@ cc_library(
)
cc_test(
name = "split_normalized_landmark_list_calculator_test",
srcs = ["split_normalized_landmark_list_calculator_test.cc"],
name = "split_landmarks_calculator_test",
srcs = ["split_landmarks_calculator_test.cc"],
deps = [
":split_normalized_landmark_list_calculator",
":split_landmarks_calculator",
":split_vector_calculator_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_runner",
@ -1183,3 +1210,45 @@ cc_test(
"@com_google_absl//absl/strings",
],
)
cc_library(
name = "graph_profile_calculator",
srcs = ["graph_profile_calculator.cc"],
visibility = ["//visibility:public"],
deps = [
":graph_profile_calculator_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_profile_cc_proto",
"//mediapipe/framework/api2:node",
"//mediapipe/framework/api2:packet",
"//mediapipe/framework/api2:port",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
],
alwayslink = 1,
)
cc_test(
name = "graph_profile_calculator_test",
srcs = ["graph_profile_calculator_test.cc"],
deps = [
":graph_profile_calculator",
"//mediapipe/framework:calculator_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_profile_cc_proto",
"//mediapipe/framework:test_calculators",
"//mediapipe/framework/deps:clock",
"//mediapipe/framework/deps:message_matchers",
"//mediapipe/framework/port:core_proto",
"//mediapipe/framework/port:gtest_main",
"//mediapipe/framework/port:integral_types",
"//mediapipe/framework/port:logging",
"//mediapipe/framework/port:parse_text_proto",
"//mediapipe/framework/port:threadpool",
"//mediapipe/framework/tool:simulation_clock_executor",
"//mediapipe/framework/tool:sink",
"@com_google_absl//absl/status",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/time",
],
)

View File

@ -24,6 +24,9 @@
namespace mediapipe {
constexpr char kDataTag[] = "DATA";
constexpr char kHeaderTag[] = "HEADER";
class AddHeaderCalculatorTest : public ::testing::Test {};
TEST_F(AddHeaderCalculatorTest, HeaderStream) {
@ -36,11 +39,11 @@ TEST_F(AddHeaderCalculatorTest, HeaderStream) {
CalculatorRunner runner(node);
// Set header and add 5 packets.
runner.MutableInputs()->Tag("HEADER").header =
runner.MutableInputs()->Tag(kHeaderTag).header =
Adopt(new std::string("my_header"));
for (int i = 0; i < 5; ++i) {
Packet packet = Adopt(new int(i)).At(Timestamp(i * 1000));
runner.MutableInputs()->Tag("DATA").packets.push_back(packet);
runner.MutableInputs()->Tag(kDataTag).packets.push_back(packet);
}
// Run calculator.
@ -85,13 +88,14 @@ TEST_F(AddHeaderCalculatorTest, NoPacketsOnHeaderStream) {
CalculatorRunner runner(node);
// Set header and add 5 packets.
runner.MutableInputs()->Tag("HEADER").header =
runner.MutableInputs()->Tag(kHeaderTag).header =
Adopt(new std::string("my_header"));
runner.MutableInputs()->Tag("HEADER").packets.push_back(
Adopt(new std::string("not allowed")));
runner.MutableInputs()
->Tag(kHeaderTag)
.packets.push_back(Adopt(new std::string("not allowed")));
for (int i = 0; i < 5; ++i) {
Packet packet = Adopt(new int(i)).At(Timestamp(i * 1000));
runner.MutableInputs()->Tag("DATA").packets.push_back(packet);
runner.MutableInputs()->Tag(kDataTag).packets.push_back(packet);
}
// Run calculator.
@ -108,11 +112,11 @@ TEST_F(AddHeaderCalculatorTest, InputSidePacket) {
CalculatorRunner runner(node);
// Set header and add 5 packets.
runner.MutableSidePackets()->Tag("HEADER") =
runner.MutableSidePackets()->Tag(kHeaderTag) =
Adopt(new std::string("my_header"));
for (int i = 0; i < 5; ++i) {
Packet packet = Adopt(new int(i)).At(Timestamp(i * 1000));
runner.MutableInputs()->Tag("DATA").packets.push_back(packet);
runner.MutableInputs()->Tag(kDataTag).packets.push_back(packet);
}
// Run calculator.
@ -143,13 +147,13 @@ TEST_F(AddHeaderCalculatorTest, UsingBothSideInputAndStream) {
CalculatorRunner runner(node);
// Set both headers and add 5 packets.
runner.MutableSidePackets()->Tag("HEADER") =
runner.MutableSidePackets()->Tag(kHeaderTag) =
Adopt(new std::string("my_header"));
runner.MutableSidePackets()->Tag("HEADER") =
runner.MutableSidePackets()->Tag(kHeaderTag) =
Adopt(new std::string("my_header"));
for (int i = 0; i < 5; ++i) {
Packet packet = Adopt(new int(i)).At(Timestamp(i * 1000));
runner.MutableInputs()->Tag("DATA").packets.push_back(packet);
runner.MutableInputs()->Tag(kDataTag).packets.push_back(packet);
}
// Run should fail because header can only be provided one way.

View File

@ -42,4 +42,9 @@ REGISTER_CALCULATOR(BeginLoopDetectionCalculator);
typedef BeginLoopCalculator<std::vector<Matrix>> BeginLoopMatrixCalculator;
REGISTER_CALCULATOR(BeginLoopMatrixCalculator);
// A calculator to process std::vector<std::vector<Matrix>>.
typedef BeginLoopCalculator<std::vector<std::vector<Matrix>>>
BeginLoopMatrixVectorCalculator;
REGISTER_CALCULATOR(BeginLoopMatrixVectorCalculator);
} // namespace mediapipe

View File

@ -19,6 +19,13 @@
namespace mediapipe {
constexpr char kIncrementTag[] = "INCREMENT";
constexpr char kInitialValueTag[] = "INITIAL_VALUE";
constexpr char kBatchSizeTag[] = "BATCH_SIZE";
constexpr char kErrorCountTag[] = "ERROR_COUNT";
constexpr char kMaxCountTag[] = "MAX_COUNT";
constexpr char kErrorOnOpenTag[] = "ERROR_ON_OPEN";
// Source calculator that produces MAX_COUNT*BATCH_SIZE int packets of
// sequential numbers from INITIAL_VALUE (default 0) with a common
// difference of INCREMENT (default 1) between successive numbers (with
@ -33,53 +40,53 @@ class CountingSourceCalculator : public CalculatorBase {
static absl::Status GetContract(CalculatorContract* cc) {
cc->Outputs().Index(0).Set<int>();
if (cc->InputSidePackets().HasTag("ERROR_ON_OPEN")) {
cc->InputSidePackets().Tag("ERROR_ON_OPEN").Set<bool>();
if (cc->InputSidePackets().HasTag(kErrorOnOpenTag)) {
cc->InputSidePackets().Tag(kErrorOnOpenTag).Set<bool>();
}
RET_CHECK(cc->InputSidePackets().HasTag("MAX_COUNT") ||
cc->InputSidePackets().HasTag("ERROR_COUNT"));
if (cc->InputSidePackets().HasTag("MAX_COUNT")) {
cc->InputSidePackets().Tag("MAX_COUNT").Set<int>();
RET_CHECK(cc->InputSidePackets().HasTag(kMaxCountTag) ||
cc->InputSidePackets().HasTag(kErrorCountTag));
if (cc->InputSidePackets().HasTag(kMaxCountTag)) {
cc->InputSidePackets().Tag(kMaxCountTag).Set<int>();
}
if (cc->InputSidePackets().HasTag("ERROR_COUNT")) {
cc->InputSidePackets().Tag("ERROR_COUNT").Set<int>();
if (cc->InputSidePackets().HasTag(kErrorCountTag)) {
cc->InputSidePackets().Tag(kErrorCountTag).Set<int>();
}
if (cc->InputSidePackets().HasTag("BATCH_SIZE")) {
cc->InputSidePackets().Tag("BATCH_SIZE").Set<int>();
if (cc->InputSidePackets().HasTag(kBatchSizeTag)) {
cc->InputSidePackets().Tag(kBatchSizeTag).Set<int>();
}
if (cc->InputSidePackets().HasTag("INITIAL_VALUE")) {
cc->InputSidePackets().Tag("INITIAL_VALUE").Set<int>();
if (cc->InputSidePackets().HasTag(kInitialValueTag)) {
cc->InputSidePackets().Tag(kInitialValueTag).Set<int>();
}
if (cc->InputSidePackets().HasTag("INCREMENT")) {
cc->InputSidePackets().Tag("INCREMENT").Set<int>();
if (cc->InputSidePackets().HasTag(kIncrementTag)) {
cc->InputSidePackets().Tag(kIncrementTag).Set<int>();
}
return absl::OkStatus();
}
absl::Status Open(CalculatorContext* cc) override {
if (cc->InputSidePackets().HasTag("ERROR_ON_OPEN") &&
cc->InputSidePackets().Tag("ERROR_ON_OPEN").Get<bool>()) {
if (cc->InputSidePackets().HasTag(kErrorOnOpenTag) &&
cc->InputSidePackets().Tag(kErrorOnOpenTag).Get<bool>()) {
return absl::NotFoundError("expected error");
}
if (cc->InputSidePackets().HasTag("ERROR_COUNT")) {
error_count_ = cc->InputSidePackets().Tag("ERROR_COUNT").Get<int>();
if (cc->InputSidePackets().HasTag(kErrorCountTag)) {
error_count_ = cc->InputSidePackets().Tag(kErrorCountTag).Get<int>();
RET_CHECK_LE(0, error_count_);
}
if (cc->InputSidePackets().HasTag("MAX_COUNT")) {
max_count_ = cc->InputSidePackets().Tag("MAX_COUNT").Get<int>();
if (cc->InputSidePackets().HasTag(kMaxCountTag)) {
max_count_ = cc->InputSidePackets().Tag(kMaxCountTag).Get<int>();
RET_CHECK_LE(0, max_count_);
}
if (cc->InputSidePackets().HasTag("BATCH_SIZE")) {
batch_size_ = cc->InputSidePackets().Tag("BATCH_SIZE").Get<int>();
if (cc->InputSidePackets().HasTag(kBatchSizeTag)) {
batch_size_ = cc->InputSidePackets().Tag(kBatchSizeTag).Get<int>();
RET_CHECK_LT(0, batch_size_);
}
if (cc->InputSidePackets().HasTag("INITIAL_VALUE")) {
counter_ = cc->InputSidePackets().Tag("INITIAL_VALUE").Get<int>();
if (cc->InputSidePackets().HasTag(kInitialValueTag)) {
counter_ = cc->InputSidePackets().Tag(kInitialValueTag).Get<int>();
}
if (cc->InputSidePackets().HasTag("INCREMENT")) {
increment_ = cc->InputSidePackets().Tag("INCREMENT").Get<int>();
if (cc->InputSidePackets().HasTag(kIncrementTag)) {
increment_ = cc->InputSidePackets().Tag(kIncrementTag).Get<int>();
RET_CHECK_LT(0, increment_);
}
RET_CHECK(error_count_ >= 0 || max_count_ >= 0);

View File

@ -35,11 +35,14 @@
// }
namespace mediapipe {
constexpr char kFloatVectorTag[] = "FLOAT_VECTOR";
constexpr char kEncodedTag[] = "ENCODED";
class DequantizeByteArrayCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("ENCODED").Set<std::string>();
cc->Outputs().Tag("FLOAT_VECTOR").Set<std::vector<float>>();
cc->Inputs().Tag(kEncodedTag).Set<std::string>();
cc->Outputs().Tag(kFloatVectorTag).Set<std::vector<float>>();
return absl::OkStatus();
}
@ -66,7 +69,7 @@ class DequantizeByteArrayCalculator : public CalculatorBase {
absl::Status Process(CalculatorContext* cc) final {
const std::string& encoded =
cc->Inputs().Tag("ENCODED").Value().Get<std::string>();
cc->Inputs().Tag(kEncodedTag).Value().Get<std::string>();
std::vector<float> float_vector;
float_vector.reserve(encoded.length());
for (int i = 0; i < encoded.length(); ++i) {
@ -74,7 +77,7 @@ class DequantizeByteArrayCalculator : public CalculatorBase {
static_cast<unsigned char>(encoded.at(i)) * scalar_ + bias_);
}
cc->Outputs()
.Tag("FLOAT_VECTOR")
.Tag(kFloatVectorTag)
.AddPacket(MakePacket<std::vector<float>>(float_vector)
.At(cc->InputTimestamp()));
return absl::OkStatus();

View File

@ -25,6 +25,9 @@
namespace mediapipe {
constexpr char kFloatVectorTag[] = "FLOAT_VECTOR";
constexpr char kEncodedTag[] = "ENCODED";
TEST(QuantizeFloatVectorCalculatorTest, WrongConfig) {
CalculatorGraphConfig::Node node_config =
ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
@ -39,7 +42,9 @@ TEST(QuantizeFloatVectorCalculatorTest, WrongConfig) {
)pb");
CalculatorRunner runner(node_config);
std::string empty_string;
runner.MutableInputs()->Tag("ENCODED").packets.push_back(
runner.MutableInputs()
->Tag(kEncodedTag)
.packets.push_back(
MakePacket<std::string>(empty_string).At(Timestamp(0)));
auto status = runner.Run();
EXPECT_FALSE(status.ok());
@ -64,7 +69,9 @@ TEST(QuantizeFloatVectorCalculatorTest, WrongConfig2) {
)pb");
CalculatorRunner runner(node_config);
std::string empty_string;
runner.MutableInputs()->Tag("ENCODED").packets.push_back(
runner.MutableInputs()
->Tag(kEncodedTag)
.packets.push_back(
MakePacket<std::string>(empty_string).At(Timestamp(0)));
auto status = runner.Run();
EXPECT_FALSE(status.ok());
@ -89,7 +96,9 @@ TEST(QuantizeFloatVectorCalculatorTest, WrongConfig3) {
)pb");
CalculatorRunner runner(node_config);
std::string empty_string;
runner.MutableInputs()->Tag("ENCODED").packets.push_back(
runner.MutableInputs()
->Tag(kEncodedTag)
.packets.push_back(
MakePacket<std::string>(empty_string).At(Timestamp(0)));
auto status = runner.Run();
EXPECT_FALSE(status.ok());
@ -114,14 +123,16 @@ TEST(DequantizeByteArrayCalculatorTest, TestDequantization) {
)pb");
CalculatorRunner runner(node_config);
unsigned char input[4] = {0x7F, 0xFF, 0x00, 0x01};
runner.MutableInputs()->Tag("ENCODED").packets.push_back(
runner.MutableInputs()
->Tag(kEncodedTag)
.packets.push_back(
MakePacket<std::string>(
std::string(reinterpret_cast<char const*>(input), 4))
.At(Timestamp(0)));
auto status = runner.Run();
MP_ASSERT_OK(runner.Run());
const std::vector<Packet>& outputs =
runner.Outputs().Tag("FLOAT_VECTOR").packets;
runner.Outputs().Tag(kFloatVectorTag).packets;
EXPECT_EQ(1, outputs.size());
const std::vector<float>& result = outputs[0].Get<std::vector<float>>();
ASSERT_FALSE(result.empty());

View File

@ -28,6 +28,10 @@ typedef EndLoopCalculator<std::vector<::mediapipe::NormalizedRect>>
EndLoopNormalizedRectCalculator;
REGISTER_CALCULATOR(EndLoopNormalizedRectCalculator);
typedef EndLoopCalculator<std::vector<::mediapipe::LandmarkList>>
EndLoopLandmarkListVectorCalculator;
REGISTER_CALCULATOR(EndLoopLandmarkListVectorCalculator);
typedef EndLoopCalculator<std::vector<::mediapipe::NormalizedLandmarkList>>
EndLoopNormalizedLandmarkListVectorCalculator;
REGISTER_CALCULATOR(EndLoopNormalizedLandmarkListVectorCalculator);

View File

@ -24,6 +24,11 @@
namespace mediapipe {
constexpr char kFinishedTag[] = "FINISHED";
constexpr char kAllowTag[] = "ALLOW";
constexpr char kMaxInFlightTag[] = "MAX_IN_FLIGHT";
constexpr char kOptionsTag[] = "OPTIONS";
// FlowLimiterCalculator is used to limit the number of frames in flight
// by dropping input frames when necessary.
//
@ -69,16 +74,19 @@ class FlowLimiterCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
auto& side_inputs = cc->InputSidePackets();
side_inputs.Tag("OPTIONS").Set<FlowLimiterCalculatorOptions>().Optional();
cc->Inputs().Tag("OPTIONS").Set<FlowLimiterCalculatorOptions>().Optional();
side_inputs.Tag(kOptionsTag).Set<FlowLimiterCalculatorOptions>().Optional();
cc->Inputs()
.Tag(kOptionsTag)
.Set<FlowLimiterCalculatorOptions>()
.Optional();
RET_CHECK_GE(cc->Inputs().NumEntries(""), 1);
for (int i = 0; i < cc->Inputs().NumEntries(""); ++i) {
cc->Inputs().Get("", i).SetAny();
cc->Outputs().Get("", i).SetSameAs(&(cc->Inputs().Get("", i)));
}
cc->Inputs().Get("FINISHED", 0).SetAny();
cc->InputSidePackets().Tag("MAX_IN_FLIGHT").Set<int>().Optional();
cc->Outputs().Tag("ALLOW").Set<bool>().Optional();
cc->InputSidePackets().Tag(kMaxInFlightTag).Set<int>().Optional();
cc->Outputs().Tag(kAllowTag).Set<bool>().Optional();
cc->SetInputStreamHandler("ImmediateInputStreamHandler");
cc->SetProcessTimestampBounds(true);
return absl::OkStatus();
@ -87,9 +95,9 @@ class FlowLimiterCalculator : public CalculatorBase {
absl::Status Open(CalculatorContext* cc) final {
options_ = cc->Options<FlowLimiterCalculatorOptions>();
options_ = tool::RetrieveOptions(options_, cc->InputSidePackets());
if (cc->InputSidePackets().HasTag("MAX_IN_FLIGHT")) {
if (cc->InputSidePackets().HasTag(kMaxInFlightTag)) {
options_.set_max_in_flight(
cc->InputSidePackets().Tag("MAX_IN_FLIGHT").Get<int>());
cc->InputSidePackets().Tag(kMaxInFlightTag).Get<int>());
}
input_queues_.resize(cc->Inputs().NumEntries(""));
RET_CHECK_OK(CopyInputHeadersToOutputs(cc->Inputs(), &(cc->Outputs())));
@ -104,8 +112,8 @@ class FlowLimiterCalculator : public CalculatorBase {
// Outputs a packet indicating whether a frame was sent or dropped.
void SendAllow(bool allow, Timestamp ts, CalculatorContext* cc) {
if (cc->Outputs().HasTag("ALLOW")) {
cc->Outputs().Tag("ALLOW").AddPacket(MakePacket<bool>(allow).At(ts));
if (cc->Outputs().HasTag(kAllowTag)) {
cc->Outputs().Tag(kAllowTag).AddPacket(MakePacket<bool>(allow).At(ts));
}
}
@ -155,7 +163,7 @@ class FlowLimiterCalculator : public CalculatorBase {
options_ = tool::RetrieveOptions(options_, cc->Inputs());
// Process the FINISHED input stream.
Packet finished_packet = cc->Inputs().Tag("FINISHED").Value();
Packet finished_packet = cc->Inputs().Tag(kFinishedTag).Value();
if (finished_packet.Timestamp() == cc->InputTimestamp()) {
while (!frames_in_flight_.empty() &&
frames_in_flight_.front() <= finished_packet.Timestamp()) {
@ -210,8 +218,8 @@ class FlowLimiterCalculator : public CalculatorBase {
Timestamp bound =
cc->Inputs().Get("", 0).Value().Timestamp().NextAllowedInStream();
SetNextTimestampBound(bound, &cc->Outputs().Get("", 0));
if (cc->Outputs().HasTag("ALLOW")) {
SetNextTimestampBound(bound, &cc->Outputs().Tag("ALLOW"));
if (cc->Outputs().HasTag(kAllowTag)) {
SetNextTimestampBound(bound, &cc->Outputs().Tag(kAllowTag));
}
}

View File

@ -36,6 +36,13 @@
namespace mediapipe {
namespace {
constexpr char kDropTimestampsTag[] = "DROP_TIMESTAMPS";
constexpr char kClockTag[] = "CLOCK";
constexpr char kWarmupTimeTag[] = "WARMUP_TIME";
constexpr char kSleepTimeTag[] = "SLEEP_TIME";
constexpr char kPacketTag[] = "PACKET";
// A simple Semaphore for synchronizing test threads.
class AtomicSemaphore {
public:
@ -204,17 +211,17 @@ TEST_F(FlowLimiterCalculatorSemaphoreTest, FramesDropped) {
class SleepCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("PACKET").SetAny();
cc->Outputs().Tag("PACKET").SetSameAs(&cc->Inputs().Tag("PACKET"));
cc->InputSidePackets().Tag("SLEEP_TIME").Set<int64>();
cc->InputSidePackets().Tag("WARMUP_TIME").Set<int64>();
cc->InputSidePackets().Tag("CLOCK").Set<mediapipe::Clock*>();
cc->Inputs().Tag(kPacketTag).SetAny();
cc->Outputs().Tag(kPacketTag).SetSameAs(&cc->Inputs().Tag(kPacketTag));
cc->InputSidePackets().Tag(kSleepTimeTag).Set<int64>();
cc->InputSidePackets().Tag(kWarmupTimeTag).Set<int64>();
cc->InputSidePackets().Tag(kClockTag).Set<mediapipe::Clock*>();
cc->SetTimestampOffset(0);
return absl::OkStatus();
}
absl::Status Open(CalculatorContext* cc) final {
clock_ = cc->InputSidePackets().Tag("CLOCK").Get<mediapipe::Clock*>();
clock_ = cc->InputSidePackets().Tag(kClockTag).Get<mediapipe::Clock*>();
return absl::OkStatus();
}
@ -222,10 +229,12 @@ class SleepCalculator : public CalculatorBase {
++packet_count;
absl::Duration sleep_time = absl::Microseconds(
packet_count == 1
? cc->InputSidePackets().Tag("WARMUP_TIME").Get<int64>()
: cc->InputSidePackets().Tag("SLEEP_TIME").Get<int64>());
? cc->InputSidePackets().Tag(kWarmupTimeTag).Get<int64>()
: cc->InputSidePackets().Tag(kSleepTimeTag).Get<int64>());
clock_->Sleep(sleep_time);
cc->Outputs().Tag("PACKET").AddPacket(cc->Inputs().Tag("PACKET").Value());
cc->Outputs()
.Tag(kPacketTag)
.AddPacket(cc->Inputs().Tag(kPacketTag).Value());
return absl::OkStatus();
}
@ -240,24 +249,27 @@ REGISTER_CALCULATOR(SleepCalculator);
class DropCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("PACKET").SetAny();
cc->Outputs().Tag("PACKET").SetSameAs(&cc->Inputs().Tag("PACKET"));
cc->InputSidePackets().Tag("DROP_TIMESTAMPS").Set<bool>();
cc->Inputs().Tag(kPacketTag).SetAny();
cc->Outputs().Tag(kPacketTag).SetSameAs(&cc->Inputs().Tag(kPacketTag));
cc->InputSidePackets().Tag(kDropTimestampsTag).Set<bool>();
cc->SetProcessTimestampBounds(true);
return absl::OkStatus();
}
absl::Status Process(CalculatorContext* cc) final {
if (!cc->Inputs().Tag("PACKET").Value().IsEmpty()) {
if (!cc->Inputs().Tag(kPacketTag).Value().IsEmpty()) {
++packet_count;
}
bool drop = (packet_count == 3);
if (!drop && !cc->Inputs().Tag("PACKET").Value().IsEmpty()) {
cc->Outputs().Tag("PACKET").AddPacket(cc->Inputs().Tag("PACKET").Value());
if (!drop && !cc->Inputs().Tag(kPacketTag).Value().IsEmpty()) {
cc->Outputs()
.Tag(kPacketTag)
.AddPacket(cc->Inputs().Tag(kPacketTag).Value());
}
if (!drop || !cc->InputSidePackets().Tag("DROP_TIMESTAMPS").Get<bool>()) {
cc->Outputs().Tag("PACKET").SetNextTimestampBound(
cc->InputTimestamp().NextAllowedInStream());
if (!drop || !cc->InputSidePackets().Tag(kDropTimestampsTag).Get<bool>()) {
cc->Outputs()
.Tag(kPacketTag)
.SetNextTimestampBound(cc->InputTimestamp().NextAllowedInStream());
}
return absl::OkStatus();
}

View File

@ -21,6 +21,11 @@
namespace mediapipe {
namespace {
constexpr char kStateChangeTag[] = "STATE_CHANGE";
constexpr char kDisallowTag[] = "DISALLOW";
constexpr char kAllowTag[] = "ALLOW";
enum GateState {
GATE_UNINITIALIZED,
GATE_ALLOW,
@ -59,8 +64,9 @@ std::string ToString(GateState state) {
// ALLOW or DISALLOW can also be specified as an input side packet. The rules
// for evaluation remain the same as above.
//
// ALLOW/DISALLOW inputs must be specified either using input stream or
// via input side packet but not both.
// ALLOW/DISALLOW inputs must be specified either using input stream or via
// input side packet but not both. If neither is specified, the behavior is then
// determined by the "allow" field in the calculator options.
//
// Intended to be used with the default input stream handler, which synchronizes
// all data input streams with the ALLOW/DISALLOW control input stream.
@ -83,30 +89,33 @@ class GateCalculator : public CalculatorBase {
GateCalculator() {}
static absl::Status CheckAndInitAllowDisallowInputs(CalculatorContract* cc) {
bool input_via_side_packet = cc->InputSidePackets().HasTag("ALLOW") ||
cc->InputSidePackets().HasTag("DISALLOW");
bool input_via_side_packet = cc->InputSidePackets().HasTag(kAllowTag) ||
cc->InputSidePackets().HasTag(kDisallowTag);
bool input_via_stream =
cc->Inputs().HasTag("ALLOW") || cc->Inputs().HasTag("DISALLOW");
// Only one of input_side_packet or input_stream may specify ALLOW/DISALLOW
// input.
RET_CHECK(input_via_side_packet ^ input_via_stream);
cc->Inputs().HasTag(kAllowTag) || cc->Inputs().HasTag(kDisallowTag);
// Only one of input_side_packet or input_stream may specify
// ALLOW/DISALLOW input.
if (input_via_side_packet) {
RET_CHECK(cc->InputSidePackets().HasTag("ALLOW") ^
cc->InputSidePackets().HasTag("DISALLOW"));
RET_CHECK(!input_via_stream);
RET_CHECK(cc->InputSidePackets().HasTag(kAllowTag) ^
cc->InputSidePackets().HasTag(kDisallowTag));
if (cc->InputSidePackets().HasTag("ALLOW")) {
cc->InputSidePackets().Tag("ALLOW").Set<bool>();
if (cc->InputSidePackets().HasTag(kAllowTag)) {
cc->InputSidePackets().Tag(kAllowTag).Set<bool>().Optional();
} else {
cc->InputSidePackets().Tag("DISALLOW").Set<bool>();
cc->InputSidePackets().Tag(kDisallowTag).Set<bool>().Optional();
}
} else {
RET_CHECK(cc->Inputs().HasTag("ALLOW") ^ cc->Inputs().HasTag("DISALLOW"));
}
if (input_via_stream) {
RET_CHECK(!input_via_side_packet);
RET_CHECK(cc->Inputs().HasTag(kAllowTag) ^
cc->Inputs().HasTag(kDisallowTag));
if (cc->Inputs().HasTag("ALLOW")) {
cc->Inputs().Tag("ALLOW").Set<bool>();
if (cc->Inputs().HasTag(kAllowTag)) {
cc->Inputs().Tag(kAllowTag).Set<bool>();
} else {
cc->Inputs().Tag("DISALLOW").Set<bool>();
cc->Inputs().Tag(kDisallowTag).Set<bool>();
}
}
return absl::OkStatus();
@ -125,23 +134,22 @@ class GateCalculator : public CalculatorBase {
cc->Outputs().Get("", i).SetSameAs(&cc->Inputs().Get("", i));
}
if (cc->Outputs().HasTag("STATE_CHANGE")) {
cc->Outputs().Tag("STATE_CHANGE").Set<bool>();
if (cc->Outputs().HasTag(kStateChangeTag)) {
cc->Outputs().Tag(kStateChangeTag).Set<bool>();
}
return absl::OkStatus();
}
absl::Status Open(CalculatorContext* cc) final {
use_side_packet_for_allow_disallow_ = false;
if (cc->InputSidePackets().HasTag("ALLOW")) {
if (cc->InputSidePackets().HasTag(kAllowTag)) {
use_side_packet_for_allow_disallow_ = true;
allow_by_side_packet_decision_ =
cc->InputSidePackets().Tag("ALLOW").Get<bool>();
} else if (cc->InputSidePackets().HasTag("DISALLOW")) {
cc->InputSidePackets().Tag(kAllowTag).Get<bool>();
} else if (cc->InputSidePackets().HasTag(kDisallowTag)) {
use_side_packet_for_allow_disallow_ = true;
allow_by_side_packet_decision_ =
!cc->InputSidePackets().Tag("DISALLOW").Get<bool>();
!cc->InputSidePackets().Tag(kDisallowTag).Get<bool>();
}
cc->SetOffset(TimestampDiff(0));
@ -152,26 +160,34 @@ class GateCalculator : public CalculatorBase {
const auto& options = cc->Options<::mediapipe::GateCalculatorOptions>();
empty_packets_as_allow_ = options.empty_packets_as_allow();
if (!use_side_packet_for_allow_disallow_ &&
!cc->Inputs().HasTag(kAllowTag) && !cc->Inputs().HasTag(kDisallowTag)) {
use_option_for_allow_disallow_ = true;
allow_by_option_decision_ = options.allow();
}
return absl::OkStatus();
}
absl::Status Process(CalculatorContext* cc) final {
bool allow = empty_packets_as_allow_;
if (use_side_packet_for_allow_disallow_) {
if (use_option_for_allow_disallow_) {
allow = allow_by_option_decision_;
} else if (use_side_packet_for_allow_disallow_) {
allow = allow_by_side_packet_decision_;
} else {
if (cc->Inputs().HasTag("ALLOW") &&
!cc->Inputs().Tag("ALLOW").IsEmpty()) {
allow = cc->Inputs().Tag("ALLOW").Get<bool>();
if (cc->Inputs().HasTag(kAllowTag) &&
!cc->Inputs().Tag(kAllowTag).IsEmpty()) {
allow = cc->Inputs().Tag(kAllowTag).Get<bool>();
}
if (cc->Inputs().HasTag("DISALLOW") &&
!cc->Inputs().Tag("DISALLOW").IsEmpty()) {
allow = !cc->Inputs().Tag("DISALLOW").Get<bool>();
if (cc->Inputs().HasTag(kDisallowTag) &&
!cc->Inputs().Tag(kDisallowTag).IsEmpty()) {
allow = !cc->Inputs().Tag(kDisallowTag).Get<bool>();
}
}
const GateState new_gate_state = allow ? GATE_ALLOW : GATE_DISALLOW;
if (cc->Outputs().HasTag("STATE_CHANGE")) {
if (cc->Outputs().HasTag(kStateChangeTag)) {
if (last_gate_state_ != GATE_UNINITIALIZED &&
last_gate_state_ != new_gate_state) {
VLOG(2) << "State transition in " << cc->NodeName() << " @ "
@ -179,7 +195,7 @@ class GateCalculator : public CalculatorBase {
<< ToString(last_gate_state_) << " to "
<< ToString(new_gate_state);
cc->Outputs()
.Tag("STATE_CHANGE")
.Tag(kStateChangeTag)
.AddPacket(MakePacket<bool>(allow).At(cc->InputTimestamp()));
}
}
@ -211,8 +227,10 @@ class GateCalculator : public CalculatorBase {
GateState last_gate_state_ = GATE_UNINITIALIZED;
int num_data_streams_;
bool empty_packets_as_allow_;
bool use_side_packet_for_allow_disallow_;
bool use_side_packet_for_allow_disallow_ = false;
bool allow_by_side_packet_decision_;
bool use_option_for_allow_disallow_ = false;
bool allow_by_option_decision_;
};
REGISTER_CALCULATOR(GateCalculator);

View File

@ -29,4 +29,8 @@ message GateCalculatorOptions {
// disallowing the corresponding packets in the data input streams. Setting
// this option to true inverts that, allowing the data packets to go through.
optional bool empty_packets_as_allow = 1;
// Whether to allow or disallow the input streams to pass when no
// ALLOW/DISALLOW input or side input is specified.
optional bool allow = 2 [default = false];
}

View File

@ -22,6 +22,9 @@ namespace mediapipe {
namespace {
constexpr char kDisallowTag[] = "DISALLOW";
constexpr char kAllowTag[] = "ALLOW";
class GateCalculatorTest : public ::testing::Test {
protected:
// Helper to run a graph and return status.
@ -110,6 +113,68 @@ TEST_F(GateCalculatorTest, InvalidInputs) {
)")));
}
TEST_F(GateCalculatorTest, AllowByALLOWOptionToTrue) {
SetRunner(R"(
calculator: "GateCalculator"
input_stream: "test_input"
output_stream: "test_output"
options: {
[mediapipe.GateCalculatorOptions.ext] {
allow: true
}
}
)");
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);
constexpr int64 kTimestampValue1 = 43;
RunTimeStep(kTimestampValue1, false);
const std::vector<Packet>& output = runner()->Outputs().Get("", 0).packets;
ASSERT_EQ(2, output.size());
EXPECT_EQ(kTimestampValue0, output[0].Timestamp().Value());
EXPECT_EQ(kTimestampValue1, output[1].Timestamp().Value());
EXPECT_EQ(true, output[0].Get<bool>());
EXPECT_EQ(false, output[1].Get<bool>());
}
TEST_F(GateCalculatorTest, DisallowByALLOWOptionSetToFalse) {
SetRunner(R"(
calculator: "GateCalculator"
input_stream: "test_input"
output_stream: "test_output"
options: {
[mediapipe.GateCalculatorOptions.ext] {
allow: false
}
}
)");
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);
constexpr int64 kTimestampValue1 = 43;
RunTimeStep(kTimestampValue1, false);
const std::vector<Packet>& output = runner()->Outputs().Get("", 0).packets;
ASSERT_EQ(0, output.size());
}
TEST_F(GateCalculatorTest, DisallowByALLOWOptionNotSet) {
SetRunner(R"(
calculator: "GateCalculator"
input_stream: "test_input"
output_stream: "test_output"
)");
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);
constexpr int64 kTimestampValue1 = 43;
RunTimeStep(kTimestampValue1, false);
const std::vector<Packet>& output = runner()->Outputs().Get("", 0).packets;
ASSERT_EQ(0, output.size());
}
TEST_F(GateCalculatorTest, AllowByALLOWSidePacketSetToTrue) {
SetRunner(R"(
calculator: "GateCalculator"
@ -117,7 +182,7 @@ TEST_F(GateCalculatorTest, AllowByALLOWSidePacketSetToTrue) {
input_stream: "test_input"
output_stream: "test_output"
)");
runner()->MutableSidePackets()->Tag("ALLOW") = Adopt(new bool(true));
runner()->MutableSidePackets()->Tag(kAllowTag) = Adopt(new bool(true));
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);
@ -139,7 +204,7 @@ TEST_F(GateCalculatorTest, AllowByDisallowSidePacketSetToFalse) {
input_stream: "test_input"
output_stream: "test_output"
)");
runner()->MutableSidePackets()->Tag("DISALLOW") = Adopt(new bool(false));
runner()->MutableSidePackets()->Tag(kDisallowTag) = Adopt(new bool(false));
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);
@ -161,7 +226,7 @@ TEST_F(GateCalculatorTest, DisallowByALLOWSidePacketSetToFalse) {
input_stream: "test_input"
output_stream: "test_output"
)");
runner()->MutableSidePackets()->Tag("ALLOW") = Adopt(new bool(false));
runner()->MutableSidePackets()->Tag(kAllowTag) = Adopt(new bool(false));
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);
@ -179,7 +244,7 @@ TEST_F(GateCalculatorTest, DisallowByDISALLOWSidePacketSetToTrue) {
input_stream: "test_input"
output_stream: "test_output"
)");
runner()->MutableSidePackets()->Tag("DISALLOW") = Adopt(new bool(true));
runner()->MutableSidePackets()->Tag(kDisallowTag) = Adopt(new bool(true));
constexpr int64 kTimestampValue0 = 42;
RunTimeStep(kTimestampValue0, true);

View File

@ -0,0 +1,70 @@
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <memory>
#include "mediapipe/calculators/core/graph_profile_calculator.pb.h"
#include "mediapipe/framework/api2/node.h"
#include "mediapipe/framework/api2/packet.h"
#include "mediapipe/framework/api2/port.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/calculator_profile.pb.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/status.h"
namespace mediapipe {
namespace api2 {
// This calculator periodically copies the GraphProfile from
// mediapipe::GraphProfiler::CaptureProfile to the "PROFILE" output stream.
//
// Example config:
// node {
// calculator: "GraphProfileCalculator"
// output_stream: "FRAME:any_frame"
// output_stream: "PROFILE:graph_profile"
// }
//
class GraphProfileCalculator : public Node {
public:
static constexpr Input<AnyType>::Multiple kFrameIn{"FRAME"};
static constexpr Output<GraphProfile> kProfileOut{"PROFILE"};
MEDIAPIPE_NODE_CONTRACT(kFrameIn, kProfileOut);
static absl::Status UpdateContract(CalculatorContract* cc) {
return absl::OkStatus();
}
absl::Status Process(CalculatorContext* cc) final {
auto options = cc->Options<::mediapipe::GraphProfileCalculatorOptions>();
if (prev_profile_ts_ == Timestamp::Unset() ||
cc->InputTimestamp() - prev_profile_ts_ >= options.profile_interval()) {
prev_profile_ts_ = cc->InputTimestamp();
GraphProfile result;
MP_RETURN_IF_ERROR(cc->GetProfilingContext()->CaptureProfile(&result));
kProfileOut(cc).Send(result);
}
return absl::OkStatus();
}
private:
Timestamp prev_profile_ts_;
};
MEDIAPIPE_REGISTER_NODE(GraphProfileCalculator);
} // namespace api2
} // namespace mediapipe

View File

@ -0,0 +1,30 @@
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
package mediapipe;
import "mediapipe/framework/calculator.proto";
option objc_class_prefix = "MediaPipe";
message GraphProfileCalculatorOptions {
extend mediapipe.CalculatorOptions {
optional GraphProfileCalculatorOptions ext = 367481815;
}
// The interval in microseconds between successive reported GraphProfiles.
optional int64 profile_interval = 1 [default = 1000000];
}

View File

@ -0,0 +1,211 @@
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <memory>
#include <string>
#include <vector>
#include "absl/status/status.h"
#include "absl/strings/str_cat.h"
#include "absl/time/time.h"
#include "mediapipe/framework/calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/calculator_profile.pb.h"
#include "mediapipe/framework/deps/clock.h"
#include "mediapipe/framework/deps/message_matchers.h"
#include "mediapipe/framework/port/gmock.h"
#include "mediapipe/framework/port/gtest.h"
#include "mediapipe/framework/port/integral_types.h"
#include "mediapipe/framework/port/logging.h"
#include "mediapipe/framework/port/parse_text_proto.h"
#include "mediapipe/framework/port/proto_ns.h"
#include "mediapipe/framework/port/status_matchers.h"
#include "mediapipe/framework/port/threadpool.h"
#include "mediapipe/framework/tool/simulation_clock_executor.h"
// Tests for GraphProfileCalculator.
using testing::ElementsAre;
namespace mediapipe {
namespace {
constexpr char kClockTag[] = "CLOCK";
using mediapipe::Clock;
// A Calculator with a fixed Process call latency.
class SleepCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->InputSidePackets().Tag(kClockTag).Set<std::shared_ptr<Clock>>();
cc->Inputs().Index(0).SetAny();
cc->Outputs().Index(0).SetSameAs(&cc->Inputs().Index(0));
cc->SetTimestampOffset(TimestampDiff(0));
return absl::OkStatus();
}
absl::Status Open(CalculatorContext* cc) final {
clock_ =
cc->InputSidePackets().Tag(kClockTag).Get<std::shared_ptr<Clock>>();
return absl::OkStatus();
}
absl::Status Process(CalculatorContext* cc) final {
clock_->Sleep(absl::Milliseconds(5));
cc->Outputs().Index(0).AddPacket(cc->Inputs().Index(0).Value());
return absl::OkStatus();
}
std::shared_ptr<::mediapipe::Clock> clock_ = nullptr;
};
REGISTER_CALCULATOR(SleepCalculator);
// Tests showing GraphProfileCalculator reporting GraphProfile output packets.
class GraphProfileCalculatorTest : public ::testing::Test {
protected:
void SetUpProfileGraph() {
ASSERT_TRUE(proto_ns::TextFormat::ParseFromString(R"(
input_stream: "input_packets_0"
node {
calculator: 'SleepCalculator'
input_side_packet: 'CLOCK:sync_clock'
input_stream: 'input_packets_0'
output_stream: 'output_packets_1'
}
node {
calculator: "GraphProfileCalculator"
options: {
[mediapipe.GraphProfileCalculatorOptions.ext]: {
profile_interval: 25000
}
}
input_stream: "FRAME:output_packets_1"
output_stream: "PROFILE:output_packets_0"
}
)",
&graph_config_));
}
static Packet PacketAt(int64 ts) {
return Adopt(new int64(999)).At(Timestamp(ts));
}
static Packet None() { return Packet().At(Timestamp::OneOverPostStream()); }
static bool IsNone(const Packet& packet) {
return packet.Timestamp() == Timestamp::OneOverPostStream();
}
// Return the values of the timestamps of a vector of Packets.
static std::vector<int64> TimestampValues(
const std::vector<Packet>& packets) {
std::vector<int64> result;
for (const Packet& p : packets) {
result.push_back(p.Timestamp().Value());
}
return result;
}
// Runs a CalculatorGraph with a series of packet sets.
// Returns a vector of packets from each graph output stream.
void RunGraph(const std::vector<std::vector<Packet>>& input_sets,
std::vector<Packet>* output_packets) {
// Register output packet observers.
tool::AddVectorSink("output_packets_0", &graph_config_, output_packets);
// Start running the graph.
std::shared_ptr<SimulationClockExecutor> executor(
new SimulationClockExecutor(3 /*num_threads*/));
CalculatorGraph graph;
MP_ASSERT_OK(graph.SetExecutor("", executor));
graph.profiler()->SetClock(executor->GetClock());
MP_ASSERT_OK(graph.Initialize(graph_config_));
executor->GetClock()->ThreadStart();
MP_ASSERT_OK(graph.StartRun({
{"sync_clock",
Adopt(new std::shared_ptr<::mediapipe::Clock>(executor->GetClock()))},
}));
// Send each packet to the graph in the specified order.
for (int t = 0; t < input_sets.size(); t++) {
const std::vector<Packet>& input_set = input_sets[t];
for (int i = 0; i < input_set.size(); i++) {
const Packet& packet = input_set[i];
if (!IsNone(packet)) {
MP_EXPECT_OK(graph.AddPacketToInputStream(
absl::StrCat("input_packets_", i), packet));
}
executor->GetClock()->Sleep(absl::Milliseconds(10));
}
}
MP_ASSERT_OK(graph.CloseAllInputStreams());
executor->GetClock()->Sleep(absl::Milliseconds(100));
executor->GetClock()->ThreadFinish();
MP_ASSERT_OK(graph.WaitUntilDone());
}
CalculatorGraphConfig graph_config_;
};
TEST_F(GraphProfileCalculatorTest, GraphProfile) {
SetUpProfileGraph();
auto profiler_config = graph_config_.mutable_profiler_config();
profiler_config->set_enable_profiler(true);
profiler_config->set_trace_enabled(false);
profiler_config->set_trace_log_disabled(true);
profiler_config->set_enable_stream_latency(true);
profiler_config->set_calculator_filter(".*Calculator");
// Run the graph with a series of packet sets.
std::vector<std::vector<Packet>> input_sets = {
{PacketAt(10000)}, //
{PacketAt(20000)}, //
{PacketAt(30000)}, //
{PacketAt(40000)},
};
std::vector<Packet> output_packets;
RunGraph(input_sets, &output_packets);
// Validate the output packets.
EXPECT_THAT(TimestampValues(output_packets), //
ElementsAre(10000, 40000));
GraphProfile expected_profile =
mediapipe::ParseTextProtoOrDie<GraphProfile>(R"pb(
calculator_profiles {
name: "GraphProfileCalculator"
open_runtime: 0
process_runtime { total: 0 count: 3 }
process_input_latency { total: 15000 count: 3 }
process_output_latency { total: 15000 count: 3 }
input_stream_profiles {
name: "output_packets_1"
back_edge: false
latency { total: 0 count: 3 }
}
}
calculator_profiles {
name: "SleepCalculator"
open_runtime: 0
process_runtime { total: 15000 count: 3 }
process_input_latency { total: 0 count: 3 }
process_output_latency { total: 15000 count: 3 }
input_stream_profiles {
name: "input_packets_0"
back_edge: false
latency { total: 0 count: 3 }
}
})pb");
EXPECT_THAT(output_packets[1].Get<GraphProfile>(),
mediapipe::EqualsProto(expected_profile));
}
} // namespace
} // namespace mediapipe

View File

@ -0,0 +1,70 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/calculator_runner.h"
#include "mediapipe/framework/port/canonical_errors.h"
#include "mediapipe/framework/port/gmock.h"
#include "mediapipe/framework/port/gtest.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/port/status_matchers.h"
#include "mediapipe/framework/timestamp.h"
#include "mediapipe/framework/tool/validate_type.h"
#include "mediapipe/util/packet_test_util.h"
#include "mediapipe/util/time_series_test_util.h"
namespace mediapipe {
class MakePairCalculatorTest
: public mediapipe::TimeSeriesCalculatorTest<mediapipe::NoOptions> {
protected:
void SetUp() override {
calculator_name_ = "MakePairCalculator";
num_input_streams_ = 2;
}
};
TEST_F(MakePairCalculatorTest, ProducesExpectedPairs) {
InitializeGraph();
AppendInputPacket(new std::string("first packet"), Timestamp(1),
/* input_index= */ 0);
AppendInputPacket(new std::string("second packet"), Timestamp(5),
/* input_index= */ 0);
AppendInputPacket(new int(10), Timestamp(1), /* input_index= */ 1);
AppendInputPacket(new int(20), Timestamp(5), /* input_index= */ 1);
MP_ASSERT_OK(RunGraph());
EXPECT_THAT(
output().packets,
::testing::ElementsAre(
mediapipe::PacketContainsTimestampAndPayload<
std::pair<Packet, Packet>>(
Timestamp(1),
::testing::Pair(
mediapipe::PacketContainsTimestampAndPayload<std::string>(
Timestamp(1), std::string("first packet")),
mediapipe::PacketContainsTimestampAndPayload<int>(
Timestamp(1), 10))),
mediapipe::PacketContainsTimestampAndPayload<
std::pair<Packet, Packet>>(
Timestamp(5),
::testing::Pair(
mediapipe::PacketContainsTimestampAndPayload<std::string>(
Timestamp(5), std::string("second packet")),
mediapipe::PacketContainsTimestampAndPayload<int>(
Timestamp(5), 20)))));
}
} // namespace mediapipe

View File

@ -29,6 +29,9 @@
namespace mediapipe {
namespace {
constexpr char kMinuendTag[] = "MINUEND";
constexpr char kSubtrahendTag[] = "SUBTRAHEND";
// A 3x4 Matrix of random integers in [0,1000).
const char kMatrixText[] =
"rows: 3\n"
@ -104,12 +107,13 @@ TEST(MatrixSubtractCalculatorTest, SubtractFromInput) {
CalculatorRunner runner(node_config);
Matrix* side_matrix = new Matrix();
MatrixFromTextProto(kMatrixText, side_matrix);
runner.MutableSidePackets()->Tag("SUBTRAHEND") = Adopt(side_matrix);
runner.MutableSidePackets()->Tag(kSubtrahendTag) = Adopt(side_matrix);
Matrix* input_matrix = new Matrix();
MatrixFromTextProto(kMatrixText2, input_matrix);
runner.MutableInputs()->Tag("MINUEND").packets.push_back(
Adopt(input_matrix).At(Timestamp(0)));
runner.MutableInputs()
->Tag(kMinuendTag)
.packets.push_back(Adopt(input_matrix).At(Timestamp(0)));
MP_ASSERT_OK(runner.Run());
EXPECT_EQ(1, runner.Outputs().Index(0).packets.size());
@ -133,12 +137,12 @@ TEST(MatrixSubtractCalculatorTest, SubtractFromSideMatrix) {
CalculatorRunner runner(node_config);
Matrix* side_matrix = new Matrix();
MatrixFromTextProto(kMatrixText, side_matrix);
runner.MutableSidePackets()->Tag("MINUEND") = Adopt(side_matrix);
runner.MutableSidePackets()->Tag(kMinuendTag) = Adopt(side_matrix);
Matrix* input_matrix = new Matrix();
MatrixFromTextProto(kMatrixText2, input_matrix);
runner.MutableInputs()
->Tag("SUBTRAHEND")
->Tag(kSubtrahendTag)
.packets.push_back(Adopt(input_matrix).At(Timestamp(0)));
MP_ASSERT_OK(runner.Run());

View File

@ -17,6 +17,9 @@
namespace mediapipe {
constexpr char kPresenceTag[] = "PRESENCE";
constexpr char kPacketTag[] = "PACKET";
// For each non empty input packet, emits a single output packet containing a
// boolean value "true", "false" in response to empty packets (a.k.a. timestamp
// bound updates) This can be used to "flag" the presence of an arbitrary packet
@ -58,8 +61,8 @@ namespace mediapipe {
class PacketPresenceCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("PACKET").SetAny();
cc->Outputs().Tag("PRESENCE").Set<bool>();
cc->Inputs().Tag(kPacketTag).SetAny();
cc->Outputs().Tag(kPresenceTag).Set<bool>();
// Process() function is invoked in response to input stream timestamp
// bound updates.
cc->SetProcessTimestampBounds(true);
@ -73,8 +76,8 @@ class PacketPresenceCalculator : public CalculatorBase {
absl::Status Process(CalculatorContext* cc) final {
cc->Outputs()
.Tag("PRESENCE")
.AddPacket(MakePacket<bool>(!cc->Inputs().Tag("PACKET").IsEmpty())
.Tag(kPresenceTag)
.AddPacket(MakePacket<bool>(!cc->Inputs().Tag(kPacketTag).IsEmpty())
.At(cc->InputTimestamp()));
return absl::OkStatus();
}

View File

@ -39,6 +39,11 @@ namespace mediapipe {
REGISTER_CALCULATOR(PacketResamplerCalculator);
namespace {
constexpr char kSeedTag[] = "SEED";
constexpr char kVideoHeaderTag[] = "VIDEO_HEADER";
constexpr char kOptionsTag[] = "OPTIONS";
// Returns a TimestampDiff (assuming microseconds) corresponding to the
// given time in seconds.
TimestampDiff TimestampDiffFromSeconds(double seconds) {
@ -50,16 +55,16 @@ TimestampDiff TimestampDiffFromSeconds(double seconds) {
absl::Status PacketResamplerCalculator::GetContract(CalculatorContract* cc) {
const auto& resampler_options =
cc->Options<PacketResamplerCalculatorOptions>();
if (cc->InputSidePackets().HasTag("OPTIONS")) {
cc->InputSidePackets().Tag("OPTIONS").Set<CalculatorOptions>();
if (cc->InputSidePackets().HasTag(kOptionsTag)) {
cc->InputSidePackets().Tag(kOptionsTag).Set<CalculatorOptions>();
}
CollectionItemId input_data_id = cc->Inputs().GetId("DATA", 0);
if (!input_data_id.IsValid()) {
input_data_id = cc->Inputs().GetId("", 0);
}
cc->Inputs().Get(input_data_id).SetAny();
if (cc->Inputs().HasTag("VIDEO_HEADER")) {
cc->Inputs().Tag("VIDEO_HEADER").Set<VideoHeader>();
if (cc->Inputs().HasTag(kVideoHeaderTag)) {
cc->Inputs().Tag(kVideoHeaderTag).Set<VideoHeader>();
}
CollectionItemId output_data_id = cc->Outputs().GetId("DATA", 0);
@ -67,15 +72,15 @@ absl::Status PacketResamplerCalculator::GetContract(CalculatorContract* cc) {
output_data_id = cc->Outputs().GetId("", 0);
}
cc->Outputs().Get(output_data_id).SetSameAs(&cc->Inputs().Get(input_data_id));
if (cc->Outputs().HasTag("VIDEO_HEADER")) {
cc->Outputs().Tag("VIDEO_HEADER").Set<VideoHeader>();
if (cc->Outputs().HasTag(kVideoHeaderTag)) {
cc->Outputs().Tag(kVideoHeaderTag).Set<VideoHeader>();
}
if (resampler_options.jitter() != 0.0) {
RET_CHECK_GT(resampler_options.jitter(), 0.0);
RET_CHECK_LE(resampler_options.jitter(), 1.0);
RET_CHECK(cc->InputSidePackets().HasTag("SEED"));
cc->InputSidePackets().Tag("SEED").Set<std::string>();
RET_CHECK(cc->InputSidePackets().HasTag(kSeedTag));
cc->InputSidePackets().Tag(kSeedTag).Set<std::string>();
}
return absl::OkStatus();
}
@ -143,9 +148,9 @@ absl::Status PacketResamplerCalculator::Open(CalculatorContext* cc) {
absl::Status PacketResamplerCalculator::Process(CalculatorContext* cc) {
if (cc->InputTimestamp() == Timestamp::PreStream() &&
cc->Inputs().UsesTags() && cc->Inputs().HasTag("VIDEO_HEADER") &&
!cc->Inputs().Tag("VIDEO_HEADER").IsEmpty()) {
video_header_ = cc->Inputs().Tag("VIDEO_HEADER").Get<VideoHeader>();
cc->Inputs().UsesTags() && cc->Inputs().HasTag(kVideoHeaderTag) &&
!cc->Inputs().Tag(kVideoHeaderTag).IsEmpty()) {
video_header_ = cc->Inputs().Tag(kVideoHeaderTag).Get<VideoHeader>();
video_header_.frame_rate = frame_rate_;
if (cc->Inputs().Get(input_data_id_).IsEmpty()) {
return absl::OkStatus();
@ -234,7 +239,7 @@ absl::Status LegacyJitterWithReflectionStrategy::Open(CalculatorContext* cc) {
"ignored, because we are adding jitter.";
}
const auto& seed = cc->InputSidePackets().Tag("SEED").Get<std::string>();
const auto& seed = cc->InputSidePackets().Tag(kSeedTag).Get<std::string>();
random_ = CreateSecureRandom(seed);
if (random_ == nullptr) {
return absl::InvalidArgumentError(
@ -357,7 +362,7 @@ absl::Status ReproducibleJitterWithReflectionStrategy::Open(
"ignored, because we are adding jitter.";
}
const auto& seed = cc->InputSidePackets().Tag("SEED").Get<std::string>();
const auto& seed = cc->InputSidePackets().Tag(kSeedTag).Get<std::string>();
random_ = CreateSecureRandom(seed);
if (random_ == nullptr) {
return absl::InvalidArgumentError(
@ -504,7 +509,7 @@ absl::Status JitterWithoutReflectionStrategy::Open(CalculatorContext* cc) {
"ignored, because we are adding jitter.";
}
const auto& seed = cc->InputSidePackets().Tag("SEED").Get<std::string>();
const auto& seed = cc->InputSidePackets().Tag(kSeedTag).Get<std::string>();
random_ = CreateSecureRandom(seed);
if (random_ == nullptr) {
return absl::InvalidArgumentError(
@ -635,9 +640,9 @@ absl::Status NoJitterStrategy::Process(CalculatorContext* cc) {
base_timestamp_ +
TimestampDiffFromSeconds(first_index / calculator_->frame_rate_);
}
if (cc->Outputs().UsesTags() && cc->Outputs().HasTag("VIDEO_HEADER")) {
if (cc->Outputs().UsesTags() && cc->Outputs().HasTag(kVideoHeaderTag)) {
cc->Outputs()
.Tag("VIDEO_HEADER")
.Tag(kVideoHeaderTag)
.Add(new VideoHeader(calculator_->video_header_),
Timestamp::PreStream());
}

View File

@ -32,6 +32,12 @@ namespace mediapipe {
using ::testing::ElementsAre;
namespace {
constexpr char kOptionsTag[] = "OPTIONS";
constexpr char kSeedTag[] = "SEED";
constexpr char kVideoHeaderTag[] = "VIDEO_HEADER";
constexpr char kDataTag[] = "DATA";
// A simple version of CalculatorRunner with built-in convenience
// methods for setting inputs from a vector and checking outputs
// against expected outputs (both timestamps and contents).
@ -464,7 +470,7 @@ TEST(PacketResamplerCalculatorTest, SetVideoHeader) {
)pb"));
for (const int64 ts : {0, 5000, 10010, 15001, 19990}) {
runner.MutableInputs()->Tag("DATA").packets.push_back(
runner.MutableInputs()->Tag(kDataTag).packets.push_back(
Adopt(new std::string(absl::StrCat("Frame #", ts))).At(Timestamp(ts)));
}
VideoHeader video_header_in;
@ -474,16 +480,16 @@ TEST(PacketResamplerCalculatorTest, SetVideoHeader) {
video_header_in.duration = 1.0;
video_header_in.format = ImageFormat::SRGB;
runner.MutableInputs()
->Tag("VIDEO_HEADER")
->Tag(kVideoHeaderTag)
.packets.push_back(
Adopt(new VideoHeader(video_header_in)).At(Timestamp::PreStream()));
MP_ASSERT_OK(runner.Run());
ASSERT_EQ(1, runner.Outputs().Tag("VIDEO_HEADER").packets.size());
ASSERT_EQ(1, runner.Outputs().Tag(kVideoHeaderTag).packets.size());
EXPECT_EQ(Timestamp::PreStream(),
runner.Outputs().Tag("VIDEO_HEADER").packets[0].Timestamp());
runner.Outputs().Tag(kVideoHeaderTag).packets[0].Timestamp());
const VideoHeader& video_header_out =
runner.Outputs().Tag("VIDEO_HEADER").packets[0].Get<VideoHeader>();
runner.Outputs().Tag(kVideoHeaderTag).packets[0].Get<VideoHeader>();
EXPECT_EQ(video_header_in.width, video_header_out.width);
EXPECT_EQ(video_header_in.height, video_header_out.height);
EXPECT_DOUBLE_EQ(50.0, video_header_out.frame_rate);
@ -725,7 +731,7 @@ TEST(PacketResamplerCalculatorTest, OptionsSidePacket) {
[mediapipe.PacketResamplerCalculatorOptions.ext] {
frame_rate: 30
})pb"));
runner.MutableSidePackets()->Tag("OPTIONS") = Adopt(options);
runner.MutableSidePackets()->Tag(kOptionsTag) = Adopt(options);
runner.SetInput({-222, 15000, 32000, 49999, 150000});
MP_ASSERT_OK(runner.Run());
EXPECT_EQ(6, runner.Outputs().Index(0).packets.size());
@ -740,7 +746,7 @@ TEST(PacketResamplerCalculatorTest, OptionsSidePacket) {
frame_rate: 30
base_timestamp: 0
})pb"));
runner.MutableSidePackets()->Tag("OPTIONS") = Adopt(options);
runner.MutableSidePackets()->Tag(kOptionsTag) = Adopt(options);
runner.SetInput({-222, 15000, 32000, 49999, 150000});
MP_ASSERT_OK(runner.Run());

View File

@ -217,6 +217,7 @@ absl::Status PacketThinnerCalculator::Open(CalculatorContext* cc) {
header->format = video_header.format;
header->width = video_header.width;
header->height = video_header.height;
header->duration = video_header.duration;
header->frame_rate = new_frame_rate;
cc->Outputs().Index(0).SetHeader(Adopt(header.release()));
} else {

View File

@ -29,6 +29,8 @@
namespace mediapipe {
namespace {
constexpr char kPeriodTag[] = "PERIOD";
// A simple version of CalculatorRunner with built-in convenience methods for
// setting inputs from a vector and checking outputs against a vector of
// expected outputs.
@ -121,7 +123,7 @@ TEST(PacketThinnerCalculatorTest, ASyncUniformStreamThinningTestBySidePacket) {
SimpleRunner runner(node);
runner.SetInput({2, 4, 6, 8, 10, 12, 14});
runner.MutableSidePackets()->Tag("PERIOD") = MakePacket<int64>(5);
runner.MutableSidePackets()->Tag(kPeriodTag) = MakePacket<int64>(5);
MP_ASSERT_OK(runner.Run());
const std::vector<int64> expected_timestamps = {2, 8, 14};
@ -160,7 +162,7 @@ TEST(PacketThinnerCalculatorTest, SyncUniformStreamThinningTestBySidePacket1) {
SimpleRunner runner(node);
runner.SetInput({2, 4, 6, 8, 10, 12, 14});
runner.MutableSidePackets()->Tag("PERIOD") = MakePacket<int64>(5);
runner.MutableSidePackets()->Tag(kPeriodTag) = MakePacket<int64>(5);
MP_ASSERT_OK(runner.Run());
const std::vector<int64> expected_timestamps = {2, 6, 10, 14};

View File

@ -39,6 +39,8 @@ using ::testing::Pair;
using ::testing::Value;
namespace {
constexpr char kDisallowTag[] = "DISALLOW";
// Returns the timestamp values for a vector of Packets.
// TODO: puth this kind of test util in a common place.
std::vector<int64> TimestampValues(const std::vector<Packet>& packets) {
@ -702,14 +704,14 @@ class DroppingGateCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Index(0).SetAny();
cc->Inputs().Tag("DISALLOW").Set<bool>();
cc->Inputs().Tag(kDisallowTag).Set<bool>();
cc->Outputs().Index(0).SetSameAs(&cc->Inputs().Index(0));
return absl::OkStatus();
}
absl::Status Process(CalculatorContext* cc) final {
if (!cc->Inputs().Index(0).IsEmpty() &&
!cc->Inputs().Tag("DISALLOW").Get<bool>()) {
!cc->Inputs().Tag(kDisallowTag).Get<bool>()) {
cc->Outputs().Index(0).AddPacket(cc->Inputs().Index(0).Value());
}
return absl::OkStatus();

View File

@ -41,11 +41,14 @@
// }
namespace mediapipe {
constexpr char kEncodedTag[] = "ENCODED";
constexpr char kFloatVectorTag[] = "FLOAT_VECTOR";
class QuantizeFloatVectorCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag("FLOAT_VECTOR").Set<std::vector<float>>();
cc->Outputs().Tag("ENCODED").Set<std::string>();
cc->Inputs().Tag(kFloatVectorTag).Set<std::vector<float>>();
cc->Outputs().Tag(kEncodedTag).Set<std::string>();
return absl::OkStatus();
}
@ -70,7 +73,7 @@ class QuantizeFloatVectorCalculator : public CalculatorBase {
absl::Status Process(CalculatorContext* cc) final {
const std::vector<float>& float_vector =
cc->Inputs().Tag("FLOAT_VECTOR").Value().Get<std::vector<float>>();
cc->Inputs().Tag(kFloatVectorTag).Value().Get<std::vector<float>>();
int feature_size = float_vector.size();
std::string encoded_features;
encoded_features.reserve(feature_size);
@ -86,7 +89,9 @@ class QuantizeFloatVectorCalculator : public CalculatorBase {
(old_value - min_quantized_value_) * (255.0 / range_));
encoded_features += encoded;
}
cc->Outputs().Tag("ENCODED").AddPacket(
cc->Outputs()
.Tag(kEncodedTag)
.AddPacket(
MakePacket<std::string>(encoded_features).At(cc->InputTimestamp()));
return absl::OkStatus();
}

View File

@ -25,6 +25,9 @@
namespace mediapipe {
constexpr char kEncodedTag[] = "ENCODED";
constexpr char kFloatVectorTag[] = "FLOAT_VECTOR";
TEST(QuantizeFloatVectorCalculatorTest, WrongConfig) {
CalculatorGraphConfig::Node node_config =
ParseTextProtoOrDie<CalculatorGraphConfig::Node>(R"pb(
@ -40,7 +43,7 @@ TEST(QuantizeFloatVectorCalculatorTest, WrongConfig) {
CalculatorRunner runner(node_config);
std::vector<float> empty_vector;
runner.MutableInputs()
->Tag("FLOAT_VECTOR")
->Tag(kFloatVectorTag)
.packets.push_back(
MakePacket<std::vector<float>>(empty_vector).At(Timestamp(0)));
auto status = runner.Run();
@ -67,7 +70,7 @@ TEST(QuantizeFloatVectorCalculatorTest, WrongConfig2) {
CalculatorRunner runner(node_config);
std::vector<float> empty_vector;
runner.MutableInputs()
->Tag("FLOAT_VECTOR")
->Tag(kFloatVectorTag)
.packets.push_back(
MakePacket<std::vector<float>>(empty_vector).At(Timestamp(0)));
auto status = runner.Run();
@ -94,7 +97,7 @@ TEST(QuantizeFloatVectorCalculatorTest, WrongConfig3) {
CalculatorRunner runner(node_config);
std::vector<float> empty_vector;
runner.MutableInputs()
->Tag("FLOAT_VECTOR")
->Tag(kFloatVectorTag)
.packets.push_back(
MakePacket<std::vector<float>>(empty_vector).At(Timestamp(0)));
auto status = runner.Run();
@ -121,11 +124,12 @@ TEST(QuantizeFloatVectorCalculatorTest, TestEmptyVector) {
CalculatorRunner runner(node_config);
std::vector<float> empty_vector;
runner.MutableInputs()
->Tag("FLOAT_VECTOR")
->Tag(kFloatVectorTag)
.packets.push_back(
MakePacket<std::vector<float>>(empty_vector).At(Timestamp(0)));
MP_ASSERT_OK(runner.Run());
const std::vector<Packet>& outputs = runner.Outputs().Tag("ENCODED").packets;
const std::vector<Packet>& outputs =
runner.Outputs().Tag(kEncodedTag).packets;
EXPECT_EQ(1, outputs.size());
EXPECT_TRUE(outputs[0].Get<std::string>().empty());
EXPECT_EQ(Timestamp(0), outputs[0].Timestamp());
@ -147,11 +151,12 @@ TEST(QuantizeFloatVectorCalculatorTest, TestNonEmptyVector) {
CalculatorRunner runner(node_config);
std::vector<float> vector = {0.0f, -64.0f, 64.0f, -32.0f, 32.0f};
runner.MutableInputs()
->Tag("FLOAT_VECTOR")
->Tag(kFloatVectorTag)
.packets.push_back(
MakePacket<std::vector<float>>(vector).At(Timestamp(0)));
MP_ASSERT_OK(runner.Run());
const std::vector<Packet>& outputs = runner.Outputs().Tag("ENCODED").packets;
const std::vector<Packet>& outputs =
runner.Outputs().Tag(kEncodedTag).packets;
EXPECT_EQ(1, outputs.size());
const std::string& result = outputs[0].Get<std::string>();
ASSERT_FALSE(result.empty());
@ -185,11 +190,12 @@ TEST(QuantizeFloatVectorCalculatorTest, TestSaturation) {
CalculatorRunner runner(node_config);
std::vector<float> vector = {-65.0f, 65.0f};
runner.MutableInputs()
->Tag("FLOAT_VECTOR")
->Tag(kFloatVectorTag)
.packets.push_back(
MakePacket<std::vector<float>>(vector).At(Timestamp(0)));
MP_ASSERT_OK(runner.Run());
const std::vector<Packet>& outputs = runner.Outputs().Tag("ENCODED").packets;
const std::vector<Packet>& outputs =
runner.Outputs().Tag(kEncodedTag).packets;
EXPECT_EQ(1, outputs.size());
const std::string& result = outputs[0].Get<std::string>();
ASSERT_FALSE(result.empty());

View File

@ -23,6 +23,9 @@
namespace mediapipe {
constexpr char kAllowTag[] = "ALLOW";
constexpr char kMaxInFlightTag[] = "MAX_IN_FLIGHT";
// RealTimeFlowLimiterCalculator is used to limit the number of pipelined
// processing operations in a section of the graph.
//
@ -86,11 +89,11 @@ class RealTimeFlowLimiterCalculator : public CalculatorBase {
cc->Outputs().Get("", i).SetSameAs(&(cc->Inputs().Get("", i)));
}
cc->Inputs().Get("FINISHED", 0).SetAny();
if (cc->InputSidePackets().HasTag("MAX_IN_FLIGHT")) {
cc->InputSidePackets().Tag("MAX_IN_FLIGHT").Set<int>();
if (cc->InputSidePackets().HasTag(kMaxInFlightTag)) {
cc->InputSidePackets().Tag(kMaxInFlightTag).Set<int>();
}
if (cc->Outputs().HasTag("ALLOW")) {
cc->Outputs().Tag("ALLOW").Set<bool>();
if (cc->Outputs().HasTag(kAllowTag)) {
cc->Outputs().Tag(kAllowTag).Set<bool>();
}
cc->SetInputStreamHandler("ImmediateInputStreamHandler");
@ -101,8 +104,8 @@ class RealTimeFlowLimiterCalculator : public CalculatorBase {
absl::Status Open(CalculatorContext* cc) final {
finished_id_ = cc->Inputs().GetId("FINISHED", 0);
max_in_flight_ = 1;
if (cc->InputSidePackets().HasTag("MAX_IN_FLIGHT")) {
max_in_flight_ = cc->InputSidePackets().Tag("MAX_IN_FLIGHT").Get<int>();
if (cc->InputSidePackets().HasTag(kMaxInFlightTag)) {
max_in_flight_ = cc->InputSidePackets().Tag(kMaxInFlightTag).Get<int>();
}
RET_CHECK_GE(max_in_flight_, 1);
num_in_flight_ = 0;

View File

@ -33,6 +33,9 @@
namespace mediapipe {
namespace {
constexpr char kFinishedTag[] = "FINISHED";
// A simple Semaphore for synchronizing test threads.
class AtomicSemaphore {
public:
@ -112,7 +115,7 @@ TEST(RealTimeFlowLimiterCalculator, BasicTest) {
Timestamp timestamp =
Timestamp((i + 1) * Timestamp::kTimestampUnitsPerSecond);
runner.MutableInputs()
->Tag("FINISHED")
->Tag(kFinishedTag)
.packets.push_back(MakePacket<bool>(true).At(timestamp));
}

View File

@ -22,6 +22,8 @@ namespace mediapipe {
namespace {
constexpr char kPacketOffsetTag[] = "PACKET_OFFSET";
// Adds packets containing integers equal to their original timestamp.
void AddPackets(CalculatorRunner* runner) {
for (int i = 0; i < 10; ++i) {
@ -111,7 +113,7 @@ TEST(SequenceShiftCalculatorTest, SidePacketOffset) {
CalculatorRunner runner(node);
AddPackets(&runner);
runner.MutableSidePackets()->Tag("PACKET_OFFSET") = Adopt(new int(-2));
runner.MutableSidePackets()->Tag(kPacketOffsetTag) = Adopt(new int(-2));
MP_ASSERT_OK(runner.Run());
const std::vector<Packet>& input_packets =
runner.MutableInputs()->Index(0).packets;

View File

@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MEDIAPIPE_CALCULATORS_CORE_SPLIT_NORMALIZED_LANDMARK_LIST_CALCULATOR_H_ // NOLINT
#define MEDIAPIPE_CALCULATORS_CORE_SPLIT_NORMALIZED_LANDMARK_LIST_CALCULATOR_H_ // NOLINT
#ifndef MEDIAPIPE_CALCULATORS_CORE_SPLIT_LANDMARKS_CALCULATOR_H_ // NOLINT
#define MEDIAPIPE_CALCULATORS_CORE_SPLIT_LANDMARKS_CALCULATOR_H_ // NOLINT
#include "mediapipe/calculators/core/split_vector_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
@ -24,29 +24,30 @@
namespace mediapipe {
// Splits an input packet with NormalizedLandmarkList into
// multiple NormalizedLandmarkList output packets using the [begin, end) ranges
// Splits an input packet with LandmarkListType into
// multiple LandmarkListType output packets using the [begin, end) ranges
// specified in SplitVectorCalculatorOptions. If the option "element_only" is
// set to true, all ranges should be of size 1 and all outputs will be elements
// of type NormalizedLandmark. If "element_only" is false, ranges can be
// non-zero in size and all outputs will be of type NormalizedLandmarkList.
// of type LandmarkType. If "element_only" is false, ranges can be
// non-zero in size and all outputs will be of type LandmarkListType.
// If the option "combine_outputs" is set to true, only one output stream can be
// specified and all ranges of elements will be combined into one
// NormalizedLandmarkList.
class SplitNormalizedLandmarkListCalculator : public CalculatorBase {
// LandmarkListType.
template <typename LandmarkType, typename LandmarkListType>
class SplitLandmarksCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
RET_CHECK(cc->Inputs().NumEntries() == 1);
RET_CHECK(cc->Outputs().NumEntries() != 0);
cc->Inputs().Index(0).Set<NormalizedLandmarkList>();
cc->Inputs().Index(0).Set<LandmarkListType>();
const auto& options =
cc->Options<::mediapipe::SplitVectorCalculatorOptions>();
if (options.combine_outputs()) {
RET_CHECK_EQ(cc->Outputs().NumEntries(), 1);
cc->Outputs().Index(0).Set<NormalizedLandmarkList>();
cc->Outputs().Index(0).Set<LandmarkListType>();
for (int i = 0; i < options.ranges_size() - 1; ++i) {
for (int j = i + 1; j < options.ranges_size(); ++j) {
const auto& range_0 = options.ranges(i);
@ -81,9 +82,9 @@ class SplitNormalizedLandmarkListCalculator : public CalculatorBase {
return absl::InvalidArgumentError(
"Since element_only is true, all ranges should be of size 1.");
}
cc->Outputs().Index(i).Set<NormalizedLandmark>();
cc->Outputs().Index(i).Set<LandmarkType>();
} else {
cc->Outputs().Index(i).Set<NormalizedLandmarkList>();
cc->Outputs().Index(i).Set<LandmarkListType>();
}
}
}
@ -110,40 +111,39 @@ class SplitNormalizedLandmarkListCalculator : public CalculatorBase {
}
absl::Status Process(CalculatorContext* cc) override {
const NormalizedLandmarkList& input =
cc->Inputs().Index(0).Get<NormalizedLandmarkList>();
const LandmarkListType& input =
cc->Inputs().Index(0).Get<LandmarkListType>();
RET_CHECK_GE(input.landmark_size(), max_range_end_)
<< "Max range end " << max_range_end_ << " exceeds landmarks size "
<< input.landmark_size();
if (combine_outputs_) {
NormalizedLandmarkList output;
LandmarkListType output;
for (int i = 0; i < ranges_.size(); ++i) {
for (int j = ranges_[i].first; j < ranges_[i].second; ++j) {
const NormalizedLandmark& input_landmark = input.landmark(j);
const LandmarkType& input_landmark = input.landmark(j);
*output.add_landmark() = input_landmark;
}
}
RET_CHECK_EQ(output.landmark_size(), total_elements_);
cc->Outputs().Index(0).AddPacket(
MakePacket<NormalizedLandmarkList>(output).At(cc->InputTimestamp()));
MakePacket<LandmarkListType>(output).At(cc->InputTimestamp()));
} else {
if (element_only_) {
for (int i = 0; i < ranges_.size(); ++i) {
cc->Outputs().Index(i).AddPacket(
MakePacket<NormalizedLandmark>(input.landmark(ranges_[i].first))
MakePacket<LandmarkType>(input.landmark(ranges_[i].first))
.At(cc->InputTimestamp()));
}
} else {
for (int i = 0; i < ranges_.size(); ++i) {
NormalizedLandmarkList output;
LandmarkListType output;
for (int j = ranges_[i].first; j < ranges_[i].second; ++j) {
const NormalizedLandmark& input_landmark = input.landmark(j);
const LandmarkType& input_landmark = input.landmark(j);
*output.add_landmark() = input_landmark;
}
cc->Outputs().Index(i).AddPacket(
MakePacket<NormalizedLandmarkList>(output).At(
cc->InputTimestamp()));
MakePacket<LandmarkListType>(output).At(cc->InputTimestamp()));
}
}
}
@ -159,9 +159,15 @@ class SplitNormalizedLandmarkListCalculator : public CalculatorBase {
bool combine_outputs_ = false;
};
typedef SplitLandmarksCalculator<NormalizedLandmark, NormalizedLandmarkList>
SplitNormalizedLandmarkListCalculator;
REGISTER_CALCULATOR(SplitNormalizedLandmarkListCalculator);
typedef SplitLandmarksCalculator<Landmark, LandmarkList>
SplitLandmarkListCalculator;
REGISTER_CALCULATOR(SplitLandmarkListCalculator);
} // namespace mediapipe
// NOLINTNEXTLINE
#endif // MEDIAPIPE_CALCULATORS_CORE_SPLIT_NORMALIZED_LANDMARK_LIST_CALCULATOR_H_
#endif // MEDIAPIPE_CALCULATORS_CORE_SPLIT_LANDMARKS_CALCULATOR_H_

View File

@ -80,6 +80,16 @@ mediapipe_proto_library(
],
)
mediapipe_proto_library(
name = "segmentation_smoothing_calculator_proto",
srcs = ["segmentation_smoothing_calculator.proto"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
],
)
cc_library(
name = "color_convert_calculator",
srcs = ["color_convert_calculator.cc"],
@ -602,3 +612,187 @@ cc_test(
"//mediapipe/framework/port:parse_text_proto",
],
)
cc_library(
name = "segmentation_smoothing_calculator",
srcs = ["segmentation_smoothing_calculator.cc"],
visibility = ["//visibility:public"],
deps = [
":segmentation_smoothing_calculator_cc_proto",
"//mediapipe/framework:calculator_options_cc_proto",
"//mediapipe/framework/formats:image_format_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_frame_opencv",
"//mediapipe/framework/formats:image",
"//mediapipe/framework/formats:image_opencv",
"//mediapipe/framework/port:logging",
"//mediapipe/framework/port:opencv_core",
"//mediapipe/framework/port:status",
"//mediapipe/framework/port:vector",
] + select({
"//mediapipe/gpu:disable_gpu": [],
"//conditions:default": [
"//mediapipe/gpu:gl_calculator_helper",
"//mediapipe/gpu:gl_simple_shaders",
"//mediapipe/gpu:gl_quad_renderer",
"//mediapipe/gpu:shader_util",
],
}),
alwayslink = 1,
)
cc_test(
name = "segmentation_smoothing_calculator_test",
srcs = ["segmentation_smoothing_calculator_test.cc"],
deps = [
":image_clone_calculator",
":image_clone_calculator_cc_proto",
":segmentation_smoothing_calculator",
":segmentation_smoothing_calculator_cc_proto",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_runner",
"//mediapipe/framework/deps:file_path",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_opencv",
"//mediapipe/framework/port:gtest_main",
"//mediapipe/framework/port:opencv_imgcodecs",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:parse_text_proto",
],
)
cc_library(
name = "affine_transformation",
hdrs = ["affine_transformation.h"],
deps = ["@com_google_absl//absl/status:statusor"],
)
cc_library(
name = "affine_transformation_runner_gl",
srcs = ["affine_transformation_runner_gl.cc"],
hdrs = ["affine_transformation_runner_gl.h"],
deps = [
":affine_transformation",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework/port:ret_check",
"//mediapipe/gpu:gl_calculator_helper",
"//mediapipe/gpu:gl_simple_shaders",
"//mediapipe/gpu:gpu_buffer",
"//mediapipe/gpu:gpu_origin_cc_proto",
"//mediapipe/gpu:shader_util",
"@com_google_absl//absl/memory",
"@com_google_absl//absl/status",
"@com_google_absl//absl/status:statusor",
"@eigen_archive//:eigen3",
],
)
cc_library(
name = "affine_transformation_runner_opencv",
srcs = ["affine_transformation_runner_opencv.cc"],
hdrs = ["affine_transformation_runner_opencv.h"],
deps = [
":affine_transformation",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_frame_opencv",
"//mediapipe/framework/port:opencv_core",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:ret_check",
"@com_google_absl//absl/memory",
"@com_google_absl//absl/status:statusor",
"@eigen_archive//:eigen3",
],
)
mediapipe_proto_library(
name = "warp_affine_calculator_proto",
srcs = ["warp_affine_calculator.proto"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
"//mediapipe/gpu:gpu_origin_proto",
],
)
cc_library(
name = "warp_affine_calculator",
srcs = ["warp_affine_calculator.cc"],
hdrs = ["warp_affine_calculator.h"],
visibility = ["//visibility:public"],
deps = [
":affine_transformation",
":affine_transformation_runner_opencv",
":warp_affine_calculator_cc_proto",
"@com_google_absl//absl/status",
"@com_google_absl//absl/status:statusor",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework/api2:node",
"//mediapipe/framework/api2:port",
"//mediapipe/framework/formats:image",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
] + select({
"//mediapipe/gpu:disable_gpu": [],
"//conditions:default": [
"//mediapipe/gpu:gl_calculator_helper",
"//mediapipe/gpu:gpu_buffer",
":affine_transformation_runner_gl",
],
}),
alwayslink = 1,
)
cc_test(
name = "warp_affine_calculator_test",
srcs = ["warp_affine_calculator_test.cc"],
data = [
"//mediapipe/calculators/tensor:testdata/image_to_tensor/input.jpg",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/large_sub_rect.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/large_sub_rect_border_zero.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/large_sub_rect_keep_aspect.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/large_sub_rect_keep_aspect_border_zero.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/large_sub_rect_keep_aspect_with_rotation.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/large_sub_rect_keep_aspect_with_rotation_border_zero.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_keep_aspect.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_keep_aspect_border_zero.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_keep_aspect_with_rotation.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_keep_aspect_with_rotation_border_zero.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_with_rotation.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/medium_sub_rect_with_rotation_border_zero.png",
"//mediapipe/calculators/tensor:testdata/image_to_tensor/noop_except_range.png",
],
tags = ["desktop_only_test"],
deps = [
":affine_transformation",
":warp_affine_calculator",
"//mediapipe/calculators/image:image_transformation_calculator",
"//mediapipe/calculators/tensor:image_to_tensor_converter",
"//mediapipe/calculators/tensor:image_to_tensor_utils",
"//mediapipe/calculators/util:from_image_calculator",
"//mediapipe/calculators/util:to_image_calculator",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:calculator_runner",
"//mediapipe/framework/deps:file_path",
"//mediapipe/framework/formats:image",
"//mediapipe/framework/formats:image_format_cc_proto",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_frame_opencv",
"//mediapipe/framework/formats:rect_cc_proto",
"//mediapipe/framework/formats:tensor",
"//mediapipe/framework/port:gtest_main",
"//mediapipe/framework/port:integral_types",
"//mediapipe/framework/port:opencv_core",
"//mediapipe/framework/port:opencv_imgcodecs",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:parse_text_proto",
"//mediapipe/gpu:gpu_buffer_to_image_frame_calculator",
"//mediapipe/gpu:image_frame_to_gpu_buffer_calculator",
"@com_google_absl//absl/flags:flag",
"@com_google_absl//absl/memory",
"@com_google_absl//absl/strings",
],
)

View File

@ -0,0 +1,55 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_H_
#define MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_H_
#include <array>
#include "absl/status/statusor.h"
namespace mediapipe {
class AffineTransformation {
public:
// Pixel extrapolation method.
// When converting image to tensor it may happen that tensor needs to read
// pixels outside image boundaries. Border mode helps to specify how such
// pixels will be calculated.
enum class BorderMode { kZero, kReplicate };
struct Size {
int width;
int height;
};
template <typename InputT, typename OutputT>
class Runner {
public:
virtual ~Runner() = default;
// Transforms input into output using @matrix as following:
// output(x, y) = input(matrix[0] * x + matrix[1] * y + matrix[3],
// matrix[4] * x + matrix[5] * y + matrix[7])
// where x and y ranges are defined by @output_size.
virtual absl::StatusOr<OutputT> Run(const InputT& input,
const std::array<float, 16>& matrix,
const Size& output_size,
BorderMode border_mode) = 0;
};
};
} // namespace mediapipe
#endif // MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_H_

View File

@ -0,0 +1,354 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mediapipe/calculators/image/affine_transformation_runner_gl.h"
#include <memory>
#include <optional>
#include "Eigen/Core"
#include "Eigen/Geometry"
#include "Eigen/LU"
#include "absl/memory/memory.h"
#include "absl/status/status.h"
#include "absl/status/statusor.h"
#include "mediapipe/calculators/image/affine_transformation.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/gpu/gl_calculator_helper.h"
#include "mediapipe/gpu/gl_simple_shaders.h"
#include "mediapipe/gpu/gpu_buffer.h"
#include "mediapipe/gpu/gpu_origin.pb.h"
#include "mediapipe/gpu/shader_util.h"
namespace mediapipe {
namespace {
using mediapipe::GlCalculatorHelper;
using mediapipe::GlhCreateProgram;
using mediapipe::GlTexture;
using mediapipe::GpuBuffer;
using mediapipe::GpuOrigin;
bool IsMatrixVerticalFlipNeeded(GpuOrigin::Mode gpu_origin) {
switch (gpu_origin) {
case GpuOrigin::DEFAULT:
case GpuOrigin::CONVENTIONAL:
#ifdef __APPLE__
return false;
#else
return true;
#endif // __APPLE__
case GpuOrigin::TOP_LEFT:
return false;
}
}
#ifdef __APPLE__
#define GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED 0
#else
#define GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED 1
#endif // __APPLE__
bool IsGlClampToBorderSupported(const mediapipe::GlContext& gl_context) {
return gl_context.gl_major_version() > 3 ||
(gl_context.gl_major_version() == 3 &&
gl_context.gl_minor_version() >= 2);
}
constexpr int kAttribVertex = 0;
constexpr int kAttribTexturePosition = 1;
constexpr int kNumAttributes = 2;
class GlTextureWarpAffineRunner
: public AffineTransformation::Runner<GpuBuffer,
std::unique_ptr<GpuBuffer>> {
public:
GlTextureWarpAffineRunner(std::shared_ptr<GlCalculatorHelper> gl_helper,
GpuOrigin::Mode gpu_origin)
: gl_helper_(gl_helper), gpu_origin_(gpu_origin) {}
absl::Status Init() {
return gl_helper_->RunInGlContext([this]() -> absl::Status {
const GLint attr_location[kNumAttributes] = {
kAttribVertex,
kAttribTexturePosition,
};
const GLchar* attr_name[kNumAttributes] = {
"position",
"texture_coordinate",
};
constexpr GLchar kVertShader[] = R"(
in vec4 position;
in mediump vec4 texture_coordinate;
out mediump vec2 sample_coordinate;
uniform mat4 transform_matrix;
void main() {
gl_Position = position;
vec4 tc = transform_matrix * texture_coordinate;
sample_coordinate = tc.xy;
}
)";
constexpr GLchar kFragShader[] = R"(
DEFAULT_PRECISION(mediump, float)
in vec2 sample_coordinate;
uniform sampler2D input_texture;
#ifdef GL_ES
#define fragColor gl_FragColor
#else
out vec4 fragColor;
#endif // defined(GL_ES);
void main() {
vec4 color = texture2D(input_texture, sample_coordinate);
#ifdef CUSTOM_ZERO_BORDER_MODE
float out_of_bounds =
float(sample_coordinate.x < 0.0 || sample_coordinate.x > 1.0 ||
sample_coordinate.y < 0.0 || sample_coordinate.y > 1.0);
color = mix(color, vec4(0.0, 0.0, 0.0, 0.0), out_of_bounds);
#endif // defined(CUSTOM_ZERO_BORDER_MODE)
fragColor = color;
}
)";
// Create program and set parameters.
auto create_fn = [&](const std::string& vs,
const std::string& fs) -> absl::StatusOr<Program> {
GLuint program = 0;
GlhCreateProgram(vs.c_str(), fs.c_str(), kNumAttributes, &attr_name[0],
attr_location, &program);
RET_CHECK(program) << "Problem initializing warp affine program.";
glUseProgram(program);
glUniform1i(glGetUniformLocation(program, "input_texture"), 1);
GLint matrix_id = glGetUniformLocation(program, "transform_matrix");
return Program{.id = program, .matrix_id = matrix_id};
};
const std::string vert_src =
absl::StrCat(mediapipe::kMediaPipeVertexShaderPreamble, kVertShader);
const std::string frag_src = absl::StrCat(
mediapipe::kMediaPipeFragmentShaderPreamble, kFragShader);
ASSIGN_OR_RETURN(program_, create_fn(vert_src, frag_src));
auto create_custom_zero_fn = [&]() -> absl::StatusOr<Program> {
std::string custom_zero_border_mode_def = R"(
#define CUSTOM_ZERO_BORDER_MODE
)";
const std::string frag_custom_zero_src =
absl::StrCat(mediapipe::kMediaPipeFragmentShaderPreamble,
custom_zero_border_mode_def, kFragShader);
return create_fn(vert_src, frag_custom_zero_src);
};
#if GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
if (!IsGlClampToBorderSupported(gl_helper_->GetGlContext())) {
ASSIGN_OR_RETURN(program_custom_zero_, create_custom_zero_fn());
}
#else
ASSIGN_OR_RETURN(program_custom_zero_, create_custom_zero_fn());
#endif // GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
glGenFramebuffers(1, &framebuffer_);
// vertex storage
glGenBuffers(2, vbo_);
glGenVertexArrays(1, &vao_);
// vbo 0
glBindBuffer(GL_ARRAY_BUFFER, vbo_[0]);
glBufferData(GL_ARRAY_BUFFER, sizeof(mediapipe::kBasicSquareVertices),
mediapipe::kBasicSquareVertices, GL_STATIC_DRAW);
// vbo 1
glBindBuffer(GL_ARRAY_BUFFER, vbo_[1]);
glBufferData(GL_ARRAY_BUFFER, sizeof(mediapipe::kBasicTextureVertices),
mediapipe::kBasicTextureVertices, GL_STATIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
return absl::OkStatus();
});
}
absl::StatusOr<std::unique_ptr<GpuBuffer>> Run(
const GpuBuffer& input, const std::array<float, 16>& matrix,
const AffineTransformation::Size& size,
AffineTransformation::BorderMode border_mode) override {
std::unique_ptr<GpuBuffer> gpu_buffer;
MP_RETURN_IF_ERROR(
gl_helper_->RunInGlContext([this, &input, &matrix, &size, &border_mode,
&gpu_buffer]() -> absl::Status {
auto input_texture = gl_helper_->CreateSourceTexture(input);
auto output_texture = gl_helper_->CreateDestinationTexture(
size.width, size.height, input.format());
MP_RETURN_IF_ERROR(
RunInternal(input_texture, matrix, border_mode, &output_texture));
gpu_buffer = output_texture.GetFrame<GpuBuffer>();
return absl::OkStatus();
}));
return gpu_buffer;
}
absl::Status RunInternal(const GlTexture& texture,
const std::array<float, 16>& matrix,
AffineTransformation::BorderMode border_mode,
GlTexture* output) {
glDisable(GL_DEPTH_TEST);
glBindFramebuffer(GL_FRAMEBUFFER, framebuffer_);
glViewport(0, 0, output->width(), output->height());
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, output->name());
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D,
output->name(), 0);
glActiveTexture(GL_TEXTURE1);
glBindTexture(texture.target(), texture.name());
// a) Filtering.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
// b) Clamping.
std::optional<Program> program = program_;
switch (border_mode) {
case AffineTransformation::BorderMode::kReplicate: {
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
break;
}
case AffineTransformation::BorderMode::kZero: {
#if GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
if (program_custom_zero_) {
program = program_custom_zero_;
} else {
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_BORDER);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_BORDER);
glTexParameterfv(GL_TEXTURE_2D, GL_TEXTURE_BORDER_COLOR,
std::array<float, 4>{0.0f, 0.0f, 0.0f, 0.0f}.data());
}
#else
RET_CHECK(program_custom_zero_)
<< "Program must have been initialized.";
program = program_custom_zero_;
#endif // GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
break;
}
}
glUseProgram(program->id);
Eigen::Matrix<float, 4, 4, Eigen::RowMajor> eigen_mat(matrix.data());
if (IsMatrixVerticalFlipNeeded(gpu_origin_)) {
// @matrix describes affine transformation in terms of TOP LEFT origin, so
// in some cases/on some platforms an extra flipping should be done before
// and after.
const Eigen::Matrix<float, 4, 4, Eigen::RowMajor> flip_y(
{{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, -1.0f, 0.0f, 1.0f},
{0.0f, 0.0f, 1.0f, 0.0f},
{0.0f, 0.0f, 0.0f, 1.0f}});
eigen_mat = flip_y * eigen_mat * flip_y;
}
// If GL context is ES2, then GL_FALSE must be used for 'transpose'
// GLboolean in glUniformMatrix4fv, or else INVALID_VALUE error is reported.
// Hence, transposing the matrix and always passing transposed.
eigen_mat.transposeInPlace();
glUniformMatrix4fv(program->matrix_id, 1, GL_FALSE, eigen_mat.data());
// vao
glBindVertexArray(vao_);
// vbo 0
glBindBuffer(GL_ARRAY_BUFFER, vbo_[0]);
glEnableVertexAttribArray(kAttribVertex);
glVertexAttribPointer(kAttribVertex, 2, GL_FLOAT, 0, 0, nullptr);
// vbo 1
glBindBuffer(GL_ARRAY_BUFFER, vbo_[1]);
glEnableVertexAttribArray(kAttribTexturePosition);
glVertexAttribPointer(kAttribTexturePosition, 2, GL_FLOAT, 0, 0, nullptr);
// draw
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
// Resetting to MediaPipe texture param defaults.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glDisableVertexAttribArray(kAttribVertex);
glDisableVertexAttribArray(kAttribTexturePosition);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, 0);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, 0);
return absl::OkStatus();
}
~GlTextureWarpAffineRunner() override {
gl_helper_->RunInGlContext([this]() {
// Release OpenGL resources.
if (framebuffer_ != 0) glDeleteFramebuffers(1, &framebuffer_);
if (program_.id != 0) glDeleteProgram(program_.id);
if (program_custom_zero_ && program_custom_zero_->id != 0) {
glDeleteProgram(program_custom_zero_->id);
}
if (vao_ != 0) glDeleteVertexArrays(1, &vao_);
glDeleteBuffers(2, vbo_);
});
}
private:
struct Program {
GLuint id;
GLint matrix_id;
};
std::shared_ptr<GlCalculatorHelper> gl_helper_;
GpuOrigin::Mode gpu_origin_;
GLuint vao_ = 0;
GLuint vbo_[2] = {0, 0};
Program program_;
std::optional<Program> program_custom_zero_;
GLuint framebuffer_ = 0;
};
#undef GL_CLAMP_TO_BORDER_MAY_BE_SUPPORTED
} // namespace
absl::StatusOr<std::unique_ptr<
AffineTransformation::Runner<GpuBuffer, std::unique_ptr<GpuBuffer>>>>
CreateAffineTransformationGlRunner(
std::shared_ptr<GlCalculatorHelper> gl_helper, GpuOrigin::Mode gpu_origin) {
auto runner =
absl::make_unique<GlTextureWarpAffineRunner>(gl_helper, gpu_origin);
MP_RETURN_IF_ERROR(runner->Init());
return runner;
}
} // namespace mediapipe

View File

@ -0,0 +1,36 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_RUNNER_GL_H_
#define MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_RUNNER_GL_H_
#include <memory>
#include "absl/status/statusor.h"
#include "mediapipe/calculators/image/affine_transformation.h"
#include "mediapipe/gpu/gl_calculator_helper.h"
#include "mediapipe/gpu/gpu_buffer.h"
#include "mediapipe/gpu/gpu_origin.pb.h"
namespace mediapipe {
absl::StatusOr<std::unique_ptr<AffineTransformation::Runner<
mediapipe::GpuBuffer, std::unique_ptr<mediapipe::GpuBuffer>>>>
CreateAffineTransformationGlRunner(
std::shared_ptr<mediapipe::GlCalculatorHelper> gl_helper,
mediapipe::GpuOrigin::Mode gpu_origin);
} // namespace mediapipe
#endif // MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_RUNNER_GL_H_

View File

@ -0,0 +1,160 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include "mediapipe/calculators/image/affine_transformation_runner_opencv.h"
#include <memory>
#include "absl/memory/memory.h"
#include "absl/status/statusor.h"
#include "mediapipe/calculators/image/affine_transformation.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/port/opencv_core_inc.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/ret_check.h"
namespace mediapipe {
namespace {
cv::BorderTypes GetBorderModeForOpenCv(
AffineTransformation::BorderMode border_mode) {
switch (border_mode) {
case AffineTransformation::BorderMode::kZero:
return cv::BORDER_CONSTANT;
case AffineTransformation::BorderMode::kReplicate:
return cv::BORDER_REPLICATE;
}
}
class OpenCvRunner
: public AffineTransformation::Runner<ImageFrame, ImageFrame> {
public:
absl::StatusOr<ImageFrame> Run(
const ImageFrame& input, const std::array<float, 16>& matrix,
const AffineTransformation::Size& size,
AffineTransformation::BorderMode border_mode) override {
// OpenCV warpAffine works in absolute coordinates, so the transfom (which
// accepts and produces relative coordinates) should be adjusted to first
// normalize coordinates and then scale them.
// clang-format off
cv::Matx44f normalize_dst_coordinate({
1.0f / size.width, 0.0f, 0.0f, 0.0f,
0.0f, 1.0f / size.height, 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f});
cv::Matx44f scale_src_coordinate({
1.0f * input.Width(), 0.0f, 0.0f, 0.0f,
0.0f, 1.0f * input.Height(), 0.0f, 0.0f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f});
// clang-format on
cv::Matx44f adjust_dst_coordinate;
cv::Matx44f adjust_src_coordinate;
// TODO: update to always use accurate implementation.
constexpr bool kOpenCvCompatibility = true;
if (kOpenCvCompatibility) {
adjust_dst_coordinate = normalize_dst_coordinate;
adjust_src_coordinate = scale_src_coordinate;
} else {
// To do an accurate affine image transformation and make "on-cpu" and
// "on-gpu" calculations aligned - extra offset is required to select
// correct pixels.
//
// Each destination pixel corresponds to some pixels region from source
// image.(In case of downscaling there can be more than one pixel.) The
// offset for x and y is calculated in the way, so pixel in the middle of
// the region is selected.
//
// For simplicity sake, let's consider downscaling from 100x50 to 10x10
// without a rotation:
// 1. Each destination pixel corresponds to 10x5 region
// X range: [0, .. , 9]
// Y range: [0, .. , 4]
// 2. Considering we have __discrete__ pixels, the center of the region is
// between (4, 2) and (5, 2) pixels, let's assume it's a "pixel"
// (4.5, 2).
// 3. When using the above as an offset for every pixel select while
// downscaling, resulting pixels are:
// (4.5, 2), (14.5, 2), .. , (94.5, 2)
// (4.5, 7), (14.5, 7), .. , (94.5, 7)
// ..
// (4.5, 47), (14.5, 47), .., (94.5, 47)
// instead of:
// (0, 0), (10, 0), .. , (90, 0)
// (0, 5), (10, 7), .. , (90, 5)
// ..
// (0, 45), (10, 45), .., (90, 45)
// The latter looks shifted.
//
// Offsets are needed, so that __discrete__ pixel at (0, 0) corresponds to
// the same pixel as would __non discrete__ pixel at (0.5, 0.5). Hence,
// transformation matrix should shift coordinates by (0.5, 0.5) as the
// very first step.
//
// Due to the above shift, transformed coordinates would be valid for
// float coordinates where pixel (0, 0) spans [0.0, 1.0) x [0.0, 1.0).
// T0 make it valid for __discrete__ pixels, transformation matrix should
// shift coordinate by (-0.5f, -0.5f) as the very last step. (E.g. if we
// get (0.5f, 0.5f), then it's (0, 0) __discrete__ pixel.)
// clang-format off
cv::Matx44f shift_dst({1.0f, 0.0f, 0.0f, 0.5f,
0.0f, 1.0f, 0.0f, 0.5f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f});
cv::Matx44f shift_src({1.0f, 0.0f, 0.0f, -0.5f,
0.0f, 1.0f, 0.0f, -0.5f,
0.0f, 0.0f, 1.0f, 0.0f,
0.0f, 0.0f, 0.0f, 1.0f});
// clang-format on
adjust_dst_coordinate = normalize_dst_coordinate * shift_dst;
adjust_src_coordinate = shift_src * scale_src_coordinate;
}
cv::Matx44f transform(matrix.data());
cv::Matx44f transform_absolute =
adjust_src_coordinate * transform * adjust_dst_coordinate;
cv::Mat in_mat = formats::MatView(&input);
cv::Mat cv_affine_transform(2, 3, CV_32F);
cv_affine_transform.at<float>(0, 0) = transform_absolute.val[0];
cv_affine_transform.at<float>(0, 1) = transform_absolute.val[1];
cv_affine_transform.at<float>(0, 2) = transform_absolute.val[3];
cv_affine_transform.at<float>(1, 0) = transform_absolute.val[4];
cv_affine_transform.at<float>(1, 1) = transform_absolute.val[5];
cv_affine_transform.at<float>(1, 2) = transform_absolute.val[7];
ImageFrame out_image(input.Format(), size.width, size.height);
cv::Mat out_mat = formats::MatView(&out_image);
cv::warpAffine(in_mat, out_mat, cv_affine_transform,
cv::Size(out_mat.cols, out_mat.rows),
/*flags=*/cv::INTER_LINEAR | cv::WARP_INVERSE_MAP,
GetBorderModeForOpenCv(border_mode));
return out_image;
}
};
} // namespace
absl::StatusOr<
std::unique_ptr<AffineTransformation::Runner<ImageFrame, ImageFrame>>>
CreateAffineTransformationOpenCvRunner() {
return absl::make_unique<OpenCvRunner>();
}
} // namespace mediapipe

View File

@ -0,0 +1,32 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_RUNNER_OPENCV_H_
#define MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_RUNNER_OPENCV_H_
#include <memory>
#include "absl/status/statusor.h"
#include "mediapipe/calculators/image/affine_transformation.h"
#include "mediapipe/framework/formats/image_frame.h"
namespace mediapipe {
absl::StatusOr<
std::unique_ptr<AffineTransformation::Runner<ImageFrame, ImageFrame>>>
CreateAffineTransformationOpenCvRunner();
} // namespace mediapipe
#endif // MEDIAPIPE_CALCULATORS_IMAGE_AFFINE_TRANSFORMATION_RUNNER_OPENCV_H_

View File

@ -240,7 +240,7 @@ absl::Status BilateralFilterCalculator::RenderCpu(CalculatorContext* cc) {
auto input_mat = mediapipe::formats::MatView(&input_frame);
// Only 1 or 3 channel images supported by OpenCV.
if ((input_mat.channels() == 1 || input_mat.channels() == 3)) {
if (!(input_mat.channels() == 1 || input_mat.channels() == 3)) {
return absl::InternalError(
"CPU filtering supports only 1 or 3 channel input images.");
}

View File

@ -36,7 +36,7 @@ using GpuBuffer = mediapipe::GpuBuffer;
// stored on the target storage (CPU vs GPU) specified in the calculator option.
//
// The clone shares ownership of the input pixel data on the existing storage.
// If the target storage is diffrent from the existing one, then the data is
// If the target storage is different from the existing one, then the data is
// further copied there.
//
// Example usage:

View File

@ -102,6 +102,10 @@ mediapipe::ScaleMode_Mode ParseScaleMode(
// IMAGE: ImageFrame representing the input image.
// IMAGE_GPU: GpuBuffer representing the input image.
//
// OUTPUT_DIMENSIONS (optional): The output width and height in pixels as
// pair<int, int>. If set, it will override corresponding field in calculator
// options and input side packet.
//
// ROTATION_DEGREES (optional): The counterclockwise rotation angle in
// degrees. This allows different rotation angles for different frames. It has
// to be a multiple of 90 degrees. If provided, it overrides the
@ -221,6 +225,10 @@ absl::Status ImageTransformationCalculator::GetContract(
}
#endif // !MEDIAPIPE_DISABLE_GPU
if (cc->Inputs().HasTag("OUTPUT_DIMENSIONS")) {
cc->Inputs().Tag("OUTPUT_DIMENSIONS").Set<std::pair<int, int>>();
}
if (cc->Inputs().HasTag("ROTATION_DEGREES")) {
cc->Inputs().Tag("ROTATION_DEGREES").Set<int>();
}
@ -329,6 +337,13 @@ absl::Status ImageTransformationCalculator::Process(CalculatorContext* cc) {
!cc->Inputs().Tag("FLIP_VERTICALLY").IsEmpty()) {
flip_vertically_ = cc->Inputs().Tag("FLIP_VERTICALLY").Get<bool>();
}
if (cc->Inputs().HasTag("OUTPUT_DIMENSIONS") &&
!cc->Inputs().Tag("OUTPUT_DIMENSIONS").IsEmpty()) {
const auto& image_size =
cc->Inputs().Tag("OUTPUT_DIMENSIONS").Get<std::pair<int, int>>();
output_width_ = image_size.first;
output_height_ = image_size.second;
}
if (use_gpu_) {
#if !MEDIAPIPE_DISABLE_GPU

View File

@ -37,6 +37,22 @@ constexpr char kImageFrameTag[] = "IMAGE";
constexpr char kMaskCpuTag[] = "MASK";
constexpr char kGpuBufferTag[] = "IMAGE_GPU";
constexpr char kMaskGpuTag[] = "MASK_GPU";
inline cv::Vec3b Blend(const cv::Vec3b& color1, const cv::Vec3b& color2,
float weight, int invert_mask,
int adjust_with_luminance) {
weight = (1 - invert_mask) * weight + invert_mask * (1.0f - weight);
float luminance =
(1 - adjust_with_luminance) * 1.0f +
adjust_with_luminance *
(color1[0] * 0.299 + color1[1] * 0.587 + color1[2] * 0.114) / 255;
float mix_value = weight * luminance;
return color1 * (1.0 - mix_value) + color2 * mix_value;
}
} // namespace
namespace mediapipe {
@ -44,15 +60,14 @@ namespace mediapipe {
// A calculator to recolor a masked area of an image to a specified color.
//
// A mask image is used to specify where to overlay a user defined color.
// The luminance of the input image is used to adjust the blending weight,
// to help preserve image textures.
//
// Inputs:
// One of the following IMAGE tags:
// IMAGE: An ImageFrame input image, RGB or RGBA.
// IMAGE: An ImageFrame input image in ImageFormat::SRGB.
// IMAGE_GPU: A GpuBuffer input image, RGBA.
// One of the following MASK tags:
// MASK: An ImageFrame input mask, Gray, RGB or RGBA.
// MASK: An ImageFrame input mask in ImageFormat::GRAY8, SRGB, SRGBA, or
// VEC32F1
// MASK_GPU: A GpuBuffer input mask, RGBA.
// Output:
// One of the following IMAGE tags:
@ -98,10 +113,12 @@ class RecolorCalculator : public CalculatorBase {
void GlRender();
bool initialized_ = false;
std::vector<float> color_;
std::vector<uint8> color_;
mediapipe::RecolorCalculatorOptions::MaskChannel mask_channel_;
bool use_gpu_ = false;
bool invert_mask_ = false;
bool adjust_with_luminance_ = false;
#if !MEDIAPIPE_DISABLE_GPU
mediapipe::GlCalculatorHelper gpu_helper_;
GLuint program_ = 0;
@ -233,11 +250,15 @@ absl::Status RecolorCalculator::RenderCpu(CalculatorContext* cc) {
}
cv::Mat mask_full;
cv::resize(mask_mat, mask_full, input_mat.size());
const cv::Vec3b recolor = {color_[0], color_[1], color_[2]};
auto output_img = absl::make_unique<ImageFrame>(
input_img.Format(), input_mat.cols, input_mat.rows);
cv::Mat output_mat = mediapipe::formats::MatView(output_img.get());
const int invert_mask = invert_mask_ ? 1 : 0;
const int adjust_with_luminance = adjust_with_luminance_ ? 1 : 0;
// From GPU shader:
/*
vec4 weight = texture2D(mask, sample_coordinate);
@ -249,18 +270,23 @@ absl::Status RecolorCalculator::RenderCpu(CalculatorContext* cc) {
fragColor = mix(color1, color2, mix_value);
*/
if (mask_img.Format() == ImageFormat::VEC32F1) {
for (int i = 0; i < output_mat.rows; ++i) {
for (int j = 0; j < output_mat.cols; ++j) {
float weight = mask_full.at<uchar>(i, j) * (1.0 / 255.0);
cv::Vec3f color1 = input_mat.at<cv::Vec3b>(i, j);
cv::Vec3f color2 = {color_[0], color_[1], color_[2]};
float luminance =
(color1[0] * 0.299 + color1[1] * 0.587 + color1[2] * 0.114) / 255;
float mix_value = weight * luminance;
cv::Vec3b mix_color = color1 * (1.0 - mix_value) + color2 * mix_value;
output_mat.at<cv::Vec3b>(i, j) = mix_color;
const float weight = mask_full.at<float>(i, j);
output_mat.at<cv::Vec3b>(i, j) =
Blend(input_mat.at<cv::Vec3b>(i, j), recolor, weight, invert_mask,
adjust_with_luminance);
}
}
} else {
for (int i = 0; i < output_mat.rows; ++i) {
for (int j = 0; j < output_mat.cols; ++j) {
const float weight = mask_full.at<uchar>(i, j) * (1.0 / 255.0);
output_mat.at<cv::Vec3b>(i, j) =
Blend(input_mat.at<cv::Vec3b>(i, j), recolor, weight, invert_mask,
adjust_with_luminance);
}
}
}
@ -385,6 +411,9 @@ absl::Status RecolorCalculator::LoadOptions(CalculatorContext* cc) {
color_.push_back(options.color().g());
color_.push_back(options.color().b());
invert_mask_ = options.invert_mask();
adjust_with_luminance_ = options.adjust_with_luminance();
return absl::OkStatus();
}
@ -435,13 +464,20 @@ absl::Status RecolorCalculator::InitGpu(CalculatorContext* cc) {
uniform sampler2D frame;
uniform sampler2D mask;
uniform vec3 recolor;
uniform float invert_mask;
uniform float adjust_with_luminance;
void main() {
vec4 weight = texture2D(mask, sample_coordinate);
vec4 color1 = texture2D(frame, sample_coordinate);
vec4 color2 = vec4(recolor, 1.0);
float luminance = dot(color1.rgb, vec3(0.299, 0.587, 0.114));
weight = mix(weight, 1.0 - weight, invert_mask);
float luminance = mix(1.0,
dot(color1.rgb, vec3(0.299, 0.587, 0.114)),
adjust_with_luminance);
float mix_value = weight.MASK_COMPONENT * luminance;
fragColor = mix(color1, color2, mix_value);
@ -458,6 +494,10 @@ absl::Status RecolorCalculator::InitGpu(CalculatorContext* cc) {
glUniform1i(glGetUniformLocation(program_, "mask"), 2);
glUniform3f(glGetUniformLocation(program_, "recolor"), color_[0] / 255.0,
color_[1] / 255.0, color_[2] / 255.0);
glUniform1f(glGetUniformLocation(program_, "invert_mask"),
invert_mask_ ? 1.0f : 0.0f);
glUniform1f(glGetUniformLocation(program_, "adjust_with_luminance"),
adjust_with_luminance_ ? 1.0f : 0.0f);
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();

View File

@ -36,4 +36,11 @@ message RecolorCalculatorOptions {
// Color to blend into input image where mask is > 0.
// The blending is based on the input image luminosity.
optional Color color = 2;
// Swap the meaning of mask values for foreground/background.
optional bool invert_mask = 3 [default = false];
// Whether to use the luminance of the input image to further adjust the
// blending weight, to help preserve image textures.
optional bool adjust_with_luminance = 4 [default = true];
}

View File

@ -262,6 +262,7 @@ absl::Status ScaleImageCalculator::InitializeFrameInfo(CalculatorContext* cc) {
scale_image::FindOutputDimensions(crop_width_, crop_height_, //
options_.target_width(), //
options_.target_height(), //
options_.target_max_area(), //
options_.preserve_aspect_ratio(), //
options_.scale_to_multiple_of(), //
&output_width_, &output_height_));

View File

@ -28,6 +28,11 @@ message ScaleImageCalculatorOptions {
optional int32 target_width = 1;
optional int32 target_height = 2;
// If set, then automatically calculates a target_width and target_height that
// has an area below the target max area. Aspect ratio preservation cannot be
// disabled.
optional int32 target_max_area = 15;
// If true, the image is scaled up or down proportionally so that it
// fits inside the box represented by target_width and target_height.
// Otherwise it is scaled to fit target_width and target_height

View File

@ -92,12 +92,21 @@ absl::Status FindOutputDimensions(int input_width, //
int input_height, //
int target_width, //
int target_height, //
int target_max_area, //
bool preserve_aspect_ratio, //
int scale_to_multiple_of, //
int* output_width, int* output_height) {
CHECK(output_width);
CHECK(output_height);
if (target_max_area > 0 && input_width * input_height > target_max_area) {
preserve_aspect_ratio = true;
target_height = static_cast<int>(sqrt(static_cast<double>(target_max_area) /
(static_cast<double>(input_width) /
static_cast<double>(input_height))));
target_width = -1; // Resize width to preserve aspect ratio.
}
if (preserve_aspect_ratio) {
RET_CHECK(scale_to_multiple_of == 2)
<< "FindOutputDimensions always outputs width and height that are "
@ -164,5 +173,17 @@ absl::Status FindOutputDimensions(int input_width, //
<< "Unable to set output dimensions based on target dimensions.";
}
absl::Status FindOutputDimensions(int input_width, //
int input_height, //
int target_width, //
int target_height, //
bool preserve_aspect_ratio, //
int scale_to_multiple_of, //
int* output_width, int* output_height) {
return FindOutputDimensions(
input_width, input_height, target_width, target_height, -1,
preserve_aspect_ratio, scale_to_multiple_of, output_width, output_height);
}
} // namespace scale_image
} // namespace mediapipe

View File

@ -34,15 +34,25 @@ absl::Status FindCropDimensions(int input_width, int input_height, //
int* crop_width, int* crop_height, //
int* col_start, int* row_start);
// Given an input width and height, a target width and height, whether to
// preserve the aspect ratio, and whether to round-down to the multiple of a
// given number nearest to the targets, determine the output width and height.
// If target_width or target_height is non-positive, then they will be set to
// the input_width and input_height respectively. If scale_to_multiple_of is
// less than 1, it will be treated like 1. The output_width and
// output_height will be reduced as necessary to preserve_aspect_ratio if the
// option is specified. If preserving the aspect ratio is desired, you must set
// scale_to_multiple_of to 2.
// Given an input width and height, a target width and height or max area,
// whether to preserve the aspect ratio, and whether to round-down to the
// multiple of a given number nearest to the targets, determine the output width
// and height. If target_width or target_height is non-positive, then they will
// be set to the input_width and input_height respectively. If target_area is
// non-positive, then it will be ignored. If scale_to_multiple_of is less than
// 1, it will be treated like 1. The output_width and output_height will be
// reduced as necessary to preserve_aspect_ratio if the option is specified. If
// preserving the aspect ratio is desired, you must set scale_to_multiple_of
// to 2.
absl::Status FindOutputDimensions(int input_width, int input_height, //
int target_width,
int target_height, //
int target_max_area, //
bool preserve_aspect_ratio, //
int scale_to_multiple_of, //
int* output_width, int* output_height);
// Backwards compatible helper.
absl::Status FindOutputDimensions(int input_width, int input_height, //
int target_width,
int target_height, //

View File

@ -79,49 +79,49 @@ TEST(ScaleImageUtilsTest, FindOutputDimensionsPreserveRatio) {
int output_width;
int output_height;
// Not scale.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(200, output_width);
EXPECT_EQ(100, output_height);
// Not scale with odd input size.
MP_ASSERT_OK(FindOutputDimensions(201, 101, -1, -1, false, 1, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(201, 101, -1, -1, -1, false, 1,
&output_width, &output_height));
EXPECT_EQ(201, output_width);
EXPECT_EQ(101, output_height);
// Scale down by 1/2.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 100, -1, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 100, -1, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(100, output_width);
EXPECT_EQ(50, output_height);
// Scale up, doubling dimensions.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, 200, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, 200, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(400, output_width);
EXPECT_EQ(200, output_height);
// Fits a 2:1 image into a 150 x 150 box. Output dimensions are always
// visible by 2.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 150, 150, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 150, 150, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(150, output_width);
EXPECT_EQ(74, output_height);
// Fits a 2:1 image into a 400 x 50 box.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 400, 50, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 400, 50, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(100, output_width);
EXPECT_EQ(50, output_height);
// Scale to multiple number with odd targe size.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 101, -1, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 101, -1, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(100, output_width);
EXPECT_EQ(50, output_height);
// Scale to multiple number with odd targe size.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 101, -1, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 101, -1, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(100, output_width);
EXPECT_EQ(50, output_height);
// Scale to odd size.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 151, 101, false, 1, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 151, 101, -1, false, 1,
&output_width, &output_height));
EXPECT_EQ(151, output_width);
EXPECT_EQ(101, output_height);
}
@ -131,18 +131,18 @@ TEST(ScaleImageUtilsTest, FindOutputDimensionsNoAspectRatio) {
int output_width;
int output_height;
// Scale width only.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 100, -1, false, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 100, -1, -1, false, 2,
&output_width, &output_height));
EXPECT_EQ(100, output_width);
EXPECT_EQ(100, output_height);
// Scale height only.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, 200, false, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, 200, -1, false, 2,
&output_width, &output_height));
EXPECT_EQ(200, output_width);
EXPECT_EQ(200, output_height);
// Scale both dimensions.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 150, 200, false, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 150, 200, -1, false, 2,
&output_width, &output_height));
EXPECT_EQ(150, output_width);
EXPECT_EQ(200, output_height);
}
@ -152,41 +152,78 @@ TEST(ScaleImageUtilsTest, FindOutputDimensionsDownScaleToMultipleOf) {
int output_width;
int output_height;
// Set no targets, downscale to a multiple of 8.
MP_ASSERT_OK(FindOutputDimensions(100, 100, -1, -1, false, 8, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(100, 100, -1, -1, -1, false, 8,
&output_width, &output_height));
EXPECT_EQ(96, output_width);
EXPECT_EQ(96, output_height);
// Set width target, downscale to a multiple of 8.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 100, -1, false, 8, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 100, -1, -1, false, 8,
&output_width, &output_height));
EXPECT_EQ(96, output_width);
EXPECT_EQ(96, output_height);
// Set height target, downscale to a multiple of 8.
MP_ASSERT_OK(FindOutputDimensions(201, 101, -1, 201, false, 8, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(201, 101, -1, 201, -1, false, 8,
&output_width, &output_height));
EXPECT_EQ(200, output_width);
EXPECT_EQ(200, output_height);
// Set both targets, downscale to a multiple of 8.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 150, 200, false, 8, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 150, 200, -1, false, 8,
&output_width, &output_height));
EXPECT_EQ(144, output_width);
EXPECT_EQ(200, output_height);
// Doesn't throw error if keep aspect is true and downscale multiple is 2.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 400, 200, true, 2, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 400, 200, -1, true, 2,
&output_width, &output_height));
EXPECT_EQ(400, output_width);
EXPECT_EQ(200, output_height);
// Throws error if keep aspect is true, but downscale multiple is not 2.
ASSERT_THAT(FindOutputDimensions(200, 100, 400, 200, true, 4, &output_width,
&output_height),
ASSERT_THAT(FindOutputDimensions(200, 100, 400, 200, -1, true, 4,
&output_width, &output_height),
testing::Not(testing::status::IsOk()));
// Downscaling to multiple ignored if multiple is less than 2.
MP_ASSERT_OK(FindOutputDimensions(200, 100, 401, 201, false, 1, &output_width,
&output_height));
MP_ASSERT_OK(FindOutputDimensions(200, 100, 401, 201, -1, false, 1,
&output_width, &output_height));
EXPECT_EQ(401, output_width);
EXPECT_EQ(201, output_height);
}
// Tests scaling without keeping the aspect ratio fixed.
TEST(ScaleImageUtilsTest, FindOutputDimensionsMaxArea) {
int output_width;
int output_height;
// Smaller area.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, 9000, false, 2,
&output_width, &output_height));
EXPECT_NEAR(
200 / 100,
static_cast<double>(output_width) / static_cast<double>(output_height),
0.1f);
EXPECT_LE(output_width * output_height, 9000);
// Close to original area.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, 19999, false, 2,
&output_width, &output_height));
EXPECT_NEAR(
200.0 / 100.0,
static_cast<double>(output_width) / static_cast<double>(output_height),
0.1f);
EXPECT_LE(output_width * output_height, 19999);
// Don't scale with larger area.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, 20001, false, 2,
&output_width, &output_height));
EXPECT_EQ(200, output_width);
EXPECT_EQ(100, output_height);
// Don't scale with equal area.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, 20000, false, 2,
&output_width, &output_height));
EXPECT_EQ(200, output_width);
EXPECT_EQ(100, output_height);
// Don't scale at all.
MP_ASSERT_OK(FindOutputDimensions(200, 100, -1, -1, -1, false, 2,
&output_width, &output_height));
EXPECT_EQ(200, output_width);
EXPECT_EQ(100, output_height);
}
} // namespace
} // namespace scale_image
} // namespace mediapipe

View File

@ -0,0 +1,429 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <algorithm>
#include <memory>
#include "mediapipe/calculators/image/segmentation_smoothing_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/calculator_options.pb.h"
#include "mediapipe/framework/formats/image.h"
#include "mediapipe/framework/formats/image_format.pb.h"
#include "mediapipe/framework/formats/image_frame.h"
#include "mediapipe/framework/formats/image_frame_opencv.h"
#include "mediapipe/framework/formats/image_opencv.h"
#include "mediapipe/framework/port/logging.h"
#include "mediapipe/framework/port/opencv_core_inc.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/port/vector.h"
#if !MEDIAPIPE_DISABLE_GPU
#include "mediapipe/gpu/gl_calculator_helper.h"
#include "mediapipe/gpu/gl_simple_shaders.h"
#include "mediapipe/gpu/shader_util.h"
#endif // !MEDIAPIPE_DISABLE_GPU
namespace mediapipe {
namespace {
constexpr char kCurrentMaskTag[] = "MASK";
constexpr char kPreviousMaskTag[] = "MASK_PREVIOUS";
constexpr char kOutputMaskTag[] = "MASK_SMOOTHED";
enum { ATTRIB_VERTEX, ATTRIB_TEXTURE_POSITION, NUM_ATTRIBUTES };
} // namespace
// A calculator for mixing two segmentation masks together,
// based on an uncertantity probability estimate.
//
// Inputs:
// MASK - Image containing the new/current mask.
// [ImageFormat::VEC32F1, or
// GpuBufferFormat::kBGRA32/kRGB24/kGrayHalf16/kGrayFloat32]
// MASK_PREVIOUS - Image containing previous mask.
// [Same format as MASK_CURRENT]
// * If input channels is >1, only the first channel (R) is used as the mask.
//
// Output:
// MASK_SMOOTHED - Blended mask.
// [Same format as MASK_CURRENT]
// * The resulting filtered mask will be stored in R channel,
// and duplicated in A if 4 channels.
//
// Options:
// combine_with_previous_ratio - Amount of previous to blend with current.
//
// Example:
// node {
// calculator: "SegmentationSmoothingCalculator"
// input_stream: "MASK:mask"
// input_stream: "MASK_PREVIOUS:mask_previous"
// output_stream: "MASK_SMOOTHED:mask_smoothed"
// options: {
// [mediapipe.SegmentationSmoothingCalculatorOptions.ext] {
// combine_with_previous_ratio: 0.9
// }
// }
// }
//
class SegmentationSmoothingCalculator : public CalculatorBase {
public:
SegmentationSmoothingCalculator() = default;
static absl::Status GetContract(CalculatorContract* cc);
// From Calculator.
absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;
absl::Status Close(CalculatorContext* cc) override;
private:
absl::Status RenderGpu(CalculatorContext* cc);
absl::Status RenderCpu(CalculatorContext* cc);
absl::Status GlSetup(CalculatorContext* cc);
void GlRender(CalculatorContext* cc);
float combine_with_previous_ratio_;
bool gpu_initialized_ = false;
#if !MEDIAPIPE_DISABLE_GPU
mediapipe::GlCalculatorHelper gpu_helper_;
GLuint program_ = 0;
#endif // !MEDIAPIPE_DISABLE_GPU
};
REGISTER_CALCULATOR(SegmentationSmoothingCalculator);
absl::Status SegmentationSmoothingCalculator::GetContract(
CalculatorContract* cc) {
CHECK_GE(cc->Inputs().NumEntries(), 1);
cc->Inputs().Tag(kCurrentMaskTag).Set<Image>();
cc->Inputs().Tag(kPreviousMaskTag).Set<Image>();
cc->Outputs().Tag(kOutputMaskTag).Set<Image>();
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(mediapipe::GlCalculatorHelper::UpdateContract(cc));
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
absl::Status SegmentationSmoothingCalculator::Open(CalculatorContext* cc) {
cc->SetOffset(TimestampDiff(0));
auto options =
cc->Options<mediapipe::SegmentationSmoothingCalculatorOptions>();
combine_with_previous_ratio_ = options.combine_with_previous_ratio();
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(gpu_helper_.Open(cc));
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
absl::Status SegmentationSmoothingCalculator::Process(CalculatorContext* cc) {
if (cc->Inputs().Tag(kCurrentMaskTag).IsEmpty()) {
return absl::OkStatus();
}
if (cc->Inputs().Tag(kPreviousMaskTag).IsEmpty()) {
// Pass through current image if previous is not available.
cc->Outputs()
.Tag(kOutputMaskTag)
.AddPacket(cc->Inputs().Tag(kCurrentMaskTag).Value());
return absl::OkStatus();
}
// Run on GPU if incoming data is on GPU.
const bool use_gpu = cc->Inputs().Tag(kCurrentMaskTag).Get<Image>().UsesGpu();
if (use_gpu) {
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(gpu_helper_.RunInGlContext([this, cc]() -> absl::Status {
if (!gpu_initialized_) {
MP_RETURN_IF_ERROR(GlSetup(cc));
gpu_initialized_ = true;
}
MP_RETURN_IF_ERROR(RenderGpu(cc));
return absl::OkStatus();
}));
#else
return absl::InternalError("GPU processing is disabled.");
#endif // !MEDIAPIPE_DISABLE_GPU
} else {
MP_RETURN_IF_ERROR(RenderCpu(cc));
}
return absl::OkStatus();
}
absl::Status SegmentationSmoothingCalculator::Close(CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
gpu_helper_.RunInGlContext([this] {
if (program_) glDeleteProgram(program_);
program_ = 0;
});
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
absl::Status SegmentationSmoothingCalculator::RenderCpu(CalculatorContext* cc) {
// Setup source images.
const auto& current_frame = cc->Inputs().Tag(kCurrentMaskTag).Get<Image>();
const cv::Mat current_mat = mediapipe::formats::MatView(&current_frame);
RET_CHECK_EQ(current_mat.type(), CV_32FC1)
<< "Only 1-channel float input image is supported.";
const auto& previous_frame = cc->Inputs().Tag(kPreviousMaskTag).Get<Image>();
const cv::Mat previous_mat = mediapipe::formats::MatView(&previous_frame);
RET_CHECK_EQ(previous_mat.type(), current_mat.type())
<< "Warning: mixing input format types: " << previous_mat.type()
<< " != " << previous_mat.type();
RET_CHECK_EQ(current_mat.rows, previous_mat.rows);
RET_CHECK_EQ(current_mat.cols, previous_mat.cols);
// Setup destination image.
auto output_frame = std::make_shared<ImageFrame>(
current_frame.image_format(), current_mat.cols, current_mat.rows);
cv::Mat output_mat = mediapipe::formats::MatView(output_frame.get());
output_mat.setTo(cv::Scalar(0));
// Blending function.
const auto blending_fn = [&](const float prev_mask_value,
const float new_mask_value) {
/*
* Assume p := new_mask_value
* H(p) := 1 + (p * log(p) + (1-p) * log(1-p)) / log(2)
* uncertainty alpha(p) =
* Clamp(1 - (1 - H(p)) * (1 - H(p)), 0, 1) [squaring the uncertainty]
*
* The following polynomial approximates uncertainty alpha as a function
* of (p + 0.5):
*/
const float c1 = 5.68842;
const float c2 = -0.748699;
const float c3 = -57.8051;
const float c4 = 291.309;
const float c5 = -624.717;
const float t = new_mask_value - 0.5f;
const float x = t * t;
const float uncertainty =
1.0f -
std::min(1.0f, x * (c1 + x * (c2 + x * (c3 + x * (c4 + x * c5)))));
return new_mask_value + (prev_mask_value - new_mask_value) *
(uncertainty * combine_with_previous_ratio_);
};
// Write directly to the first channel of output.
for (int i = 0; i < output_mat.rows; ++i) {
float* out_ptr = output_mat.ptr<float>(i);
const float* curr_ptr = current_mat.ptr<float>(i);
const float* prev_ptr = previous_mat.ptr<float>(i);
for (int j = 0; j < output_mat.cols; ++j) {
const float new_mask_value = curr_ptr[j];
const float prev_mask_value = prev_ptr[j];
out_ptr[j] = blending_fn(prev_mask_value, new_mask_value);
}
}
cc->Outputs()
.Tag(kOutputMaskTag)
.AddPacket(MakePacket<Image>(output_frame).At(cc->InputTimestamp()));
return absl::OkStatus();
}
absl::Status SegmentationSmoothingCalculator::RenderGpu(CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
// Setup source textures.
const auto& current_frame = cc->Inputs().Tag(kCurrentMaskTag).Get<Image>();
RET_CHECK(
(current_frame.format() == mediapipe::GpuBufferFormat::kBGRA32 ||
current_frame.format() == mediapipe::GpuBufferFormat::kGrayHalf16 ||
current_frame.format() == mediapipe::GpuBufferFormat::kGrayFloat32 ||
current_frame.format() == mediapipe::GpuBufferFormat::kRGB24))
<< "Only RGBA, RGB, or 1-channel Float input image supported.";
auto current_texture = gpu_helper_.CreateSourceTexture(current_frame);
const auto& previous_frame = cc->Inputs().Tag(kPreviousMaskTag).Get<Image>();
if (previous_frame.format() != current_frame.format()) {
LOG(ERROR) << "Warning: mixing input format types. ";
}
auto previous_texture = gpu_helper_.CreateSourceTexture(previous_frame);
// Setup destination texture.
const int width = current_frame.width(), height = current_frame.height();
auto output_texture = gpu_helper_.CreateDestinationTexture(
width, height, current_frame.format());
// Process shader.
{
gpu_helper_.BindFramebuffer(output_texture);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, current_texture.name());
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, previous_texture.name());
GlRender(cc);
glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, 0);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, 0);
}
glFlush();
// Send out image as GPU packet.
auto output_frame = output_texture.GetFrame<Image>();
cc->Outputs()
.Tag(kOutputMaskTag)
.Add(output_frame.release(), cc->InputTimestamp());
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
void SegmentationSmoothingCalculator::GlRender(CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
static const GLfloat square_vertices[] = {
-1.0f, -1.0f, // bottom left
1.0f, -1.0f, // bottom right
-1.0f, 1.0f, // top left
1.0f, 1.0f, // top right
};
static const GLfloat texture_vertices[] = {
0.0f, 0.0f, // bottom left
1.0f, 0.0f, // bottom right
0.0f, 1.0f, // top left
1.0f, 1.0f, // top right
};
// program
glUseProgram(program_);
// vertex storage
GLuint vbo[2];
glGenBuffers(2, vbo);
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
// vbo 0
glBindBuffer(GL_ARRAY_BUFFER, vbo[0]);
glBufferData(GL_ARRAY_BUFFER, 4 * 2 * sizeof(GLfloat), square_vertices,
GL_STATIC_DRAW);
glEnableVertexAttribArray(ATTRIB_VERTEX);
glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, nullptr);
// vbo 1
glBindBuffer(GL_ARRAY_BUFFER, vbo[1]);
glBufferData(GL_ARRAY_BUFFER, 4 * 2 * sizeof(GLfloat), texture_vertices,
GL_STATIC_DRAW);
glEnableVertexAttribArray(ATTRIB_TEXTURE_POSITION);
glVertexAttribPointer(ATTRIB_TEXTURE_POSITION, 2, GL_FLOAT, 0, 0, nullptr);
// draw
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
// cleanup
glDisableVertexAttribArray(ATTRIB_VERTEX);
glDisableVertexAttribArray(ATTRIB_TEXTURE_POSITION);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
glDeleteVertexArrays(1, &vao);
glDeleteBuffers(2, vbo);
#endif // !MEDIAPIPE_DISABLE_GPU
}
absl::Status SegmentationSmoothingCalculator::GlSetup(CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
const GLint attr_location[NUM_ATTRIBUTES] = {
ATTRIB_VERTEX,
ATTRIB_TEXTURE_POSITION,
};
const GLchar* attr_name[NUM_ATTRIBUTES] = {
"position",
"texture_coordinate",
};
// Shader to blend in previous mask based on computed uncertainty probability.
const std::string frag_src =
absl::StrCat(std::string(mediapipe::kMediaPipeFragmentShaderPreamble),
R"(
DEFAULT_PRECISION(mediump, float)
#ifdef GL_ES
#define fragColor gl_FragColor
#else
out vec4 fragColor;
#endif // defined(GL_ES);
in vec2 sample_coordinate;
uniform sampler2D current_mask;
uniform sampler2D previous_mask;
uniform float combine_with_previous_ratio;
void main() {
vec4 current_pix = texture2D(current_mask, sample_coordinate);
vec4 previous_pix = texture2D(previous_mask, sample_coordinate);
float new_mask_value = current_pix.r;
float prev_mask_value = previous_pix.r;
// Assume p := new_mask_value
// H(p) := 1 + (p * log(p) + (1-p) * log(1-p)) / log(2)
// uncertainty alpha(p) =
// Clamp(1 - (1 - H(p)) * (1 - H(p)), 0, 1) [squaring the uncertainty]
//
// The following polynomial approximates uncertainty alpha as a function
// of (p + 0.5):
const float c1 = 5.68842;
const float c2 = -0.748699;
const float c3 = -57.8051;
const float c4 = 291.309;
const float c5 = -624.717;
float t = new_mask_value - 0.5;
float x = t * t;
float uncertainty =
1.0 - min(1.0, x * (c1 + x * (c2 + x * (c3 + x * (c4 + x * c5)))));
new_mask_value +=
(prev_mask_value - new_mask_value) * (uncertainty * combine_with_previous_ratio);
fragColor = vec4(new_mask_value, 0.0, 0.0, new_mask_value);
}
)");
// Create shader program and set parameters.
mediapipe::GlhCreateProgram(mediapipe::kBasicVertexShader, frag_src.c_str(),
NUM_ATTRIBUTES, (const GLchar**)&attr_name[0],
attr_location, &program_);
RET_CHECK(program_) << "Problem initializing the program.";
glUseProgram(program_);
glUniform1i(glGetUniformLocation(program_, "current_mask"), 1);
glUniform1i(glGetUniformLocation(program_, "previous_mask"), 2);
glUniform1f(glGetUniformLocation(program_, "combine_with_previous_ratio"),
combine_with_previous_ratio_);
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
} // namespace mediapipe

View File

@ -0,0 +1,35 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
package mediapipe;
import "mediapipe/framework/calculator.proto";
message SegmentationSmoothingCalculatorOptions {
extend CalculatorOptions {
optional SegmentationSmoothingCalculatorOptions ext = 377425128;
}
// How much to blend in previous mask, based on a probability estimate.
// Range: [0-1]
// 0 = Use only current frame (no blending).
// 1 = Blend in the previous mask based on uncertainty estimate.
// With ratio at 1, the uncertainty estimate is trusted completely.
// When uncertainty is high, the previous mask is given higher weight.
// Therefore, if both ratio and uncertainty are 1, only old mask is used.
// A pixel is 'uncertain' if its value is close to the middle (0.5 or 127).
optional float combine_with_previous_ratio = 1 [default = 0.0];
}

Some files were not shown because too many files have changed in this diff Show More