Project import generated by Copybara.

GitOrigin-RevId: 08c2016a4df5aef571b464a4d4491f38c6b2af10
This commit is contained in:
MediaPipe Team 2021-06-03 13:13:30 -07:00 committed by chuoling
parent ae05ad04b3
commit 8b57bf879b
118 changed files with 3999 additions and 391 deletions

View File

@ -0,0 +1,21 @@
<em>Please make sure that this is a build/installation issue and also refer to the [troubleshooting](https://google.github.io/mediapipe/getting_started/troubleshooting.html) documentation before raising any issues.</em>
**System information** (Please provide as much relevant information as possible)
- OS Platform and Distribution (e.g. Linux Ubuntu 16.04, Android 11, iOS 14.4):
- Compiler version (e.g. gcc/g++ 8 /Apple clang version 12.0.0):
- Programming Language and version ( e.g. C++ 14, Python 3.6, Java ):
- Installed using virtualenv? pip? Conda? (if python):
- [MediaPipe version](https://github.com/google/mediapipe/releases):
- Bazel version:
- XCode and Tulsi versions (if iOS):
- Android SDK and NDK versions (if android):
- Android [AAR](https://google.github.io/mediapipe/getting_started/android_archive_library.html) ( if android):
- OpenCV version (if running on desktop):
**Describe the problem**:
**[Provide the exact sequence of commands / steps that you executed before running into the problem](https://google.github.io/mediapipe/getting_started/getting_started.html):**
**Complete Logs:**
Include Complete Log information or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached:

View File

@ -0,0 +1,20 @@
<em>Please make sure that this is a [solution](https://google.github.io/mediapipe/solutions/solutions.html) issue.<em>
**System information** (Please provide as much relevant information as possible)
- Have I written custom code (as opposed to using a stock example script provided in Mediapipe):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4):
- [MediaPipe version](https://github.com/google/mediapipe/releases):
- Bazel version:
- Solution (e.g. FaceMesh, Pose, Holistic):
- Programming Language and version ( e.g. C++, Python, Java):
**Describe the expected behavior:**
**Standalone code you may have used to try to get what you need :**
If there is a problem, provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/repo link /any notebook:
**Other info / Complete Logs :**
Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached:

View File

@ -0,0 +1,45 @@
Thank you for submitting a MediaPipe documentation issue.
The MediaPipe docs are open source! To get involved, read the documentation Contributor Guide
## URL(s) with the issue:
Please provide a link to the documentation entry, for example: https://github.com/google/mediapipe/blob/master/docs/solutions/face_mesh.md#models
## Description of issue (what needs changing):
Kinds of documentation problems:
### Clear description
For example, why should someone use this method? How is it useful?
### Correct links
Is the link to the source code correct?
### Parameters defined
Are all parameters defined and formatted correctly?
### Returns defined
Are return values defined?
### Raises listed and defined
Are the errors defined? For example,
### Usage example
Is there a usage example?
See the API guide:
on how to write testable usage examples.
### Request visuals, if applicable
Are there currently visuals? If not, will it clarify the content?
### Submit a pull request?
Are you planning to also submit a pull request to fix the issue? See the docs
https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md

18
.github/bot_config.yml vendored Normal file
View File

@ -0,0 +1,18 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
# A list of assignees
assignees:
- sgowroji

34
.github/stale.yml vendored Normal file
View File

@ -0,0 +1,34 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ============================================================================
#
# This file was assembled from multiple pieces, whose use is documented
# throughout. Please refer to the TensorFlow dockerfiles documentation
# for more information.
# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 7
# Number of days of inactivity before a stale Issue or Pull Request is closed
daysUntilClose: 7
# Only issues or pull requests with all of these labels are checked if stale. Defaults to `[]` (disabled)
onlyLabels:
- stat:awaiting response
# Comment to post when marking as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you.
# Comment to post when removing the stale label. Set to `false` to disable
unmarkComment: false
closeComment: >
Closing as stale. Please reopen if you'd like to work on this further.

View File

@ -40,6 +40,7 @@ Hair Segmentation
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |

View File

@ -71,8 +71,8 @@ http_archive(
# Google Benchmark library.
http_archive(
name = "com_google_benchmark",
urls = ["https://github.com/google/benchmark/archive/master.zip"],
strip_prefix = "benchmark-master",
urls = ["https://github.com/google/benchmark/archive/main.zip"],
strip_prefix = "benchmark-main",
build_file = "@//third_party:benchmark.BUILD",
)
@ -369,9 +369,9 @@ http_archive(
)
# Tensorflow repo should always go after the other external dependencies.
# 2021-04-30
_TENSORFLOW_GIT_COMMIT = "5bd3c57ef184543d22e34e36cff9d9bea608e06d"
_TENSORFLOW_SHA256= "9a45862834221aafacf6fb275f92b3876bc89443cbecc51be93f13839a6609f0"
# 2021-05-27
_TENSORFLOW_GIT_COMMIT = "d6bfcdb0926173dbb7aa02ceba5aae6250b8aaa6"
_TENSORFLOW_SHA256 = "ec40e1462239d8783d02f76a43412c8f80bac71ea20e41e1b7729b990aad6923"
http_archive(
name = "org_tensorflow",
urls = [

View File

@ -97,6 +97,7 @@ for app in ${apps}; do
if [[ ${target_name} == "holistic_tracking" ||
${target_name} == "iris_tracking" ||
${target_name} == "pose_tracking" ||
${target_name} == "selfie_segmentation" ||
${target_name} == "upper_body_pose_tracking" ]]; then
graph_suffix="cpu"
else

View File

@ -248,12 +248,58 @@ absl::Status MyCalculator::Process() {
}
```
## Calculator options
Calculators accept processing parameters through (1) input stream packets (2)
input side packets, and (3) calculator options. Calculator options, if
specified, appear as literal values in the `node_options` field of the
`CalculatorGraphConfiguration.Node` message.
```
node {
calculator: "TfLiteInferenceCalculator"
input_stream: "TENSORS:main_model_input"
output_stream: "TENSORS:main_model_output"
node_options: {
[type.googleapis.com/mediapipe.TfLiteInferenceCalculatorOptions] {
model_path: "mediapipe/models/active_speaker_detection/audio_visual_model.tflite"
}
}
}
```
The `node_options` field accepts the proto3 syntax. Alternatively, calculator
options can be specified in the `options` field using proto2 syntax.
```
node: {
calculator: "IntervalFilterCalculator"
node_options: {
[type.googleapis.com/mediapipe.IntervalFilterCalculatorOptions] {
intervals {
start_us: 20000
end_us: 40000
}
}
}
}
```
Not all calculators accept calcuator options. In order to accept options, a
calculator will normally define a new protobuf message type to represent its
options, such as `IntervalFilterCalculatorOptions`. The calculator will then
read that protobuf message in its `CalculatorBase::Open` method, and possibly
also in the `CalculatorBase::GetContract` function or its
`CalculatorBase::Process` method. Normally, the new protobuf message type will
be defined as a protobuf schema using a ".proto" file and a
`mediapipe_proto_library()` build rule.
## Example calculator
This section discusses the implementation of `PacketClonerCalculator`, which
does a relatively simple job, and is used in many calculator graphs.
`PacketClonerCalculator` simply produces a copy of its most recent input
packets on demand.
`PacketClonerCalculator` simply produces a copy of its most recent input packets
on demand.
`PacketClonerCalculator` is useful when the timestamps of arriving data packets
are not aligned perfectly. Suppose we have a room with a microphone, light
@ -279,8 +325,8 @@ input streams:
imageframe of video data representing video collected from camera in the
room with timestamp.
Below is the implementation of the `PacketClonerCalculator`. You can see
the `GetContract()`, `Open()`, and `Process()` methods as well as the instance
Below is the implementation of the `PacketClonerCalculator`. You can see the
`GetContract()`, `Open()`, and `Process()` methods as well as the instance
variable `current_` which holds the most recent input packets.
```c++
@ -401,6 +447,6 @@ node {
The diagram below shows how the `PacketClonerCalculator` defines its output
packets (bottom) based on its series of input packets (top).
| ![Graph using PacketClonerCalculator](../images/packet_cloner_calculator.png) |
| :---------------------------------------------------------------------------: |
| *Each time it receives a packet on its TICK input stream, the PacketClonerCalculator outputs the most recent packet from each of its input streams. The sequence of output packets (bottom) is determined by the sequence of input packets (top) and their timestamps. The timestamps are shown along the right side of the diagram.* |
![Graph using PacketClonerCalculator](../images/packet_cloner_calculator.png) |
:--------------------------------------------------------------------------: |
*Each time it receives a packet on its TICK input stream, the PacketClonerCalculator outputs the most recent packet from each of its input streams. The sequence of output packets (bottom) is determined by the sequence of input packets (top) and their timestamps. The timestamps are shown along the right side of the diagram.* |

View File

@ -111,11 +111,11 @@ component known as an InputStreamHandler.
See [Synchronization](synchronization.md) for more details.
### Realtime data streams
### Real-time streams
MediaPipe calculator graphs are often used to process streams of video or audio
frames for interactive applications. Normally, each Calculator runs as soon as
all of its input packets for a given timestamp become available. Calculators
used in realtime graphs need to define output timestamp bounds based on input
used in real-time graphs need to define output timestamp bounds based on input
timestamp bounds in order to allow downstream calculators to be scheduled
promptly. See [Realtime data streams](realtime.md) for details.
promptly. See [Real-time Streams](realtime_streams.md) for details.

View File

@ -1,29 +1,28 @@
---
layout: default
title: Processing real-time data streams
title: Real-time Streams
parent: Framework Concepts
nav_order: 6
has_children: true
has_toc: false
---
# Processing real-time data streams
# Real-time Streams
{: .no_toc }
1. TOC
{:toc}
---
## Realtime timestamps
## Real-time timestamps
MediaPipe calculator graphs are often used to process streams of video or audio
frames for interactive applications. The MediaPipe framework requires only that
successive packets be assigned monotonically increasing timestamps. By
convention, realtime calculators and graphs use the recording time or the
convention, real-time calculators and graphs use the recording time or the
presentation time of each frame as its timestamp, with each timestamp indicating
the microseconds since `Jan/1/1970:00:00:00`. This allows packets from various
sources to be processed in a globally consistent sequence.
## Realtime scheduling
## Real-time scheduling
Normally, each Calculator runs as soon as all of its input packets for a given
timestamp become available. Normally, this happens when the calculator has
@ -38,7 +37,7 @@ When a calculator does not produce any output packets for a given timestamp, it
can instead output a "timestamp bound" indicating that no packet will be
produced for that timestamp. This indication is necessary to allow downstream
calculators to run at that timestamp, even though no packet has arrived for
certain streams for that timestamp. This is especially important for realtime
certain streams for that timestamp. This is especially important for real-time
graphs in interactive applications, where it is crucial that each calculator
begin processing as soon as possible.
@ -83,12 +82,12 @@ For example, `Timestamp(1).NextAllowedInStream() == Timestamp(2)`.
## Propagating timestamp bounds
Calculators that will be used in realtime graphs need to define output timestamp
bounds based on input timestamp bounds in order to allow downstream calculators
to be scheduled promptly. A common pattern is for calculators to output packets
with the same timestamps as their input packets. In this case, simply outputting
a packet on every call to `Calculator::Process` is sufficient to define output
timestamp bounds.
Calculators that will be used in real-time graphs need to define output
timestamp bounds based on input timestamp bounds in order to allow downstream
calculators to be scheduled promptly. A common pattern is for calculators to
output packets with the same timestamps as their input packets. In this case,
simply outputting a packet on every call to `Calculator::Process` is sufficient
to define output timestamp bounds.
However, calculators are not required to follow this common pattern for output
timestamps, they are only required to choose monotonically increasing output

View File

@ -17,12 +17,13 @@ nav_order: 4
MediaPipe currently offers the following solutions:
Solution | NPM Package | Example
----------------- | ----------------------------- | -------
--------------------------- | --------------------------------------- | -------
[Face Mesh][F-pg] | [@mediapipe/face_mesh][F-npm] | [mediapipe.dev/demo/face_mesh][F-demo]
[Face Detection][Fd-pg] | [@mediapipe/face_detection][Fd-npm] | [mediapipe.dev/demo/face_detection][Fd-demo]
[Hands][H-pg] | [@mediapipe/hands][H-npm] | [mediapipe.dev/demo/hands][H-demo]
[Holistic][Ho-pg] | [@mediapipe/holistic][Ho-npm] | [mediapipe.dev/demo/holistic][Ho-demo]
[Pose][P-pg] | [@mediapipe/pose][P-npm] | [mediapipe.dev/demo/pose][P-demo]
[Selfie Segmentation][S-pg] | [@mediapipe/selfie_segmentation][S-npm] | [mediapipe.dev/demo/selfie_segmentation][S-demo]
Click on a solution link above for more information, including API and code
snippets.
@ -67,11 +68,13 @@ affecting your work, restrict your request to a `<minor>` number. e.g.,
[Fd-pg]: ../solutions/face_detection#javascript-solution-api
[H-pg]: ../solutions/hands#javascript-solution-api
[P-pg]: ../solutions/pose#javascript-solution-api
[S-pg]: ../solutions/selfie_segmentation#javascript-solution-api
[Ho-npm]: https://www.npmjs.com/package/@mediapipe/holistic
[F-npm]: https://www.npmjs.com/package/@mediapipe/face_mesh
[Fd-npm]: https://www.npmjs.com/package/@mediapipe/face_detection
[H-npm]: https://www.npmjs.com/package/@mediapipe/hands
[P-npm]: https://www.npmjs.com/package/@mediapipe/pose
[S-npm]: https://www.npmjs.com/package/@mediapipe/selfie_segmentation
[draw-npm]: https://www.npmjs.com/package/@mediapipe/drawing_utils
[cam-npm]: https://www.npmjs.com/package/@mediapipe/camera_utils
[ctrl-npm]: https://www.npmjs.com/package/@mediapipe/control_utils
@ -80,15 +83,18 @@ affecting your work, restrict your request to a `<minor>` number. e.g.,
[Fd-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/face_detection
[H-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/hands
[P-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/pose
[P-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/selfie_segmentation
[Ho-pen]: https://code.mediapipe.dev/codepen/holistic
[F-pen]: https://code.mediapipe.dev/codepen/face_mesh
[Fd-pen]: https://code.mediapipe.dev/codepen/face_detection
[H-pen]: https://code.mediapipe.dev/codepen/hands
[P-pen]: https://code.mediapipe.dev/codepen/pose
[S-pen]: https://code.mediapipe.dev/codepen/selfie_segmentation
[Ho-demo]: https://mediapipe.dev/demo/holistic
[F-demo]: https://mediapipe.dev/demo/face_mesh
[Fd-demo]: https://mediapipe.dev/demo/face_detection
[H-demo]: https://mediapipe.dev/demo/hands
[P-demo]: https://mediapipe.dev/demo/pose
[S-demo]: https://mediapipe.dev/demo/selfie_segmentation
[npm]: https://www.npmjs.com/package/@mediapipe
[codepen]: https://code.mediapipe.dev/codepen

View File

@ -51,6 +51,7 @@ details in each solution via the links below:
* [MediaPipe Holistic](../solutions/holistic#python-solution-api)
* [MediaPipe Objectron](../solutions/objectron#python-solution-api)
* [MediaPipe Pose](../solutions/pose#python-solution-api)
* [MediaPipe Selfie Segmentation](../solutions/selfie_segmentation#python-solution-api)
## MediaPipe on Google Colab
@ -62,6 +63,7 @@ details in each solution via the links below:
* [MediaPipe Pose Colab](https://mediapipe.page.link/pose_py_colab)
* [MediaPipe Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic)
* [MediaPipe Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended)
* [MediaPipe Selfie Segmentation Colab](https://mediapipe.page.link/selfie_segmentation_py_colab)
## MediaPipe Python Framework

Binary file not shown.

View File

@ -40,6 +40,7 @@ Hair Segmentation
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |

View File

@ -2,7 +2,7 @@
layout: default
title: AutoFlip (Saliency-aware Video Cropping)
parent: Solutions
nav_order: 13
nav_order: 14
---
# AutoFlip: Saliency-aware Video Cropping

View File

@ -2,7 +2,7 @@
layout: default
title: Box Tracking
parent: Solutions
nav_order: 9
nav_order: 10
---
# MediaPipe Box Tracking

View File

@ -68,7 +68,7 @@ normalized to `[0.0, 1.0]` by the image width and height respectively.
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -81,9 +81,10 @@ mp_face_detection = mp.solutions.face_detection
mp_drawing = mp.solutions.drawing_utils
# For static images:
IMAGE_FILES = []
with mp_face_detection.FaceDetection(
min_detection_confidence=0.5) as face_detection:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
# Convert the BGR image to RGB and process it with MediaPipe Face Detection.
results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

View File

@ -265,7 +265,7 @@ magnitude of `z` uses roughly the same scale as `x`.
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -281,12 +281,13 @@ mp_drawing = mp.solutions.drawing_utils
mp_face_mesh = mp.solutions.face_mesh
# For static images:
IMAGE_FILES = []
drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1)
with mp_face_mesh.FaceMesh(
static_image_mode=True,
max_num_faces=1,
min_detection_confidence=0.5) as face_mesh:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
# Convert the BGR image to RGB before processing.
results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

View File

@ -2,7 +2,7 @@
layout: default
title: Hair Segmentation
parent: Solutions
nav_order: 7
nav_order: 8
---
# MediaPipe Hair Segmentation

View File

@ -206,7 +206,7 @@ is not the case, please swap the handedness output in the application.
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -222,11 +222,12 @@ mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
# For static images:
IMAGE_FILES = []
with mp_hands.Hands(
static_image_mode=True,
max_num_hands=2,
min_detection_confidence=0.5) as hands:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
# Read an image, flip it around y-axis for correct handedness output (see
# above).
image = cv2.flip(cv2.imread(file), 1)

View File

@ -201,7 +201,7 @@ A list of 21 hand landmarks on the right hand, in the same representation as
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -218,10 +218,11 @@ mp_drawing = mp.solutions.drawing_utils
mp_holistic = mp.solutions.holistic
# For static images:
IMAGE_FILES = []
with mp_holistic.Holistic(
static_image_mode=True,
model_complexity=2) as holistic:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
image_height, image_width, _ = image.shape
# Convert the BGR image to RGB before processing.

View File

@ -2,7 +2,7 @@
layout: default
title: Instant Motion Tracking
parent: Solutions
nav_order: 10
nav_order: 11
---
# MediaPipe Instant Motion Tracking

View File

@ -2,7 +2,7 @@
layout: default
title: KNIFT (Template-based Feature Matching)
parent: Solutions
nav_order: 12
nav_order: 13
---
# MediaPipe KNIFT

View File

@ -2,7 +2,7 @@
layout: default
title: Dataset Preparation with MediaSequence
parent: Solutions
nav_order: 14
nav_order: 15
---
# Dataset Preparation with MediaSequence

View File

@ -16,10 +16,15 @@ nav_order: 30
* Face detection model for front-facing/selfie camera:
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_front.tflite),
[TFLite model quantized for EdgeTPU/Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/models/face-detector-quantized_edgetpu.tflite)
[TFLite model quantized for EdgeTPU/Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/models/face-detector-quantized_edgetpu.tflite),
[Model card](https://mediapipe.page.link/blazeface-mc)
* Face detection model for back-facing camera:
[TFLite model ](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back.tflite)
* [Model card](https://mediapipe.page.link/blazeface-mc)
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back.tflite),
[Model card](https://mediapipe.page.link/blazeface-back-mc)
* Face detection model for back-facing camera (sparse):
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back_sparse.tflite),
[Model card](https://mediapipe.page.link/blazeface-back-sparse-mc)
### [Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh)
@ -60,6 +65,12 @@ nav_order: 30
* Hand recrop model:
[TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/holistic_landmark/hand_recrop.tflite)
### [Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation)
* [TFLite model (general)](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite)
* [TFLite model (landscape)](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite)
* [Model card](https://mediapipe.page.link/selfiesegmentation-mc)
### [Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation)
* [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/models/hair_segmentation.tflite)

View File

@ -2,7 +2,7 @@
layout: default
title: Object Detection
parent: Solutions
nav_order: 8
nav_order: 9
---
# MediaPipe Object Detection

View File

@ -2,7 +2,7 @@
layout: default
title: Objectron (3D Object Detection)
parent: Solutions
nav_order: 11
nav_order: 12
---
# MediaPipe Objectron
@ -277,7 +277,7 @@ following:
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -297,11 +297,12 @@ mp_drawing = mp.solutions.drawing_utils
mp_objectron = mp.solutions.objectron
# For static images:
IMAGE_FILES = []
with mp_objectron.Objectron(static_image_mode=True,
max_num_objects=5,
min_detection_confidence=0.5,
model_name='Shoe') as objectron:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
# Convert the BGR image to RGB and process it with MediaPipe Objectron.
results = objectron.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

View File

@ -187,7 +187,7 @@ Naming style may differ slightly across platforms/languages.
#### pose_landmarks
A list of pose landmarks. Each lanmark consists of the following:
A list of pose landmarks. Each landmark consists of the following:
* `x` and `y`: Landmark coordinates normalized to `[0.0, 1.0]` by the image
width and height respectively.
@ -202,7 +202,7 @@ A list of pose landmarks. Each lanmark consists of the following:
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the following usage example.
[Python Colab](#resources) and the usage example below.
Supported configuration options:
@ -219,11 +219,12 @@ mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose
# For static images:
IMAGE_FILES = []
with mp_pose.Pose(
static_image_mode=True,
model_complexity=2,
min_detection_confidence=0.5) as pose:
for idx, file in enumerate(file_list):
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
image_height, image_width, _ = image.shape
# Convert the BGR image to RGB before processing.

View File

@ -0,0 +1,286 @@
---
layout: default
title: Selfie Segmentation
parent: Solutions
nav_order: 7
---
# MediaPipe Selfie Segmentation
{: .no_toc }
<details close markdown="block">
<summary>
Table of contents
</summary>
{: .text-delta }
1. TOC
{:toc}
</details>
---
## Overview
*Fig 1. Example of MediaPipe Selfie Segmentation.* |
:------------------------------------------------: |
<video autoplay muted loop preload style="height: auto; width: 480px"><source src="../images/selfie_segmentation_web.mp4" type="video/mp4"></video> |
MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can
run in real-time on both smartphones and laptops. The intended use cases include
selfie effects and video conferencing, where the person is close (< 2m) to the
camera.
## Models
In this solution, we provide two models: general and landscape. Both models are
based on
[MobileNetV3](https://ai.googleblog.com/2019/11/introducing-next-generation-on-device.html),
with modifications to make them more efficient. The general model operates on a
256x256x3 (HWC) tensor, and outputs a 256x256x1 tensor representing the
segmentation mask. The landscape model is similar to the general model, but
operates on a 144x256x3 (HWC) tensor. It has fewer FLOPs than the general model,
and therefore, runs faster. Note that MediaPipe Selfie Segmentation
automatically resizes the input image to the desired tensor dimension before
feeding it into the ML models.
The general model is also powering [ML Kit](https://developers.google.com/ml-kit/vision/selfie-segmentation),
and a variant of the landscape model is powering [Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html).
Please find more detail about the models in the [model card](./models.md#selfie_segmentation).
## ML Pipeline
The pipeline is implemented as a MediaPipe
[graph](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
that uses a
[selfie segmentation subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
from the
[selfie segmentation module](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation).
Note: To visualize a graph, copy the graph and paste it into
[MediaPipe Visualizer](https://viz.mediapipe.dev/). For more information on how
to visualize its associated subgraphs, please see
[visualizer documentation](../tools/visualizer.md).
## Solution APIs
### Cross-platform Configuration Options
Naming style and availability may differ slightly across platforms/languages.
#### model_selection
An integer index `0` or `1`. Use `0` to select the general model, and `1` to
select the landscape model (see details in [Models](#models)). Default to `0` if
not specified.
### Output
Naming style may differ slightly across platforms/languages.
#### segmentation_mask
The output segmentation mask, which has the same dimension as the input image.
### Python Solution API
Please first follow general [instructions](../getting_started/python.md) to
install MediaPipe Python package, then learn more in the companion
[Python Colab](#resources) and the usage example below.
Supported configuration options:
* [model_selection](#model_selection)
```python
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_selfie_segmentation = mp.solutions.selfie_segmentation
# For static images:
IMAGE_FILES = []
BG_COLOR = (192, 192, 192) # gray
MASK_COLOR = (255, 255, 255) # white
with mp_selfie_segmentation.SelfieSegmentation(
model_selection=0) as selfie_segmentation:
for idx, file in enumerate(IMAGE_FILES):
image = cv2.imread(file)
image_height, image_width, _ = image.shape
# Convert the BGR image to RGB before processing.
results = selfie_segmentation.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# Draw selfie segmentation on the background image.
# To improve segmentation around boundaries, consider applying a joint
# bilateral filter to "results.segmentation_mask" with "image".
condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
# Generate solid color images for showing the output selfie segmentation mask.
fg_image = np.zeros(image.shape, dtype=np.uint8)
fg_image[:] = MASK_COLOR
bg_image = np.zeros(image.shape, dtype=np.uint8)
bg_image[:] = BG_COLOR
output_image = np.where(condition, fg_image, bg_image)
cv2.imwrite('/tmp/selfie_segmentation_output' + str(idx) + '.png', output_image)
# For webcam input:
BG_COLOR = (192, 192, 192) # gray
cap = cv2.VideoCapture(0)
with mp_selfie_segmentation.SelfieSegmentation(
model_selection=1) as selfie_segmentation:
bg_image = None
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
# If loading a video, use 'break' instead of 'continue'.
continue
# Flip the image horizontally for a later selfie-view display, and convert
# the BGR image to RGB.
image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
# To improve performance, optionally mark the image as not writeable to
# pass by reference.
image.flags.writeable = False
results = selfie_segmentation.process(image)
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Draw selfie segmentation on the background image.
# To improve segmentation around boundaries, consider applying a joint
# bilateral filter to "results.segmentation_mask" with "image".
condition = np.stack(
(results.segmentation_mask,) * 3, axis=-1) > 0.1
# The background can be customized.
# a) Load an image (with the same width and height of the input image) to
# be the background, e.g., bg_image = cv2.imread('/path/to/image/file')
# b) Blur the input image by applying image filtering, e.g.,
# bg_image = cv2.GaussianBlur(image,(55,55),0)
if bg_image is None:
bg_image = np.zeros(image.shape, dtype=np.uint8)
bg_image[:] = BG_COLOR
output_image = np.where(condition, image, bg_image)
cv2.imshow('MediaPipe Selfie Segmentation', output_image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
```
### JavaScript Solution API
Please first see general [introduction](../getting_started/javascript.md) on
MediaPipe in JavaScript, then learn more in the companion [web demo](#resources)
and the following usage example.
Supported configuration options:
* [modelSelection](#model_selection)
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/camera_utils/camera_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/control_utils/control_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/drawing_utils/drawing_utils.js" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/selfie_segmentation.js" crossorigin="anonymous"></script>
</head>
<body>
<div class="container">
<video class="input_video"></video>
<canvas class="output_canvas" width="1280px" height="720px"></canvas>
</div>
</body>
</html>
```
```javascript
<script type="module">
const videoElement = document.getElementsByClassName('input_video')[0];
const canvasElement = document.getElementsByClassName('output_canvas')[0];
const canvasCtx = canvasElement.getContext('2d');
function onResults(results) {
canvasCtx.save();
canvasCtx.clearRect(0, 0, canvasElement.width, canvasElement.height);
canvasCtx.drawImage(results.segmentationMask, 0, 0,
canvasElement.width, canvasElement.height);
// Only overwrite existing pixels.
canvasCtx.globalCompositeOperation = 'source-in';
canvasCtx.fillStyle = '#00FF00';
canvasCtx.fillRect(0, 0, canvasElement.width, canvasElement.height);
// Only overwrite missing pixels.
canvasCtx.globalCompositeOperation = 'destination-atop';
canvasCtx.drawImage(
results.image, 0, 0, canvasElement.width, canvasElement.height);
canvasCtx.restore();
}
const selfieSegmentation = new SelfieSegmentation({locateFile: (file) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/selfie_segmentation/${file}`;
}});
selfieSegmentation.setOptions({
modelSelection: 1,
});
selfieSegmentation.onResults(onResults);
const camera = new Camera(videoElement, {
onFrame: async () => {
await selfieSegmentation.send({image: videoElement});
},
width: 1280,
height: 720
});
camera.start();
</script>
```
## Example Apps
Please first see general instructions for
[Android](../getting_started/android.md), [iOS](../getting_started/ios.md), and
[desktop](../getting_started/cpp.md) on how to build MediaPipe examples.
Note: To visualize a graph, copy the graph and paste it into
[MediaPipe Visualizer](https://viz.mediapipe.dev/). For more information on how
to visualize its associated subgraphs, please see
[visualizer documentation](../tools/visualizer.md).
### Mobile
* Graph:
[`mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
* Android target:
[(or download prebuilt ARM64 APK)](https://drive.google.com/file/d/1DoeyGzMmWUsjfVgZfGGecrn7GKzYcEAo/view?usp=sharing)
[`mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu:selfiesegmentationgpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu/BUILD)
* iOS target:
[`mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp`](http:/mediapipe/examples/ios/selfiesegmentationgpu/BUILD)
### Desktop
Please first see general instructions for [desktop](../getting_started/cpp.md)
on how to build MediaPipe examples.
* Running on CPU
* Graph:
[`mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt)
* Target:
[`mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_cpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/selfie_segmentation/BUILD)
* Running on GPU
* Graph:
[`mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt)
* Target:
[`mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/selfie_segmentation/BUILD)
## Resources
* Google AI Blog:
[Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html)
* [ML Kit Selfie Segmentation API](https://developers.google.com/ml-kit/vision/selfie-segmentation)
* [Models and model cards](./models.md#selfie_segmentation)
* [Web demo](https://code.mediapipe.dev/codepen/selfie_segmentation)
* [Python Colab](https://mediapipe.page.link/selfie_segmentation_py_colab)

View File

@ -24,6 +24,7 @@ has_toc: false
[Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ |
[Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | |
[Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅
[Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | |

View File

@ -2,7 +2,7 @@
layout: default
title: YouTube-8M Feature Extraction and Model Inference
parent: Solutions
nav_order: 15
nav_order: 16
---
# YouTube-8M Feature Extraction and Model Inference

View File

@ -16,6 +16,7 @@
"mediapipe/examples/ios/objectdetectiongpu/BUILD",
"mediapipe/examples/ios/objectdetectiontrackinggpu/BUILD",
"mediapipe/examples/ios/posetrackinggpu/BUILD",
"mediapipe/examples/ios/selfiesegmentationgpu/BUILD",
"mediapipe/framework/BUILD",
"mediapipe/gpu/BUILD",
"mediapipe/objc/BUILD",
@ -35,6 +36,7 @@
"//mediapipe/examples/ios/objectdetectiongpu:ObjectDetectionGpuApp",
"//mediapipe/examples/ios/objectdetectiontrackinggpu:ObjectDetectionTrackingGpuApp",
"//mediapipe/examples/ios/posetrackinggpu:PoseTrackingGpuApp",
"//mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp",
"//mediapipe/objc:mediapipe_framework_ios"
],
"optionSet" : {
@ -103,6 +105,7 @@
"mediapipe/examples/ios/objectdetectioncpu",
"mediapipe/examples/ios/objectdetectiongpu",
"mediapipe/examples/ios/posetrackinggpu",
"mediapipe/examples/ios/selfiesegmentationgpu",
"mediapipe/framework",
"mediapipe/framework/deps",
"mediapipe/framework/formats",
@ -120,6 +123,7 @@
"mediapipe/graphs/hand_tracking",
"mediapipe/graphs/object_detection",
"mediapipe/graphs/pose_tracking",
"mediapipe/graphs/selfie_segmentation",
"mediapipe/models",
"mediapipe/modules",
"mediapipe/objc",

View File

@ -22,6 +22,7 @@
"mediapipe/examples/ios/objectdetectiongpu",
"mediapipe/examples/ios/objectdetectiontrackinggpu",
"mediapipe/examples/ios/posetrackinggpu",
"mediapipe/examples/ios/selfiesegmentationgpu",
"mediapipe/objc"
],
"projectName" : "Mediapipe",

View File

@ -37,6 +37,22 @@ constexpr char kImageFrameTag[] = "IMAGE";
constexpr char kMaskCpuTag[] = "MASK";
constexpr char kGpuBufferTag[] = "IMAGE_GPU";
constexpr char kMaskGpuTag[] = "MASK_GPU";
inline cv::Vec3b Blend(const cv::Vec3b& color1, const cv::Vec3b& color2,
float weight, int invert_mask,
int adjust_with_luminance) {
weight = (1 - invert_mask) * weight + invert_mask * (1.0f - weight);
float luminance =
(1 - adjust_with_luminance) * 1.0f +
adjust_with_luminance *
(color1[0] * 0.299 + color1[1] * 0.587 + color1[2] * 0.114) / 255;
float mix_value = weight * luminance;
return color1 * (1.0 - mix_value) + color2 * mix_value;
}
} // namespace
namespace mediapipe {
@ -44,15 +60,14 @@ namespace mediapipe {
// A calculator to recolor a masked area of an image to a specified color.
//
// A mask image is used to specify where to overlay a user defined color.
// The luminance of the input image is used to adjust the blending weight,
// to help preserve image textures.
//
// Inputs:
// One of the following IMAGE tags:
// IMAGE: An ImageFrame input image, RGB or RGBA.
// IMAGE: An ImageFrame input image in ImageFormat::SRGB.
// IMAGE_GPU: A GpuBuffer input image, RGBA.
// One of the following MASK tags:
// MASK: An ImageFrame input mask, Gray, RGB or RGBA.
// MASK: An ImageFrame input mask in ImageFormat::GRAY8, SRGB, SRGBA, or
// VEC32F1
// MASK_GPU: A GpuBuffer input mask, RGBA.
// Output:
// One of the following IMAGE tags:
@ -98,10 +113,12 @@ class RecolorCalculator : public CalculatorBase {
void GlRender();
bool initialized_ = false;
std::vector<float> color_;
std::vector<uint8> color_;
mediapipe::RecolorCalculatorOptions::MaskChannel mask_channel_;
bool use_gpu_ = false;
bool invert_mask_ = false;
bool adjust_with_luminance_ = false;
#if !MEDIAPIPE_DISABLE_GPU
mediapipe::GlCalculatorHelper gpu_helper_;
GLuint program_ = 0;
@ -233,11 +250,15 @@ absl::Status RecolorCalculator::RenderCpu(CalculatorContext* cc) {
}
cv::Mat mask_full;
cv::resize(mask_mat, mask_full, input_mat.size());
const cv::Vec3b recolor = {color_[0], color_[1], color_[2]};
auto output_img = absl::make_unique<ImageFrame>(
input_img.Format(), input_mat.cols, input_mat.rows);
cv::Mat output_mat = mediapipe::formats::MatView(output_img.get());
const int invert_mask = invert_mask_ ? 1 : 0;
const int adjust_with_luminance = adjust_with_luminance_ ? 1 : 0;
// From GPU shader:
/*
vec4 weight = texture2D(mask, sample_coordinate);
@ -249,18 +270,23 @@ absl::Status RecolorCalculator::RenderCpu(CalculatorContext* cc) {
fragColor = mix(color1, color2, mix_value);
*/
if (mask_img.Format() == ImageFormat::VEC32F1) {
for (int i = 0; i < output_mat.rows; ++i) {
for (int j = 0; j < output_mat.cols; ++j) {
float weight = mask_full.at<uchar>(i, j) * (1.0 / 255.0);
cv::Vec3f color1 = input_mat.at<cv::Vec3b>(i, j);
cv::Vec3f color2 = {color_[0], color_[1], color_[2]};
float luminance =
(color1[0] * 0.299 + color1[1] * 0.587 + color1[2] * 0.114) / 255;
float mix_value = weight * luminance;
cv::Vec3b mix_color = color1 * (1.0 - mix_value) + color2 * mix_value;
output_mat.at<cv::Vec3b>(i, j) = mix_color;
const float weight = mask_full.at<float>(i, j);
output_mat.at<cv::Vec3b>(i, j) =
Blend(input_mat.at<cv::Vec3b>(i, j), recolor, weight, invert_mask,
adjust_with_luminance);
}
}
} else {
for (int i = 0; i < output_mat.rows; ++i) {
for (int j = 0; j < output_mat.cols; ++j) {
const float weight = mask_full.at<uchar>(i, j) * (1.0 / 255.0);
output_mat.at<cv::Vec3b>(i, j) =
Blend(input_mat.at<cv::Vec3b>(i, j), recolor, weight, invert_mask,
adjust_with_luminance);
}
}
}
@ -385,6 +411,9 @@ absl::Status RecolorCalculator::LoadOptions(CalculatorContext* cc) {
color_.push_back(options.color().g());
color_.push_back(options.color().b());
invert_mask_ = options.invert_mask();
adjust_with_luminance_ = options.adjust_with_luminance();
return absl::OkStatus();
}
@ -435,13 +464,20 @@ absl::Status RecolorCalculator::InitGpu(CalculatorContext* cc) {
uniform sampler2D frame;
uniform sampler2D mask;
uniform vec3 recolor;
uniform float invert_mask;
uniform float adjust_with_luminance;
void main() {
vec4 weight = texture2D(mask, sample_coordinate);
vec4 color1 = texture2D(frame, sample_coordinate);
vec4 color2 = vec4(recolor, 1.0);
float luminance = dot(color1.rgb, vec3(0.299, 0.587, 0.114));
weight = mix(weight, 1.0 - weight, invert_mask);
float luminance = mix(1.0,
dot(color1.rgb, vec3(0.299, 0.587, 0.114)),
adjust_with_luminance);
float mix_value = weight.MASK_COMPONENT * luminance;
fragColor = mix(color1, color2, mix_value);
@ -458,6 +494,10 @@ absl::Status RecolorCalculator::InitGpu(CalculatorContext* cc) {
glUniform1i(glGetUniformLocation(program_, "mask"), 2);
glUniform3f(glGetUniformLocation(program_, "recolor"), color_[0] / 255.0,
color_[1] / 255.0, color_[2] / 255.0);
glUniform1f(glGetUniformLocation(program_, "invert_mask"),
invert_mask_ ? 1.0f : 0.0f);
glUniform1f(glGetUniformLocation(program_, "adjust_with_luminance"),
adjust_with_luminance_ ? 1.0f : 0.0f);
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();

View File

@ -36,4 +36,11 @@ message RecolorCalculatorOptions {
// Color to blend into input image where mask is > 0.
// The blending is based on the input image luminosity.
optional Color color = 2;
// Swap the meaning of mask values for foreground/background.
optional bool invert_mask = 3 [default = false];
// Whether to use the luminance of the input image to further adjust the
// blending weight, to help preserve image textures.
optional bool adjust_with_luminance = 4 [default = true];
}

View File

@ -753,3 +753,76 @@ cc_test(
"//mediapipe/framework/port:gtest_main",
],
)
# Copied from /mediapipe/calculators/tflite/BUILD
selects.config_setting_group(
name = "gpu_inference_disabled",
match_any = [
"//mediapipe/gpu:disable_gpu",
],
)
mediapipe_proto_library(
name = "tensors_to_segmentation_calculator_proto",
srcs = ["tensors_to_segmentation_calculator.proto"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:calculator_options_proto",
"//mediapipe/framework:calculator_proto",
"//mediapipe/gpu:gpu_origin_proto",
],
)
cc_library(
name = "tensors_to_segmentation_calculator",
srcs = ["tensors_to_segmentation_calculator.cc"],
copts = select({
"//mediapipe:apple": [
"-x objective-c++",
"-fobjc-arc", # enable reference-counting
],
"//conditions:default": [],
}),
visibility = ["//visibility:public"],
deps = [
":tensors_to_segmentation_calculator_cc_proto",
"@com_google_absl//absl/strings:str_format",
"@com_google_absl//absl/strings",
"@com_google_absl//absl/types:span",
"//mediapipe/framework/formats:image",
"//mediapipe/framework/formats:image_frame",
"//mediapipe/framework/formats:image_opencv",
"//mediapipe/framework/formats:tensor",
"//mediapipe/framework/port:opencv_imgproc",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework:calculator_context",
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:port",
"//mediapipe/util:resource_util",
"@org_tensorflow//tensorflow/lite:framework",
"//mediapipe/gpu:gpu_origin_cc_proto",
"//mediapipe/framework/port:statusor",
] + selects.with_or({
"//mediapipe/gpu:disable_gpu": [],
"//conditions:default": [
"//mediapipe/gpu:gl_calculator_helper",
"//mediapipe/gpu:gl_simple_shaders",
"//mediapipe/gpu:gpu_buffer",
"//mediapipe/gpu:shader_util",
],
}) + selects.with_or({
":gpu_inference_disabled": [],
"//mediapipe:ios": [
"//mediapipe/gpu:MPPMetalUtil",
"//mediapipe/gpu:MPPMetalHelper",
],
"//conditions:default": [
"@org_tensorflow//tensorflow/lite/delegates/gpu:gl_delegate",
"@org_tensorflow//tensorflow/lite/delegates/gpu/gl:gl_program",
"@org_tensorflow//tensorflow/lite/delegates/gpu/gl:gl_shader",
"@org_tensorflow//tensorflow/lite/delegates/gpu/gl:gl_texture",
"@org_tensorflow//tensorflow/lite/delegates/gpu/gl/converters:util",
],
}),
alwayslink = 1,
)

View File

@ -105,6 +105,15 @@ void ConvertAnchorsToRawValues(const std::vector<Anchor>& anchors,
// for anchors (e.g. for SSD models) depend on the outputs of the
// detection model. The size of anchor tensor must be (num_boxes *
// 4).
//
// Input side packet:
// ANCHORS (optional) - The anchors used for decoding the bounding boxes, as a
// vector of `Anchor` protos. Not required if post-processing is built-in
// the model.
// IGNORE_CLASSES (optional) - The list of class ids that should be ignored, as
// a vector of integers. It overrides the corresponding field in the
// calculator options.
//
// Output:
// DETECTIONS - Result MediaPipe detections.
//
@ -132,8 +141,11 @@ class TensorsToDetectionsCalculator : public Node {
static constexpr Input<std::vector<Tensor>> kInTensors{"TENSORS"};
static constexpr SideInput<std::vector<Anchor>>::Optional kInAnchors{
"ANCHORS"};
static constexpr SideInput<std::vector<int>>::Optional kSideInIgnoreClasses{
"IGNORE_CLASSES"};
static constexpr Output<std::vector<Detection>> kOutDetections{"DETECTIONS"};
MEDIAPIPE_NODE_CONTRACT(kInTensors, kInAnchors, kOutDetections);
MEDIAPIPE_NODE_CONTRACT(kInTensors, kInAnchors, kSideInIgnoreClasses,
kOutDetections);
static absl::Status UpdateContract(CalculatorContract* cc);
absl::Status Open(CalculatorContext* cc) override;
@ -566,9 +578,16 @@ absl::Status TensorsToDetectionsCalculator::LoadOptions(CalculatorContext* cc) {
kNumCoordsPerBox,
num_coords_);
if (kSideInIgnoreClasses(cc).IsConnected()) {
RET_CHECK(!kSideInIgnoreClasses(cc).IsEmpty());
for (int ignore_class : *kSideInIgnoreClasses(cc)) {
ignore_classes_.insert(ignore_class);
}
} else {
for (int i = 0; i < options_.ignore_classes_size(); ++i) {
ignore_classes_.insert(options_.ignore_classes(i));
}
}
return absl::OkStatus();
}

View File

@ -56,7 +56,7 @@ message TensorsToDetectionsCalculatorOptions {
// [x_center, y_center, w, h].
optional bool reverse_output_order = 14 [default = false];
// The ids of classes that should be ignored during decoding the score for
// each predicted box.
// each predicted box. Can be overridden with IGNORE_CLASSES side packet.
repeated int32 ignore_classes = 8;
optional bool sigmoid_score = 15 [default = false];

View File

@ -0,0 +1,885 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#include <vector>
#include "absl/strings/str_format.h"
#include "absl/types/span.h"
#include "mediapipe/calculators/tensor/tensors_to_segmentation_calculator.pb.h"
#include "mediapipe/framework/calculator_context.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/image.h"
#include "mediapipe/framework/formats/image_opencv.h"
#include "mediapipe/framework/formats/tensor.h"
#include "mediapipe/framework/port.h"
#include "mediapipe/framework/port/opencv_imgproc_inc.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/port/statusor.h"
#include "mediapipe/gpu/gpu_origin.pb.h"
#include "mediapipe/util/resource_util.h"
#include "tensorflow/lite/interpreter.h"
#if !MEDIAPIPE_DISABLE_GPU
#include "mediapipe/gpu/gl_calculator_helper.h"
#include "mediapipe/gpu/gl_simple_shaders.h"
#include "mediapipe/gpu/gpu_buffer.h"
#include "mediapipe/gpu/shader_util.h"
#endif // !MEDIAPIPE_DISABLE_GPU
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
#include "tensorflow/lite/delegates/gpu/gl/converters/util.h"
#include "tensorflow/lite/delegates/gpu/gl/gl_program.h"
#include "tensorflow/lite/delegates/gpu/gl/gl_shader.h"
#include "tensorflow/lite/delegates/gpu/gl/gl_texture.h"
#include "tensorflow/lite/delegates/gpu/gl_delegate.h"
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
#if MEDIAPIPE_METAL_ENABLED
#import <CoreVideo/CoreVideo.h>
#import <Metal/Metal.h>
#import <MetalKit/MetalKit.h>
#import "mediapipe/gpu/MPPMetalHelper.h"
#include "mediapipe/gpu/MPPMetalUtil.h"
#endif // MEDIAPIPE_METAL_ENABLED
namespace {
constexpr int kWorkgroupSize = 8; // Block size for GPU shader.
enum { ATTRIB_VERTEX, ATTRIB_TEXTURE_POSITION, NUM_ATTRIBUTES };
// Commonly used to compute the number of blocks to launch in a kernel.
int NumGroups(const int size, const int group_size) { // NOLINT
return (size + group_size - 1) / group_size;
}
bool CanUseGpu() {
#if !MEDIAPIPE_DISABLE_GPU || MEDIAPIPE_METAL_ENABLED
// TODO: Configure GPU usage policy in individual calculators.
constexpr bool kAllowGpuProcessing = true;
return kAllowGpuProcessing;
#else
return false;
#endif // !MEDIAPIPE_DISABLE_GPU || MEDIAPIPE_METAL_ENABLED
}
constexpr char kTensorsTag[] = "TENSORS";
constexpr char kOutputSizeTag[] = "OUTPUT_SIZE";
constexpr char kMaskTag[] = "MASK";
absl::StatusOr<std::tuple<int, int, int>> GetHwcFromDims(
const std::vector<int>& dims) {
if (dims.size() == 3) {
return std::make_tuple(dims[0], dims[1], dims[2]);
} else if (dims.size() == 4) {
// BHWC format check B == 1
RET_CHECK_EQ(1, dims[0]) << "Expected batch to be 1 for BHWC heatmap";
return std::make_tuple(dims[1], dims[2], dims[3]);
} else {
RET_CHECK(false) << "Invalid shape for segmentation tensor " << dims.size();
}
}
} // namespace
namespace mediapipe {
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
using ::tflite::gpu::gl::GlProgram;
using ::tflite::gpu::gl::GlShader;
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
// Converts Tensors from a tflite segmentation model to an image mask.
//
// Performs optional upscale to OUTPUT_SIZE dimensions if provided,
// otherwise the mask is the same size as input tensor.
//
// If at least one input tensor is already on GPU, processing happens on GPU and
// the output mask is also stored on GPU. Otherwise, processing and the output
// mask are both on CPU.
//
// On GPU, the mask is an RGBA image, in both the R & A channels, scaled 0-1.
// On CPU, the mask is a ImageFormat::VEC32F1 image, with values scaled 0-1.
//
//
// Inputs:
// One of the following TENSORS tags:
// TENSORS: Vector of Tensor,
// The tensor dimensions are specified in this calculator's options.
// OUTPUT_SIZE(optional): std::pair<int, int>,
// If provided, the size to upscale mask to.
//
// Output:
// MASK: An Image output mask, RGBA(GPU) / VEC32F1(CPU).
//
// Options:
// See tensors_to_segmentation_calculator.proto
//
// Usage example:
// node {
// calculator: "TensorsToSegmentationCalculator"
// input_stream: "TENSORS:tensors"
// input_stream: "OUTPUT_SIZE:size"
// output_stream: "MASK:hair_mask"
// node_options: {
// [mediapipe.TensorsToSegmentationCalculatorOptions] {
// output_layer_index: 1
// # gpu_origin: CONVENTIONAL # or TOP_LEFT
// }
// }
// }
//
// Currently only OpenGLES 3.1 and CPU backends supported.
// TODO Refactor and add support for other backends/platforms.
//
class TensorsToSegmentationCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc);
absl::Status Open(CalculatorContext* cc) override;
absl::Status Process(CalculatorContext* cc) override;
absl::Status Close(CalculatorContext* cc) override;
private:
absl::Status LoadOptions(CalculatorContext* cc);
absl::Status InitGpu(CalculatorContext* cc);
absl::Status ProcessGpu(CalculatorContext* cc);
absl::Status ProcessCpu(CalculatorContext* cc);
void GlRender();
bool DoesGpuTextureStartAtBottom() {
return options_.gpu_origin() != mediapipe::GpuOrigin_Mode_TOP_LEFT;
}
template <class T>
absl::Status ApplyActivation(cv::Mat& tensor_mat, cv::Mat* small_mask_mat);
::mediapipe::TensorsToSegmentationCalculatorOptions options_;
#if !MEDIAPIPE_DISABLE_GPU
mediapipe::GlCalculatorHelper gpu_helper_;
GLuint upsample_program_;
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
std::unique_ptr<GlProgram> mask_program_31_;
#else
GLuint mask_program_20_;
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
#if MEDIAPIPE_METAL_ENABLED
MPPMetalHelper* metal_helper_ = nullptr;
id<MTLComputePipelineState> mask_program_;
#endif // MEDIAPIPE_METAL_ENABLED
#endif // !MEDIAPIPE_DISABLE_GPU
};
REGISTER_CALCULATOR(TensorsToSegmentationCalculator);
// static
absl::Status TensorsToSegmentationCalculator::GetContract(
CalculatorContract* cc) {
RET_CHECK(!cc->Inputs().GetTags().empty());
RET_CHECK(!cc->Outputs().GetTags().empty());
// Inputs.
cc->Inputs().Tag(kTensorsTag).Set<std::vector<Tensor>>();
if (cc->Inputs().HasTag(kOutputSizeTag)) {
cc->Inputs().Tag(kOutputSizeTag).Set<std::pair<int, int>>();
}
// Outputs.
cc->Outputs().Tag(kMaskTag).Set<Image>();
if (CanUseGpu()) {
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(mediapipe::GlCalculatorHelper::UpdateContract(cc));
#if MEDIAPIPE_METAL_ENABLED
MP_RETURN_IF_ERROR([MPPMetalHelper updateContract:cc]);
#endif // MEDIAPIPE_METAL_ENABLED
#endif // !MEDIAPIPE_DISABLE_GPU
}
return absl::OkStatus();
}
absl::Status TensorsToSegmentationCalculator::Open(CalculatorContext* cc) {
cc->SetOffset(TimestampDiff(0));
bool use_gpu = false;
if (CanUseGpu()) {
#if !MEDIAPIPE_DISABLE_GPU
use_gpu = true;
MP_RETURN_IF_ERROR(gpu_helper_.Open(cc));
#if MEDIAPIPE_METAL_ENABLED
metal_helper_ = [[MPPMetalHelper alloc] initWithCalculatorContext:cc];
RET_CHECK(metal_helper_);
#endif // MEDIAPIPE_METAL_ENABLED
#endif // !MEDIAPIPE_DISABLE_GPU
}
MP_RETURN_IF_ERROR(LoadOptions(cc));
if (use_gpu) {
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(InitGpu(cc));
#else
RET_CHECK_FAIL() << "GPU processing disabled.";
#endif // !MEDIAPIPE_DISABLE_GPU
}
return absl::OkStatus();
}
absl::Status TensorsToSegmentationCalculator::Process(CalculatorContext* cc) {
if (cc->Inputs().Tag(kTensorsTag).IsEmpty()) {
return absl::OkStatus();
}
const auto& input_tensors =
cc->Inputs().Tag(kTensorsTag).Get<std::vector<Tensor>>();
bool use_gpu = false;
if (CanUseGpu()) {
// Use GPU processing only if at least one input tensor is already on GPU.
for (const auto& tensor : input_tensors) {
if (tensor.ready_on_gpu()) {
use_gpu = true;
break;
}
}
}
// Validate tensor channels and activation type.
{
RET_CHECK(!input_tensors.empty());
ASSIGN_OR_RETURN(auto hwc, GetHwcFromDims(input_tensors[0].shape().dims));
int tensor_channels = std::get<2>(hwc);
typedef mediapipe::TensorsToSegmentationCalculatorOptions Options;
switch (options_.activation()) {
case Options::NONE:
RET_CHECK_EQ(tensor_channels, 1);
break;
case Options::SIGMOID:
RET_CHECK_EQ(tensor_channels, 1);
break;
case Options::SOFTMAX:
RET_CHECK_EQ(tensor_channels, 2);
break;
}
}
if (use_gpu) {
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(gpu_helper_.RunInGlContext([this, cc]() -> absl::Status {
MP_RETURN_IF_ERROR(ProcessGpu(cc));
return absl::OkStatus();
}));
#else
RET_CHECK_FAIL() << "GPU processing disabled.";
#endif // !MEDIAPIPE_DISABLE_GPU
} else {
MP_RETURN_IF_ERROR(ProcessCpu(cc));
}
return absl::OkStatus();
}
absl::Status TensorsToSegmentationCalculator::Close(CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
gpu_helper_.RunInGlContext([this] {
if (upsample_program_) glDeleteProgram(upsample_program_);
upsample_program_ = 0;
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
mask_program_31_.reset();
#else
if (mask_program_20_) glDeleteProgram(mask_program_20_);
mask_program_20_ = 0;
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
#if MEDIAPIPE_METAL_ENABLED
mask_program_ = nil;
#endif // MEDIAPIPE_METAL_ENABLED
});
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
absl::Status TensorsToSegmentationCalculator::ProcessCpu(
CalculatorContext* cc) {
// Get input streams, and dimensions.
const auto& input_tensors =
cc->Inputs().Tag(kTensorsTag).Get<std::vector<Tensor>>();
ASSIGN_OR_RETURN(auto hwc, GetHwcFromDims(input_tensors[0].shape().dims));
auto [tensor_height, tensor_width, tensor_channels] = hwc;
int output_width = tensor_width, output_height = tensor_height;
if (cc->Inputs().HasTag(kOutputSizeTag)) {
const auto& size =
cc->Inputs().Tag(kOutputSizeTag).Get<std::pair<int, int>>();
output_width = size.first;
output_height = size.second;
}
// Create initial working mask.
cv::Mat small_mask_mat(cv::Size(tensor_width, tensor_height), CV_32FC1);
// Wrap input tensor.
auto raw_input_tensor = &input_tensors[0];
auto raw_input_view = raw_input_tensor->GetCpuReadView();
const float* raw_input_data = raw_input_view.buffer<float>();
cv::Mat tensor_mat(cv::Size(tensor_width, tensor_height),
CV_MAKETYPE(CV_32F, tensor_channels),
const_cast<float*>(raw_input_data));
// Process mask tensor and apply activation function.
if (tensor_channels == 2) {
MP_RETURN_IF_ERROR(ApplyActivation<cv::Vec2f>(tensor_mat, &small_mask_mat));
} else if (tensor_channels == 1) {
RET_CHECK(mediapipe::TensorsToSegmentationCalculatorOptions::SOFTMAX !=
options_.activation()); // Requires 2 channels.
if (mediapipe::TensorsToSegmentationCalculatorOptions::NONE ==
options_.activation()) // Pass-through optimization.
tensor_mat.copyTo(small_mask_mat);
else
MP_RETURN_IF_ERROR(ApplyActivation<float>(tensor_mat, &small_mask_mat));
} else {
RET_CHECK_FAIL() << "Unsupported number of tensor channels "
<< tensor_channels;
}
// Send out image as CPU packet.
std::shared_ptr<ImageFrame> mask_frame = std::make_shared<ImageFrame>(
ImageFormat::VEC32F1, output_width, output_height);
std::unique_ptr<Image> output_mask = absl::make_unique<Image>(mask_frame);
cv::Mat output_mat = formats::MatView(output_mask.get());
// Upsample small mask into output.
cv::resize(small_mask_mat, output_mat, cv::Size(output_width, output_height));
cc->Outputs().Tag(kMaskTag).Add(output_mask.release(), cc->InputTimestamp());
return absl::OkStatus();
}
template <class T>
absl::Status TensorsToSegmentationCalculator::ApplyActivation(
cv::Mat& tensor_mat, cv::Mat* small_mask_mat) {
// Configure activation function.
const int output_layer_index = options_.output_layer_index();
typedef mediapipe::TensorsToSegmentationCalculatorOptions Options;
const auto activation_fn = [&](const cv::Vec2f& mask_value) {
float new_mask_value = 0;
// TODO consider moving switch out of the loop,
// and also avoid float/Vec2f casting.
switch (options_.activation()) {
case Options::NONE: {
new_mask_value = mask_value[0];
break;
}
case Options::SIGMOID: {
const float pixel0 = mask_value[0];
new_mask_value = 1.0 / (std::exp(-pixel0) + 1.0);
break;
}
case Options::SOFTMAX: {
const float pixel0 = mask_value[0];
const float pixel1 = mask_value[1];
const float max_pixel = std::max(pixel0, pixel1);
const float min_pixel = std::min(pixel0, pixel1);
const float softmax_denom =
/*exp(max_pixel - max_pixel)=*/1.0f +
std::exp(min_pixel - max_pixel);
new_mask_value = std::exp(mask_value[output_layer_index] - max_pixel) /
softmax_denom;
break;
}
}
return new_mask_value;
};
// Process mask tensor.
for (int i = 0; i < tensor_mat.rows; ++i) {
for (int j = 0; j < tensor_mat.cols; ++j) {
const T& input_pix = tensor_mat.at<T>(i, j);
const float mask_value = activation_fn(input_pix);
small_mask_mat->at<float>(i, j) = mask_value;
}
}
return absl::OkStatus();
}
// Steps:
// 1. receive tensor
// 2. process segmentation tensor into small mask
// 3. upsample small mask into output mask to be same size as input image
absl::Status TensorsToSegmentationCalculator::ProcessGpu(
CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
// Get input streams, and dimensions.
const auto& input_tensors =
cc->Inputs().Tag(kTensorsTag).Get<std::vector<Tensor>>();
ASSIGN_OR_RETURN(auto hwc, GetHwcFromDims(input_tensors[0].shape().dims));
auto [tensor_height, tensor_width, tensor_channels] = hwc;
int output_width = tensor_width, output_height = tensor_height;
if (cc->Inputs().HasTag(kOutputSizeTag)) {
const auto& size =
cc->Inputs().Tag(kOutputSizeTag).Get<std::pair<int, int>>();
output_width = size.first;
output_height = size.second;
}
// Create initial working mask texture.
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
tflite::gpu::gl::GlTexture small_mask_texture;
#else
mediapipe::GlTexture small_mask_texture;
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
// Run shader, process mask tensor.
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
{
MP_RETURN_IF_ERROR(CreateReadWriteRgbaImageTexture(
tflite::gpu::DataType::UINT8, // GL_RGBA8
{tensor_width, tensor_height}, &small_mask_texture));
const int output_index = 0;
glBindImageTexture(output_index, small_mask_texture.id(), 0, GL_FALSE, 0,
GL_WRITE_ONLY, GL_RGBA8);
auto read_view = input_tensors[0].GetOpenGlBufferReadView();
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, read_view.name());
const tflite::gpu::uint3 workgroups = {
NumGroups(tensor_width, kWorkgroupSize),
NumGroups(tensor_height, kWorkgroupSize), 1};
glUseProgram(mask_program_31_->id());
glUniform2i(glGetUniformLocation(mask_program_31_->id(), "out_size"),
tensor_width, tensor_height);
MP_RETURN_IF_ERROR(mask_program_31_->Dispatch(workgroups));
}
#elif MEDIAPIPE_METAL_ENABLED
{
id<MTLCommandBuffer> command_buffer = [metal_helper_ commandBuffer];
command_buffer.label = @"SegmentationKernel";
id<MTLComputeCommandEncoder> command_encoder =
[command_buffer computeCommandEncoder];
[command_encoder setComputePipelineState:mask_program_];
auto read_view = input_tensors[0].GetMtlBufferReadView(command_buffer);
[command_encoder setBuffer:read_view.buffer() offset:0 atIndex:0];
mediapipe::GpuBuffer small_mask_buffer = [metal_helper_
mediapipeGpuBufferWithWidth:tensor_width
height:tensor_height
format:mediapipe::GpuBufferFormat::kBGRA32];
id<MTLTexture> small_mask_texture_metal =
[metal_helper_ metalTextureWithGpuBuffer:small_mask_buffer];
[command_encoder setTexture:small_mask_texture_metal atIndex:1];
unsigned int out_size[] = {static_cast<unsigned int>(tensor_width),
static_cast<unsigned int>(tensor_height)};
[command_encoder setBytes:&out_size length:sizeof(out_size) atIndex:2];
MTLSize threads_per_group = MTLSizeMake(kWorkgroupSize, kWorkgroupSize, 1);
MTLSize threadgroups =
MTLSizeMake(NumGroups(tensor_width, kWorkgroupSize),
NumGroups(tensor_height, kWorkgroupSize), 1);
[command_encoder dispatchThreadgroups:threadgroups
threadsPerThreadgroup:threads_per_group];
[command_encoder endEncoding];
[command_buffer commit];
small_mask_texture = gpu_helper_.CreateSourceTexture(small_mask_buffer);
}
#else
{
small_mask_texture = gpu_helper_.CreateDestinationTexture(
tensor_width, tensor_height,
mediapipe::GpuBufferFormat::kBGRA32); // actually GL_RGBA8
// Go through CPU if not already texture 2D (no direct conversion yet).
// Tensor::GetOpenGlTexture2dReadView() doesn't automatically convert types.
if (!input_tensors[0].ready_as_opengl_texture_2d()) {
(void)input_tensors[0].GetCpuReadView();
}
auto read_view = input_tensors[0].GetOpenGlTexture2dReadView();
gpu_helper_.BindFramebuffer(small_mask_texture);
glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, read_view.name());
glUseProgram(mask_program_20_);
GlRender();
glBindTexture(GL_TEXTURE_2D, 0);
glFlush();
}
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
// Upsample small mask into output.
mediapipe::GlTexture output_texture = gpu_helper_.CreateDestinationTexture(
output_width, output_height,
mediapipe::GpuBufferFormat::kBGRA32); // actually GL_RGBA8
// Run shader, upsample result.
{
gpu_helper_.BindFramebuffer(output_texture);
glActiveTexture(GL_TEXTURE1);
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
glBindTexture(GL_TEXTURE_2D, small_mask_texture.id());
#else
glBindTexture(GL_TEXTURE_2D, small_mask_texture.name());
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
glUseProgram(upsample_program_);
GlRender();
glBindTexture(GL_TEXTURE_2D, 0);
glFlush();
}
// Send out image as GPU packet.
auto output_image = output_texture.GetFrame<Image>();
cc->Outputs().Tag(kMaskTag).Add(output_image.release(), cc->InputTimestamp());
// Cleanup
output_texture.Release();
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
void TensorsToSegmentationCalculator::GlRender() {
#if !MEDIAPIPE_DISABLE_GPU
static const GLfloat square_vertices[] = {
-1.0f, -1.0f, // bottom left
1.0f, -1.0f, // bottom right
-1.0f, 1.0f, // top left
1.0f, 1.0f, // top right
};
static const GLfloat texture_vertices[] = {
0.0f, 0.0f, // bottom left
1.0f, 0.0f, // bottom right
0.0f, 1.0f, // top left
1.0f, 1.0f, // top right
};
// vertex storage
GLuint vbo[2];
glGenBuffers(2, vbo);
GLuint vao;
glGenVertexArrays(1, &vao);
glBindVertexArray(vao);
// vbo 0
glBindBuffer(GL_ARRAY_BUFFER, vbo[0]);
glBufferData(GL_ARRAY_BUFFER, 4 * 2 * sizeof(GLfloat), square_vertices,
GL_STATIC_DRAW);
glEnableVertexAttribArray(ATTRIB_VERTEX);
glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, nullptr);
// vbo 1
glBindBuffer(GL_ARRAY_BUFFER, vbo[1]);
glBufferData(GL_ARRAY_BUFFER, 4 * 2 * sizeof(GLfloat), texture_vertices,
GL_STATIC_DRAW);
glEnableVertexAttribArray(ATTRIB_TEXTURE_POSITION);
glVertexAttribPointer(ATTRIB_TEXTURE_POSITION, 2, GL_FLOAT, 0, 0, nullptr);
// draw
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
// cleanup
glDisableVertexAttribArray(ATTRIB_VERTEX);
glDisableVertexAttribArray(ATTRIB_TEXTURE_POSITION);
glBindBuffer(GL_ARRAY_BUFFER, 0);
glBindVertexArray(0);
glDeleteVertexArrays(1, &vao);
glDeleteBuffers(2, vbo);
#endif // !MEDIAPIPE_DISABLE_GPU
}
absl::Status TensorsToSegmentationCalculator::LoadOptions(
CalculatorContext* cc) {
// Get calculator options specified in the graph.
options_ = cc->Options<::mediapipe::TensorsToSegmentationCalculatorOptions>();
return absl::OkStatus();
}
absl::Status TensorsToSegmentationCalculator::InitGpu(CalculatorContext* cc) {
#if !MEDIAPIPE_DISABLE_GPU
MP_RETURN_IF_ERROR(gpu_helper_.RunInGlContext([this]() -> absl::Status {
// A shader to process a segmentation tensor into an output mask.
// Currently uses 4 channels for output, and sets R+A channels as mask value.
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
// GLES 3.1
const tflite::gpu::uint3 workgroup_size = {kWorkgroupSize, kWorkgroupSize,
1};
const std::string shader_header =
absl::StrCat(tflite::gpu::gl::GetShaderHeader(workgroup_size), R"(
precision highp float;
layout(rgba8, binding = 0) writeonly uniform highp image2D output_texture;
uniform ivec2 out_size;
)");
/* Shader defines will be inserted here. */
const std::string shader_src_main = R"(
layout(std430, binding = 2) readonly buffer B0 {
#ifdef TWO_CHANNEL_INPUT
vec2 elements[];
#else
float elements[];
#endif // TWO_CHANNEL_INPUT
} input_data; // data tensor
void main() {
int out_width = out_size.x;
int out_height = out_size.y;
ivec2 gid = ivec2(gl_GlobalInvocationID.xy);
if (gid.x >= out_width || gid.y >= out_height) { return; }
int linear_index = gid.y * out_width + gid.x;
#ifdef TWO_CHANNEL_INPUT
vec2 input_value = input_data.elements[linear_index];
#else
vec2 input_value = vec2(input_data.elements[linear_index], 0.0);
#endif // TWO_CHANNEL_INPUT
// Run activation function.
// One and only one of FN_SOFTMAX,FN_SIGMOID,FN_NONE will be defined.
#ifdef FN_SOFTMAX
// Only two channel input tensor is supported.
vec2 input_px = input_value.rg;
float shift = max(input_px.r, input_px.g);
float softmax_denom = exp(input_px.r - shift) + exp(input_px.g - shift);
float new_mask_value =
exp(input_px[OUTPUT_LAYER_INDEX] - shift) / softmax_denom;
#endif // FN_SOFTMAX
#ifdef FN_SIGMOID
float new_mask_value = 1.0 / (exp(-input_value.r) + 1.0);
#endif // FN_SIGMOID
#ifdef FN_NONE
float new_mask_value = input_value.r;
#endif // FN_NONE
#ifdef FLIP_Y_COORD
int y_coord = out_height - gid.y - 1;
#else
int y_coord = gid.y;
#endif // defined(FLIP_Y_COORD)
ivec2 output_coordinate = ivec2(gid.x, y_coord);
vec4 out_value = vec4(new_mask_value, 0.0, 0.0, new_mask_value);
imageStore(output_texture, output_coordinate, out_value);
})";
#elif MEDIAPIPE_METAL_ENABLED
// METAL
const std::string shader_header = R"(
#include <metal_stdlib>
using namespace metal;
)";
/* Shader defines will be inserted here. */
const std::string shader_src_main = R"(
kernel void segmentationKernel(
#ifdef TWO_CHANNEL_INPUT
device float2* elements [[ buffer(0) ]],
#else
device float* elements [[ buffer(0) ]],
#endif // TWO_CHANNEL_INPUT
texture2d<float, access::write> output_texture [[ texture(1) ]],
constant uint* out_size [[ buffer(2) ]],
uint2 gid [[ thread_position_in_grid ]])
{
uint out_width = out_size[0];
uint out_height = out_size[1];
if (gid.x >= out_width || gid.y >= out_height) { return; }
uint linear_index = gid.y * out_width + gid.x;
#ifdef TWO_CHANNEL_INPUT
float2 input_value = elements[linear_index];
#else
float2 input_value = float2(elements[linear_index], 0.0);
#endif // TWO_CHANNEL_INPUT
// Run activation function.
// One and only one of FN_SOFTMAX,FN_SIGMOID,FN_NONE will be defined.
#ifdef FN_SOFTMAX
// Only two channel input tensor is supported.
float2 input_px = input_value.xy;
float shift = max(input_px.x, input_px.y);
float softmax_denom = exp(input_px.r - shift) + exp(input_px.g - shift);
float new_mask_value =
exp(input_px[OUTPUT_LAYER_INDEX] - shift) / softmax_denom;
#endif // FN_SOFTMAX
#ifdef FN_SIGMOID
float new_mask_value = 1.0 / (exp(-input_value.x) + 1.0);
#endif // FN_SIGMOID
#ifdef FN_NONE
float new_mask_value = input_value.x;
#endif // FN_NONE
#ifdef FLIP_Y_COORD
int y_coord = out_height - gid.y - 1;
#else
int y_coord = gid.y;
#endif // defined(FLIP_Y_COORD)
uint2 output_coordinate = uint2(gid.x, y_coord);
float4 out_value = float4(new_mask_value, 0.0, 0.0, new_mask_value);
output_texture.write(out_value, output_coordinate);
}
)";
#else
// GLES 2.0
const std::string shader_header = absl::StrCat(
std::string(mediapipe::kMediaPipeFragmentShaderPreamble), R"(
DEFAULT_PRECISION(mediump, float)
)");
/* Shader defines will be inserted here. */
const std::string shader_src_main = R"(
in vec2 sample_coordinate;
uniform sampler2D input_texture;
#ifdef GL_ES
#define fragColor gl_FragColor
#else
out vec4 fragColor;
#endif // defined(GL_ES);
void main() {
vec4 input_value = texture2D(input_texture, sample_coordinate);
vec2 gid = sample_coordinate;
// Run activation function.
// One and only one of FN_SOFTMAX,FN_SIGMOID,FN_NONE will be defined.
#ifdef FN_SOFTMAX
// Only two channel input tensor is supported.
vec2 input_px = input_value.rg;
float shift = max(input_px.r, input_px.g);
float softmax_denom = exp(input_px.r - shift) + exp(input_px.g - shift);
float new_mask_value =
exp(mix(input_px.r, input_px.g, float(OUTPUT_LAYER_INDEX)) - shift) / softmax_denom;
#endif // FN_SOFTMAX
#ifdef FN_SIGMOID
float new_mask_value = 1.0 / (exp(-input_value.r) + 1.0);
#endif // FN_SIGMOID
#ifdef FN_NONE
float new_mask_value = input_value.r;
#endif // FN_NONE
#ifdef FLIP_Y_COORD
float y_coord = 1.0 - gid.y;
#else
float y_coord = gid.y;
#endif // defined(FLIP_Y_COORD)
vec2 output_coordinate = vec2(gid.x, y_coord);
vec4 out_value = vec4(new_mask_value, 0.0, 0.0, new_mask_value);
fragColor = out_value;
})";
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
// Shader defines.
typedef mediapipe::TensorsToSegmentationCalculatorOptions Options;
const std::string output_layer_index =
"\n#define OUTPUT_LAYER_INDEX int(" +
std::to_string(options_.output_layer_index()) + ")";
const std::string flip_y_coord =
DoesGpuTextureStartAtBottom() ? "\n#define FLIP_Y_COORD" : "";
const std::string fn_none =
options_.activation() == Options::NONE ? "\n#define FN_NONE" : "";
const std::string fn_sigmoid =
options_.activation() == Options::SIGMOID ? "\n#define FN_SIGMOID" : "";
const std::string fn_softmax =
options_.activation() == Options::SOFTMAX ? "\n#define FN_SOFTMAX" : "";
const std::string two_channel = options_.activation() == Options::SOFTMAX
? "\n#define TWO_CHANNEL_INPUT"
: "";
const std::string shader_defines =
absl::StrCat(output_layer_index, flip_y_coord, fn_softmax, fn_sigmoid,
fn_none, two_channel);
// Build full shader.
const std::string shader_src_no_previous =
absl::StrCat(shader_header, shader_defines, shader_src_main);
// Vertex shader attributes.
const GLint attr_location[NUM_ATTRIBUTES] = {
ATTRIB_VERTEX,
ATTRIB_TEXTURE_POSITION,
};
const GLchar* attr_name[NUM_ATTRIBUTES] = {
"position",
"texture_coordinate",
};
// Main shader program & parameters
#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
GlShader shader_without_previous;
MP_RETURN_IF_ERROR(GlShader::CompileShader(
GL_COMPUTE_SHADER, shader_src_no_previous, &shader_without_previous));
mask_program_31_ = absl::make_unique<GlProgram>();
MP_RETURN_IF_ERROR(GlProgram::CreateWithShader(shader_without_previous,
mask_program_31_.get()));
#elif MEDIAPIPE_METAL_ENABLED
id<MTLDevice> device = metal_helper_.mtlDevice;
NSString* library_source =
[NSString stringWithUTF8String:shader_src_no_previous.c_str()];
NSError* error = nil;
id<MTLLibrary> library = [device newLibraryWithSource:library_source
options:nullptr
error:&error];
RET_CHECK(library != nil) << "Couldn't create shader library "
<< [[error localizedDescription] UTF8String];
id<MTLFunction> kernel_func = nil;
kernel_func = [library newFunctionWithName:@"segmentationKernel"];
RET_CHECK(kernel_func != nil) << "Couldn't create kernel function.";
mask_program_ =
[device newComputePipelineStateWithFunction:kernel_func error:&error];
RET_CHECK(mask_program_ != nil) << "Couldn't create pipeline state " <<
[[error localizedDescription] UTF8String];
#else
mediapipe::GlhCreateProgram(
mediapipe::kBasicVertexShader, shader_src_no_previous.c_str(),
NUM_ATTRIBUTES, &attr_name[0], attr_location, &mask_program_20_);
RET_CHECK(mask_program_20_) << "Problem initializing the program.";
glUseProgram(mask_program_20_);
glUniform1i(glGetUniformLocation(mask_program_20_, "input_texture"), 1);
#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31
// Simple pass-through program, used for hardware upsampling.
mediapipe::GlhCreateProgram(
mediapipe::kBasicVertexShader, mediapipe::kBasicTexturedFragmentShader,
NUM_ATTRIBUTES, &attr_name[0], attr_location, &upsample_program_);
RET_CHECK(upsample_program_) << "Problem initializing the program.";
glUseProgram(upsample_program_);
glUniform1i(glGetUniformLocation(upsample_program_, "video_frame"), 1);
return absl::OkStatus();
}));
#endif // !MEDIAPIPE_DISABLE_GPU
return absl::OkStatus();
}
} // namespace mediapipe

View File

@ -0,0 +1,46 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
syntax = "proto2";
package mediapipe;
import "mediapipe/framework/calculator.proto";
import "mediapipe/gpu/gpu_origin.proto";
message TensorsToSegmentationCalculatorOptions {
extend mediapipe.CalculatorOptions {
optional TensorsToSegmentationCalculatorOptions ext = 374311106;
}
// For CONVENTIONAL mode in OpenGL, textures start at bottom and needs
// to be flipped vertically as tensors are expected to start at top.
// (DEFAULT or unset is interpreted as CONVENTIONAL.)
optional GpuOrigin.Mode gpu_origin = 1;
// Supported activation functions for filtering.
enum Activation {
NONE = 0; // Assumes 1-channel input tensor.
SIGMOID = 1; // Assumes 1-channel input tensor.
SOFTMAX = 2; // Assumes 2-channel input tensor.
}
// Activation function to apply to input tensor.
// Softmax requires a 2-channel tensor, see output_layer_index below.
optional Activation activation = 2 [default = NONE];
// Channel to use for processing tensor.
// Only applies when using activation=SOFTMAX.
// Works on two channel input tensor only.
optional int32 output_layer_index = 3 [default = 1];
}

View File

@ -859,6 +859,7 @@ cc_library(
"//mediapipe/framework:calculator_framework",
"//mediapipe/framework:timestamp",
"//mediapipe/framework/formats:landmark_cc_proto",
"//mediapipe/framework/formats:rect_cc_proto",
"//mediapipe/framework/port:ret_check",
"//mediapipe/util/filtering:one_euro_filter",
"//mediapipe/util/filtering:relative_velocity_filter",

View File

@ -323,7 +323,7 @@ absl::Status DetectionsToRectsCalculator::ComputeRotation(
DetectionSpec DetectionsToRectsCalculator::GetDetectionSpec(
const CalculatorContext* cc) {
absl::optional<std::pair<int, int>> image_size;
if (cc->Inputs().HasTag(kImageSizeTag)) {
if (HasTagValue(cc->Inputs(), kImageSizeTag)) {
image_size = cc->Inputs().Tag(kImageSizeTag).Get<std::pair<int, int>>();
}

View File

@ -157,6 +157,12 @@ TEST(DetectionsToRectsCalculatorTest, DetectionKeyPointsToRect) {
/*image_size=*/{640, 480});
MP_ASSERT_OK(status_or_value);
EXPECT_THAT(status_or_value.value(), RectEq(480, 360, 320, 240));
status_or_value = RunDetectionKeyPointsToRectCalculation(
/*detection=*/DetectionWithKeyPoints({{0.25f, 0.25f}, {0.75f, 0.75f}}),
/*image_size=*/{0, 0});
MP_ASSERT_OK(status_or_value);
EXPECT_THAT(status_or_value.value(), RectEq(0, 0, 0, 0));
}
TEST(DetectionsToRectsCalculatorTest, DetectionToNormalizedRect) {

View File

@ -18,6 +18,7 @@
#include "mediapipe/calculators/util/landmarks_smoothing_calculator.pb.h"
#include "mediapipe/framework/calculator_framework.h"
#include "mediapipe/framework/formats/landmark.pb.h"
#include "mediapipe/framework/formats/rect.pb.h"
#include "mediapipe/framework/port/ret_check.h"
#include "mediapipe/framework/timestamp.h"
#include "mediapipe/util/filtering/one_euro_filter.h"
@ -30,6 +31,7 @@ namespace {
constexpr char kNormalizedLandmarksTag[] = "NORM_LANDMARKS";
constexpr char kLandmarksTag[] = "LANDMARKS";
constexpr char kImageSizeTag[] = "IMAGE_SIZE";
constexpr char kObjectScaleRoiTag[] = "OBJECT_SCALE_ROI";
constexpr char kNormalizedFilteredLandmarksTag[] = "NORM_FILTERED_LANDMARKS";
constexpr char kFilteredLandmarksTag[] = "FILTERED_LANDMARKS";
@ -94,6 +96,18 @@ float GetObjectScale(const LandmarkList& landmarks) {
return (object_width + object_height) / 2.0f;
}
float GetObjectScale(const NormalizedRect& roi, const int image_width,
const int image_height) {
const float object_width = roi.width() * image_width;
const float object_height = roi.height() * image_height;
return (object_width + object_height) / 2.0f;
}
float GetObjectScale(const Rect& roi) {
return (roi.width() + roi.height()) / 2.0f;
}
// Abstract class for various landmarks filters.
class LandmarksFilter {
public:
@ -103,6 +117,7 @@ class LandmarksFilter {
virtual absl::Status Apply(const LandmarkList& in_landmarks,
const absl::Duration& timestamp,
const absl::optional<float> object_scale_opt,
LandmarkList* out_landmarks) = 0;
};
@ -111,6 +126,7 @@ class NoFilter : public LandmarksFilter {
public:
absl::Status Apply(const LandmarkList& in_landmarks,
const absl::Duration& timestamp,
const absl::optional<float> object_scale_opt,
LandmarkList* out_landmarks) override {
*out_landmarks = in_landmarks;
return absl::OkStatus();
@ -136,13 +152,15 @@ class VelocityFilter : public LandmarksFilter {
absl::Status Apply(const LandmarkList& in_landmarks,
const absl::Duration& timestamp,
const absl::optional<float> object_scale_opt,
LandmarkList* out_landmarks) override {
// Get value scale as inverse value of the object scale.
// If value is too small smoothing will be disabled and landmarks will be
// returned as is.
float value_scale = 1.0f;
if (!disable_value_scaling_) {
const float object_scale = GetObjectScale(in_landmarks);
const float object_scale =
object_scale_opt ? *object_scale_opt : GetObjectScale(in_landmarks);
if (object_scale < min_allowed_object_scale_) {
*out_landmarks = in_landmarks;
return absl::OkStatus();
@ -205,12 +223,14 @@ class VelocityFilter : public LandmarksFilter {
class OneEuroFilterImpl : public LandmarksFilter {
public:
OneEuroFilterImpl(double frequency, double min_cutoff, double beta,
double derivate_cutoff, float min_allowed_object_scale)
double derivate_cutoff, float min_allowed_object_scale,
bool disable_value_scaling)
: frequency_(frequency),
min_cutoff_(min_cutoff),
beta_(beta),
derivate_cutoff_(derivate_cutoff),
min_allowed_object_scale_(min_allowed_object_scale) {}
min_allowed_object_scale_(min_allowed_object_scale),
disable_value_scaling_(disable_value_scaling) {}
absl::Status Reset() override {
x_filters_.clear();
@ -221,16 +241,24 @@ class OneEuroFilterImpl : public LandmarksFilter {
absl::Status Apply(const LandmarkList& in_landmarks,
const absl::Duration& timestamp,
const absl::optional<float> object_scale_opt,
LandmarkList* out_landmarks) override {
// Initialize filters once.
MP_RETURN_IF_ERROR(InitializeFiltersIfEmpty(in_landmarks.landmark_size()));
const float object_scale = GetObjectScale(in_landmarks);
// Get value scale as inverse value of the object scale.
// If value is too small smoothing will be disabled and landmarks will be
// returned as is.
float value_scale = 1.0f;
if (!disable_value_scaling_) {
const float object_scale =
object_scale_opt ? *object_scale_opt : GetObjectScale(in_landmarks);
if (object_scale < min_allowed_object_scale_) {
*out_landmarks = in_landmarks;
return absl::OkStatus();
}
const float value_scale = 1.0f / object_scale;
value_scale = 1.0f / object_scale;
}
// Filter landmarks. Every axis of every landmark is filtered separately.
for (int i = 0; i < in_landmarks.landmark_size(); ++i) {
@ -277,6 +305,7 @@ class OneEuroFilterImpl : public LandmarksFilter {
double beta_;
double derivate_cutoff_;
double min_allowed_object_scale_;
bool disable_value_scaling_;
std::vector<OneEuroFilter> x_filters_;
std::vector<OneEuroFilter> y_filters_;
@ -292,6 +321,10 @@ class OneEuroFilterImpl : public LandmarksFilter {
// IMAGE_SIZE: A std::pair<int, int> represention of image width and height.
// Required to perform all computations in absolute coordinates to avoid any
// influence of normalized values.
// OBJECT_SCALE_ROI (optional): A NormRect or Rect (depending on the format of
// input landmarks) used to determine the object scale for some of the
// filters. If not provided - object scale will be calculated from
// landmarks.
//
// Outputs:
// NORM_FILTERED_LANDMARKS: A NormalizedLandmarkList of smoothed landmarks.
@ -301,6 +334,7 @@ class OneEuroFilterImpl : public LandmarksFilter {
// calculator: "LandmarksSmoothingCalculator"
// input_stream: "NORM_LANDMARKS:pose_landmarks"
// input_stream: "IMAGE_SIZE:image_size"
// input_stream: "OBJECT_SCALE_ROI:roi"
// output_stream: "NORM_FILTERED_LANDMARKS:pose_landmarks_filtered"
// options: {
// [mediapipe.LandmarksSmoothingCalculatorOptions.ext] {
@ -330,9 +364,17 @@ absl::Status LandmarksSmoothingCalculator::GetContract(CalculatorContract* cc) {
cc->Outputs()
.Tag(kNormalizedFilteredLandmarksTag)
.Set<NormalizedLandmarkList>();
if (cc->Inputs().HasTag(kObjectScaleRoiTag)) {
cc->Inputs().Tag(kObjectScaleRoiTag).Set<NormalizedRect>();
}
} else {
cc->Inputs().Tag(kLandmarksTag).Set<LandmarkList>();
cc->Outputs().Tag(kFilteredLandmarksTag).Set<LandmarkList>();
if (cc->Inputs().HasTag(kObjectScaleRoiTag)) {
cc->Inputs().Tag(kObjectScaleRoiTag).Set<Rect>();
}
}
return absl::OkStatus();
@ -357,7 +399,8 @@ absl::Status LandmarksSmoothingCalculator::Open(CalculatorContext* cc) {
options.one_euro_filter().min_cutoff(),
options.one_euro_filter().beta(),
options.one_euro_filter().derivate_cutoff(),
options.one_euro_filter().min_allowed_object_scale());
options.one_euro_filter().min_allowed_object_scale(),
options.one_euro_filter().disable_value_scaling());
} else {
RET_CHECK_FAIL()
<< "Landmarks filter is either not specified or not supported";
@ -389,13 +432,20 @@ absl::Status LandmarksSmoothingCalculator::Process(CalculatorContext* cc) {
std::tie(image_width, image_height) =
cc->Inputs().Tag(kImageSizeTag).Get<std::pair<int, int>>();
absl::optional<float> object_scale;
if (cc->Inputs().HasTag(kObjectScaleRoiTag) &&
!cc->Inputs().Tag(kObjectScaleRoiTag).IsEmpty()) {
auto& roi = cc->Inputs().Tag(kObjectScaleRoiTag).Get<NormalizedRect>();
object_scale = GetObjectScale(roi, image_width, image_height);
}
auto in_landmarks = absl::make_unique<LandmarkList>();
NormalizedLandmarksToLandmarks(in_norm_landmarks, image_width, image_height,
in_landmarks.get());
auto out_landmarks = absl::make_unique<LandmarkList>();
MP_RETURN_IF_ERROR(landmarks_filter_->Apply(*in_landmarks, timestamp,
out_landmarks.get()));
MP_RETURN_IF_ERROR(landmarks_filter_->Apply(
*in_landmarks, timestamp, object_scale, out_landmarks.get()));
auto out_norm_landmarks = absl::make_unique<NormalizedLandmarkList>();
LandmarksToNormalizedLandmarks(*out_landmarks, image_width, image_height,
@ -408,9 +458,16 @@ absl::Status LandmarksSmoothingCalculator::Process(CalculatorContext* cc) {
const auto& in_landmarks =
cc->Inputs().Tag(kLandmarksTag).Get<LandmarkList>();
absl::optional<float> object_scale;
if (cc->Inputs().HasTag(kObjectScaleRoiTag) &&
!cc->Inputs().Tag(kObjectScaleRoiTag).IsEmpty()) {
auto& roi = cc->Inputs().Tag(kObjectScaleRoiTag).Get<Rect>();
object_scale = GetObjectScale(roi);
}
auto out_landmarks = absl::make_unique<LandmarkList>();
MP_RETURN_IF_ERROR(
landmarks_filter_->Apply(in_landmarks, timestamp, out_landmarks.get()));
MP_RETURN_IF_ERROR(landmarks_filter_->Apply(
in_landmarks, timestamp, object_scale, out_landmarks.get()));
cc->Outputs()
.Tag(kFilteredLandmarksTag)

View File

@ -41,9 +41,9 @@ message LandmarksSmoothingCalculatorOptions {
optional float min_allowed_object_scale = 3 [default = 1e-6];
// Disable value scaling based on object size and use `1.0` instead.
// Value scale is calculated as inverse value of object size. Object size is
// calculated as maximum side of rectangular bounding box of the object in
// XY plane.
// If not disabled, value scale is calculated as inverse value of object
// size. Object size is calculated as maximum side of rectangular bounding
// box of the object in XY plane.
optional bool disable_value_scaling = 4 [default = false];
}
@ -72,6 +72,12 @@ message LandmarksSmoothingCalculatorOptions {
// If calculated object scale is less than given value smoothing will be
// disabled and landmarks will be returned as is.
optional float min_allowed_object_scale = 5 [default = 1e-6];
// Disable value scaling based on object size and use `1.0` instead.
// If not disabled, value scale is calculated as inverse value of object
// size. Object size is calculated as maximum side of rectangular bounding
// box of the object in XY plane.
optional bool disable_value_scaling = 6 [default = false];
}
oneof filter_options {

View File

@ -40,7 +40,7 @@ constexpr char kRectTag[] = "NORM_RECT";
// Input:
// LANDMARKS: A LandmarkList representing world landmarks in the rectangle.
// NORM_RECT: An NormalizedRect representing a normalized rectangle in image
// coordinates.
// coordinates. (Optional)
//
// Output:
// LANDMARKS: A LandmarkList representing world landmarks projected (rotated
@ -59,7 +59,9 @@ class WorldLandmarkProjectionCalculator : public CalculatorBase {
public:
static absl::Status GetContract(CalculatorContract* cc) {
cc->Inputs().Tag(kLandmarksTag).Set<LandmarkList>();
if (cc->Inputs().HasTag(kRectTag)) {
cc->Inputs().Tag(kRectTag).Set<NormalizedRect>();
}
cc->Outputs().Tag(kLandmarksTag).Set<LandmarkList>();
return absl::OkStatus();
@ -74,13 +76,24 @@ class WorldLandmarkProjectionCalculator : public CalculatorBase {
absl::Status Process(CalculatorContext* cc) override {
// Check that landmarks and rect are not empty.
if (cc->Inputs().Tag(kLandmarksTag).IsEmpty() ||
cc->Inputs().Tag(kRectTag).IsEmpty()) {
(cc->Inputs().HasTag(kRectTag) &&
cc->Inputs().Tag(kRectTag).IsEmpty())) {
return absl::OkStatus();
}
const auto& in_landmarks =
cc->Inputs().Tag(kLandmarksTag).Get<LandmarkList>();
std::function<void(const Landmark&, Landmark*)> rotate_fn;
if (cc->Inputs().HasTag(kRectTag)) {
const auto& in_rect = cc->Inputs().Tag(kRectTag).Get<NormalizedRect>();
const float cosa = std::cos(in_rect.rotation());
const float sina = std::sin(in_rect.rotation());
rotate_fn = [cosa, sina](const Landmark& in_landmark,
Landmark* out_landmark) {
out_landmark->set_x(cosa * in_landmark.x() - sina * in_landmark.y());
out_landmark->set_y(sina * in_landmark.x() + cosa * in_landmark.y());
};
}
auto out_landmarks = absl::make_unique<LandmarkList>();
for (int i = 0; i < in_landmarks.landmark_size(); ++i) {
@ -89,11 +102,9 @@ class WorldLandmarkProjectionCalculator : public CalculatorBase {
Landmark* out_landmark = out_landmarks->add_landmark();
*out_landmark = in_landmark;
const float angle = in_rect.rotation();
out_landmark->set_x(std::cos(angle) * in_landmark.x() -
std::sin(angle) * in_landmark.y());
out_landmark->set_y(std::sin(angle) * in_landmark.x() +
std::cos(angle) * in_landmark.y());
if (rotate_fn) {
rotate_fn(in_landmark, out_landmark);
}
}
cc->Outputs()

View File

@ -0,0 +1,60 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
licenses(["notice"])
package(default_visibility = ["//visibility:private"])
cc_binary(
name = "libmediapipe_jni.so",
linkshared = 1,
linkstatic = 1,
deps = [
"//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu_deps",
"//mediapipe/java/com/google/mediapipe/framework/jni:mediapipe_framework_jni",
],
)
cc_library(
name = "mediapipe_jni_lib",
srcs = [":libmediapipe_jni.so"],
alwayslink = 1,
)
android_binary(
name = "selfiesegmentationgpu",
srcs = glob(["*.java"]),
assets = [
"//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu.binarypb",
"//mediapipe/modules/selfie_segmentation:selfie_segmentation.tflite",
],
assets_dir = "",
manifest = "//mediapipe/examples/android/src/java/com/google/mediapipe/apps/basic:AndroidManifest.xml",
manifest_values = {
"applicationId": "com.google.mediapipe.apps.selfiesegmentationgpu",
"appName": "Selfie Segmentation",
"mainActivity": "com.google.mediapipe.apps.basic.MainActivity",
"cameraFacingFront": "True",
"binaryGraphName": "selfie_segmentation_gpu.binarypb",
"inputVideoStreamName": "input_video",
"outputVideoStreamName": "output_video",
"flipFramesVertically": "True",
"converterNumBuffers": "2",
},
multidex = "native",
deps = [
":mediapipe_jni_lib",
"//mediapipe/examples/android/src/java/com/google/mediapipe/apps/basic:basic_lib",
],
)

View File

@ -49,7 +49,23 @@ message FaceBoxAdjusterCalculatorOptions {
optional float ipd_face_box_height_ratio = 7 [default = 0.3131];
// The max look up angle before considering the eye distance unstable.
optional float max_head_tilt_angle_deg = 8 [default = 12.0];
optional float max_head_tilt_angle_deg = 8 [default = 5.0];
// The min look up angle (i.e. looking down) before considering the eye
// distance unstable.
optional float min_head_tilt_angle_deg = 10 [default = -18.0];
// The max look right angle before considering the eye distance unstable.
optional float max_head_pan_angle_deg = 11 [default = 25.0];
// The min look right angle (i.e. looking left) before considering the eye
// distance unstable.
optional float min_head_pan_angle_deg = 12 [default = -25.0];
// Update rate for motion history, valid values [0.0, 1.0].
optional float motion_history_alpha = 13 [default = 0.5];
// Max value of head motion (max of current or history) to be considered still
// stable.
optional float head_motion_threshold = 14 [default = 10.0];
// The max amount of time to use an old eye distance when the face look angle
// is unstable.
optional int32 max_facesize_history_us = 9 [default = 8000000];

View File

@ -14,8 +14,8 @@ node: {
output_stream: "LETTERBOX_PADDING:letterbox_padding"
options: {
[mediapipe.ImageTransformationCalculatorOptions.ext] {
output_width: 256
output_height: 256
output_width: 192
output_height: 192
scale_mode: FIT
}
}
@ -50,19 +50,17 @@ node {
output_side_packet: "anchors"
options: {
[mediapipe.SsdAnchorsCalculatorOptions.ext] {
num_layers: 4
min_scale: 0.15625
num_layers: 1
min_scale: 0.1484375
max_scale: 0.75
input_size_height: 256
input_size_width: 256
input_size_height: 192
input_size_width: 192
anchor_offset_x: 0.5
anchor_offset_y: 0.5
strides: 16
strides: 32
strides: 32
strides: 32
strides: 4
aspect_ratios: 1.0
fixed_anchor_size: true
interpolated_scale_aspect_ratio: 0.0
}
}
}
@ -78,7 +76,7 @@ node {
options: {
[mediapipe.TfLiteTensorsToDetectionsCalculatorOptions.ext] {
num_classes: 1
num_boxes: 896
num_boxes: 2304
num_coords: 16
box_coord_offset: 0
keypoint_coord_offset: 4
@ -87,11 +85,11 @@ node {
sigmoid_score: true
score_clipping_thresh: 100.0
reverse_output_order: true
x_scale: 256.0
y_scale: 256.0
h_scale: 256.0
w_scale: 256.0
min_score_thresh: 0.65
x_scale: 192.0
y_scale: 192.0
h_scale: 192.0
w_scale: 192.0
min_score_thresh: 0.6
}
}
}

View File

@ -0,0 +1,34 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
licenses(["notice"])
package(default_visibility = ["//mediapipe/examples:__subpackages__"])
cc_binary(
name = "selfie_segmentation_cpu",
deps = [
"//mediapipe/examples/desktop:demo_run_graph_main",
"//mediapipe/graphs/selfie_segmentation:selfie_segmentation_cpu_deps",
],
)
# Linux only
cc_binary(
name = "selfie_segmentation_gpu",
deps = [
"//mediapipe/examples/desktop:demo_run_graph_main_gpu",
"//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu_deps",
],
)

View File

@ -0,0 +1,69 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
load(
"@build_bazel_rules_apple//apple:ios.bzl",
"ios_application",
)
load(
"//mediapipe/examples/ios:bundle_id.bzl",
"BUNDLE_ID_PREFIX",
"example_provisioning",
)
licenses(["notice"])
MIN_IOS_VERSION = "10.0"
alias(
name = "selfiesegmentationgpu",
actual = "SelfieSegmentationGpuApp",
)
ios_application(
name = "SelfieSegmentationGpuApp",
app_icons = ["//mediapipe/examples/ios/common:AppIcon"],
bundle_id = BUNDLE_ID_PREFIX + ".SelfieSegmentationGpu",
families = [
"iphone",
"ipad",
],
infoplists = [
"//mediapipe/examples/ios/common:Info.plist",
"Info.plist",
],
minimum_os_version = MIN_IOS_VERSION,
provisioning_profile = example_provisioning(),
deps = [
":SelfieSegmentationGpuAppLibrary",
"@ios_opencv//:OpencvFramework",
],
)
objc_library(
name = "SelfieSegmentationGpuAppLibrary",
data = [
"//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu.binarypb",
"//mediapipe/modules/selfie_segmentation:selfie_segmentation.tflite",
],
deps = [
"//mediapipe/examples/ios/common:CommonMediaPipeAppLibrary",
] + select({
"//mediapipe:ios_i386": [],
"//mediapipe:ios_x86_64": [],
"//conditions:default": [
"//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu_deps",
],
}),
)

View File

@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>CameraPosition</key>
<string>front</string>
<key>GraphOutputStream</key>
<string>output_video</string>
<key>GraphInputStream</key>
<string>input_video</string>
<key>GraphName</key>
<string>selfie_segmentation_gpu</string>
</dict>
</plist>

View File

@ -225,7 +225,7 @@ cc_library(
"//mediapipe/framework:stream_handler_cc_proto",
"//mediapipe/framework/port:any_proto",
"//mediapipe/framework/port:status",
"//mediapipe/framework/tool:options_util",
"//mediapipe/framework/tool:options_map",
"//mediapipe/framework/tool:tag_map",
"@com_google_absl//absl/memory",
],
@ -473,7 +473,7 @@ cc_library(
"//mediapipe/framework:calculator_cc_proto",
"//mediapipe/framework/port:any_proto",
"//mediapipe/framework/port:logging",
"//mediapipe/framework/tool:options_util",
"//mediapipe/framework/tool:options_map",
"@com_google_absl//absl/base:core_headers",
"@com_google_absl//absl/strings",
],

View File

@ -1,4 +1,4 @@
# Experimental new APIs
# New MediaPipe APIs
This directory defines new APIs for MediaPipe:
@ -6,13 +6,12 @@ This directory defines new APIs for MediaPipe:
- Builder API, for assembling CalculatorGraphConfigs with C++, as an alternative
to using the proto API directly.
The code is working, and the new APIs interoperate fully with the existing
framework code. They are considered a work in progress, but are being released
now so we can begin adopting them in our calculators.
The new APIs interoperate fully with the existing framework code, and we are
adopting them in our calculators. We are still making improvements, and the
placement of this code under the `mediapipe::api2` namespace is not final.
Developers are welcome to try out these APIs as early adopters, but should
expect breaking changes. The placement of this code under the `mediapipe::api2`
namespace is not final.
Developers are welcome to try out these APIs as early adopters, but there may be
breaking changes.
## Node API

View File

@ -29,7 +29,7 @@
#include "mediapipe/framework/port.h"
#include "mediapipe/framework/port/any_proto.h"
#include "mediapipe/framework/status_handler.pb.h"
#include "mediapipe/framework/tool/options_util.h"
#include "mediapipe/framework/tool/options_map.h"
namespace mediapipe {

View File

@ -32,7 +32,7 @@
#include "mediapipe/framework/packet_set.h"
#include "mediapipe/framework/port.h"
#include "mediapipe/framework/port/any_proto.h"
#include "mediapipe/framework/tool/options_util.h"
#include "mediapipe/framework/tool/options_map.h"
namespace mediapipe {

View File

@ -154,12 +154,25 @@ cc_test(
],
)
cc_library(
name = "options_map",
hdrs = ["options_map.h"],
visibility = ["//mediapipe/framework:mediapipe_internal"],
deps = [
"//mediapipe/framework:calculator_cc_proto",
"//mediapipe/framework/port:any_proto",
"//mediapipe/framework/port:status",
"//mediapipe/framework/tool:type_util",
],
)
cc_library(
name = "options_util",
srcs = ["options_util.cc"],
hdrs = ["options_util.h"],
visibility = ["//mediapipe/framework:mediapipe_internal"],
deps = [
":options_map",
":proto_util_lite",
"//mediapipe/framework:calculator_cc_proto",
"//mediapipe/framework:collection",
@ -199,17 +212,6 @@ mediapipe_cc_test(
],
)
cc_library(
name = "packet_util",
hdrs = ["packet_util.h"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/framework:packet",
"//mediapipe/framework/port:statusor",
"@org_tensorflow//tensorflow/core:protos_all_cc",
],
)
cc_library(
name = "proto_util_lite",
srcs = ["proto_util_lite.cc"],
@ -681,6 +683,7 @@ cc_library(
"//mediapipe/framework/port:logging",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
"//mediapipe/framework/stream_handler:immediate_input_stream_handler",
"//mediapipe/framework/tool:switch_container_cc_proto",
"@com_google_absl//absl/strings",
],
@ -706,6 +709,7 @@ cc_library(
"//mediapipe/framework/port:logging",
"//mediapipe/framework/port:ret_check",
"//mediapipe/framework/port:status",
"//mediapipe/framework/stream_handler:immediate_input_stream_handler",
"//mediapipe/framework/tool:switch_container_cc_proto",
"@com_google_absl//absl/strings",
],

View File

@ -0,0 +1,107 @@
#ifndef MEDIAPIPE_FRAMEWORK_TOOL_OPTIONS_MAP_H_
#define MEDIAPIPE_FRAMEWORK_TOOL_OPTIONS_MAP_H_
#include <map>
#include <memory>
#include <type_traits>
#include "mediapipe/framework/calculator.pb.h"
#include "mediapipe/framework/port/any_proto.h"
#include "mediapipe/framework/port/status.h"
#include "mediapipe/framework/tool/type_util.h"
namespace mediapipe {
namespace tool {
// A compile-time detector for the constant |T::ext|.
template <typename T>
struct IsExtension {
private:
template <typename U>
static char test(decltype(&U::ext));
template <typename>
static int test(...);
public:
static constexpr bool value = (sizeof(test<T>(0)) == sizeof(char));
};
template <class T,
typename std::enable_if<IsExtension<T>::value, int>::type = 0>
void GetExtension(const CalculatorOptions& options, T* result) {
if (options.HasExtension(T::ext)) {
*result = options.GetExtension(T::ext);
}
}
template <class T,
typename std::enable_if<!IsExtension<T>::value, int>::type = 0>
void GetExtension(const CalculatorOptions& options, T* result) {}
template <class T>
void GetNodeOptions(const CalculatorGraphConfig::Node& node_config, T* result) {
#if defined(MEDIAPIPE_PROTO_LITE) && defined(MEDIAPIPE_PROTO_THIRD_PARTY)
// protobuf::Any is unavailable with third_party/protobuf:protobuf-lite.
#else
for (const mediapipe::protobuf::Any& options : node_config.node_options()) {
if (options.Is<T>()) {
options.UnpackTo(result);
}
}
#endif
}
// A map from object type to object.
class TypeMap {
public:
template <class T>
bool Has() const {
return content_.count(TypeId<T>()) > 0;
}
template <class T>
T* Get() const {
if (!Has<T>()) {
content_[TypeId<T>()] = std::make_shared<T>();
}
return static_cast<T*>(content_[TypeId<T>()].get());
}
private:
mutable std::map<TypeIndex, std::shared_ptr<void>> content_;
};
// Extracts the options message of a specified type from a
// CalculatorGraphConfig::Node.
class OptionsMap {
public:
OptionsMap& Initialize(const CalculatorGraphConfig::Node& node_config) {
node_config_ = &node_config;
return *this;
}
// Returns the options data for a CalculatorGraphConfig::Node, from
// either "options" or "node_options" using either GetExtension or UnpackTo.
template <class T>
const T& Get() const {
if (options_.Has<T>()) {
return *options_.Get<T>();
}
T* result = options_.Get<T>();
if (node_config_->has_options()) {
GetExtension(node_config_->options(), result);
} else {
GetNodeOptions(*node_config_, result);
}
return *result;
}
const CalculatorGraphConfig::Node* node_config_;
TypeMap options_;
};
} // namespace tool
} // namespace mediapipe
#endif // MEDIAPIPE_FRAMEWORK_TOOL_OPTIONS_MAP_H_

View File

@ -20,6 +20,7 @@
#include "mediapipe/framework/packet.h"
#include "mediapipe/framework/packet_set.h"
#include "mediapipe/framework/port/any_proto.h"
#include "mediapipe/framework/tool/options_map.h"
#include "mediapipe/framework/tool/type_util.h"
namespace mediapipe {
@ -34,64 +35,6 @@ inline T MergeOptions(const T& base, const T& options) {
return result;
}
// A compile-time detector for the constant |T::ext|.
template <typename T>
struct IsExtension {
private:
template <typename U>
static char test(decltype(&U::ext));
template <typename>
static int test(...);
public:
static constexpr bool value = (sizeof(test<T>(0)) == sizeof(char));
};
// A map from object type to object.
class TypeMap {
public:
template <class T>
bool Has() const {
return content_.count(TypeId<T>()) > 0;
}
template <class T>
T* Get() const {
if (!Has<T>()) {
content_[TypeId<T>()] = std::make_shared<T>();
}
return static_cast<T*>(content_[TypeId<T>()].get());
}
private:
mutable std::map<TypeIndex, std::shared_ptr<void>> content_;
};
template <class T,
typename std::enable_if<IsExtension<T>::value, int>::type = 0>
void GetExtension(const CalculatorOptions& options, T* result) {
if (options.HasExtension(T::ext)) {
*result = options.GetExtension(T::ext);
}
}
template <class T,
typename std::enable_if<!IsExtension<T>::value, int>::type = 0>
void GetExtension(const CalculatorOptions& options, T* result) {}
template <class T>
void GetNodeOptions(const CalculatorGraphConfig::Node& node_config, T* result) {
#if defined(MEDIAPIPE_PROTO_LITE) && defined(MEDIAPIPE_PROTO_THIRD_PARTY)
// protobuf::Any is unavailable with third_party/protobuf:protobuf-lite.
#else
for (const mediapipe::protobuf::Any& options : node_config.node_options()) {
if (options.Is<T>()) {
options.UnpackTo(result);
}
}
#endif
}
// Combine a base options message with an optional side packet. The specified
// packet can hold either the specified options type T or CalculatorOptions.
// Fields are either replaced or merged depending on field merge_fields.
@ -132,35 +75,6 @@ inline T RetrieveOptions(const T& base, const InputStreamShardSet& stream_set,
return base;
}
// Extracts the options message of a specified type from a
// CalculatorGraphConfig::Node.
class OptionsMap {
public:
OptionsMap& Initialize(const CalculatorGraphConfig::Node& node_config) {
node_config_ = &node_config;
return *this;
}
// Returns the options data for a CalculatorGraphConfig::Node, from
// either "options" or "node_options" using either GetExtension or UnpackTo.
template <class T>
const T& Get() const {
if (options_.Has<T>()) {
return *options_.Get<T>();
}
T* result = options_.Get<T>();
if (node_config_->has_options()) {
GetExtension(node_config_->options(), result);
} else {
GetNodeOptions(*node_config_, result);
}
return *result;
}
const CalculatorGraphConfig::Node* node_config_;
TypeMap options_;
};
// Finds the descriptor for a protobuf.
const proto_ns::Descriptor* GetProtobufDescriptor(const std::string& type_name);

View File

@ -1,57 +0,0 @@
// Copyright 2019 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#ifndef MEDIAPIPE_FRAMEWORK_TOOL_PACKET_UTIL_H_
#define MEDIAPIPE_FRAMEWORK_TOOL_PACKET_UTIL_H_
#include "mediapipe/framework/packet.h"
#include "tensorflow/core/example/example.pb.h"
namespace mediapipe {
namespace tool {
// The CLIF-friendly util functions to create and access a typed MediaPipe
// Packet from MediaPipe Python interface.
// Functions for SequenceExample Packets.
// Make a SequenceExample packet from a serialized SequenceExample.
// The SequenceExample in the Packet is owned by the C++ packet.
Packet CreateSequenceExamplePacketFromString(std::string* serialized_content) {
tensorflow::SequenceExample sequence_example;
sequence_example.ParseFromString(*serialized_content);
return MakePacket<tensorflow::SequenceExample>(sequence_example);
}
// Get a serialized SequenceExample std::string from a Packet.
// The ownership of the returned std::string will be transferred to the Python
// object.
std::unique_ptr<std::string> GetSerializedSequenceExample(Packet* packet) {
return absl::make_unique<std::string>(
packet->Get<tensorflow::SequenceExample>().SerializeAsString());
}
// Make a String packet
Packet CreateStringPacket(std::string* input_string) {
return MakePacket<std::string>(*input_string);
}
// Get the std::string from a Packet<std::string>
std::unique_ptr<std::string> GetString(Packet* packet) {
return absl::make_unique<std::string>(packet->Get<std::string>());
}
} // namespace tool
} // namespace mediapipe
#endif // MEDIAPIPE_FRAMEWORK_TOOL_PACKET_UTIL_H_

View File

@ -16,8 +16,6 @@
#define MEDIAPIPE_FRAMEWORK_TOOL_TYPE_UTIL_H_
#include <cstddef>
#include <string>
#include <typeindex>
#include <typeinfo>
#include "mediapipe/framework/port.h"

View File

@ -142,7 +142,7 @@ def _metal_library_impl(ctx):
if ctx.files.hdrs:
additional_params["header"] = depset([f for f in ctx.files.hdrs])
objc_provider = apple_common.new_objc_provider(
providers = [x.objc for x in ctx.attr.deps if hasattr(x, "objc")],
providers = [x[apple_common.Objc] for x in ctx.attr.deps if apple_common.Objc in x],
**additional_params
)
@ -169,7 +169,7 @@ def _metal_library_impl(ctx):
METAL_LIBRARY_ATTRS = dicts.add(apple_support.action_required_attrs(), {
"srcs": attr.label_list(allow_files = [".metal"], allow_empty = False),
"hdrs": attr.label_list(allow_files = [".h"]),
"deps": attr.label_list(providers = [["objc", CcInfo]]),
"deps": attr.label_list(providers = [["objc", CcInfo], [apple_common.Objc, CcInfo]]),
"copts": attr.string_list(),
"minimum_os_version": attr.string(),
})

View File

@ -40,8 +40,8 @@ node: {
output_stream: "LETTERBOX_PADDING:letterbox_padding"
node_options: {
[type.googleapis.com/mediapipe.ImageTransformationCalculatorOptions] {
output_width: 256
output_height: 256
output_width: 192
output_height: 192
scale_mode: FIT
}
}
@ -76,19 +76,17 @@ node {
output_side_packet: "anchors"
node_options: {
[type.googleapis.com/mediapipe.SsdAnchorsCalculatorOptions] {
num_layers: 4
min_scale: 0.15625
num_layers: 1
min_scale: 0.1484375
max_scale: 0.75
input_size_height: 256
input_size_width: 256
input_size_height: 192
input_size_width: 192
anchor_offset_x: 0.5
anchor_offset_y: 0.5
strides: 16
strides: 32
strides: 32
strides: 32
strides: 4
aspect_ratios: 1.0
fixed_anchor_size: true
interpolated_scale_aspect_ratio: 0.0
}
}
}
@ -104,7 +102,7 @@ node {
node_options: {
[type.googleapis.com/mediapipe.TfLiteTensorsToDetectionsCalculatorOptions] {
num_classes: 1
num_boxes: 896
num_boxes: 2304
num_coords: 16
box_coord_offset: 0
keypoint_coord_offset: 4
@ -113,11 +111,11 @@ node {
sigmoid_score: true
score_clipping_thresh: 100.0
reverse_output_order: true
x_scale: 256.0
y_scale: 256.0
h_scale: 256.0
w_scale: 256.0
min_score_thresh: 0.65
x_scale: 192.0
y_scale: 192.0
h_scale: 192.0
w_scale: 192.0
min_score_thresh: 0.6
}
}
}

View File

@ -41,8 +41,8 @@ node: {
output_stream: "LETTERBOX_PADDING:letterbox_padding"
node_options: {
[type.googleapis.com/mediapipe.ImageTransformationCalculatorOptions] {
output_width: 256
output_height: 256
output_width: 192
output_height: 192
scale_mode: FIT
}
}
@ -77,19 +77,17 @@ node {
output_side_packet: "anchors"
node_options: {
[type.googleapis.com/mediapipe.SsdAnchorsCalculatorOptions] {
num_layers: 4
min_scale: 0.15625
num_layers: 1
min_scale: 0.1484375
max_scale: 0.75
input_size_height: 256
input_size_width: 256
input_size_height: 192
input_size_width: 192
anchor_offset_x: 0.5
anchor_offset_y: 0.5
strides: 16
strides: 32
strides: 32
strides: 32
strides: 4
aspect_ratios: 1.0
fixed_anchor_size: true
interpolated_scale_aspect_ratio: 0.0
}
}
}
@ -105,7 +103,7 @@ node {
node_options: {
[type.googleapis.com/mediapipe.TfLiteTensorsToDetectionsCalculatorOptions] {
num_classes: 1
num_boxes: 896
num_boxes: 2304
num_coords: 16
box_coord_offset: 0
keypoint_coord_offset: 4
@ -114,11 +112,11 @@ node {
sigmoid_score: true
score_clipping_thresh: 100.0
reverse_output_order: true
x_scale: 256.0
y_scale: 256.0
h_scale: 256.0
w_scale: 256.0
min_score_thresh: 0.65
x_scale: 192.0
y_scale: 192.0
h_scale: 192.0
w_scale: 192.0
min_score_thresh: 0.6
}
}
}

View File

@ -15,9 +15,9 @@
#include <cmath>
#include <memory>
#include "Eigen/Core"
#include "Eigen/Dense"
#include "Eigen/src/Core/util/Constants.h"
#include "Eigen/src/Geometry/Quaternion.h"
#include "Eigen/Geometry"
#include "absl/memory/memory.h"
#include "absl/strings/str_cat.h"
#include "absl/strings/str_join.h"

View File

@ -14,9 +14,9 @@
#include <memory>
#include "Eigen/Core"
#include "Eigen/Dense"
#include "Eigen/src/Core/util/Constants.h"
#include "Eigen/src/Geometry/Quaternion.h"
#include "Eigen/Geometry"
#include "absl/memory/memory.h"
#include "absl/strings/str_cat.h"
#include "absl/strings/str_join.h"

View File

@ -0,0 +1,54 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
load(
"//mediapipe/framework/tool:mediapipe_graph.bzl",
"mediapipe_binary_graph",
)
licenses(["notice"])
package(default_visibility = ["//visibility:public"])
cc_library(
name = "selfie_segmentation_gpu_deps",
deps = [
"//mediapipe/calculators/core:flow_limiter_calculator",
"//mediapipe/calculators/image:recolor_calculator",
"//mediapipe/modules/selfie_segmentation:selfie_segmentation_gpu",
],
)
mediapipe_binary_graph(
name = "selfie_segmentation_gpu_binary_graph",
graph = "selfie_segmentation_gpu.pbtxt",
output_name = "selfie_segmentation_gpu.binarypb",
deps = [":selfie_segmentation_gpu_deps"],
)
cc_library(
name = "selfie_segmentation_cpu_deps",
deps = [
"//mediapipe/calculators/core:flow_limiter_calculator",
"//mediapipe/calculators/image:recolor_calculator",
"//mediapipe/modules/selfie_segmentation:selfie_segmentation_cpu",
],
)
mediapipe_binary_graph(
name = "selfie_segmentation_cpu_binary_graph",
graph = "selfie_segmentation_cpu.pbtxt",
output_name = "selfie_segmentation_cpu.binarypb",
deps = [":selfie_segmentation_cpu_deps"],
)

View File

@ -0,0 +1,52 @@
# MediaPipe graph that performs selfie segmentation with TensorFlow Lite on CPU.
# CPU buffer. (ImageFrame)
input_stream: "input_video"
# Output image with rendered results. (ImageFrame)
output_stream: "output_video"
# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:output_video"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
}
# Subgraph that performs selfie segmentation.
node {
calculator: "SelfieSegmentationCpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "SEGMENTATION_MASK:segmentation_mask"
}
# Colors the selfie segmentation with the color specified in the option.
node {
calculator: "RecolorCalculator"
input_stream: "IMAGE:throttled_input_video"
input_stream: "MASK:segmentation_mask"
output_stream: "IMAGE:output_video"
node_options: {
[type.googleapis.com/mediapipe.RecolorCalculatorOptions] {
color { r: 0 g: 0 b: 255 }
mask_channel: RED
invert_mask: true
adjust_with_luminance: false
}
}
}

View File

@ -0,0 +1,52 @@
# MediaPipe graph that performs selfie segmentation with TensorFlow Lite on GPU.
# GPU buffer. (GpuBuffer)
input_stream: "input_video"
# Output image with rendered results. (GpuBuffer)
output_stream: "output_video"
# Throttles the images flowing downstream for flow control. It passes through
# the very first incoming image unaltered, and waits for downstream nodes
# (calculators and subgraphs) in the graph to finish their tasks before it
# passes through another image. All images that come in while waiting are
# dropped, limiting the number of in-flight images in most part of the graph to
# 1. This prevents the downstream nodes from queuing up incoming images and data
# excessively, which leads to increased latency and memory usage, unwanted in
# real-time mobile applications. It also eliminates unnecessarily computation,
# e.g., the output produced by a node may get dropped downstream if the
# subsequent nodes are still busy processing previous inputs.
node {
calculator: "FlowLimiterCalculator"
input_stream: "input_video"
input_stream: "FINISHED:output_video"
input_stream_info: {
tag_index: "FINISHED"
back_edge: true
}
output_stream: "throttled_input_video"
}
# Subgraph that performs selfie segmentation.
node {
calculator: "SelfieSegmentationGpu"
input_stream: "IMAGE:throttled_input_video"
output_stream: "SEGMENTATION_MASK:segmentation_mask"
}
# Colors the selfie segmentation with the color specified in the option.
node {
calculator: "RecolorCalculator"
input_stream: "IMAGE_GPU:throttled_input_video"
input_stream: "MASK_GPU:segmentation_mask"
output_stream: "IMAGE_GPU:output_video"
node_options: {
[type.googleapis.com/mediapipe.RecolorCalculatorOptions] {
color { r: 0 g: 0 b: 255 }
mask_channel: RED
invert_mask: true
adjust_with_luminance: false
}
}
}

View File

@ -0,0 +1,223 @@
// Copyright 2019-2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.components;
import android.graphics.SurfaceTexture;
import android.opengl.GLES11Ext;
import android.opengl.GLES20;
import android.opengl.GLSurfaceView;
import android.opengl.Matrix;
import android.util.Log;
import com.google.mediapipe.framework.TextureFrame;
import com.google.mediapipe.glutil.CommonShaders;
import com.google.mediapipe.glutil.ShaderUtil;
import java.nio.FloatBuffer;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;
import javax.microedition.khronos.egl.EGLConfig;
import javax.microedition.khronos.opengles.GL10;
/**
* Renderer for a {@link GLSurfaceView}. It displays a texture. The texture is scaled and cropped as
* necessary to fill the view, while maintaining its aspect ratio.
*
* <p>It can render both textures bindable to the normal {@link GLES20#GL_TEXTURE_2D} target as well
* as textures bindable to {@link GLES11Ext#GL_TEXTURE_EXTERNAL_OES}, which is used for Android
* surfaces. Call {@link #setTextureTarget(int)} to choose the correct target.
*
* <p>It can display a {@link SurfaceTexture} (call {@link #setSurfaceTexture(SurfaceTexture)}) or a
* {@link TextureFrame} (call {@link #setNextFrame(TextureFrame)}).
*/
public class GlSurfaceViewRenderer implements GLSurfaceView.Renderer {
private static final String TAG = "DemoRenderer";
private static final int ATTRIB_POSITION = 1;
private static final int ATTRIB_TEXTURE_COORDINATE = 2;
private int surfaceWidth;
private int surfaceHeight;
private int frameWidth = 0;
private int frameHeight = 0;
private int program = 0;
private int frameUniform;
private int textureTarget = GLES11Ext.GL_TEXTURE_EXTERNAL_OES;
private int textureTransformUniform;
// Controls the alignment between frame size and surface size, 0.5f default is centered.
private float alignmentHorizontal = 0.5f;
private float alignmentVertical = 0.5f;
private float[] textureTransformMatrix = new float[16];
private SurfaceTexture surfaceTexture = null;
private final AtomicReference<TextureFrame> nextFrame = new AtomicReference<>();
@Override
public void onSurfaceCreated(GL10 gl, EGLConfig config) {
if (surfaceTexture == null) {
Matrix.setIdentityM(textureTransformMatrix, 0 /* offset */);
}
Map<String, Integer> attributeLocations = new HashMap<>();
attributeLocations.put("position", ATTRIB_POSITION);
attributeLocations.put("texture_coordinate", ATTRIB_TEXTURE_COORDINATE);
Log.d(TAG, "external texture: " + isExternalTexture());
program =
ShaderUtil.createProgram(
CommonShaders.VERTEX_SHADER,
isExternalTexture()
? CommonShaders.FRAGMENT_SHADER_EXTERNAL
: CommonShaders.FRAGMENT_SHADER,
attributeLocations);
frameUniform = GLES20.glGetUniformLocation(program, "video_frame");
textureTransformUniform = GLES20.glGetUniformLocation(program, "texture_transform");
ShaderUtil.checkGlError("glGetUniformLocation");
GLES20.glClearColor(0.0f, 0.0f, 0.0f, 1.0f);
}
@Override
public void onSurfaceChanged(GL10 gl, int width, int height) {
surfaceWidth = width;
surfaceHeight = height;
GLES20.glViewport(0, 0, width, height);
}
@Override
public void onDrawFrame(GL10 gl) {
TextureFrame frame = nextFrame.getAndSet(null);
GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT);
ShaderUtil.checkGlError("glClear");
if (surfaceTexture == null && frame == null) {
return;
}
GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
ShaderUtil.checkGlError("glActiveTexture");
if (surfaceTexture != null) {
surfaceTexture.updateTexImage();
surfaceTexture.getTransformMatrix(textureTransformMatrix);
} else {
GLES20.glBindTexture(textureTarget, frame.getTextureName());
ShaderUtil.checkGlError("glBindTexture");
}
GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_LINEAR);
GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR);
GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE);
GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE);
ShaderUtil.checkGlError("texture setup");
GLES20.glUseProgram(program);
GLES20.glUniform1i(frameUniform, 0);
GLES20.glUniformMatrix4fv(textureTransformUniform, 1, false, textureTransformMatrix, 0);
ShaderUtil.checkGlError("glUniformMatrix4fv");
GLES20.glEnableVertexAttribArray(ATTRIB_POSITION);
GLES20.glVertexAttribPointer(
ATTRIB_POSITION, 2, GLES20.GL_FLOAT, false, 0, CommonShaders.SQUARE_VERTICES);
// TODO: compute scale from surfaceTexture size.
float scaleWidth = frameWidth > 0 ? (float) surfaceWidth / (float) frameWidth : 1.0f;
float scaleHeight = frameHeight > 0 ? (float) surfaceHeight / (float) frameHeight : 1.0f;
// Whichever of the two scales is greater corresponds to the dimension where the image
// is proportionally smaller than the view. Dividing both scales by that number results
// in that dimension having scale 1.0, and thus touching the edges of the view, while the
// other is cropped proportionally.
float maxScale = Math.max(scaleWidth, scaleHeight);
scaleWidth /= maxScale;
scaleHeight /= maxScale;
// Alignment controls where the visible section is placed within the full camera frame, with
// (0, 0) being the bottom left, and (1, 1) being the top right.
float textureLeft = (1.0f - scaleWidth) * alignmentHorizontal;
float textureRight = textureLeft + scaleWidth;
float textureBottom = (1.0f - scaleHeight) * alignmentVertical;
float textureTop = textureBottom + scaleHeight;
// Unlike on iOS, there is no need to flip the surfaceTexture here.
// But for regular textures, we will need to flip them.
final FloatBuffer passThroughTextureVertices =
ShaderUtil.floatBuffer(
textureLeft, textureBottom,
textureRight, textureBottom,
textureLeft, textureTop,
textureRight, textureTop);
GLES20.glEnableVertexAttribArray(ATTRIB_TEXTURE_COORDINATE);
GLES20.glVertexAttribPointer(
ATTRIB_TEXTURE_COORDINATE, 2, GLES20.GL_FLOAT, false, 0, passThroughTextureVertices);
ShaderUtil.checkGlError("program setup");
GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4);
ShaderUtil.checkGlError("glDrawArrays");
GLES20.glBindTexture(textureTarget, 0);
ShaderUtil.checkGlError("unbind surfaceTexture");
// We must flush before releasing the frame.
GLES20.glFlush();
if (frame != null) {
frame.release();
}
}
public void setTextureTarget(int target) {
if (program != 0) {
throw new IllegalStateException(
"setTextureTarget must be called before the surface is created");
}
textureTarget = target;
}
public void setSurfaceTexture(SurfaceTexture texture) {
if (!isExternalTexture()) {
throw new IllegalStateException(
"to use a SurfaceTexture, the texture target must be GL_TEXTURE_EXTERNAL_OES");
}
TextureFrame oldFrame = nextFrame.getAndSet(null);
if (oldFrame != null) {
oldFrame.release();
}
surfaceTexture = texture;
}
// Use this when the texture is not a SurfaceTexture.
public void setNextFrame(TextureFrame frame) {
if (surfaceTexture != null) {
Matrix.setIdentityM(textureTransformMatrix, 0 /* offset */);
}
TextureFrame oldFrame = nextFrame.getAndSet(frame);
if (oldFrame != null
&& (frame == null || (oldFrame.getTextureName() != frame.getTextureName()))) {
oldFrame.release();
}
surfaceTexture = null;
}
public void setFrameSize(int width, int height) {
frameWidth = width;
frameHeight = height;
}
/**
* When the aspect ratios between the camera frame and the surface size are mismatched, this
* controls how the image is aligned. 0.0 means aligning the left/bottom edges; 1.0 means aligning
* the right/top edges; 0.5 (default) means aligning the centers.
*/
public void setAlignment(float horizontal, float vertical) {
alignmentHorizontal = horizontal;
alignmentVertical = vertical;
}
private boolean isExternalTexture() {
return textureTarget == GLES11Ext.GL_TEXTURE_EXTERNAL_OES;
}
}

View File

@ -16,6 +16,7 @@ package com.google.mediapipe.framework;
import android.graphics.Bitmap;
import java.nio.ByteBuffer;
import java.util.List;
// TODO: use Preconditions in this file.
/**

View File

@ -444,8 +444,16 @@ JNIEXPORT jlong JNICALL PACKET_GETTER_METHOD(nativeGetGpuBuffer)(JNIEnv* env,
mediapipe::android::Graph::GetPacketFromHandle(packet);
mediapipe::GlTextureBufferSharedPtr ptr;
if (mediapipe_packet.ValidateAsType<mediapipe::Image>().ok()) {
const mediapipe::Image& buffer = mediapipe_packet.Get<mediapipe::Image>();
auto mediapipe_graph =
mediapipe::android::Graph::GetContextFromHandle(packet);
auto gl_context = mediapipe_graph->GetGpuResources()->gl_context();
auto status =
gl_context->Run([gl_context, mediapipe_packet, &ptr]() -> absl::Status {
const mediapipe::Image& buffer =
mediapipe_packet.Get<mediapipe::Image>();
ptr = buffer.GetGlTextureBufferSharedPtr();
return absl::OkStatus();
});
} else {
const mediapipe::GpuBuffer& buffer =
mediapipe_packet.Get<mediapipe::GpuBuffer>();

View File

@ -0,0 +1,67 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
package(default_visibility = ["//visibility:public"])
licenses(["notice"])
android_library(
name = "solution_base",
srcs = glob(
["*.java"],
exclude = [
"CameraInput.java",
],
),
visibility = ["//visibility:public"],
deps = [
"//mediapipe/java/com/google/mediapipe/framework:android_framework",
"//mediapipe/java/com/google/mediapipe/glutil",
"//third_party:autovalue",
"@maven//:com_google_code_findbugs_jsr305",
"@maven//:com_google_guava_guava",
],
)
android_library(
name = "camera_input",
srcs = ["CameraInput.java"],
visibility = ["//visibility:public"],
deps = [
"//mediapipe/java/com/google/mediapipe/components:android_camerax_helper",
"//mediapipe/java/com/google/mediapipe/components:android_components",
"//mediapipe/java/com/google/mediapipe/framework:android_framework",
"@maven//:com_google_guava_guava",
],
)
# Native dependencies of all MediaPipe solutions.
cc_binary(
name = "libmediapipe_jni.so",
linkshared = 1,
linkstatic = 1,
# TODO: Add more calculators to support other top-level solutions.
deps = [
"//mediapipe/java/com/google/mediapipe/framework/jni:mediapipe_framework_jni",
"//mediapipe/modules/hand_landmark:hand_landmark_tracking_gpu_image",
],
)
# Converts the .so cc_binary into a cc_library, to be consumed in an android_binary.
cc_library(
name = "mediapipe_jni_lib",
srcs = [":libmediapipe_jni.so"],
visibility = ["//visibility:public"],
alwayslink = 1,
)

View File

@ -0,0 +1,109 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
import android.app.Activity;
import com.google.mediapipe.components.CameraHelper;
import com.google.mediapipe.components.CameraXPreviewHelper;
import com.google.mediapipe.components.ExternalTextureConverter;
import com.google.mediapipe.components.PermissionHelper;
import com.google.mediapipe.components.TextureFrameConsumer;
import com.google.mediapipe.framework.MediaPipeException;
import com.google.mediapipe.framework.TextureFrame;
import javax.microedition.khronos.egl.EGLContext;
/**
* The camera component that takes the camera input and produces MediaPipe {@link TextureFrame}
* objects.
*/
public class CameraInput {
private static final String TAG = "CameraInput";
/** Represents the direction the camera faces relative to device screen. */
public static enum CameraFacing {
FRONT,
BACK
};
private final CameraXPreviewHelper cameraHelper;
private TextureFrameConsumer cameraNewFrameListener;
private ExternalTextureConverter converter;
/**
* Initializes CamereInput and requests camera permissions.
*
* @param activity an Android {@link Activity}.
*/
public CameraInput(Activity activity) {
cameraHelper = new CameraXPreviewHelper();
PermissionHelper.checkAndRequestCameraPermissions(activity);
}
/**
* Sets a callback to be invoked when new frames available.
*
* @param listener the callback.
*/
public void setCameraNewFrameListener(TextureFrameConsumer listener) {
cameraNewFrameListener = listener;
}
/**
* Sets up the external texture converter and starts the camera.
*
* @param activity an Android {@link Activity}.
* @param eglContext an OpenGL {@link EGLContext}.
* @param cameraFacing the direction the camera faces relative to device screen.
* @param width the desired width of the converted texture.
* @param height the desired height of the converted texture.
*/
public void start(
Activity activity, EGLContext eglContext, CameraFacing cameraFacing, int width, int height) {
if (!PermissionHelper.cameraPermissionsGranted(activity)) {
return;
}
if (converter == null) {
converter = new ExternalTextureConverter(eglContext, 2);
}
if (cameraNewFrameListener == null) {
throw new MediaPipeException(
MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(),
"cameraNewFrameListener is not set.");
}
converter.setConsumer(cameraNewFrameListener);
cameraHelper.setOnCameraStartedListener(
surfaceTexture ->
converter.setSurfaceTextureAndAttachToGLContext(surfaceTexture, width, height));
cameraHelper.startCamera(
activity,
cameraFacing == CameraFacing.FRONT
? CameraHelper.CameraFacing.FRONT
: CameraHelper.CameraFacing.BACK,
/*unusedSurfaceTexture=*/ null,
null);
}
/** Stops the camera input. */
public void stop() {
if (converter != null) {
converter.close();
}
}
/** Returns a boolean which is true if the camera is in Portrait mode, false in Landscape mode. */
public boolean isCameraRotated() {
return cameraHelper.isCameraRotated();
}
}

View File

@ -0,0 +1,20 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
/** Interface for the customizable MediaPipe solution error listener. */
public interface ErrorListener {
void onError(String message, RuntimeException e);
}

View File

@ -0,0 +1,174 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
import android.content.Context;
import android.graphics.Bitmap;
import android.util.Log;
import com.google.mediapipe.framework.MediaPipeException;
import com.google.mediapipe.framework.Packet;
import com.google.mediapipe.framework.TextureFrame;
import com.google.mediapipe.glutil.EglManager;
import java.util.concurrent.atomic.AtomicInteger;
import javax.microedition.khronos.egl.EGLContext;
/** The base class of the MediaPipe image solutions. */
// TODO: Consolidates the "send" methods to be a single "send(MlImage image)".
public class ImageSolutionBase extends SolutionBase {
public static final String TAG = "ImageSolutionBase";
protected boolean staticImageMode;
private EglManager eglManager;
// Internal fake timestamp for static images.
private final AtomicInteger staticImageTimestamp = new AtomicInteger(0);
/**
* Initializes MediaPipe image solution base with Android context, solution specific settings, and
* solution result handler.
*
* @param context an Android {@link Context}.
* @param solutionInfo a {@link SolutionInfo} contains binary graph file path, graph input and
* output stream names.
* @param outputHandler a {@link OutputHandler} handles the solution graph output packets and
* runtime exception.
*/
@Override
public synchronized void initialize(
Context context,
SolutionInfo solutionInfo,
OutputHandler<? extends SolutionResult> outputHandler) {
staticImageMode = solutionInfo.staticImageMode();
try {
super.initialize(context, solutionInfo, outputHandler);
eglManager = new EglManager(/*parentContext=*/ null);
solutionGraph.setParentGlContext(eglManager.getNativeContext());
} catch (MediaPipeException e) {
throwException("Error occurs when creating MediaPipe image solution graph. ", e);
}
}
/** Returns the managed {@link EGLContext} to share the opengl context with other components. */
public EGLContext getGlContext() {
return eglManager.getContext();
}
/** Returns the opengl major version number. */
public int getGlMajorVersion() {
return eglManager.getGlMajorVersion();
}
/** Sends a {@link TextureFrame} into solution graph for processing. */
public void send(TextureFrame textureFrame) {
if (!staticImageMode && textureFrame.getTimestamp() == Long.MIN_VALUE) {
throwException(
"Error occurs when calling the solution send method. ",
new MediaPipeException(
MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(),
"TextureFrame's timestamp needs to be explicitly set if not in static image mode."));
return;
}
long timestampUs =
staticImageMode ? staticImageTimestamp.getAndIncrement() : textureFrame.getTimestamp();
sendImage(textureFrame, timestampUs);
}
/**
* Sends a {@link Bitmap} with a timestamp into solution graph for processing. In static image
* mode, the timestamp is ignored.
*/
public void send(Bitmap inputBitmap, long timestamp) {
if (staticImageMode) {
Log.w(TAG, "In static image mode, the MediaPipe solution ignores the input timestamp.");
}
sendImage(inputBitmap, staticImageMode ? staticImageTimestamp.getAndIncrement() : timestamp);
}
/** Sends a {@link Bitmap} (static image) into solution graph for processing. */
public void send(Bitmap inputBitmap) {
if (!staticImageMode) {
throwException(
"Error occurs when calling the solution send method. ",
new MediaPipeException(
MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(),
"When not in static image mode, a timestamp associated with the image is required."
+ " Use send(Bitmap inputBitmap, long timestamp) instead."));
return;
}
sendImage(inputBitmap, staticImageTimestamp.getAndIncrement());
}
/** Internal implementation of sending Bitmap/TextureFrame into the MediaPipe solution. */
private synchronized <T> void sendImage(T imageObj, long timestamp) {
if (lastTimestamp >= timestamp) {
throwException(
"The received frame having a smaller timestamp than the processed timestamp.",
new MediaPipeException(
MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(),
"Receving a frame with invalid timestamp."));
return;
}
lastTimestamp = timestamp;
Packet imagePacket = null;
try {
if (imageObj instanceof TextureFrame) {
imagePacket = packetCreator.createImage((TextureFrame) imageObj);
imageObj = null;
} else if (imageObj instanceof Bitmap) {
imagePacket = packetCreator.createRgbaImage((Bitmap) imageObj);
} else {
throwException(
"The input image type is not supported. ",
new MediaPipeException(
MediaPipeException.StatusCode.UNIMPLEMENTED.ordinal(),
"The input image type is not supported."));
}
try {
// addConsumablePacketToInputStream allows the graph to take exclusive ownership of the
// packet, which may allow for more memory optimizations.
solutionGraph.addConsumablePacketToInputStream(
imageInputStreamName, imagePacket, timestamp);
// If addConsumablePacket succeeded, we don't need to release the packet ourselves.
imagePacket = null;
} catch (MediaPipeException e) {
// TODO: do not suppress exceptions here!
if (errorListener == null) {
Log.e(TAG, "Mediapipe error: ", e);
} else {
throw e;
}
}
} catch (RuntimeException e) {
if (errorListener != null) {
errorListener.onError("Mediapipe error: ", e);
} else {
throw e;
}
} finally {
if (imagePacket != null) {
// In case of error, addConsumablePacketToInputStream will not release the packet, so we
// have to release it ourselves. (We could also re-try adding, but we don't).
imagePacket.release();
}
if (imageObj instanceof TextureFrame) {
if (imageObj != null) {
// imagePacket will release frame if it has been created, but if not, we need to
// release it.
((TextureFrame) imageObj).release();
}
}
}
}
}

View File

@ -0,0 +1,59 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
import android.graphics.Bitmap;
import com.google.mediapipe.framework.AndroidPacketGetter;
import com.google.mediapipe.framework.Packet;
import com.google.mediapipe.framework.PacketGetter;
import com.google.mediapipe.framework.TextureFrame;
/**
* The base class of any MediaPipe image solution result. The base class contains the common parts
* across all image solution results, including the input timestamp and the input image data. A new
* MediaPipe image solution result class should extend ImageSolutionResult.
*/
public class ImageSolutionResult implements SolutionResult {
protected long timestamp;
protected Packet imagePacket;
private Bitmap cachedBitmap;
// Result timestamp, which is set to the timestamp of the corresponding input image. May return
// Long.MIN_VALUE if the input image is not associated with a timestamp.
@Override
public long timestamp() {
return timestamp;
}
// Returns the corresponding input image as a {@link Bitmap}.
public Bitmap inputBitmap() {
if (cachedBitmap != null) {
return cachedBitmap;
}
cachedBitmap = AndroidPacketGetter.getBitmapFromRgba(imagePacket);
return cachedBitmap;
}
// Returns the corresponding input image as a {@link TextureFrame}. The caller must release the
// acquired {@link TextureFrame} after using.
public TextureFrame acquireTextureFrame() {
return PacketGetter.getTextureFrame(imagePacket);
}
// Releases image packet and the underlying data.
void releaseImagePacket() {
imagePacket.release();
}
}

View File

@ -0,0 +1,86 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
import android.util.Log;
import com.google.mediapipe.framework.MediaPipeException;
import com.google.mediapipe.framework.Packet;
import java.util.List;
/** Interface for handling MediaPipe solution graph outputs. */
public class OutputHandler<T extends SolutionResult> {
private static final String TAG = "OutputHandler";
/** Interface for converting outputs packet lists to solution result objects. */
public interface OutputConverter<T extends SolutionResult> {
public abstract T convert(List<Packet> packets);
}
// A solution specific graph output converter that should be implemented by solution.
private OutputConverter<T> outputConverter;
// The user-defined solution result listener.
private ResultListener<T> customResultListener;
// The user-defined error listener.
private ErrorListener customErrorListener;
/**
* Sets a callback to be invoked to convert a packet list to a solution result object.
*
* @param converter the solution-defined {@link OutputConverter} callback.
*/
public void setOutputConverter(OutputConverter<T> converter) {
this.outputConverter = converter;
}
/**
* Sets a callback to be invoked when a solution result objects become available .
*
* @param listener the user-defined {@link ResultListener} callback.
*/
public void setResultListener(ResultListener<T> listener) {
this.customResultListener = listener;
}
/**
* Sets a callback to be invoked when exceptions are thrown in the solution.
*
* @param listener the user-defined {@link ErrorListener} callback.
*/
public void setErrorListener(ErrorListener listener) {
this.customErrorListener = listener;
}
/** Handles a list of output packets. Invoked when packet lists become available. */
public void run(List<Packet> packets) {
T solutionResult = null;
try {
solutionResult = outputConverter.convert(packets);
customResultListener.run(solutionResult);
} catch (MediaPipeException e) {
if (customErrorListener != null) {
customErrorListener.onError("Error occurs when getting MediaPipe solution result. ", e);
} else {
Log.e(TAG, "Error occurs when getting MediaPipe solution result. " + e);
}
} finally {
for (Packet packet : packets) {
packet.release();
}
if (solutionResult instanceof ImageSolutionResult) {
ImageSolutionResult imageSolutionResult = (ImageSolutionResult) solutionResult;
imageSolutionResult.releaseImagePacket();
}
}
}
}

View File

@ -0,0 +1,20 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
/** Interface for the customizable MediaPipe solution result listener. */
public interface ResultListener<T> {
void run(T result);
}

View File

@ -0,0 +1,150 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
import static java.util.concurrent.TimeUnit.MICROSECONDS;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
import android.content.Context;
import android.os.SystemClock;
import android.util.Log;
import com.google.common.collect.ImmutableList;
import com.google.mediapipe.framework.AndroidAssetUtil;
import com.google.mediapipe.framework.AndroidPacketCreator;
import com.google.mediapipe.framework.Graph;
import com.google.mediapipe.framework.MediaPipeException;
import com.google.mediapipe.framework.Packet;
import com.google.mediapipe.framework.PacketGetter;
import com.google.protobuf.Parser;
import java.io.File;
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicBoolean;
import javax.annotation.Nullable;
/** The base class of the MediaPipe solutions. */
public class SolutionBase {
private static final String TAG = "SolutionBase";
protected Graph solutionGraph;
protected AndroidPacketCreator packetCreator;
protected ErrorListener errorListener;
protected String imageInputStreamName;
protected long lastTimestamp = Long.MIN_VALUE;
protected final AtomicBoolean solutionGraphStarted = new AtomicBoolean(false);
static {
// Load all native libraries needed by the app.
System.loadLibrary("mediapipe_jni");
System.loadLibrary("opencv_java3");
}
/**
* Initializes solution base with Android context, solution specific settings, and solution result
* handler.
*
* @param context an Android {@link Context}.
* @param solutionInfo a {@link SolutionInfo} contains binary graph file path, graph input and
* output stream names.
* @param outputHandler a {@link OutputHandler} handles both solution result object and runtime
* exception.
*/
public synchronized void initialize(
Context context,
SolutionInfo solutionInfo,
OutputHandler<? extends SolutionResult> outputHandler) {
this.imageInputStreamName = solutionInfo.imageInputStreamName();
try {
AndroidAssetUtil.initializeNativeAssetManager(context);
solutionGraph = new Graph();
if (new File(solutionInfo.binaryGraphPath()).isAbsolute()) {
solutionGraph.loadBinaryGraph(solutionInfo.binaryGraphPath());
} else {
solutionGraph.loadBinaryGraph(
AndroidAssetUtil.getAssetBytes(context.getAssets(), solutionInfo.binaryGraphPath()));
}
solutionGraph.addMultiStreamCallback(
solutionInfo.outputStreamNames(), outputHandler::run, /*observeTimestampBounds=*/ true);
packetCreator = new AndroidPacketCreator(solutionGraph);
} catch (MediaPipeException e) {
throwException("Error occurs when creating the MediaPipe solution graph. ", e);
}
}
/** Throws exception with error message. */
protected void throwException(String message, MediaPipeException e) {
if (errorListener != null) {
errorListener.onError(message, e);
} else {
Log.e(TAG, message, e);
}
}
/**
* A convinence method to get proto list from a packet. If packet is empty, returns an empty list.
*/
protected <T> List<T> getProtoVector(Packet packet, Parser<T> messageParser) {
return packet.isEmpty()
? ImmutableList.<T>of()
: PacketGetter.getProtoVector(packet, messageParser);
}
/** Gets current timestamp in microseconds. */
protected long getCurrentTimestampUs() {
return MICROSECONDS.convert(SystemClock.elapsedRealtime(), MILLISECONDS);
}
/** Starts the solution graph by taking an optional input side packets map. */
public synchronized void start(@Nullable Map<String, Packet> inputSidePackets) {
try {
if (inputSidePackets != null) {
solutionGraph.setInputSidePackets(inputSidePackets);
}
if (!solutionGraphStarted.getAndSet(true)) {
solutionGraph.startRunningGraph();
}
} catch (MediaPipeException e) {
throwException("Error occurs when starting the MediaPipe solution graph. ", e);
}
}
/** A blocking API that returns until the solution finishes processing all the pending tasks. */
public void waitUntilIdle() {
try {
solutionGraph.waitUntilGraphIdle();
} catch (MediaPipeException e) {
throwException("Error occurs when waiting until the MediaPipe graph becomes idle. ", e);
}
}
/** Closes and cleans up the solution graph. */
public void close() {
if (solutionGraphStarted.get()) {
try {
solutionGraph.closeAllPacketSources();
solutionGraph.waitUntilGraphDone();
} catch (MediaPipeException e) {
// Note: errors during Process are reported at the earliest opportunity,
// which may be addPacket or waitUntilDone, depending on timing. For consistency,
// we want to always report them using the same async handler if installed.
throwException("Error occurs when closing the Mediapipe solution graph. ", e);
}
try {
solutionGraph.tearDown();
} catch (MediaPipeException e) {
throwException("Error occurs when closing the Mediapipe solution graph. ", e);
}
}
}
}

View File

@ -0,0 +1,48 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
import com.google.auto.value.AutoValue;
import com.google.common.collect.ImmutableList;
/** SolutionInfo contains all needed informaton to initialize a MediaPipe solution graph. */
@AutoValue
public abstract class SolutionInfo {
public abstract String binaryGraphPath();
public abstract String imageInputStreamName();
public abstract ImmutableList<String> outputStreamNames();
public abstract boolean staticImageMode();
public static Builder builder() {
return new AutoValue_SolutionInfo.Builder();
}
/** Builder for {@link SolutionInfo}. */
@AutoValue.Builder
public abstract static class Builder {
public abstract Builder setBinaryGraphPath(String value);
public abstract Builder setImageInputStreamName(String value);
public abstract Builder setOutputStreamNames(ImmutableList<String> value);
public abstract Builder setStaticImageMode(boolean value);
public abstract SolutionInfo build();
}
}

View File

@ -0,0 +1,23 @@
// Copyright 2021 The MediaPipe Authors.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package com.google.mediapipe.solutionbase;
/**
* Interface of the MediaPipe solution result. Any MediaPipe solution-specific result class should
* implement SolutionResult.
*/
public interface SolutionResult {
long timestamp();
}

View File

@ -83,6 +83,8 @@ mediapipe_simple_subgraph(
exports_files(
srcs = [
"face_detection_back.tflite",
"face_detection_back_sparse.tflite",
"face_detection_front.tflite",
],
)

View File

@ -109,7 +109,7 @@ node {
output_stream: "ensured_landmark_tensors"
}
# Decodes the landmark tensors into a vector of lanmarks, where the landmark
# Decodes the landmark tensors into a vector of landmarks, where the landmark
# coordinates are normalized by the size of the input image to the model.
node {
calculator: "TensorsToLandmarksCalculator"

View File

@ -109,7 +109,7 @@ node {
output_stream: "ensured_landmark_tensors"
}
# Decodes the landmark tensors into a vector of lanmarks, where the landmark
# Decodes the landmark tensors into a vector of landmarks, where the landmark
# coordinates are normalized by the size of the input image to the model.
node {
calculator: "TensorsToLandmarksCalculator"

View File

@ -14,7 +14,7 @@
#include "mediapipe/modules/objectron/calculators/box.h"
#include "Eigen/src/Core/util/Constants.h"
#include "Eigen/Core"
#include "mediapipe/framework/port/logging.h"
namespace mediapipe {

View File

@ -78,7 +78,9 @@ mediapipe_simple_subgraph(
graph = "pose_landmark_filtering.pbtxt",
register_as = "PoseLandmarkFiltering",
deps = [
"//mediapipe/calculators/util:alignment_points_to_rects_calculator",
"//mediapipe/calculators/util:landmarks_smoothing_calculator",
"//mediapipe/calculators/util:landmarks_to_detection_calculator",
"//mediapipe/calculators/util:visibility_smoothing_calculator",
"//mediapipe/framework/tool:switch_container",
],

View File

@ -29,6 +29,29 @@ output_stream: "FILTERED_NORM_LANDMARKS:filtered_landmarks"
# Filtered auxiliary set of normalized landmarks. (NormalizedRect)
output_stream: "FILTERED_AUX_NORM_LANDMARKS:filtered_aux_landmarks"
# Converts landmarks to a detection that tightly encloses all landmarks.
node {
calculator: "LandmarksToDetectionCalculator"
input_stream: "NORM_LANDMARKS:aux_landmarks"
output_stream: "DETECTION:aux_detection"
}
# Converts detection into a rectangle based on center and scale alignment
# points.
node {
calculator: "AlignmentPointsRectsCalculator"
input_stream: "DETECTION:aux_detection"
input_stream: "IMAGE_SIZE:image_size"
output_stream: "NORM_RECT:roi"
options: {
[mediapipe.DetectionsToRectsCalculatorOptions.ext] {
rotation_vector_start_keypoint_index: 0
rotation_vector_end_keypoint_index: 1
rotation_vector_target_angle_degrees: 90
}
}
}
# Smoothes pose landmark visibilities to reduce jitter.
node {
calculator: "SwitchContainer"
@ -66,6 +89,7 @@ node {
input_side_packet: "ENABLE:enable"
input_stream: "NORM_LANDMARKS:filtered_visibility"
input_stream: "IMAGE_SIZE:image_size"
input_stream: "OBJECT_SCALE_ROI:roi"
output_stream: "NORM_FILTERED_LANDMARKS:filtered_landmarks"
options: {
[mediapipe.SwitchContainerOptions.ext] {
@ -83,12 +107,12 @@ node {
options: {
[mediapipe.LandmarksSmoothingCalculatorOptions.ext] {
one_euro_filter {
# Min cutoff 0.1 results into ~ 0.02 alpha in landmark EMA filter
# Min cutoff 0.1 results into ~0.01 alpha in landmark EMA filter
# when landmark is static.
min_cutoff: 0.1
# Beta 40.0 in combintation with min_cutoff 0.1 results into ~0.8
# alpha in landmark EMA filter when landmark is moving fast.
beta: 40.0
min_cutoff: 0.05
# Beta 80.0 in combintation with min_cutoff 0.05 results into
# ~0.94 alpha in landmark EMA filter when landmark is moving fast.
beta: 80.0
# Derivative cutoff 1.0 results into ~0.17 alpha in landmark
# velocity EMA filter.
derivate_cutoff: 1.0
@ -119,6 +143,7 @@ node {
calculator: "LandmarksSmoothingCalculator"
input_stream: "NORM_LANDMARKS:filtered_aux_visibility"
input_stream: "IMAGE_SIZE:image_size"
input_stream: "OBJECT_SCALE_ROI:roi"
output_stream: "NORM_FILTERED_LANDMARKS:filtered_aux_landmarks"
options: {
[mediapipe.LandmarksSmoothingCalculatorOptions.ext] {
@ -130,9 +155,9 @@ node {
# Min cutoff 0.01 results into ~0.002 alpha in landmark EMA
# filter when landmark is static.
min_cutoff: 0.01
# Beta 1.0 in combintation with min_cutoff 0.01 results into ~0.2
# Beta 10.0 in combintation with min_cutoff 0.01 results into ~0.68
# alpha in landmark EMA filter when landmark is moving fast.
beta: 1.0
beta: 10.0
# Derivative cutoff 1.0 results into ~0.17 alpha in landmark
# velocity EMA filter.
derivate_cutoff: 1.0

View File

@ -0,0 +1,73 @@
# Copyright 2021 The MediaPipe Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
load(
"//mediapipe/framework/tool:mediapipe_graph.bzl",
"mediapipe_simple_subgraph",
)
licenses(["notice"])
package(default_visibility = ["//visibility:public"])
mediapipe_simple_subgraph(
name = "selfie_segmentation_model_loader",
graph = "selfie_segmentation_model_loader.pbtxt",
register_as = "SelfieSegmentationModelLoader",
deps = [
"//mediapipe/calculators/core:constant_side_packet_calculator",
"//mediapipe/calculators/tflite:tflite_model_calculator",
"//mediapipe/calculators/util:local_file_contents_calculator",
"//mediapipe/framework/tool:switch_container",
],
)
mediapipe_simple_subgraph(
name = "selfie_segmentation_cpu",
graph = "selfie_segmentation_cpu.pbtxt",
register_as = "SelfieSegmentationCpu",
deps = [
":selfie_segmentation_model_loader",
"//mediapipe/calculators/image:image_properties_calculator",
"//mediapipe/calculators/tensor:image_to_tensor_calculator",
"//mediapipe/calculators/tensor:inference_calculator",
"//mediapipe/calculators/tensor:tensors_to_segmentation_calculator",
"//mediapipe/calculators/tflite:tflite_custom_op_resolver_calculator",
"//mediapipe/calculators/util:from_image_calculator",
"//mediapipe/framework/tool:switch_container",
],
)
mediapipe_simple_subgraph(
name = "selfie_segmentation_gpu",
graph = "selfie_segmentation_gpu.pbtxt",
register_as = "SelfieSegmentationGpu",
deps = [
":selfie_segmentation_model_loader",
"//mediapipe/calculators/image:image_properties_calculator",
"//mediapipe/calculators/tensor:image_to_tensor_calculator",
"//mediapipe/calculators/tensor:inference_calculator",
"//mediapipe/calculators/tensor:tensors_to_segmentation_calculator",
"//mediapipe/calculators/tflite:tflite_custom_op_resolver_calculator",
"//mediapipe/calculators/util:from_image_calculator",
"//mediapipe/framework/tool:switch_container",
],
)
exports_files(
srcs = [
"selfie_segmentation.tflite",
"selfie_segmentation_landscape.tflite",
],
)

View File

@ -0,0 +1,6 @@
# selfie_segmentation
Subgraphs|Details
:--- | :---
[`SelfieSegmentationCpu`](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_cpu.pbtxt)| Segments the person from background in a selfie image. (CPU input, and inference is executed on CPU.)
[`SelfieSegmentationGpu`](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt)| Segments the person from background in a selfie image. (GPU input, and inference is executed on GPU.)

View File

@ -0,0 +1,131 @@
# MediaPipe graph to perform selfie segmentation. (CPU input, and all processing
# and inference are also performed on CPU)
#
# It is required that "selfie_segmentation.tflite" or
# "selfie_segmentation_landscape.tflite" is available at
# "mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite"
# or
# "mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite"
# path respectively during execution, depending on the specification in the
# MODEL_SELECTION input side packet.
#
# EXAMPLE:
# node {
# calculator: "SelfieSegmentationCpu"
# input_side_packet: "MODEL_SELECTION:model_selection"
# input_stream: "IMAGE:image"
# output_stream: "SEGMENTATION_MASK:segmentation_mask"
# }
type: "SelfieSegmentationCpu"
# CPU image. (ImageFrame)
input_stream: "IMAGE:image"
# An integer 0 or 1. Use 0 to select a general-purpose model (operating on a
# 256x256 tensor), and 1 to select a model (operating on a 256x144 tensor) more
# optimized for landscape images. If unspecified, functions as set to 0. (int)
input_side_packet: "MODEL_SELECTION:model_selection"
# Segmentation mask. (ImageFrame in ImageFormat::VEC32F1)
output_stream: "SEGMENTATION_MASK:segmentation_mask"
# Resizes the input image into a tensor with a dimension desired by the model.
node {
calculator: "SwitchContainer"
input_side_packet: "SELECT:model_selection"
input_stream: "IMAGE:image"
output_stream: "TENSORS:input_tensors"
options: {
[mediapipe.SwitchContainerOptions.ext] {
select: 0
contained_node: {
calculator: "ImageToTensorCalculator"
options: {
[mediapipe.ImageToTensorCalculatorOptions.ext] {
output_tensor_width: 256
output_tensor_height: 256
keep_aspect_ratio: false
output_tensor_float_range {
min: 0.0
max: 1.0
}
border_mode: BORDER_ZERO
}
}
}
contained_node: {
calculator: "ImageToTensorCalculator"
options: {
[mediapipe.ImageToTensorCalculatorOptions.ext] {
output_tensor_width: 256
output_tensor_height: 144
keep_aspect_ratio: false
output_tensor_float_range {
min: 0.0
max: 1.0
}
border_mode: BORDER_ZERO
}
}
}
}
}
}
# Generates a single side packet containing a TensorFlow Lite op resolver that
# supports custom ops needed by the model used in this graph.
node {
calculator: "TfLiteCustomOpResolverCalculator"
output_side_packet: "op_resolver"
}
# Loads the selfie segmentation TF Lite model.
node {
calculator: "SelfieSegmentationModelLoader"
input_side_packet: "MODEL_SELECTION:model_selection"
output_side_packet: "MODEL:model"
}
# Runs model inference on CPU.
node {
calculator: "InferenceCalculator"
input_stream: "TENSORS:input_tensors"
output_stream: "TENSORS:output_tensors"
input_side_packet: "MODEL:model"
input_side_packet: "CUSTOM_OP_RESOLVER:op_resolver"
options: {
[mediapipe.InferenceCalculatorOptions.ext] {
delegate { xnnpack {} }
}
#
}
}
# Retrieves the size of the input image.
node {
calculator: "ImagePropertiesCalculator"
input_stream: "IMAGE_CPU:image"
output_stream: "SIZE:input_size"
}
# Processes the output tensors into a segmentation mask that has the same size
# as the input image into the graph.
node {
calculator: "TensorsToSegmentationCalculator"
input_stream: "TENSORS:output_tensors"
input_stream: "OUTPUT_SIZE:input_size"
output_stream: "MASK:mask_image"
options: {
[mediapipe.TensorsToSegmentationCalculatorOptions.ext] {
activation: NONE
}
}
}
# Converts the incoming Image into the corresponding ImageFrame type.
node: {
calculator: "FromImageCalculator"
input_stream: "IMAGE:mask_image"
output_stream: "IMAGE_CPU:segmentation_mask"
}

View File

@ -0,0 +1,133 @@
# MediaPipe graph to perform selfie segmentation. (GPU input, and all processing
# and inference are also performed on GPU)
#
# It is required that "selfie_segmentation.tflite" or
# "selfie_segmentation_landscape.tflite" is available at
# "mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite"
# or
# "mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite"
# path respectively during execution, depending on the specification in the
# MODEL_SELECTION input side packet.
#
# EXAMPLE:
# node {
# calculator: "SelfieSegmentationGpu"
# input_side_packet: "MODEL_SELECTION:model_selection"
# input_stream: "IMAGE:image"
# output_stream: "SEGMENTATION_MASK:segmentation_mask"
# }
type: "SelfieSegmentationGpu"
# GPU image. (GpuBuffer)
input_stream: "IMAGE:image"
# An integer 0 or 1. Use 0 to select a general-purpose model (operating on a
# 256x256 tensor), and 1 to select a model (operating on a 256x144 tensor) more
# optimized for landscape images. If unspecified, functions as set to 0. (int)
input_side_packet: "MODEL_SELECTION:model_selection"
# Segmentation mask. (GpuBuffer in RGBA, with the same mask values in R and A)
output_stream: "SEGMENTATION_MASK:segmentation_mask"
# Resizes the input image into a tensor with a dimension desired by the model.
node {
calculator: "SwitchContainer"
input_side_packet: "SELECT:model_selection"
input_stream: "IMAGE_GPU:image"
output_stream: "TENSORS:input_tensors"
options: {
[mediapipe.SwitchContainerOptions.ext] {
select: 0
contained_node: {
calculator: "ImageToTensorCalculator"
options: {
[mediapipe.ImageToTensorCalculatorOptions.ext] {
output_tensor_width: 256
output_tensor_height: 256
keep_aspect_ratio: false
output_tensor_float_range {
min: 0.0
max: 1.0
}
border_mode: BORDER_ZERO
gpu_origin: TOP_LEFT
}
}
}
contained_node: {
calculator: "ImageToTensorCalculator"
options: {
[mediapipe.ImageToTensorCalculatorOptions.ext] {
output_tensor_width: 256
output_tensor_height: 144
keep_aspect_ratio: false
output_tensor_float_range {
min: 0.0
max: 1.0
}
border_mode: BORDER_ZERO
gpu_origin: TOP_LEFT
}
}
}
}
}
}
# Generates a single side packet containing a TensorFlow Lite op resolver that
# supports custom ops needed by the model used in this graph.
node {
calculator: "TfLiteCustomOpResolverCalculator"
output_side_packet: "op_resolver"
options: {
[mediapipe.TfLiteCustomOpResolverCalculatorOptions.ext] {
use_gpu: true
}
}
}
# Loads the selfie segmentation TF Lite model.
node {
calculator: "SelfieSegmentationModelLoader"
input_side_packet: "MODEL_SELECTION:model_selection"
output_side_packet: "MODEL:model"
}
# Runs model inference on GPU.
node {
calculator: "InferenceCalculator"
input_stream: "TENSORS:input_tensors"
output_stream: "TENSORS:output_tensors"
input_side_packet: "MODEL:model"
input_side_packet: "CUSTOM_OP_RESOLVER:op_resolver"
}
# Retrieves the size of the input image.
node {
calculator: "ImagePropertiesCalculator"
input_stream: "IMAGE_GPU:image"
output_stream: "SIZE:input_size"
}
# Processes the output tensors into a segmentation mask that has the same size
# as the input image into the graph.
node {
calculator: "TensorsToSegmentationCalculator"
input_stream: "TENSORS:output_tensors"
input_stream: "OUTPUT_SIZE:input_size"
output_stream: "MASK:mask_image"
options: {
[mediapipe.TensorsToSegmentationCalculatorOptions.ext] {
activation: NONE
gpu_origin: TOP_LEFT
}
}
}
# Converts the incoming Image into the corresponding GpuBuffer type.
node: {
calculator: "FromImageCalculator"
input_stream: "IMAGE:mask_image"
output_stream: "IMAGE_GPU:segmentation_mask"
}

View File

@ -0,0 +1,63 @@
# MediaPipe graph to load a selected selfie segmentation TF Lite model.
type: "SelfieSegmentationModelLoader"
# An integer 0 or 1. Use 0 to select a general-purpose model (operating on a
# 256x256 tensor), and 1 to select a model (operating on a 256x144 tensor) more
# optimized for landscape images. If unspecified, functions as set to 0. (int)
input_side_packet: "MODEL_SELECTION:model_selection"
# TF Lite model represented as a FlatBuffer.
# (std::unique_ptr<tflite::FlatBufferModel, std::function<void(tflite::FlatBufferModel*)>>)
output_side_packet: "MODEL:model"
# Determines path to the desired pose landmark model file.
node {
calculator: "SwitchContainer"
input_side_packet: "SELECT:model_selection"
output_side_packet: "PACKET:model_path"
options: {
[mediapipe.SwitchContainerOptions.ext] {
select: 0
contained_node: {
calculator: "ConstantSidePacketCalculator"
options: {
[mediapipe.ConstantSidePacketCalculatorOptions.ext]: {
packet {
string_value: "mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite"
}
}
}
}
contained_node: {
calculator: "ConstantSidePacketCalculator"
options: {
[mediapipe.ConstantSidePacketCalculatorOptions.ext]: {
packet {
string_value: "mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite"
}
}
}
}
}
}
}
# Loads the file in the specified path into a blob.
node {
calculator: "LocalFileContentsCalculator"
input_side_packet: "FILE_PATH:model_path"
output_side_packet: "CONTENTS:model_blob"
options: {
[mediapipe.LocalFileContentsCalculatorOptions.ext]: {
text_mode: false
}
}
}
# Converts the input blob into a TF Lite model.
node {
calculator: "TfLiteModelCalculator"
input_side_packet: "MODEL_BLOB:model_blob"
output_side_packet: "MODEL:model"
}

View File

@ -0,0 +1,26 @@
<em>Please make sure that this is a bug and also refer to the [troubleshooting](https://google.github.io/mediapipe/getting_started/troubleshooting.html), FAQ documentation before raising any issues.</em>
**System information** (Please provide as much relevant information as possible)
- Have I written custom code (as opposed to using a stock example script provided in MediaPipe):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4):
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- Browser and version (e.g. Google Chrome, Safari) if the issue happens on browser:
- Programming Language and version ( e.g. C++, Python, Java):
- [MediaPipe version](https://github.com/google/mediapipe/releases):
- Bazel version (if compiling from source):
- Solution ( e.g. FaceMesh, Pose, Holistic ):
- Android Studio, NDK, SDK versions (if issue is related to building in Android environment):
- Xcode & Tulsi version (if issue is related to building for iOS):
**Describe the current behavior:**
**Describe the expected behavior:**
**Standalone code to reproduce the issue:**
Provide a reproducible test case that is the bare minimum necessary to replicate the problem. If possible, please share a link to Colab/repo link /any notebook:
**Other info / Complete Logs :**
Include any logs or source code that would be helpful to
diagnose the problem. If including tracebacks, please include the full
traceback. Large logs and files should be attached

View File

@ -0,0 +1,18 @@
<em>Please make sure that this is a feature request.</em>
**System information** (Please provide as much relevant information as possible)
- MediaPipe Solution (you are using):
- Programming language : C++/typescript/Python/Objective C/Android Java
- Are you willing to contribute it (Yes/No):
**Describe the feature and the current behavior/state:**
**Will this change the current api? How?**
**Who will benefit with this feature?**
**Please specify the use cases for this feature:**
**Any Other info:**

Some files were not shown because too many files have changed in this diff Show More