diff --git a/.github/ISSUE_TEMPLATE/00-build-installation-issue.md b/.github/ISSUE_TEMPLATE/00-build-installation-issue.md new file mode 100644 index 000000000..f027c5c85 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/00-build-installation-issue.md @@ -0,0 +1,21 @@ +Please make sure that this is a build/installation issue and also refer to the [troubleshooting](https://google.github.io/mediapipe/getting_started/troubleshooting.html) documentation before raising any issues. + +**System information** (Please provide as much relevant information as possible) +- OS Platform and Distribution (e.g. Linux Ubuntu 16.04, Android 11, iOS 14.4): +- Compiler version (e.g. gcc/g++ 8 /Apple clang version 12.0.0): +- Programming Language and version ( e.g. C++ 14, Python 3.6, Java ): +- Installed using virtualenv? pip? Conda? (if python): +- [MediaPipe version](https://github.com/google/mediapipe/releases): +- Bazel version: +- XCode and Tulsi versions (if iOS): +- Android SDK and NDK versions (if android): +- Android [AAR](https://google.github.io/mediapipe/getting_started/android_archive_library.html) ( if android): +- OpenCV version (if running on desktop): + +**Describe the problem**: + + +**[Provide the exact sequence of commands / steps that you executed before running into the problem](https://google.github.io/mediapipe/getting_started/getting_started.html):** + +**Complete Logs:** +Include Complete Log information or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached: diff --git a/.github/ISSUE_TEMPLATE/10-solution-issue.md b/.github/ISSUE_TEMPLATE/10-solution-issue.md new file mode 100644 index 000000000..49f569c89 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/10-solution-issue.md @@ -0,0 +1,20 @@ +Please make sure that this is a [solution](https://google.github.io/mediapipe/solutions/solutions.html) issue. + +**System information** (Please provide as much relevant information as possible) +- Have I written custom code (as opposed to using a stock example script provided in Mediapipe): +- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): +- [MediaPipe version](https://github.com/google/mediapipe/releases): +- Bazel version: +- Solution (e.g. FaceMesh, Pose, Holistic): +- Programming Language and version ( e.g. C++, Python, Java): + +**Describe the expected behavior:** + +**Standalone code you may have used to try to get what you need :** + +If there is a problem, provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/repo link /any notebook: + +**Other info / Complete Logs :** +Include any logs or source code that would be helpful to +diagnose the problem. If including tracebacks, please include the full +traceback. Large logs and files should be attached: diff --git a/.github/ISSUE_TEMPLATE/20-documentation-issue.md b/.github/ISSUE_TEMPLATE/20-documentation-issue.md new file mode 100644 index 000000000..2d1b460f9 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/20-documentation-issue.md @@ -0,0 +1,45 @@ +Thank you for submitting a MediaPipe documentation issue. +The MediaPipe docs are open source! To get involved, read the documentation Contributor Guide +## URL(s) with the issue: + +Please provide a link to the documentation entry, for example: https://github.com/google/mediapipe/blob/master/docs/solutions/face_mesh.md#models + +## Description of issue (what needs changing): + +Kinds of documentation problems: + +### Clear description + +For example, why should someone use this method? How is it useful? + +### Correct links + +Is the link to the source code correct? + +### Parameters defined +Are all parameters defined and formatted correctly? + +### Returns defined + +Are return values defined? + +### Raises listed and defined + +Are the errors defined? For example, + +### Usage example + +Is there a usage example? + +See the API guide: +on how to write testable usage examples. + +### Request visuals, if applicable + +Are there currently visuals? If not, will it clarify the content? + +### Submit a pull request? + +Are you planning to also submit a pull request to fix the issue? See the docs +https://github.com/google/mediapipe/blob/master/CONTRIBUTING.md + diff --git a/.github/bot_config.yml b/.github/bot_config.yml new file mode 100644 index 000000000..b1b2d98ea --- /dev/null +++ b/.github/bot_config.yml @@ -0,0 +1,18 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ + +# A list of assignees +assignees: + - sgowroji diff --git a/.github/stale.yml b/.github/stale.yml new file mode 100644 index 000000000..03c67d0f6 --- /dev/null +++ b/.github/stale.yml @@ -0,0 +1,34 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================ +# +# This file was assembled from multiple pieces, whose use is documented +# throughout. Please refer to the TensorFlow dockerfiles documentation +# for more information. + +# Number of days of inactivity before an Issue or Pull Request becomes stale +daysUntilStale: 7 +# Number of days of inactivity before a stale Issue or Pull Request is closed +daysUntilClose: 7 +# Only issues or pull requests with all of these labels are checked if stale. Defaults to `[]` (disabled) +onlyLabels: + - stat:awaiting response +# Comment to post when marking as stale. Set to `false` to disable +markComment: > + This issue has been automatically marked as stale because it has not had + recent activity. It will be closed if no further activity occurs. Thank you. +# Comment to post when removing the stale label. Set to `false` to disable +unmarkComment: false +closeComment: > + Closing as stale. Please reopen if you'd like to work on this further. diff --git a/README.md b/README.md index 8c75978a4..ed2fa0772 100644 --- a/README.md +++ b/README.md @@ -40,11 +40,12 @@ Hair Segmentation [Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ | [Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ | [Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ | +[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ | [Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | | [Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅ [Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | | [Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | | -[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | | +[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | | [KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | | [AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | | [MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | | diff --git a/WORKSPACE b/WORKSPACE index e797410a7..4b33425ea 100644 --- a/WORKSPACE +++ b/WORKSPACE @@ -71,8 +71,8 @@ http_archive( # Google Benchmark library. http_archive( name = "com_google_benchmark", - urls = ["https://github.com/google/benchmark/archive/master.zip"], - strip_prefix = "benchmark-master", + urls = ["https://github.com/google/benchmark/archive/main.zip"], + strip_prefix = "benchmark-main", build_file = "@//third_party:benchmark.BUILD", ) @@ -369,9 +369,9 @@ http_archive( ) # Tensorflow repo should always go after the other external dependencies. -# 2021-04-30 -_TENSORFLOW_GIT_COMMIT = "5bd3c57ef184543d22e34e36cff9d9bea608e06d" -_TENSORFLOW_SHA256= "9a45862834221aafacf6fb275f92b3876bc89443cbecc51be93f13839a6609f0" +# 2021-05-27 +_TENSORFLOW_GIT_COMMIT = "d6bfcdb0926173dbb7aa02ceba5aae6250b8aaa6" +_TENSORFLOW_SHA256 = "ec40e1462239d8783d02f76a43412c8f80bac71ea20e41e1b7729b990aad6923" http_archive( name = "org_tensorflow", urls = [ diff --git a/build_desktop_examples.sh b/build_desktop_examples.sh index a35556cf0..7ff8db29c 100644 --- a/build_desktop_examples.sh +++ b/build_desktop_examples.sh @@ -97,6 +97,7 @@ for app in ${apps}; do if [[ ${target_name} == "holistic_tracking" || ${target_name} == "iris_tracking" || ${target_name} == "pose_tracking" || + ${target_name} == "selfie_segmentation" || ${target_name} == "upper_body_pose_tracking" ]]; then graph_suffix="cpu" else diff --git a/docs/framework_concepts/calculators.md b/docs/framework_concepts/calculators.md index 98bf1def4..634fbab6a 100644 --- a/docs/framework_concepts/calculators.md +++ b/docs/framework_concepts/calculators.md @@ -248,12 +248,58 @@ absl::Status MyCalculator::Process() { } ``` +## Calculator options + +Calculators accept processing parameters through (1) input stream packets (2) +input side packets, and (3) calculator options. Calculator options, if +specified, appear as literal values in the `node_options` field of the +`CalculatorGraphConfiguration.Node` message. + +``` + node { + calculator: "TfLiteInferenceCalculator" + input_stream: "TENSORS:main_model_input" + output_stream: "TENSORS:main_model_output" + node_options: { + [type.googleapis.com/mediapipe.TfLiteInferenceCalculatorOptions] { + model_path: "mediapipe/models/active_speaker_detection/audio_visual_model.tflite" + } + } + } +``` + +The `node_options` field accepts the proto3 syntax. Alternatively, calculator +options can be specified in the `options` field using proto2 syntax. + +``` + node: { + calculator: "IntervalFilterCalculator" + node_options: { + [type.googleapis.com/mediapipe.IntervalFilterCalculatorOptions] { + intervals { + start_us: 20000 + end_us: 40000 + } + } + } + } +``` + +Not all calculators accept calcuator options. In order to accept options, a +calculator will normally define a new protobuf message type to represent its +options, such as `IntervalFilterCalculatorOptions`. The calculator will then +read that protobuf message in its `CalculatorBase::Open` method, and possibly +also in the `CalculatorBase::GetContract` function or its +`CalculatorBase::Process` method. Normally, the new protobuf message type will +be defined as a protobuf schema using a ".proto" file and a +`mediapipe_proto_library()` build rule. + ## Example calculator This section discusses the implementation of `PacketClonerCalculator`, which does a relatively simple job, and is used in many calculator graphs. -`PacketClonerCalculator` simply produces a copy of its most recent input -packets on demand. +`PacketClonerCalculator` simply produces a copy of its most recent input packets +on demand. `PacketClonerCalculator` is useful when the timestamps of arriving data packets are not aligned perfectly. Suppose we have a room with a microphone, light @@ -279,8 +325,8 @@ input streams: imageframe of video data representing video collected from camera in the room with timestamp. -Below is the implementation of the `PacketClonerCalculator`. You can see -the `GetContract()`, `Open()`, and `Process()` methods as well as the instance +Below is the implementation of the `PacketClonerCalculator`. You can see the +`GetContract()`, `Open()`, and `Process()` methods as well as the instance variable `current_` which holds the most recent input packets. ```c++ @@ -401,6 +447,6 @@ node { The diagram below shows how the `PacketClonerCalculator` defines its output packets (bottom) based on its series of input packets (top). -| ![Graph using PacketClonerCalculator](../images/packet_cloner_calculator.png) | -| :---------------------------------------------------------------------------: | -| *Each time it receives a packet on its TICK input stream, the PacketClonerCalculator outputs the most recent packet from each of its input streams. The sequence of output packets (bottom) is determined by the sequence of input packets (top) and their timestamps. The timestamps are shown along the right side of the diagram.* | +![Graph using PacketClonerCalculator](../images/packet_cloner_calculator.png) | +:--------------------------------------------------------------------------: | +*Each time it receives a packet on its TICK input stream, the PacketClonerCalculator outputs the most recent packet from each of its input streams. The sequence of output packets (bottom) is determined by the sequence of input packets (top) and their timestamps. The timestamps are shown along the right side of the diagram.* | diff --git a/docs/framework_concepts/framework_concepts.md b/docs/framework_concepts/framework_concepts.md index dcf446a9d..dd43d830c 100644 --- a/docs/framework_concepts/framework_concepts.md +++ b/docs/framework_concepts/framework_concepts.md @@ -111,11 +111,11 @@ component known as an InputStreamHandler. See [Synchronization](synchronization.md) for more details. -### Realtime data streams +### Real-time streams MediaPipe calculator graphs are often used to process streams of video or audio frames for interactive applications. Normally, each Calculator runs as soon as all of its input packets for a given timestamp become available. Calculators -used in realtime graphs need to define output timestamp bounds based on input +used in real-time graphs need to define output timestamp bounds based on input timestamp bounds in order to allow downstream calculators to be scheduled -promptly. See [Realtime data streams](realtime.md) for details. +promptly. See [Real-time Streams](realtime_streams.md) for details. diff --git a/docs/framework_concepts/realtime.md b/docs/framework_concepts/realtime_streams.md similarity index 91% rename from docs/framework_concepts/realtime.md rename to docs/framework_concepts/realtime_streams.md index 36b606825..038081453 100644 --- a/docs/framework_concepts/realtime.md +++ b/docs/framework_concepts/realtime_streams.md @@ -1,29 +1,28 @@ --- layout: default -title: Processing real-time data streams +title: Real-time Streams +parent: Framework Concepts nav_order: 6 -has_children: true -has_toc: false --- -# Processing real-time data streams +# Real-time Streams {: .no_toc } 1. TOC {:toc} --- -## Realtime timestamps +## Real-time timestamps MediaPipe calculator graphs are often used to process streams of video or audio frames for interactive applications. The MediaPipe framework requires only that successive packets be assigned monotonically increasing timestamps. By -convention, realtime calculators and graphs use the recording time or the +convention, real-time calculators and graphs use the recording time or the presentation time of each frame as its timestamp, with each timestamp indicating the microseconds since `Jan/1/1970:00:00:00`. This allows packets from various sources to be processed in a globally consistent sequence. -## Realtime scheduling +## Real-time scheduling Normally, each Calculator runs as soon as all of its input packets for a given timestamp become available. Normally, this happens when the calculator has @@ -38,7 +37,7 @@ When a calculator does not produce any output packets for a given timestamp, it can instead output a "timestamp bound" indicating that no packet will be produced for that timestamp. This indication is necessary to allow downstream calculators to run at that timestamp, even though no packet has arrived for -certain streams for that timestamp. This is especially important for realtime +certain streams for that timestamp. This is especially important for real-time graphs in interactive applications, where it is crucial that each calculator begin processing as soon as possible. @@ -83,12 +82,12 @@ For example, `Timestamp(1).NextAllowedInStream() == Timestamp(2)`. ## Propagating timestamp bounds -Calculators that will be used in realtime graphs need to define output timestamp -bounds based on input timestamp bounds in order to allow downstream calculators -to be scheduled promptly. A common pattern is for calculators to output packets -with the same timestamps as their input packets. In this case, simply outputting -a packet on every call to `Calculator::Process` is sufficient to define output -timestamp bounds. +Calculators that will be used in real-time graphs need to define output +timestamp bounds based on input timestamp bounds in order to allow downstream +calculators to be scheduled promptly. A common pattern is for calculators to +output packets with the same timestamps as their input packets. In this case, +simply outputting a packet on every call to `Calculator::Process` is sufficient +to define output timestamp bounds. However, calculators are not required to follow this common pattern for output timestamps, they are only required to choose monotonically increasing output diff --git a/docs/getting_started/javascript.md b/docs/getting_started/javascript.md index 0c49e1dd4..98a4f19bc 100644 --- a/docs/getting_started/javascript.md +++ b/docs/getting_started/javascript.md @@ -16,13 +16,14 @@ nav_order: 4 MediaPipe currently offers the following solutions: -Solution | NPM Package | Example ------------------ | ----------------------------- | ------- -[Face Mesh][F-pg] | [@mediapipe/face_mesh][F-npm] | [mediapipe.dev/demo/face_mesh][F-demo] -[Face Detection][Fd-pg] | [@mediapipe/face_detection][Fd-npm] | [mediapipe.dev/demo/face_detection][Fd-demo] -[Hands][H-pg] | [@mediapipe/hands][H-npm] | [mediapipe.dev/demo/hands][H-demo] -[Holistic][Ho-pg] | [@mediapipe/holistic][Ho-npm] | [mediapipe.dev/demo/holistic][Ho-demo] -[Pose][P-pg] | [@mediapipe/pose][P-npm] | [mediapipe.dev/demo/pose][P-demo] +Solution | NPM Package | Example +--------------------------- | --------------------------------------- | ------- +[Face Mesh][F-pg] | [@mediapipe/face_mesh][F-npm] | [mediapipe.dev/demo/face_mesh][F-demo] +[Face Detection][Fd-pg] | [@mediapipe/face_detection][Fd-npm] | [mediapipe.dev/demo/face_detection][Fd-demo] +[Hands][H-pg] | [@mediapipe/hands][H-npm] | [mediapipe.dev/demo/hands][H-demo] +[Holistic][Ho-pg] | [@mediapipe/holistic][Ho-npm] | [mediapipe.dev/demo/holistic][Ho-demo] +[Pose][P-pg] | [@mediapipe/pose][P-npm] | [mediapipe.dev/demo/pose][P-demo] +[Selfie Segmentation][S-pg] | [@mediapipe/selfie_segmentation][S-npm] | [mediapipe.dev/demo/selfie_segmentation][S-demo] Click on a solution link above for more information, including API and code snippets. @@ -67,11 +68,13 @@ affecting your work, restrict your request to a `` number. e.g., [Fd-pg]: ../solutions/face_detection#javascript-solution-api [H-pg]: ../solutions/hands#javascript-solution-api [P-pg]: ../solutions/pose#javascript-solution-api +[S-pg]: ../solutions/selfie_segmentation#javascript-solution-api [Ho-npm]: https://www.npmjs.com/package/@mediapipe/holistic [F-npm]: https://www.npmjs.com/package/@mediapipe/face_mesh [Fd-npm]: https://www.npmjs.com/package/@mediapipe/face_detection [H-npm]: https://www.npmjs.com/package/@mediapipe/hands [P-npm]: https://www.npmjs.com/package/@mediapipe/pose +[S-npm]: https://www.npmjs.com/package/@mediapipe/selfie_segmentation [draw-npm]: https://www.npmjs.com/package/@mediapipe/drawing_utils [cam-npm]: https://www.npmjs.com/package/@mediapipe/camera_utils [ctrl-npm]: https://www.npmjs.com/package/@mediapipe/control_utils @@ -80,15 +83,18 @@ affecting your work, restrict your request to a `` number. e.g., [Fd-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/face_detection [H-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/hands [P-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/pose +[P-jsd]: https://www.jsdelivr.com/package/npm/@mediapipe/selfie_segmentation [Ho-pen]: https://code.mediapipe.dev/codepen/holistic [F-pen]: https://code.mediapipe.dev/codepen/face_mesh [Fd-pen]: https://code.mediapipe.dev/codepen/face_detection [H-pen]: https://code.mediapipe.dev/codepen/hands [P-pen]: https://code.mediapipe.dev/codepen/pose +[S-pen]: https://code.mediapipe.dev/codepen/selfie_segmentation [Ho-demo]: https://mediapipe.dev/demo/holistic [F-demo]: https://mediapipe.dev/demo/face_mesh [Fd-demo]: https://mediapipe.dev/demo/face_detection [H-demo]: https://mediapipe.dev/demo/hands [P-demo]: https://mediapipe.dev/demo/pose +[S-demo]: https://mediapipe.dev/demo/selfie_segmentation [npm]: https://www.npmjs.com/package/@mediapipe [codepen]: https://code.mediapipe.dev/codepen diff --git a/docs/getting_started/python.md b/docs/getting_started/python.md index d59f35bbf..83550be84 100644 --- a/docs/getting_started/python.md +++ b/docs/getting_started/python.md @@ -51,6 +51,7 @@ details in each solution via the links below: * [MediaPipe Holistic](../solutions/holistic#python-solution-api) * [MediaPipe Objectron](../solutions/objectron#python-solution-api) * [MediaPipe Pose](../solutions/pose#python-solution-api) +* [MediaPipe Selfie Segmentation](../solutions/selfie_segmentation#python-solution-api) ## MediaPipe on Google Colab @@ -62,6 +63,7 @@ details in each solution via the links below: * [MediaPipe Pose Colab](https://mediapipe.page.link/pose_py_colab) * [MediaPipe Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic) * [MediaPipe Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended) +* [MediaPipe Selfie Segmentation Colab](https://mediapipe.page.link/selfie_segmentation_py_colab) ## MediaPipe Python Framework diff --git a/docs/images/selfie_segmentation_web.mp4 b/docs/images/selfie_segmentation_web.mp4 new file mode 100644 index 000000000..d9e62838e Binary files /dev/null and b/docs/images/selfie_segmentation_web.mp4 differ diff --git a/docs/index.md b/docs/index.md index 9035bf106..cc624862b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -40,11 +40,12 @@ Hair Segmentation [Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ | [Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ | [Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ | +[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ | [Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | | [Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅ [Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | | [Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | | -[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | | +[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | | [KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | | [AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | | [MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | | diff --git a/docs/solutions/autoflip.md b/docs/solutions/autoflip.md index 0e118cc55..676abcae8 100644 --- a/docs/solutions/autoflip.md +++ b/docs/solutions/autoflip.md @@ -2,7 +2,7 @@ layout: default title: AutoFlip (Saliency-aware Video Cropping) parent: Solutions -nav_order: 13 +nav_order: 14 --- # AutoFlip: Saliency-aware Video Cropping diff --git a/docs/solutions/box_tracking.md b/docs/solutions/box_tracking.md index 0e7550e7f..b84a015d1 100644 --- a/docs/solutions/box_tracking.md +++ b/docs/solutions/box_tracking.md @@ -2,7 +2,7 @@ layout: default title: Box Tracking parent: Solutions -nav_order: 9 +nav_order: 10 --- # MediaPipe Box Tracking diff --git a/docs/solutions/face_detection.md b/docs/solutions/face_detection.md index 8d5de36eb..e866a8cc3 100644 --- a/docs/solutions/face_detection.md +++ b/docs/solutions/face_detection.md @@ -68,7 +68,7 @@ normalized to `[0.0, 1.0]` by the image width and height respectively. Please first follow general [instructions](../getting_started/python.md) to install MediaPipe Python package, then learn more in the companion -[Python Colab](#resources) and the following usage example. +[Python Colab](#resources) and the usage example below. Supported configuration options: @@ -81,9 +81,10 @@ mp_face_detection = mp.solutions.face_detection mp_drawing = mp.solutions.drawing_utils # For static images: +IMAGE_FILES = [] with mp_face_detection.FaceDetection( min_detection_confidence=0.5) as face_detection: - for idx, file in enumerate(file_list): + for idx, file in enumerate(IMAGE_FILES): image = cv2.imread(file) # Convert the BGR image to RGB and process it with MediaPipe Face Detection. results = face_detection.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) diff --git a/docs/solutions/face_mesh.md b/docs/solutions/face_mesh.md index 0c620120c..263d9c3ee 100644 --- a/docs/solutions/face_mesh.md +++ b/docs/solutions/face_mesh.md @@ -265,7 +265,7 @@ magnitude of `z` uses roughly the same scale as `x`. Please first follow general [instructions](../getting_started/python.md) to install MediaPipe Python package, then learn more in the companion -[Python Colab](#resources) and the following usage example. +[Python Colab](#resources) and the usage example below. Supported configuration options: @@ -281,12 +281,13 @@ mp_drawing = mp.solutions.drawing_utils mp_face_mesh = mp.solutions.face_mesh # For static images: +IMAGE_FILES = [] drawing_spec = mp_drawing.DrawingSpec(thickness=1, circle_radius=1) with mp_face_mesh.FaceMesh( static_image_mode=True, max_num_faces=1, min_detection_confidence=0.5) as face_mesh: - for idx, file in enumerate(file_list): + for idx, file in enumerate(IMAGE_FILES): image = cv2.imread(file) # Convert the BGR image to RGB before processing. results = face_mesh.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) diff --git a/docs/solutions/hair_segmentation.md b/docs/solutions/hair_segmentation.md index 5e2e4a7c5..6722c6c54 100644 --- a/docs/solutions/hair_segmentation.md +++ b/docs/solutions/hair_segmentation.md @@ -2,7 +2,7 @@ layout: default title: Hair Segmentation parent: Solutions -nav_order: 7 +nav_order: 8 --- # MediaPipe Hair Segmentation diff --git a/docs/solutions/hands.md b/docs/solutions/hands.md index ac10124f2..9dd2898ba 100644 --- a/docs/solutions/hands.md +++ b/docs/solutions/hands.md @@ -206,7 +206,7 @@ is not the case, please swap the handedness output in the application. Please first follow general [instructions](../getting_started/python.md) to install MediaPipe Python package, then learn more in the companion -[Python Colab](#resources) and the following usage example. +[Python Colab](#resources) and the usage example below. Supported configuration options: @@ -222,11 +222,12 @@ mp_drawing = mp.solutions.drawing_utils mp_hands = mp.solutions.hands # For static images: +IMAGE_FILES = [] with mp_hands.Hands( static_image_mode=True, max_num_hands=2, min_detection_confidence=0.5) as hands: - for idx, file in enumerate(file_list): + for idx, file in enumerate(IMAGE_FILES): # Read an image, flip it around y-axis for correct handedness output (see # above). image = cv2.flip(cv2.imread(file), 1) diff --git a/docs/solutions/holistic.md b/docs/solutions/holistic.md index 7c02c8d75..14c13bd2a 100644 --- a/docs/solutions/holistic.md +++ b/docs/solutions/holistic.md @@ -201,7 +201,7 @@ A list of 21 hand landmarks on the right hand, in the same representation as Please first follow general [instructions](../getting_started/python.md) to install MediaPipe Python package, then learn more in the companion -[Python Colab](#resources) and the following usage example. +[Python Colab](#resources) and the usage example below. Supported configuration options: @@ -218,10 +218,11 @@ mp_drawing = mp.solutions.drawing_utils mp_holistic = mp.solutions.holistic # For static images: +IMAGE_FILES = [] with mp_holistic.Holistic( static_image_mode=True, model_complexity=2) as holistic: - for idx, file in enumerate(file_list): + for idx, file in enumerate(IMAGE_FILES): image = cv2.imread(file) image_height, image_width, _ = image.shape # Convert the BGR image to RGB before processing. diff --git a/docs/solutions/instant_motion_tracking.md b/docs/solutions/instant_motion_tracking.md index 36e5e83e0..9fea7ec1c 100644 --- a/docs/solutions/instant_motion_tracking.md +++ b/docs/solutions/instant_motion_tracking.md @@ -2,7 +2,7 @@ layout: default title: Instant Motion Tracking parent: Solutions -nav_order: 10 +nav_order: 11 --- # MediaPipe Instant Motion Tracking diff --git a/docs/solutions/knift.md b/docs/solutions/knift.md index 41691c418..b008f1496 100644 --- a/docs/solutions/knift.md +++ b/docs/solutions/knift.md @@ -2,7 +2,7 @@ layout: default title: KNIFT (Template-based Feature Matching) parent: Solutions -nav_order: 12 +nav_order: 13 --- # MediaPipe KNIFT diff --git a/docs/solutions/media_sequence.md b/docs/solutions/media_sequence.md index cd3b7ecef..e6bd5fd44 100644 --- a/docs/solutions/media_sequence.md +++ b/docs/solutions/media_sequence.md @@ -2,7 +2,7 @@ layout: default title: Dataset Preparation with MediaSequence parent: Solutions -nav_order: 14 +nav_order: 15 --- # Dataset Preparation with MediaSequence diff --git a/docs/solutions/models.md b/docs/solutions/models.md index e0ff4d14a..2c5e4389e 100644 --- a/docs/solutions/models.md +++ b/docs/solutions/models.md @@ -16,10 +16,15 @@ nav_order: 30 * Face detection model for front-facing/selfie camera: [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_front.tflite), - [TFLite model quantized for EdgeTPU/Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/models/face-detector-quantized_edgetpu.tflite) + [TFLite model quantized for EdgeTPU/Coral](https://github.com/google/mediapipe/tree/master/mediapipe/examples/coral/models/face-detector-quantized_edgetpu.tflite), + [Model card](https://mediapipe.page.link/blazeface-mc) * Face detection model for back-facing camera: - [TFLite model ](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back.tflite) -* [Model card](https://mediapipe.page.link/blazeface-mc) + [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back.tflite), + [Model card](https://mediapipe.page.link/blazeface-back-mc) +* Face detection model for back-facing camera (sparse): + [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/face_detection/face_detection_back_sparse.tflite), + [Model card](https://mediapipe.page.link/blazeface-back-sparse-mc) + ### [Face Mesh](https://google.github.io/mediapipe/solutions/face_mesh) @@ -60,6 +65,12 @@ nav_order: 30 * Hand recrop model: [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/modules/holistic_landmark/hand_recrop.tflite) +### [Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) + +* [TFLite model (general)](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite) +* [TFLite model (landscape)](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite) +* [Model card](https://mediapipe.page.link/selfiesegmentation-mc) + ### [Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) * [TFLite model](https://github.com/google/mediapipe/tree/master/mediapipe/models/hair_segmentation.tflite) diff --git a/docs/solutions/object_detection.md b/docs/solutions/object_detection.md index 044748537..d7cc2cec1 100644 --- a/docs/solutions/object_detection.md +++ b/docs/solutions/object_detection.md @@ -2,7 +2,7 @@ layout: default title: Object Detection parent: Solutions -nav_order: 8 +nav_order: 9 --- # MediaPipe Object Detection diff --git a/docs/solutions/objectron.md b/docs/solutions/objectron.md index 0164e23b3..20dc3cace 100644 --- a/docs/solutions/objectron.md +++ b/docs/solutions/objectron.md @@ -2,7 +2,7 @@ layout: default title: Objectron (3D Object Detection) parent: Solutions -nav_order: 11 +nav_order: 12 --- # MediaPipe Objectron @@ -277,7 +277,7 @@ following: Please first follow general [instructions](../getting_started/python.md) to install MediaPipe Python package, then learn more in the companion -[Python Colab](#resources) and the following usage example. +[Python Colab](#resources) and the usage example below. Supported configuration options: @@ -297,11 +297,12 @@ mp_drawing = mp.solutions.drawing_utils mp_objectron = mp.solutions.objectron # For static images: +IMAGE_FILES = [] with mp_objectron.Objectron(static_image_mode=True, max_num_objects=5, min_detection_confidence=0.5, model_name='Shoe') as objectron: - for idx, file in enumerate(file_list): + for idx, file in enumerate(IMAGE_FILES): image = cv2.imread(file) # Convert the BGR image to RGB and process it with MediaPipe Objectron. results = objectron.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) diff --git a/docs/solutions/pose.md b/docs/solutions/pose.md index feed2ad34..48ce218a1 100644 --- a/docs/solutions/pose.md +++ b/docs/solutions/pose.md @@ -187,7 +187,7 @@ Naming style may differ slightly across platforms/languages. #### pose_landmarks -A list of pose landmarks. Each lanmark consists of the following: +A list of pose landmarks. Each landmark consists of the following: * `x` and `y`: Landmark coordinates normalized to `[0.0, 1.0]` by the image width and height respectively. @@ -202,7 +202,7 @@ A list of pose landmarks. Each lanmark consists of the following: Please first follow general [instructions](../getting_started/python.md) to install MediaPipe Python package, then learn more in the companion -[Python Colab](#resources) and the following usage example. +[Python Colab](#resources) and the usage example below. Supported configuration options: @@ -219,11 +219,12 @@ mp_drawing = mp.solutions.drawing_utils mp_pose = mp.solutions.pose # For static images: +IMAGE_FILES = [] with mp_pose.Pose( static_image_mode=True, model_complexity=2, min_detection_confidence=0.5) as pose: - for idx, file in enumerate(file_list): + for idx, file in enumerate(IMAGE_FILES): image = cv2.imread(file) image_height, image_width, _ = image.shape # Convert the BGR image to RGB before processing. diff --git a/docs/solutions/selfie_segmentation.md b/docs/solutions/selfie_segmentation.md new file mode 100644 index 000000000..34f38a3a7 --- /dev/null +++ b/docs/solutions/selfie_segmentation.md @@ -0,0 +1,286 @@ +--- +layout: default +title: Selfie Segmentation +parent: Solutions +nav_order: 7 +--- + +# MediaPipe Selfie Segmentation +{: .no_toc } + +
+ + Table of contents + + {: .text-delta } +1. TOC +{:toc} +
+--- + +## Overview + +*Fig 1. Example of MediaPipe Selfie Segmentation.* | +:------------------------------------------------: | + | + +MediaPipe Selfie Segmentation segments the prominent humans in the scene. It can +run in real-time on both smartphones and laptops. The intended use cases include +selfie effects and video conferencing, where the person is close (< 2m) to the +camera. + +## Models + +In this solution, we provide two models: general and landscape. Both models are +based on +[MobileNetV3](https://ai.googleblog.com/2019/11/introducing-next-generation-on-device.html), +with modifications to make them more efficient. The general model operates on a +256x256x3 (HWC) tensor, and outputs a 256x256x1 tensor representing the +segmentation mask. The landscape model is similar to the general model, but +operates on a 144x256x3 (HWC) tensor. It has fewer FLOPs than the general model, +and therefore, runs faster. Note that MediaPipe Selfie Segmentation +automatically resizes the input image to the desired tensor dimension before +feeding it into the ML models. + +The general model is also powering [ML Kit](https://developers.google.com/ml-kit/vision/selfie-segmentation), +and a variant of the landscape model is powering [Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html). +Please find more detail about the models in the [model card](./models.md#selfie_segmentation). + +## ML Pipeline + +The pipeline is implemented as a MediaPipe +[graph](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt) +that uses a +[selfie segmentation subgraph](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt) +from the +[selfie segmentation module](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation). + +Note: To visualize a graph, copy the graph and paste it into +[MediaPipe Visualizer](https://viz.mediapipe.dev/). For more information on how +to visualize its associated subgraphs, please see +[visualizer documentation](../tools/visualizer.md). + +## Solution APIs + +### Cross-platform Configuration Options + +Naming style and availability may differ slightly across platforms/languages. + +#### model_selection + +An integer index `0` or `1`. Use `0` to select the general model, and `1` to +select the landscape model (see details in [Models](#models)). Default to `0` if +not specified. + +### Output + +Naming style may differ slightly across platforms/languages. + +#### segmentation_mask + +The output segmentation mask, which has the same dimension as the input image. + +### Python Solution API + +Please first follow general [instructions](../getting_started/python.md) to +install MediaPipe Python package, then learn more in the companion +[Python Colab](#resources) and the usage example below. + +Supported configuration options: + +* [model_selection](#model_selection) + +```python +import cv2 +import mediapipe as mp +mp_drawing = mp.solutions.drawing_utils +mp_selfie_segmentation = mp.solutions.selfie_segmentation + +# For static images: +IMAGE_FILES = [] +BG_COLOR = (192, 192, 192) # gray +MASK_COLOR = (255, 255, 255) # white +with mp_selfie_segmentation.SelfieSegmentation( + model_selection=0) as selfie_segmentation: + for idx, file in enumerate(IMAGE_FILES): + image = cv2.imread(file) + image_height, image_width, _ = image.shape + # Convert the BGR image to RGB before processing. + results = selfie_segmentation.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) + + # Draw selfie segmentation on the background image. + # To improve segmentation around boundaries, consider applying a joint + # bilateral filter to "results.segmentation_mask" with "image". + condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1 + # Generate solid color images for showing the output selfie segmentation mask. + fg_image = np.zeros(image.shape, dtype=np.uint8) + fg_image[:] = MASK_COLOR + bg_image = np.zeros(image.shape, dtype=np.uint8) + bg_image[:] = BG_COLOR + output_image = np.where(condition, fg_image, bg_image) + cv2.imwrite('/tmp/selfie_segmentation_output' + str(idx) + '.png', output_image) + +# For webcam input: +BG_COLOR = (192, 192, 192) # gray +cap = cv2.VideoCapture(0) +with mp_selfie_segmentation.SelfieSegmentation( + model_selection=1) as selfie_segmentation: + bg_image = None + while cap.isOpened(): + success, image = cap.read() + if not success: + print("Ignoring empty camera frame.") + # If loading a video, use 'break' instead of 'continue'. + continue + + # Flip the image horizontally for a later selfie-view display, and convert + # the BGR image to RGB. + image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB) + # To improve performance, optionally mark the image as not writeable to + # pass by reference. + image.flags.writeable = False + results = selfie_segmentation.process(image) + + image.flags.writeable = True + image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) + + # Draw selfie segmentation on the background image. + # To improve segmentation around boundaries, consider applying a joint + # bilateral filter to "results.segmentation_mask" with "image". + condition = np.stack( + (results.segmentation_mask,) * 3, axis=-1) > 0.1 + # The background can be customized. + # a) Load an image (with the same width and height of the input image) to + # be the background, e.g., bg_image = cv2.imread('/path/to/image/file') + # b) Blur the input image by applying image filtering, e.g., + # bg_image = cv2.GaussianBlur(image,(55,55),0) + if bg_image is None: + bg_image = np.zeros(image.shape, dtype=np.uint8) + bg_image[:] = BG_COLOR + output_image = np.where(condition, image, bg_image) + + cv2.imshow('MediaPipe Selfie Segmentation', output_image) + if cv2.waitKey(5) & 0xFF == 27: + break +cap.release() +``` + +### JavaScript Solution API + +Please first see general [introduction](../getting_started/javascript.md) on +MediaPipe in JavaScript, then learn more in the companion [web demo](#resources) +and the following usage example. + +Supported configuration options: + +* [modelSelection](#model_selection) + +```html + + + + + + + + + + + +
+ + +
+ + +``` + +```javascript + +``` + +## Example Apps + +Please first see general instructions for +[Android](../getting_started/android.md), [iOS](../getting_started/ios.md), and +[desktop](../getting_started/cpp.md) on how to build MediaPipe examples. + +Note: To visualize a graph, copy the graph and paste it into +[MediaPipe Visualizer](https://viz.mediapipe.dev/). For more information on how +to visualize its associated subgraphs, please see +[visualizer documentation](../tools/visualizer.md). + +### Mobile + +* Graph: + [`mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt) +* Android target: + [(or download prebuilt ARM64 APK)](https://drive.google.com/file/d/1DoeyGzMmWUsjfVgZfGGecrn7GKzYcEAo/view?usp=sharing) + [`mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu:selfiesegmentationgpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu/BUILD) +* iOS target: + [`mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp`](http:/mediapipe/examples/ios/selfiesegmentationgpu/BUILD) + +### Desktop + +Please first see general instructions for [desktop](../getting_started/cpp.md) +on how to build MediaPipe examples. + +* Running on CPU + * Graph: + [`mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt) + * Target: + [`mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_cpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/selfie_segmentation/BUILD) +* Running on GPU + * Graph: + [`mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt`](https://github.com/google/mediapipe/tree/master/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt) + * Target: + [`mediapipe/examples/desktop/selfie_segmentation:selfie_segmentation_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/selfie_segmentation/BUILD) + +## Resources + +* Google AI Blog: + [Background Features in Google Meet, Powered by Web ML](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html) +* [ML Kit Selfie Segmentation API](https://developers.google.com/ml-kit/vision/selfie-segmentation) +* [Models and model cards](./models.md#selfie_segmentation) +* [Web demo](https://code.mediapipe.dev/codepen/selfie_segmentation) +* [Python Colab](https://mediapipe.page.link/selfie_segmentation_py_colab) diff --git a/docs/solutions/solutions.md b/docs/solutions/solutions.md index a95f0c032..98bafe30e 100644 --- a/docs/solutions/solutions.md +++ b/docs/solutions/solutions.md @@ -24,11 +24,12 @@ has_toc: false [Hands](https://google.github.io/mediapipe/solutions/hands) | ✅ | ✅ | ✅ | ✅ | ✅ | [Pose](https://google.github.io/mediapipe/solutions/pose) | ✅ | ✅ | ✅ | ✅ | ✅ | [Holistic](https://google.github.io/mediapipe/solutions/holistic) | ✅ | ✅ | ✅ | ✅ | ✅ | +[Selfie Segmentation](https://google.github.io/mediapipe/solutions/selfie_segmentation) | ✅ | ✅ | ✅ | ✅ | ✅ | [Hair Segmentation](https://google.github.io/mediapipe/solutions/hair_segmentation) | ✅ | | ✅ | | | [Object Detection](https://google.github.io/mediapipe/solutions/object_detection) | ✅ | ✅ | ✅ | | | ✅ [Box Tracking](https://google.github.io/mediapipe/solutions/box_tracking) | ✅ | ✅ | ✅ | | | [Instant Motion Tracking](https://google.github.io/mediapipe/solutions/instant_motion_tracking) | ✅ | | | | | -[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | | +[Objectron](https://google.github.io/mediapipe/solutions/objectron) | ✅ | | ✅ | ✅ | | [KNIFT](https://google.github.io/mediapipe/solutions/knift) | ✅ | | | | | [AutoFlip](https://google.github.io/mediapipe/solutions/autoflip) | | | ✅ | | | [MediaSequence](https://google.github.io/mediapipe/solutions/media_sequence) | | | ✅ | | | diff --git a/docs/solutions/youtube_8m.md b/docs/solutions/youtube_8m.md index abef6f1b6..5415c146a 100644 --- a/docs/solutions/youtube_8m.md +++ b/docs/solutions/youtube_8m.md @@ -2,7 +2,7 @@ layout: default title: YouTube-8M Feature Extraction and Model Inference parent: Solutions -nav_order: 15 +nav_order: 16 --- # YouTube-8M Feature Extraction and Model Inference diff --git a/mediapipe/MediaPipe.tulsiproj/Configs/MediaPipe.tulsigen b/mediapipe/MediaPipe.tulsiproj/Configs/MediaPipe.tulsigen index 11daafdcb..f3b74900c 100644 --- a/mediapipe/MediaPipe.tulsiproj/Configs/MediaPipe.tulsigen +++ b/mediapipe/MediaPipe.tulsiproj/Configs/MediaPipe.tulsigen @@ -16,6 +16,7 @@ "mediapipe/examples/ios/objectdetectiongpu/BUILD", "mediapipe/examples/ios/objectdetectiontrackinggpu/BUILD", "mediapipe/examples/ios/posetrackinggpu/BUILD", + "mediapipe/examples/ios/selfiesegmentationgpu/BUILD", "mediapipe/framework/BUILD", "mediapipe/gpu/BUILD", "mediapipe/objc/BUILD", @@ -35,6 +36,7 @@ "//mediapipe/examples/ios/objectdetectiongpu:ObjectDetectionGpuApp", "//mediapipe/examples/ios/objectdetectiontrackinggpu:ObjectDetectionTrackingGpuApp", "//mediapipe/examples/ios/posetrackinggpu:PoseTrackingGpuApp", + "//mediapipe/examples/ios/selfiesegmentationgpu:SelfieSegmentationGpuApp", "//mediapipe/objc:mediapipe_framework_ios" ], "optionSet" : { @@ -103,6 +105,7 @@ "mediapipe/examples/ios/objectdetectioncpu", "mediapipe/examples/ios/objectdetectiongpu", "mediapipe/examples/ios/posetrackinggpu", + "mediapipe/examples/ios/selfiesegmentationgpu", "mediapipe/framework", "mediapipe/framework/deps", "mediapipe/framework/formats", @@ -120,6 +123,7 @@ "mediapipe/graphs/hand_tracking", "mediapipe/graphs/object_detection", "mediapipe/graphs/pose_tracking", + "mediapipe/graphs/selfie_segmentation", "mediapipe/models", "mediapipe/modules", "mediapipe/objc", diff --git a/mediapipe/MediaPipe.tulsiproj/project.tulsiconf b/mediapipe/MediaPipe.tulsiproj/project.tulsiconf index 33498e8c1..a2fe886cf 100644 --- a/mediapipe/MediaPipe.tulsiproj/project.tulsiconf +++ b/mediapipe/MediaPipe.tulsiproj/project.tulsiconf @@ -22,6 +22,7 @@ "mediapipe/examples/ios/objectdetectiongpu", "mediapipe/examples/ios/objectdetectiontrackinggpu", "mediapipe/examples/ios/posetrackinggpu", + "mediapipe/examples/ios/selfiesegmentationgpu", "mediapipe/objc" ], "projectName" : "Mediapipe", diff --git a/mediapipe/calculators/image/recolor_calculator.cc b/mediapipe/calculators/image/recolor_calculator.cc index 03d0c3c7a..062fb2cb3 100644 --- a/mediapipe/calculators/image/recolor_calculator.cc +++ b/mediapipe/calculators/image/recolor_calculator.cc @@ -37,6 +37,22 @@ constexpr char kImageFrameTag[] = "IMAGE"; constexpr char kMaskCpuTag[] = "MASK"; constexpr char kGpuBufferTag[] = "IMAGE_GPU"; constexpr char kMaskGpuTag[] = "MASK_GPU"; + +inline cv::Vec3b Blend(const cv::Vec3b& color1, const cv::Vec3b& color2, + float weight, int invert_mask, + int adjust_with_luminance) { + weight = (1 - invert_mask) * weight + invert_mask * (1.0f - weight); + + float luminance = + (1 - adjust_with_luminance) * 1.0f + + adjust_with_luminance * + (color1[0] * 0.299 + color1[1] * 0.587 + color1[2] * 0.114) / 255; + + float mix_value = weight * luminance; + + return color1 * (1.0 - mix_value) + color2 * mix_value; +} + } // namespace namespace mediapipe { @@ -44,15 +60,14 @@ namespace mediapipe { // A calculator to recolor a masked area of an image to a specified color. // // A mask image is used to specify where to overlay a user defined color. -// The luminance of the input image is used to adjust the blending weight, -// to help preserve image textures. // // Inputs: // One of the following IMAGE tags: -// IMAGE: An ImageFrame input image, RGB or RGBA. +// IMAGE: An ImageFrame input image in ImageFormat::SRGB. // IMAGE_GPU: A GpuBuffer input image, RGBA. // One of the following MASK tags: -// MASK: An ImageFrame input mask, Gray, RGB or RGBA. +// MASK: An ImageFrame input mask in ImageFormat::GRAY8, SRGB, SRGBA, or +// VEC32F1 // MASK_GPU: A GpuBuffer input mask, RGBA. // Output: // One of the following IMAGE tags: @@ -98,10 +113,12 @@ class RecolorCalculator : public CalculatorBase { void GlRender(); bool initialized_ = false; - std::vector color_; + std::vector color_; mediapipe::RecolorCalculatorOptions::MaskChannel mask_channel_; bool use_gpu_ = false; + bool invert_mask_ = false; + bool adjust_with_luminance_ = false; #if !MEDIAPIPE_DISABLE_GPU mediapipe::GlCalculatorHelper gpu_helper_; GLuint program_ = 0; @@ -233,11 +250,15 @@ absl::Status RecolorCalculator::RenderCpu(CalculatorContext* cc) { } cv::Mat mask_full; cv::resize(mask_mat, mask_full, input_mat.size()); + const cv::Vec3b recolor = {color_[0], color_[1], color_[2]}; auto output_img = absl::make_unique( input_img.Format(), input_mat.cols, input_mat.rows); cv::Mat output_mat = mediapipe::formats::MatView(output_img.get()); + const int invert_mask = invert_mask_ ? 1 : 0; + const int adjust_with_luminance = adjust_with_luminance_ ? 1 : 0; + // From GPU shader: /* vec4 weight = texture2D(mask, sample_coordinate); @@ -249,18 +270,23 @@ absl::Status RecolorCalculator::RenderCpu(CalculatorContext* cc) { fragColor = mix(color1, color2, mix_value); */ - for (int i = 0; i < output_mat.rows; ++i) { - for (int j = 0; j < output_mat.cols; ++j) { - float weight = mask_full.at(i, j) * (1.0 / 255.0); - cv::Vec3f color1 = input_mat.at(i, j); - cv::Vec3f color2 = {color_[0], color_[1], color_[2]}; - - float luminance = - (color1[0] * 0.299 + color1[1] * 0.587 + color1[2] * 0.114) / 255; - float mix_value = weight * luminance; - - cv::Vec3b mix_color = color1 * (1.0 - mix_value) + color2 * mix_value; - output_mat.at(i, j) = mix_color; + if (mask_img.Format() == ImageFormat::VEC32F1) { + for (int i = 0; i < output_mat.rows; ++i) { + for (int j = 0; j < output_mat.cols; ++j) { + const float weight = mask_full.at(i, j); + output_mat.at(i, j) = + Blend(input_mat.at(i, j), recolor, weight, invert_mask, + adjust_with_luminance); + } + } + } else { + for (int i = 0; i < output_mat.rows; ++i) { + for (int j = 0; j < output_mat.cols; ++j) { + const float weight = mask_full.at(i, j) * (1.0 / 255.0); + output_mat.at(i, j) = + Blend(input_mat.at(i, j), recolor, weight, invert_mask, + adjust_with_luminance); + } } } @@ -385,6 +411,9 @@ absl::Status RecolorCalculator::LoadOptions(CalculatorContext* cc) { color_.push_back(options.color().g()); color_.push_back(options.color().b()); + invert_mask_ = options.invert_mask(); + adjust_with_luminance_ = options.adjust_with_luminance(); + return absl::OkStatus(); } @@ -435,13 +464,20 @@ absl::Status RecolorCalculator::InitGpu(CalculatorContext* cc) { uniform sampler2D frame; uniform sampler2D mask; uniform vec3 recolor; + uniform float invert_mask; + uniform float adjust_with_luminance; void main() { vec4 weight = texture2D(mask, sample_coordinate); vec4 color1 = texture2D(frame, sample_coordinate); vec4 color2 = vec4(recolor, 1.0); - float luminance = dot(color1.rgb, vec3(0.299, 0.587, 0.114)); + weight = mix(weight, 1.0 - weight, invert_mask); + + float luminance = mix(1.0, + dot(color1.rgb, vec3(0.299, 0.587, 0.114)), + adjust_with_luminance); + float mix_value = weight.MASK_COMPONENT * luminance; fragColor = mix(color1, color2, mix_value); @@ -458,6 +494,10 @@ absl::Status RecolorCalculator::InitGpu(CalculatorContext* cc) { glUniform1i(glGetUniformLocation(program_, "mask"), 2); glUniform3f(glGetUniformLocation(program_, "recolor"), color_[0] / 255.0, color_[1] / 255.0, color_[2] / 255.0); + glUniform1f(glGetUniformLocation(program_, "invert_mask"), + invert_mask_ ? 1.0f : 0.0f); + glUniform1f(glGetUniformLocation(program_, "adjust_with_luminance"), + adjust_with_luminance_ ? 1.0f : 0.0f); #endif // !MEDIAPIPE_DISABLE_GPU return absl::OkStatus(); diff --git a/mediapipe/calculators/image/recolor_calculator.proto b/mediapipe/calculators/image/recolor_calculator.proto index 76326c079..abbf0849d 100644 --- a/mediapipe/calculators/image/recolor_calculator.proto +++ b/mediapipe/calculators/image/recolor_calculator.proto @@ -36,4 +36,11 @@ message RecolorCalculatorOptions { // Color to blend into input image where mask is > 0. // The blending is based on the input image luminosity. optional Color color = 2; + + // Swap the meaning of mask values for foreground/background. + optional bool invert_mask = 3 [default = false]; + + // Whether to use the luminance of the input image to further adjust the + // blending weight, to help preserve image textures. + optional bool adjust_with_luminance = 4 [default = true]; } diff --git a/mediapipe/calculators/tensor/BUILD b/mediapipe/calculators/tensor/BUILD index 2234787c9..3979def5e 100644 --- a/mediapipe/calculators/tensor/BUILD +++ b/mediapipe/calculators/tensor/BUILD @@ -753,3 +753,76 @@ cc_test( "//mediapipe/framework/port:gtest_main", ], ) + +# Copied from /mediapipe/calculators/tflite/BUILD +selects.config_setting_group( + name = "gpu_inference_disabled", + match_any = [ + "//mediapipe/gpu:disable_gpu", + ], +) + +mediapipe_proto_library( + name = "tensors_to_segmentation_calculator_proto", + srcs = ["tensors_to_segmentation_calculator.proto"], + visibility = ["//visibility:public"], + deps = [ + "//mediapipe/framework:calculator_options_proto", + "//mediapipe/framework:calculator_proto", + "//mediapipe/gpu:gpu_origin_proto", + ], +) + +cc_library( + name = "tensors_to_segmentation_calculator", + srcs = ["tensors_to_segmentation_calculator.cc"], + copts = select({ + "//mediapipe:apple": [ + "-x objective-c++", + "-fobjc-arc", # enable reference-counting + ], + "//conditions:default": [], + }), + visibility = ["//visibility:public"], + deps = [ + ":tensors_to_segmentation_calculator_cc_proto", + "@com_google_absl//absl/strings:str_format", + "@com_google_absl//absl/strings", + "@com_google_absl//absl/types:span", + "//mediapipe/framework/formats:image", + "//mediapipe/framework/formats:image_frame", + "//mediapipe/framework/formats:image_opencv", + "//mediapipe/framework/formats:tensor", + "//mediapipe/framework/port:opencv_imgproc", + "//mediapipe/framework/port:ret_check", + "//mediapipe/framework:calculator_context", + "//mediapipe/framework:calculator_framework", + "//mediapipe/framework:port", + "//mediapipe/util:resource_util", + "@org_tensorflow//tensorflow/lite:framework", + "//mediapipe/gpu:gpu_origin_cc_proto", + "//mediapipe/framework/port:statusor", + ] + selects.with_or({ + "//mediapipe/gpu:disable_gpu": [], + "//conditions:default": [ + "//mediapipe/gpu:gl_calculator_helper", + "//mediapipe/gpu:gl_simple_shaders", + "//mediapipe/gpu:gpu_buffer", + "//mediapipe/gpu:shader_util", + ], + }) + selects.with_or({ + ":gpu_inference_disabled": [], + "//mediapipe:ios": [ + "//mediapipe/gpu:MPPMetalUtil", + "//mediapipe/gpu:MPPMetalHelper", + ], + "//conditions:default": [ + "@org_tensorflow//tensorflow/lite/delegates/gpu:gl_delegate", + "@org_tensorflow//tensorflow/lite/delegates/gpu/gl:gl_program", + "@org_tensorflow//tensorflow/lite/delegates/gpu/gl:gl_shader", + "@org_tensorflow//tensorflow/lite/delegates/gpu/gl:gl_texture", + "@org_tensorflow//tensorflow/lite/delegates/gpu/gl/converters:util", + ], + }), + alwayslink = 1, +) diff --git a/mediapipe/calculators/tensor/tensors_to_detections_calculator.cc b/mediapipe/calculators/tensor/tensors_to_detections_calculator.cc index 1a27cafce..f161127f5 100644 --- a/mediapipe/calculators/tensor/tensors_to_detections_calculator.cc +++ b/mediapipe/calculators/tensor/tensors_to_detections_calculator.cc @@ -105,6 +105,15 @@ void ConvertAnchorsToRawValues(const std::vector& anchors, // for anchors (e.g. for SSD models) depend on the outputs of the // detection model. The size of anchor tensor must be (num_boxes * // 4). +// +// Input side packet: +// ANCHORS (optional) - The anchors used for decoding the bounding boxes, as a +// vector of `Anchor` protos. Not required if post-processing is built-in +// the model. +// IGNORE_CLASSES (optional) - The list of class ids that should be ignored, as +// a vector of integers. It overrides the corresponding field in the +// calculator options. +// // Output: // DETECTIONS - Result MediaPipe detections. // @@ -132,8 +141,11 @@ class TensorsToDetectionsCalculator : public Node { static constexpr Input> kInTensors{"TENSORS"}; static constexpr SideInput>::Optional kInAnchors{ "ANCHORS"}; + static constexpr SideInput>::Optional kSideInIgnoreClasses{ + "IGNORE_CLASSES"}; static constexpr Output> kOutDetections{"DETECTIONS"}; - MEDIAPIPE_NODE_CONTRACT(kInTensors, kInAnchors, kOutDetections); + MEDIAPIPE_NODE_CONTRACT(kInTensors, kInAnchors, kSideInIgnoreClasses, + kOutDetections); static absl::Status UpdateContract(CalculatorContract* cc); absl::Status Open(CalculatorContext* cc) override; @@ -566,8 +578,15 @@ absl::Status TensorsToDetectionsCalculator::LoadOptions(CalculatorContext* cc) { kNumCoordsPerBox, num_coords_); - for (int i = 0; i < options_.ignore_classes_size(); ++i) { - ignore_classes_.insert(options_.ignore_classes(i)); + if (kSideInIgnoreClasses(cc).IsConnected()) { + RET_CHECK(!kSideInIgnoreClasses(cc).IsEmpty()); + for (int ignore_class : *kSideInIgnoreClasses(cc)) { + ignore_classes_.insert(ignore_class); + } + } else { + for (int i = 0; i < options_.ignore_classes_size(); ++i) { + ignore_classes_.insert(options_.ignore_classes(i)); + } } return absl::OkStatus(); diff --git a/mediapipe/calculators/tensor/tensors_to_detections_calculator.proto b/mediapipe/calculators/tensor/tensors_to_detections_calculator.proto index 24c0a5053..364eb5cce 100644 --- a/mediapipe/calculators/tensor/tensors_to_detections_calculator.proto +++ b/mediapipe/calculators/tensor/tensors_to_detections_calculator.proto @@ -56,7 +56,7 @@ message TensorsToDetectionsCalculatorOptions { // [x_center, y_center, w, h]. optional bool reverse_output_order = 14 [default = false]; // The ids of classes that should be ignored during decoding the score for - // each predicted box. + // each predicted box. Can be overridden with IGNORE_CLASSES side packet. repeated int32 ignore_classes = 8; optional bool sigmoid_score = 15 [default = false]; diff --git a/mediapipe/calculators/tensor/tensors_to_segmentation_calculator.cc b/mediapipe/calculators/tensor/tensors_to_segmentation_calculator.cc new file mode 100644 index 000000000..23b98618c --- /dev/null +++ b/mediapipe/calculators/tensor/tensors_to_segmentation_calculator.cc @@ -0,0 +1,885 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +#include + +#include "absl/strings/str_format.h" +#include "absl/types/span.h" +#include "mediapipe/calculators/tensor/tensors_to_segmentation_calculator.pb.h" +#include "mediapipe/framework/calculator_context.h" +#include "mediapipe/framework/calculator_framework.h" +#include "mediapipe/framework/formats/image.h" +#include "mediapipe/framework/formats/image_opencv.h" +#include "mediapipe/framework/formats/tensor.h" +#include "mediapipe/framework/port.h" +#include "mediapipe/framework/port/opencv_imgproc_inc.h" +#include "mediapipe/framework/port/ret_check.h" +#include "mediapipe/framework/port/statusor.h" +#include "mediapipe/gpu/gpu_origin.pb.h" +#include "mediapipe/util/resource_util.h" +#include "tensorflow/lite/interpreter.h" + +#if !MEDIAPIPE_DISABLE_GPU +#include "mediapipe/gpu/gl_calculator_helper.h" +#include "mediapipe/gpu/gl_simple_shaders.h" +#include "mediapipe/gpu/gpu_buffer.h" +#include "mediapipe/gpu/shader_util.h" +#endif // !MEDIAPIPE_DISABLE_GPU + +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 +#include "tensorflow/lite/delegates/gpu/gl/converters/util.h" +#include "tensorflow/lite/delegates/gpu/gl/gl_program.h" +#include "tensorflow/lite/delegates/gpu/gl/gl_shader.h" +#include "tensorflow/lite/delegates/gpu/gl/gl_texture.h" +#include "tensorflow/lite/delegates/gpu/gl_delegate.h" +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + +#if MEDIAPIPE_METAL_ENABLED +#import +#import +#import + +#import "mediapipe/gpu/MPPMetalHelper.h" +#include "mediapipe/gpu/MPPMetalUtil.h" +#endif // MEDIAPIPE_METAL_ENABLED + +namespace { +constexpr int kWorkgroupSize = 8; // Block size for GPU shader. +enum { ATTRIB_VERTEX, ATTRIB_TEXTURE_POSITION, NUM_ATTRIBUTES }; + +// Commonly used to compute the number of blocks to launch in a kernel. +int NumGroups(const int size, const int group_size) { // NOLINT + return (size + group_size - 1) / group_size; +} + +bool CanUseGpu() { +#if !MEDIAPIPE_DISABLE_GPU || MEDIAPIPE_METAL_ENABLED + // TODO: Configure GPU usage policy in individual calculators. + constexpr bool kAllowGpuProcessing = true; + return kAllowGpuProcessing; +#else + return false; +#endif // !MEDIAPIPE_DISABLE_GPU || MEDIAPIPE_METAL_ENABLED +} + +constexpr char kTensorsTag[] = "TENSORS"; +constexpr char kOutputSizeTag[] = "OUTPUT_SIZE"; +constexpr char kMaskTag[] = "MASK"; + +absl::StatusOr> GetHwcFromDims( + const std::vector& dims) { + if (dims.size() == 3) { + return std::make_tuple(dims[0], dims[1], dims[2]); + } else if (dims.size() == 4) { + // BHWC format check B == 1 + RET_CHECK_EQ(1, dims[0]) << "Expected batch to be 1 for BHWC heatmap"; + return std::make_tuple(dims[1], dims[2], dims[3]); + } else { + RET_CHECK(false) << "Invalid shape for segmentation tensor " << dims.size(); + } +} +} // namespace + +namespace mediapipe { + +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 +using ::tflite::gpu::gl::GlProgram; +using ::tflite::gpu::gl::GlShader; +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + +// Converts Tensors from a tflite segmentation model to an image mask. +// +// Performs optional upscale to OUTPUT_SIZE dimensions if provided, +// otherwise the mask is the same size as input tensor. +// +// If at least one input tensor is already on GPU, processing happens on GPU and +// the output mask is also stored on GPU. Otherwise, processing and the output +// mask are both on CPU. +// +// On GPU, the mask is an RGBA image, in both the R & A channels, scaled 0-1. +// On CPU, the mask is a ImageFormat::VEC32F1 image, with values scaled 0-1. +// +// +// Inputs: +// One of the following TENSORS tags: +// TENSORS: Vector of Tensor, +// The tensor dimensions are specified in this calculator's options. +// OUTPUT_SIZE(optional): std::pair, +// If provided, the size to upscale mask to. +// +// Output: +// MASK: An Image output mask, RGBA(GPU) / VEC32F1(CPU). +// +// Options: +// See tensors_to_segmentation_calculator.proto +// +// Usage example: +// node { +// calculator: "TensorsToSegmentationCalculator" +// input_stream: "TENSORS:tensors" +// input_stream: "OUTPUT_SIZE:size" +// output_stream: "MASK:hair_mask" +// node_options: { +// [mediapipe.TensorsToSegmentationCalculatorOptions] { +// output_layer_index: 1 +// # gpu_origin: CONVENTIONAL # or TOP_LEFT +// } +// } +// } +// +// Currently only OpenGLES 3.1 and CPU backends supported. +// TODO Refactor and add support for other backends/platforms. +// +class TensorsToSegmentationCalculator : public CalculatorBase { + public: + static absl::Status GetContract(CalculatorContract* cc); + + absl::Status Open(CalculatorContext* cc) override; + absl::Status Process(CalculatorContext* cc) override; + absl::Status Close(CalculatorContext* cc) override; + + private: + absl::Status LoadOptions(CalculatorContext* cc); + absl::Status InitGpu(CalculatorContext* cc); + absl::Status ProcessGpu(CalculatorContext* cc); + absl::Status ProcessCpu(CalculatorContext* cc); + void GlRender(); + + bool DoesGpuTextureStartAtBottom() { + return options_.gpu_origin() != mediapipe::GpuOrigin_Mode_TOP_LEFT; + } + + template + absl::Status ApplyActivation(cv::Mat& tensor_mat, cv::Mat* small_mask_mat); + + ::mediapipe::TensorsToSegmentationCalculatorOptions options_; + +#if !MEDIAPIPE_DISABLE_GPU + mediapipe::GlCalculatorHelper gpu_helper_; + GLuint upsample_program_; +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + std::unique_ptr mask_program_31_; +#else + GLuint mask_program_20_; +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 +#if MEDIAPIPE_METAL_ENABLED + MPPMetalHelper* metal_helper_ = nullptr; + id mask_program_; +#endif // MEDIAPIPE_METAL_ENABLED +#endif // !MEDIAPIPE_DISABLE_GPU +}; +REGISTER_CALCULATOR(TensorsToSegmentationCalculator); + +// static +absl::Status TensorsToSegmentationCalculator::GetContract( + CalculatorContract* cc) { + RET_CHECK(!cc->Inputs().GetTags().empty()); + RET_CHECK(!cc->Outputs().GetTags().empty()); + + // Inputs. + cc->Inputs().Tag(kTensorsTag).Set>(); + if (cc->Inputs().HasTag(kOutputSizeTag)) { + cc->Inputs().Tag(kOutputSizeTag).Set>(); + } + + // Outputs. + cc->Outputs().Tag(kMaskTag).Set(); + + if (CanUseGpu()) { +#if !MEDIAPIPE_DISABLE_GPU + MP_RETURN_IF_ERROR(mediapipe::GlCalculatorHelper::UpdateContract(cc)); +#if MEDIAPIPE_METAL_ENABLED + MP_RETURN_IF_ERROR([MPPMetalHelper updateContract:cc]); +#endif // MEDIAPIPE_METAL_ENABLED +#endif // !MEDIAPIPE_DISABLE_GPU + } + + return absl::OkStatus(); +} + +absl::Status TensorsToSegmentationCalculator::Open(CalculatorContext* cc) { + cc->SetOffset(TimestampDiff(0)); + bool use_gpu = false; + + if (CanUseGpu()) { +#if !MEDIAPIPE_DISABLE_GPU + use_gpu = true; + MP_RETURN_IF_ERROR(gpu_helper_.Open(cc)); +#if MEDIAPIPE_METAL_ENABLED + metal_helper_ = [[MPPMetalHelper alloc] initWithCalculatorContext:cc]; + RET_CHECK(metal_helper_); +#endif // MEDIAPIPE_METAL_ENABLED +#endif // !MEDIAPIPE_DISABLE_GPU + } + + MP_RETURN_IF_ERROR(LoadOptions(cc)); + + if (use_gpu) { +#if !MEDIAPIPE_DISABLE_GPU + MP_RETURN_IF_ERROR(InitGpu(cc)); +#else + RET_CHECK_FAIL() << "GPU processing disabled."; +#endif // !MEDIAPIPE_DISABLE_GPU + } + + return absl::OkStatus(); +} + +absl::Status TensorsToSegmentationCalculator::Process(CalculatorContext* cc) { + if (cc->Inputs().Tag(kTensorsTag).IsEmpty()) { + return absl::OkStatus(); + } + + const auto& input_tensors = + cc->Inputs().Tag(kTensorsTag).Get>(); + + bool use_gpu = false; + if (CanUseGpu()) { + // Use GPU processing only if at least one input tensor is already on GPU. + for (const auto& tensor : input_tensors) { + if (tensor.ready_on_gpu()) { + use_gpu = true; + break; + } + } + } + + // Validate tensor channels and activation type. + { + RET_CHECK(!input_tensors.empty()); + ASSIGN_OR_RETURN(auto hwc, GetHwcFromDims(input_tensors[0].shape().dims)); + int tensor_channels = std::get<2>(hwc); + typedef mediapipe::TensorsToSegmentationCalculatorOptions Options; + switch (options_.activation()) { + case Options::NONE: + RET_CHECK_EQ(tensor_channels, 1); + break; + case Options::SIGMOID: + RET_CHECK_EQ(tensor_channels, 1); + break; + case Options::SOFTMAX: + RET_CHECK_EQ(tensor_channels, 2); + break; + } + } + + if (use_gpu) { +#if !MEDIAPIPE_DISABLE_GPU + MP_RETURN_IF_ERROR(gpu_helper_.RunInGlContext([this, cc]() -> absl::Status { + MP_RETURN_IF_ERROR(ProcessGpu(cc)); + return absl::OkStatus(); + })); +#else + RET_CHECK_FAIL() << "GPU processing disabled."; +#endif // !MEDIAPIPE_DISABLE_GPU + } else { + MP_RETURN_IF_ERROR(ProcessCpu(cc)); + } + + return absl::OkStatus(); +} + +absl::Status TensorsToSegmentationCalculator::Close(CalculatorContext* cc) { +#if !MEDIAPIPE_DISABLE_GPU + gpu_helper_.RunInGlContext([this] { + if (upsample_program_) glDeleteProgram(upsample_program_); + upsample_program_ = 0; +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + mask_program_31_.reset(); +#else + if (mask_program_20_) glDeleteProgram(mask_program_20_); + mask_program_20_ = 0; +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 +#if MEDIAPIPE_METAL_ENABLED + mask_program_ = nil; +#endif // MEDIAPIPE_METAL_ENABLED + }); +#endif // !MEDIAPIPE_DISABLE_GPU + + return absl::OkStatus(); +} + +absl::Status TensorsToSegmentationCalculator::ProcessCpu( + CalculatorContext* cc) { + // Get input streams, and dimensions. + const auto& input_tensors = + cc->Inputs().Tag(kTensorsTag).Get>(); + ASSIGN_OR_RETURN(auto hwc, GetHwcFromDims(input_tensors[0].shape().dims)); + auto [tensor_height, tensor_width, tensor_channels] = hwc; + int output_width = tensor_width, output_height = tensor_height; + if (cc->Inputs().HasTag(kOutputSizeTag)) { + const auto& size = + cc->Inputs().Tag(kOutputSizeTag).Get>(); + output_width = size.first; + output_height = size.second; + } + + // Create initial working mask. + cv::Mat small_mask_mat(cv::Size(tensor_width, tensor_height), CV_32FC1); + + // Wrap input tensor. + auto raw_input_tensor = &input_tensors[0]; + auto raw_input_view = raw_input_tensor->GetCpuReadView(); + const float* raw_input_data = raw_input_view.buffer(); + cv::Mat tensor_mat(cv::Size(tensor_width, tensor_height), + CV_MAKETYPE(CV_32F, tensor_channels), + const_cast(raw_input_data)); + + // Process mask tensor and apply activation function. + if (tensor_channels == 2) { + MP_RETURN_IF_ERROR(ApplyActivation(tensor_mat, &small_mask_mat)); + } else if (tensor_channels == 1) { + RET_CHECK(mediapipe::TensorsToSegmentationCalculatorOptions::SOFTMAX != + options_.activation()); // Requires 2 channels. + if (mediapipe::TensorsToSegmentationCalculatorOptions::NONE == + options_.activation()) // Pass-through optimization. + tensor_mat.copyTo(small_mask_mat); + else + MP_RETURN_IF_ERROR(ApplyActivation(tensor_mat, &small_mask_mat)); + } else { + RET_CHECK_FAIL() << "Unsupported number of tensor channels " + << tensor_channels; + } + + // Send out image as CPU packet. + std::shared_ptr mask_frame = std::make_shared( + ImageFormat::VEC32F1, output_width, output_height); + std::unique_ptr output_mask = absl::make_unique(mask_frame); + cv::Mat output_mat = formats::MatView(output_mask.get()); + // Upsample small mask into output. + cv::resize(small_mask_mat, output_mat, cv::Size(output_width, output_height)); + cc->Outputs().Tag(kMaskTag).Add(output_mask.release(), cc->InputTimestamp()); + + return absl::OkStatus(); +} + +template +absl::Status TensorsToSegmentationCalculator::ApplyActivation( + cv::Mat& tensor_mat, cv::Mat* small_mask_mat) { + // Configure activation function. + const int output_layer_index = options_.output_layer_index(); + typedef mediapipe::TensorsToSegmentationCalculatorOptions Options; + const auto activation_fn = [&](const cv::Vec2f& mask_value) { + float new_mask_value = 0; + // TODO consider moving switch out of the loop, + // and also avoid float/Vec2f casting. + switch (options_.activation()) { + case Options::NONE: { + new_mask_value = mask_value[0]; + break; + } + case Options::SIGMOID: { + const float pixel0 = mask_value[0]; + new_mask_value = 1.0 / (std::exp(-pixel0) + 1.0); + break; + } + case Options::SOFTMAX: { + const float pixel0 = mask_value[0]; + const float pixel1 = mask_value[1]; + const float max_pixel = std::max(pixel0, pixel1); + const float min_pixel = std::min(pixel0, pixel1); + const float softmax_denom = + /*exp(max_pixel - max_pixel)=*/1.0f + + std::exp(min_pixel - max_pixel); + new_mask_value = std::exp(mask_value[output_layer_index] - max_pixel) / + softmax_denom; + break; + } + } + return new_mask_value; + }; + + // Process mask tensor. + for (int i = 0; i < tensor_mat.rows; ++i) { + for (int j = 0; j < tensor_mat.cols; ++j) { + const T& input_pix = tensor_mat.at(i, j); + const float mask_value = activation_fn(input_pix); + small_mask_mat->at(i, j) = mask_value; + } + } + + return absl::OkStatus(); +} + +// Steps: +// 1. receive tensor +// 2. process segmentation tensor into small mask +// 3. upsample small mask into output mask to be same size as input image +absl::Status TensorsToSegmentationCalculator::ProcessGpu( + CalculatorContext* cc) { +#if !MEDIAPIPE_DISABLE_GPU + // Get input streams, and dimensions. + const auto& input_tensors = + cc->Inputs().Tag(kTensorsTag).Get>(); + ASSIGN_OR_RETURN(auto hwc, GetHwcFromDims(input_tensors[0].shape().dims)); + auto [tensor_height, tensor_width, tensor_channels] = hwc; + int output_width = tensor_width, output_height = tensor_height; + if (cc->Inputs().HasTag(kOutputSizeTag)) { + const auto& size = + cc->Inputs().Tag(kOutputSizeTag).Get>(); + output_width = size.first; + output_height = size.second; + } + + // Create initial working mask texture. +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + tflite::gpu::gl::GlTexture small_mask_texture; +#else + mediapipe::GlTexture small_mask_texture; +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + + // Run shader, process mask tensor. +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + { + MP_RETURN_IF_ERROR(CreateReadWriteRgbaImageTexture( + tflite::gpu::DataType::UINT8, // GL_RGBA8 + {tensor_width, tensor_height}, &small_mask_texture)); + + const int output_index = 0; + glBindImageTexture(output_index, small_mask_texture.id(), 0, GL_FALSE, 0, + GL_WRITE_ONLY, GL_RGBA8); + + auto read_view = input_tensors[0].GetOpenGlBufferReadView(); + glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, read_view.name()); + + const tflite::gpu::uint3 workgroups = { + NumGroups(tensor_width, kWorkgroupSize), + NumGroups(tensor_height, kWorkgroupSize), 1}; + + glUseProgram(mask_program_31_->id()); + glUniform2i(glGetUniformLocation(mask_program_31_->id(), "out_size"), + tensor_width, tensor_height); + + MP_RETURN_IF_ERROR(mask_program_31_->Dispatch(workgroups)); + } +#elif MEDIAPIPE_METAL_ENABLED + { + id command_buffer = [metal_helper_ commandBuffer]; + command_buffer.label = @"SegmentationKernel"; + id command_encoder = + [command_buffer computeCommandEncoder]; + [command_encoder setComputePipelineState:mask_program_]; + + auto read_view = input_tensors[0].GetMtlBufferReadView(command_buffer); + [command_encoder setBuffer:read_view.buffer() offset:0 atIndex:0]; + + mediapipe::GpuBuffer small_mask_buffer = [metal_helper_ + mediapipeGpuBufferWithWidth:tensor_width + height:tensor_height + format:mediapipe::GpuBufferFormat::kBGRA32]; + id small_mask_texture_metal = + [metal_helper_ metalTextureWithGpuBuffer:small_mask_buffer]; + [command_encoder setTexture:small_mask_texture_metal atIndex:1]; + + unsigned int out_size[] = {static_cast(tensor_width), + static_cast(tensor_height)}; + [command_encoder setBytes:&out_size length:sizeof(out_size) atIndex:2]; + + MTLSize threads_per_group = MTLSizeMake(kWorkgroupSize, kWorkgroupSize, 1); + MTLSize threadgroups = + MTLSizeMake(NumGroups(tensor_width, kWorkgroupSize), + NumGroups(tensor_height, kWorkgroupSize), 1); + [command_encoder dispatchThreadgroups:threadgroups + threadsPerThreadgroup:threads_per_group]; + [command_encoder endEncoding]; + [command_buffer commit]; + + small_mask_texture = gpu_helper_.CreateSourceTexture(small_mask_buffer); + } +#else + { + small_mask_texture = gpu_helper_.CreateDestinationTexture( + tensor_width, tensor_height, + mediapipe::GpuBufferFormat::kBGRA32); // actually GL_RGBA8 + + // Go through CPU if not already texture 2D (no direct conversion yet). + // Tensor::GetOpenGlTexture2dReadView() doesn't automatically convert types. + if (!input_tensors[0].ready_as_opengl_texture_2d()) { + (void)input_tensors[0].GetCpuReadView(); + } + + auto read_view = input_tensors[0].GetOpenGlTexture2dReadView(); + + gpu_helper_.BindFramebuffer(small_mask_texture); + glActiveTexture(GL_TEXTURE1); + glBindTexture(GL_TEXTURE_2D, read_view.name()); + glUseProgram(mask_program_20_); + GlRender(); + glBindTexture(GL_TEXTURE_2D, 0); + glFlush(); + } +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + + // Upsample small mask into output. + mediapipe::GlTexture output_texture = gpu_helper_.CreateDestinationTexture( + output_width, output_height, + mediapipe::GpuBufferFormat::kBGRA32); // actually GL_RGBA8 + + // Run shader, upsample result. + { + gpu_helper_.BindFramebuffer(output_texture); + glActiveTexture(GL_TEXTURE1); +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + glBindTexture(GL_TEXTURE_2D, small_mask_texture.id()); +#else + glBindTexture(GL_TEXTURE_2D, small_mask_texture.name()); +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + glUseProgram(upsample_program_); + GlRender(); + glBindTexture(GL_TEXTURE_2D, 0); + glFlush(); + } + + // Send out image as GPU packet. + auto output_image = output_texture.GetFrame(); + cc->Outputs().Tag(kMaskTag).Add(output_image.release(), cc->InputTimestamp()); + + // Cleanup + output_texture.Release(); +#endif // !MEDIAPIPE_DISABLE_GPU + + return absl::OkStatus(); +} + +void TensorsToSegmentationCalculator::GlRender() { +#if !MEDIAPIPE_DISABLE_GPU + static const GLfloat square_vertices[] = { + -1.0f, -1.0f, // bottom left + 1.0f, -1.0f, // bottom right + -1.0f, 1.0f, // top left + 1.0f, 1.0f, // top right + }; + static const GLfloat texture_vertices[] = { + 0.0f, 0.0f, // bottom left + 1.0f, 0.0f, // bottom right + 0.0f, 1.0f, // top left + 1.0f, 1.0f, // top right + }; + + // vertex storage + GLuint vbo[2]; + glGenBuffers(2, vbo); + GLuint vao; + glGenVertexArrays(1, &vao); + glBindVertexArray(vao); + + // vbo 0 + glBindBuffer(GL_ARRAY_BUFFER, vbo[0]); + glBufferData(GL_ARRAY_BUFFER, 4 * 2 * sizeof(GLfloat), square_vertices, + GL_STATIC_DRAW); + glEnableVertexAttribArray(ATTRIB_VERTEX); + glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, nullptr); + + // vbo 1 + glBindBuffer(GL_ARRAY_BUFFER, vbo[1]); + glBufferData(GL_ARRAY_BUFFER, 4 * 2 * sizeof(GLfloat), texture_vertices, + GL_STATIC_DRAW); + glEnableVertexAttribArray(ATTRIB_TEXTURE_POSITION); + glVertexAttribPointer(ATTRIB_TEXTURE_POSITION, 2, GL_FLOAT, 0, 0, nullptr); + + // draw + glDrawArrays(GL_TRIANGLE_STRIP, 0, 4); + + // cleanup + glDisableVertexAttribArray(ATTRIB_VERTEX); + glDisableVertexAttribArray(ATTRIB_TEXTURE_POSITION); + glBindBuffer(GL_ARRAY_BUFFER, 0); + glBindVertexArray(0); + glDeleteVertexArrays(1, &vao); + glDeleteBuffers(2, vbo); +#endif // !MEDIAPIPE_DISABLE_GPU +} + +absl::Status TensorsToSegmentationCalculator::LoadOptions( + CalculatorContext* cc) { + // Get calculator options specified in the graph. + options_ = cc->Options<::mediapipe::TensorsToSegmentationCalculatorOptions>(); + + return absl::OkStatus(); +} + +absl::Status TensorsToSegmentationCalculator::InitGpu(CalculatorContext* cc) { +#if !MEDIAPIPE_DISABLE_GPU + MP_RETURN_IF_ERROR(gpu_helper_.RunInGlContext([this]() -> absl::Status { + // A shader to process a segmentation tensor into an output mask. + // Currently uses 4 channels for output, and sets R+A channels as mask value. +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + // GLES 3.1 + const tflite::gpu::uint3 workgroup_size = {kWorkgroupSize, kWorkgroupSize, + 1}; + const std::string shader_header = + absl::StrCat(tflite::gpu::gl::GetShaderHeader(workgroup_size), R"( +precision highp float; + +layout(rgba8, binding = 0) writeonly uniform highp image2D output_texture; + +uniform ivec2 out_size; +)"); + /* Shader defines will be inserted here. */ + + const std::string shader_src_main = R"( +layout(std430, binding = 2) readonly buffer B0 { +#ifdef TWO_CHANNEL_INPUT + vec2 elements[]; +#else + float elements[]; +#endif // TWO_CHANNEL_INPUT +} input_data; // data tensor + +void main() { + int out_width = out_size.x; + int out_height = out_size.y; + + ivec2 gid = ivec2(gl_GlobalInvocationID.xy); + if (gid.x >= out_width || gid.y >= out_height) { return; } + int linear_index = gid.y * out_width + gid.x; + +#ifdef TWO_CHANNEL_INPUT + vec2 input_value = input_data.elements[linear_index]; +#else + vec2 input_value = vec2(input_data.elements[linear_index], 0.0); +#endif // TWO_CHANNEL_INPUT + +// Run activation function. +// One and only one of FN_SOFTMAX,FN_SIGMOID,FN_NONE will be defined. +#ifdef FN_SOFTMAX + // Only two channel input tensor is supported. + vec2 input_px = input_value.rg; + float shift = max(input_px.r, input_px.g); + float softmax_denom = exp(input_px.r - shift) + exp(input_px.g - shift); + float new_mask_value = + exp(input_px[OUTPUT_LAYER_INDEX] - shift) / softmax_denom; +#endif // FN_SOFTMAX + +#ifdef FN_SIGMOID + float new_mask_value = 1.0 / (exp(-input_value.r) + 1.0); +#endif // FN_SIGMOID + +#ifdef FN_NONE + float new_mask_value = input_value.r; +#endif // FN_NONE + +#ifdef FLIP_Y_COORD + int y_coord = out_height - gid.y - 1; +#else + int y_coord = gid.y; +#endif // defined(FLIP_Y_COORD) + ivec2 output_coordinate = ivec2(gid.x, y_coord); + + vec4 out_value = vec4(new_mask_value, 0.0, 0.0, new_mask_value); + imageStore(output_texture, output_coordinate, out_value); +})"; + +#elif MEDIAPIPE_METAL_ENABLED + // METAL + const std::string shader_header = R"( +#include +using namespace metal; +)"; + /* Shader defines will be inserted here. */ + + const std::string shader_src_main = R"( +kernel void segmentationKernel( +#ifdef TWO_CHANNEL_INPUT + device float2* elements [[ buffer(0) ]], +#else + device float* elements [[ buffer(0) ]], +#endif // TWO_CHANNEL_INPUT + texture2d output_texture [[ texture(1) ]], + constant uint* out_size [[ buffer(2) ]], + uint2 gid [[ thread_position_in_grid ]]) +{ + uint out_width = out_size[0]; + uint out_height = out_size[1]; + + if (gid.x >= out_width || gid.y >= out_height) { return; } + uint linear_index = gid.y * out_width + gid.x; + +#ifdef TWO_CHANNEL_INPUT + float2 input_value = elements[linear_index]; +#else + float2 input_value = float2(elements[linear_index], 0.0); +#endif // TWO_CHANNEL_INPUT + +// Run activation function. +// One and only one of FN_SOFTMAX,FN_SIGMOID,FN_NONE will be defined. +#ifdef FN_SOFTMAX + // Only two channel input tensor is supported. + float2 input_px = input_value.xy; + float shift = max(input_px.x, input_px.y); + float softmax_denom = exp(input_px.r - shift) + exp(input_px.g - shift); + float new_mask_value = + exp(input_px[OUTPUT_LAYER_INDEX] - shift) / softmax_denom; +#endif // FN_SOFTMAX + +#ifdef FN_SIGMOID + float new_mask_value = 1.0 / (exp(-input_value.x) + 1.0); +#endif // FN_SIGMOID + +#ifdef FN_NONE + float new_mask_value = input_value.x; +#endif // FN_NONE + +#ifdef FLIP_Y_COORD + int y_coord = out_height - gid.y - 1; +#else + int y_coord = gid.y; +#endif // defined(FLIP_Y_COORD) + uint2 output_coordinate = uint2(gid.x, y_coord); + + float4 out_value = float4(new_mask_value, 0.0, 0.0, new_mask_value); + output_texture.write(out_value, output_coordinate); +} +)"; + +#else + // GLES 2.0 + const std::string shader_header = absl::StrCat( + std::string(mediapipe::kMediaPipeFragmentShaderPreamble), R"( +DEFAULT_PRECISION(mediump, float) +)"); + /* Shader defines will be inserted here. */ + + const std::string shader_src_main = R"( +in vec2 sample_coordinate; + +uniform sampler2D input_texture; + +#ifdef GL_ES +#define fragColor gl_FragColor +#else +out vec4 fragColor; +#endif // defined(GL_ES); + +void main() { + + vec4 input_value = texture2D(input_texture, sample_coordinate); + vec2 gid = sample_coordinate; + + // Run activation function. + // One and only one of FN_SOFTMAX,FN_SIGMOID,FN_NONE will be defined. + +#ifdef FN_SOFTMAX + // Only two channel input tensor is supported. + vec2 input_px = input_value.rg; + float shift = max(input_px.r, input_px.g); + float softmax_denom = exp(input_px.r - shift) + exp(input_px.g - shift); + float new_mask_value = + exp(mix(input_px.r, input_px.g, float(OUTPUT_LAYER_INDEX)) - shift) / softmax_denom; +#endif // FN_SOFTMAX + +#ifdef FN_SIGMOID + float new_mask_value = 1.0 / (exp(-input_value.r) + 1.0); +#endif // FN_SIGMOID + +#ifdef FN_NONE + float new_mask_value = input_value.r; +#endif // FN_NONE + +#ifdef FLIP_Y_COORD + float y_coord = 1.0 - gid.y; +#else + float y_coord = gid.y; +#endif // defined(FLIP_Y_COORD) + vec2 output_coordinate = vec2(gid.x, y_coord); + + vec4 out_value = vec4(new_mask_value, 0.0, 0.0, new_mask_value); + fragColor = out_value; +})"; +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + + // Shader defines. + typedef mediapipe::TensorsToSegmentationCalculatorOptions Options; + const std::string output_layer_index = + "\n#define OUTPUT_LAYER_INDEX int(" + + std::to_string(options_.output_layer_index()) + ")"; + const std::string flip_y_coord = + DoesGpuTextureStartAtBottom() ? "\n#define FLIP_Y_COORD" : ""; + const std::string fn_none = + options_.activation() == Options::NONE ? "\n#define FN_NONE" : ""; + const std::string fn_sigmoid = + options_.activation() == Options::SIGMOID ? "\n#define FN_SIGMOID" : ""; + const std::string fn_softmax = + options_.activation() == Options::SOFTMAX ? "\n#define FN_SOFTMAX" : ""; + const std::string two_channel = options_.activation() == Options::SOFTMAX + ? "\n#define TWO_CHANNEL_INPUT" + : ""; + const std::string shader_defines = + absl::StrCat(output_layer_index, flip_y_coord, fn_softmax, fn_sigmoid, + fn_none, two_channel); + + // Build full shader. + const std::string shader_src_no_previous = + absl::StrCat(shader_header, shader_defines, shader_src_main); + + // Vertex shader attributes. + const GLint attr_location[NUM_ATTRIBUTES] = { + ATTRIB_VERTEX, + ATTRIB_TEXTURE_POSITION, + }; + const GLchar* attr_name[NUM_ATTRIBUTES] = { + "position", + "texture_coordinate", + }; + + // Main shader program & parameters +#if MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + GlShader shader_without_previous; + MP_RETURN_IF_ERROR(GlShader::CompileShader( + GL_COMPUTE_SHADER, shader_src_no_previous, &shader_without_previous)); + mask_program_31_ = absl::make_unique(); + MP_RETURN_IF_ERROR(GlProgram::CreateWithShader(shader_without_previous, + mask_program_31_.get())); +#elif MEDIAPIPE_METAL_ENABLED + id device = metal_helper_.mtlDevice; + NSString* library_source = + [NSString stringWithUTF8String:shader_src_no_previous.c_str()]; + NSError* error = nil; + id library = [device newLibraryWithSource:library_source + options:nullptr + error:&error]; + RET_CHECK(library != nil) << "Couldn't create shader library " + << [[error localizedDescription] UTF8String]; + id kernel_func = nil; + kernel_func = [library newFunctionWithName:@"segmentationKernel"]; + RET_CHECK(kernel_func != nil) << "Couldn't create kernel function."; + mask_program_ = + [device newComputePipelineStateWithFunction:kernel_func error:&error]; + RET_CHECK(mask_program_ != nil) << "Couldn't create pipeline state " << + [[error localizedDescription] UTF8String]; +#else + mediapipe::GlhCreateProgram( + mediapipe::kBasicVertexShader, shader_src_no_previous.c_str(), + NUM_ATTRIBUTES, &attr_name[0], attr_location, &mask_program_20_); + RET_CHECK(mask_program_20_) << "Problem initializing the program."; + glUseProgram(mask_program_20_); + glUniform1i(glGetUniformLocation(mask_program_20_, "input_texture"), 1); +#endif // MEDIAPIPE_OPENGL_ES_VERSION >= MEDIAPIPE_OPENGL_ES_31 + + // Simple pass-through program, used for hardware upsampling. + mediapipe::GlhCreateProgram( + mediapipe::kBasicVertexShader, mediapipe::kBasicTexturedFragmentShader, + NUM_ATTRIBUTES, &attr_name[0], attr_location, &upsample_program_); + RET_CHECK(upsample_program_) << "Problem initializing the program."; + glUseProgram(upsample_program_); + glUniform1i(glGetUniformLocation(upsample_program_, "video_frame"), 1); + + return absl::OkStatus(); + })); +#endif // !MEDIAPIPE_DISABLE_GPU + + return absl::OkStatus(); +} + +} // namespace mediapipe diff --git a/mediapipe/calculators/tensor/tensors_to_segmentation_calculator.proto b/mediapipe/calculators/tensor/tensors_to_segmentation_calculator.proto new file mode 100644 index 000000000..1662576b6 --- /dev/null +++ b/mediapipe/calculators/tensor/tensors_to_segmentation_calculator.proto @@ -0,0 +1,46 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +syntax = "proto2"; + +package mediapipe; + +import "mediapipe/framework/calculator.proto"; +import "mediapipe/gpu/gpu_origin.proto"; + +message TensorsToSegmentationCalculatorOptions { + extend mediapipe.CalculatorOptions { + optional TensorsToSegmentationCalculatorOptions ext = 374311106; + } + + // For CONVENTIONAL mode in OpenGL, textures start at bottom and needs + // to be flipped vertically as tensors are expected to start at top. + // (DEFAULT or unset is interpreted as CONVENTIONAL.) + optional GpuOrigin.Mode gpu_origin = 1; + + // Supported activation functions for filtering. + enum Activation { + NONE = 0; // Assumes 1-channel input tensor. + SIGMOID = 1; // Assumes 1-channel input tensor. + SOFTMAX = 2; // Assumes 2-channel input tensor. + } + // Activation function to apply to input tensor. + // Softmax requires a 2-channel tensor, see output_layer_index below. + optional Activation activation = 2 [default = NONE]; + + // Channel to use for processing tensor. + // Only applies when using activation=SOFTMAX. + // Works on two channel input tensor only. + optional int32 output_layer_index = 3 [default = 1]; +} diff --git a/mediapipe/calculators/util/BUILD b/mediapipe/calculators/util/BUILD index 1ee0fb9cc..62455f01f 100644 --- a/mediapipe/calculators/util/BUILD +++ b/mediapipe/calculators/util/BUILD @@ -859,6 +859,7 @@ cc_library( "//mediapipe/framework:calculator_framework", "//mediapipe/framework:timestamp", "//mediapipe/framework/formats:landmark_cc_proto", + "//mediapipe/framework/formats:rect_cc_proto", "//mediapipe/framework/port:ret_check", "//mediapipe/util/filtering:one_euro_filter", "//mediapipe/util/filtering:relative_velocity_filter", diff --git a/mediapipe/calculators/util/detections_to_rects_calculator.cc b/mediapipe/calculators/util/detections_to_rects_calculator.cc index 29836cb59..a1b13a8db 100644 --- a/mediapipe/calculators/util/detections_to_rects_calculator.cc +++ b/mediapipe/calculators/util/detections_to_rects_calculator.cc @@ -323,7 +323,7 @@ absl::Status DetectionsToRectsCalculator::ComputeRotation( DetectionSpec DetectionsToRectsCalculator::GetDetectionSpec( const CalculatorContext* cc) { absl::optional> image_size; - if (cc->Inputs().HasTag(kImageSizeTag)) { + if (HasTagValue(cc->Inputs(), kImageSizeTag)) { image_size = cc->Inputs().Tag(kImageSizeTag).Get>(); } diff --git a/mediapipe/calculators/util/detections_to_rects_calculator_test.cc b/mediapipe/calculators/util/detections_to_rects_calculator_test.cc index f46640ab2..3eae1af9d 100644 --- a/mediapipe/calculators/util/detections_to_rects_calculator_test.cc +++ b/mediapipe/calculators/util/detections_to_rects_calculator_test.cc @@ -157,6 +157,12 @@ TEST(DetectionsToRectsCalculatorTest, DetectionKeyPointsToRect) { /*image_size=*/{640, 480}); MP_ASSERT_OK(status_or_value); EXPECT_THAT(status_or_value.value(), RectEq(480, 360, 320, 240)); + + status_or_value = RunDetectionKeyPointsToRectCalculation( + /*detection=*/DetectionWithKeyPoints({{0.25f, 0.25f}, {0.75f, 0.75f}}), + /*image_size=*/{0, 0}); + MP_ASSERT_OK(status_or_value); + EXPECT_THAT(status_or_value.value(), RectEq(0, 0, 0, 0)); } TEST(DetectionsToRectsCalculatorTest, DetectionToNormalizedRect) { diff --git a/mediapipe/calculators/util/landmarks_smoothing_calculator.cc b/mediapipe/calculators/util/landmarks_smoothing_calculator.cc index fb2310610..6673816e7 100644 --- a/mediapipe/calculators/util/landmarks_smoothing_calculator.cc +++ b/mediapipe/calculators/util/landmarks_smoothing_calculator.cc @@ -18,6 +18,7 @@ #include "mediapipe/calculators/util/landmarks_smoothing_calculator.pb.h" #include "mediapipe/framework/calculator_framework.h" #include "mediapipe/framework/formats/landmark.pb.h" +#include "mediapipe/framework/formats/rect.pb.h" #include "mediapipe/framework/port/ret_check.h" #include "mediapipe/framework/timestamp.h" #include "mediapipe/util/filtering/one_euro_filter.h" @@ -30,6 +31,7 @@ namespace { constexpr char kNormalizedLandmarksTag[] = "NORM_LANDMARKS"; constexpr char kLandmarksTag[] = "LANDMARKS"; constexpr char kImageSizeTag[] = "IMAGE_SIZE"; +constexpr char kObjectScaleRoiTag[] = "OBJECT_SCALE_ROI"; constexpr char kNormalizedFilteredLandmarksTag[] = "NORM_FILTERED_LANDMARKS"; constexpr char kFilteredLandmarksTag[] = "FILTERED_LANDMARKS"; @@ -94,6 +96,18 @@ float GetObjectScale(const LandmarkList& landmarks) { return (object_width + object_height) / 2.0f; } +float GetObjectScale(const NormalizedRect& roi, const int image_width, + const int image_height) { + const float object_width = roi.width() * image_width; + const float object_height = roi.height() * image_height; + + return (object_width + object_height) / 2.0f; +} + +float GetObjectScale(const Rect& roi) { + return (roi.width() + roi.height()) / 2.0f; +} + // Abstract class for various landmarks filters. class LandmarksFilter { public: @@ -103,6 +117,7 @@ class LandmarksFilter { virtual absl::Status Apply(const LandmarkList& in_landmarks, const absl::Duration& timestamp, + const absl::optional object_scale_opt, LandmarkList* out_landmarks) = 0; }; @@ -111,6 +126,7 @@ class NoFilter : public LandmarksFilter { public: absl::Status Apply(const LandmarkList& in_landmarks, const absl::Duration& timestamp, + const absl::optional object_scale_opt, LandmarkList* out_landmarks) override { *out_landmarks = in_landmarks; return absl::OkStatus(); @@ -136,13 +152,15 @@ class VelocityFilter : public LandmarksFilter { absl::Status Apply(const LandmarkList& in_landmarks, const absl::Duration& timestamp, + const absl::optional object_scale_opt, LandmarkList* out_landmarks) override { // Get value scale as inverse value of the object scale. // If value is too small smoothing will be disabled and landmarks will be // returned as is. float value_scale = 1.0f; if (!disable_value_scaling_) { - const float object_scale = GetObjectScale(in_landmarks); + const float object_scale = + object_scale_opt ? *object_scale_opt : GetObjectScale(in_landmarks); if (object_scale < min_allowed_object_scale_) { *out_landmarks = in_landmarks; return absl::OkStatus(); @@ -205,12 +223,14 @@ class VelocityFilter : public LandmarksFilter { class OneEuroFilterImpl : public LandmarksFilter { public: OneEuroFilterImpl(double frequency, double min_cutoff, double beta, - double derivate_cutoff, float min_allowed_object_scale) + double derivate_cutoff, float min_allowed_object_scale, + bool disable_value_scaling) : frequency_(frequency), min_cutoff_(min_cutoff), beta_(beta), derivate_cutoff_(derivate_cutoff), - min_allowed_object_scale_(min_allowed_object_scale) {} + min_allowed_object_scale_(min_allowed_object_scale), + disable_value_scaling_(disable_value_scaling) {} absl::Status Reset() override { x_filters_.clear(); @@ -221,16 +241,24 @@ class OneEuroFilterImpl : public LandmarksFilter { absl::Status Apply(const LandmarkList& in_landmarks, const absl::Duration& timestamp, + const absl::optional object_scale_opt, LandmarkList* out_landmarks) override { // Initialize filters once. MP_RETURN_IF_ERROR(InitializeFiltersIfEmpty(in_landmarks.landmark_size())); - const float object_scale = GetObjectScale(in_landmarks); - if (object_scale < min_allowed_object_scale_) { - *out_landmarks = in_landmarks; - return absl::OkStatus(); + // Get value scale as inverse value of the object scale. + // If value is too small smoothing will be disabled and landmarks will be + // returned as is. + float value_scale = 1.0f; + if (!disable_value_scaling_) { + const float object_scale = + object_scale_opt ? *object_scale_opt : GetObjectScale(in_landmarks); + if (object_scale < min_allowed_object_scale_) { + *out_landmarks = in_landmarks; + return absl::OkStatus(); + } + value_scale = 1.0f / object_scale; } - const float value_scale = 1.0f / object_scale; // Filter landmarks. Every axis of every landmark is filtered separately. for (int i = 0; i < in_landmarks.landmark_size(); ++i) { @@ -277,6 +305,7 @@ class OneEuroFilterImpl : public LandmarksFilter { double beta_; double derivate_cutoff_; double min_allowed_object_scale_; + bool disable_value_scaling_; std::vector x_filters_; std::vector y_filters_; @@ -292,6 +321,10 @@ class OneEuroFilterImpl : public LandmarksFilter { // IMAGE_SIZE: A std::pair represention of image width and height. // Required to perform all computations in absolute coordinates to avoid any // influence of normalized values. +// OBJECT_SCALE_ROI (optional): A NormRect or Rect (depending on the format of +// input landmarks) used to determine the object scale for some of the +// filters. If not provided - object scale will be calculated from +// landmarks. // // Outputs: // NORM_FILTERED_LANDMARKS: A NormalizedLandmarkList of smoothed landmarks. @@ -301,6 +334,7 @@ class OneEuroFilterImpl : public LandmarksFilter { // calculator: "LandmarksSmoothingCalculator" // input_stream: "NORM_LANDMARKS:pose_landmarks" // input_stream: "IMAGE_SIZE:image_size" +// input_stream: "OBJECT_SCALE_ROI:roi" // output_stream: "NORM_FILTERED_LANDMARKS:pose_landmarks_filtered" // options: { // [mediapipe.LandmarksSmoothingCalculatorOptions.ext] { @@ -330,9 +364,17 @@ absl::Status LandmarksSmoothingCalculator::GetContract(CalculatorContract* cc) { cc->Outputs() .Tag(kNormalizedFilteredLandmarksTag) .Set(); + + if (cc->Inputs().HasTag(kObjectScaleRoiTag)) { + cc->Inputs().Tag(kObjectScaleRoiTag).Set(); + } } else { cc->Inputs().Tag(kLandmarksTag).Set(); cc->Outputs().Tag(kFilteredLandmarksTag).Set(); + + if (cc->Inputs().HasTag(kObjectScaleRoiTag)) { + cc->Inputs().Tag(kObjectScaleRoiTag).Set(); + } } return absl::OkStatus(); @@ -357,7 +399,8 @@ absl::Status LandmarksSmoothingCalculator::Open(CalculatorContext* cc) { options.one_euro_filter().min_cutoff(), options.one_euro_filter().beta(), options.one_euro_filter().derivate_cutoff(), - options.one_euro_filter().min_allowed_object_scale()); + options.one_euro_filter().min_allowed_object_scale(), + options.one_euro_filter().disable_value_scaling()); } else { RET_CHECK_FAIL() << "Landmarks filter is either not specified or not supported"; @@ -389,13 +432,20 @@ absl::Status LandmarksSmoothingCalculator::Process(CalculatorContext* cc) { std::tie(image_width, image_height) = cc->Inputs().Tag(kImageSizeTag).Get>(); + absl::optional object_scale; + if (cc->Inputs().HasTag(kObjectScaleRoiTag) && + !cc->Inputs().Tag(kObjectScaleRoiTag).IsEmpty()) { + auto& roi = cc->Inputs().Tag(kObjectScaleRoiTag).Get(); + object_scale = GetObjectScale(roi, image_width, image_height); + } + auto in_landmarks = absl::make_unique(); NormalizedLandmarksToLandmarks(in_norm_landmarks, image_width, image_height, in_landmarks.get()); auto out_landmarks = absl::make_unique(); - MP_RETURN_IF_ERROR(landmarks_filter_->Apply(*in_landmarks, timestamp, - out_landmarks.get())); + MP_RETURN_IF_ERROR(landmarks_filter_->Apply( + *in_landmarks, timestamp, object_scale, out_landmarks.get())); auto out_norm_landmarks = absl::make_unique(); LandmarksToNormalizedLandmarks(*out_landmarks, image_width, image_height, @@ -408,9 +458,16 @@ absl::Status LandmarksSmoothingCalculator::Process(CalculatorContext* cc) { const auto& in_landmarks = cc->Inputs().Tag(kLandmarksTag).Get(); + absl::optional object_scale; + if (cc->Inputs().HasTag(kObjectScaleRoiTag) && + !cc->Inputs().Tag(kObjectScaleRoiTag).IsEmpty()) { + auto& roi = cc->Inputs().Tag(kObjectScaleRoiTag).Get(); + object_scale = GetObjectScale(roi); + } + auto out_landmarks = absl::make_unique(); - MP_RETURN_IF_ERROR( - landmarks_filter_->Apply(in_landmarks, timestamp, out_landmarks.get())); + MP_RETURN_IF_ERROR(landmarks_filter_->Apply( + in_landmarks, timestamp, object_scale, out_landmarks.get())); cc->Outputs() .Tag(kFilteredLandmarksTag) diff --git a/mediapipe/calculators/util/landmarks_smoothing_calculator.proto b/mediapipe/calculators/util/landmarks_smoothing_calculator.proto index 7699287c9..017facb30 100644 --- a/mediapipe/calculators/util/landmarks_smoothing_calculator.proto +++ b/mediapipe/calculators/util/landmarks_smoothing_calculator.proto @@ -41,9 +41,9 @@ message LandmarksSmoothingCalculatorOptions { optional float min_allowed_object_scale = 3 [default = 1e-6]; // Disable value scaling based on object size and use `1.0` instead. - // Value scale is calculated as inverse value of object size. Object size is - // calculated as maximum side of rectangular bounding box of the object in - // XY plane. + // If not disabled, value scale is calculated as inverse value of object + // size. Object size is calculated as maximum side of rectangular bounding + // box of the object in XY plane. optional bool disable_value_scaling = 4 [default = false]; } @@ -72,6 +72,12 @@ message LandmarksSmoothingCalculatorOptions { // If calculated object scale is less than given value smoothing will be // disabled and landmarks will be returned as is. optional float min_allowed_object_scale = 5 [default = 1e-6]; + + // Disable value scaling based on object size and use `1.0` instead. + // If not disabled, value scale is calculated as inverse value of object + // size. Object size is calculated as maximum side of rectangular bounding + // box of the object in XY plane. + optional bool disable_value_scaling = 6 [default = false]; } oneof filter_options { diff --git a/mediapipe/calculators/util/world_landmark_projection_calculator.cc b/mediapipe/calculators/util/world_landmark_projection_calculator.cc index 28cf9498d..bcd7352a2 100644 --- a/mediapipe/calculators/util/world_landmark_projection_calculator.cc +++ b/mediapipe/calculators/util/world_landmark_projection_calculator.cc @@ -40,7 +40,7 @@ constexpr char kRectTag[] = "NORM_RECT"; // Input: // LANDMARKS: A LandmarkList representing world landmarks in the rectangle. // NORM_RECT: An NormalizedRect representing a normalized rectangle in image -// coordinates. +// coordinates. (Optional) // // Output: // LANDMARKS: A LandmarkList representing world landmarks projected (rotated @@ -59,7 +59,9 @@ class WorldLandmarkProjectionCalculator : public CalculatorBase { public: static absl::Status GetContract(CalculatorContract* cc) { cc->Inputs().Tag(kLandmarksTag).Set(); - cc->Inputs().Tag(kRectTag).Set(); + if (cc->Inputs().HasTag(kRectTag)) { + cc->Inputs().Tag(kRectTag).Set(); + } cc->Outputs().Tag(kLandmarksTag).Set(); return absl::OkStatus(); @@ -74,13 +76,24 @@ class WorldLandmarkProjectionCalculator : public CalculatorBase { absl::Status Process(CalculatorContext* cc) override { // Check that landmarks and rect are not empty. if (cc->Inputs().Tag(kLandmarksTag).IsEmpty() || - cc->Inputs().Tag(kRectTag).IsEmpty()) { + (cc->Inputs().HasTag(kRectTag) && + cc->Inputs().Tag(kRectTag).IsEmpty())) { return absl::OkStatus(); } const auto& in_landmarks = cc->Inputs().Tag(kLandmarksTag).Get(); - const auto& in_rect = cc->Inputs().Tag(kRectTag).Get(); + std::function rotate_fn; + if (cc->Inputs().HasTag(kRectTag)) { + const auto& in_rect = cc->Inputs().Tag(kRectTag).Get(); + const float cosa = std::cos(in_rect.rotation()); + const float sina = std::sin(in_rect.rotation()); + rotate_fn = [cosa, sina](const Landmark& in_landmark, + Landmark* out_landmark) { + out_landmark->set_x(cosa * in_landmark.x() - sina * in_landmark.y()); + out_landmark->set_y(sina * in_landmark.x() + cosa * in_landmark.y()); + }; + } auto out_landmarks = absl::make_unique(); for (int i = 0; i < in_landmarks.landmark_size(); ++i) { @@ -89,11 +102,9 @@ class WorldLandmarkProjectionCalculator : public CalculatorBase { Landmark* out_landmark = out_landmarks->add_landmark(); *out_landmark = in_landmark; - const float angle = in_rect.rotation(); - out_landmark->set_x(std::cos(angle) * in_landmark.x() - - std::sin(angle) * in_landmark.y()); - out_landmark->set_y(std::sin(angle) * in_landmark.x() + - std::cos(angle) * in_landmark.y()); + if (rotate_fn) { + rotate_fn(in_landmark, out_landmark); + } } cc->Outputs() diff --git a/mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu/BUILD b/mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu/BUILD new file mode 100644 index 000000000..6bfcf34c1 --- /dev/null +++ b/mediapipe/examples/android/src/java/com/google/mediapipe/apps/selfiesegmentationgpu/BUILD @@ -0,0 +1,60 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +licenses(["notice"]) + +package(default_visibility = ["//visibility:private"]) + +cc_binary( + name = "libmediapipe_jni.so", + linkshared = 1, + linkstatic = 1, + deps = [ + "//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu_deps", + "//mediapipe/java/com/google/mediapipe/framework/jni:mediapipe_framework_jni", + ], +) + +cc_library( + name = "mediapipe_jni_lib", + srcs = [":libmediapipe_jni.so"], + alwayslink = 1, +) + +android_binary( + name = "selfiesegmentationgpu", + srcs = glob(["*.java"]), + assets = [ + "//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu.binarypb", + "//mediapipe/modules/selfie_segmentation:selfie_segmentation.tflite", + ], + assets_dir = "", + manifest = "//mediapipe/examples/android/src/java/com/google/mediapipe/apps/basic:AndroidManifest.xml", + manifest_values = { + "applicationId": "com.google.mediapipe.apps.selfiesegmentationgpu", + "appName": "Selfie Segmentation", + "mainActivity": "com.google.mediapipe.apps.basic.MainActivity", + "cameraFacingFront": "True", + "binaryGraphName": "selfie_segmentation_gpu.binarypb", + "inputVideoStreamName": "input_video", + "outputVideoStreamName": "output_video", + "flipFramesVertically": "True", + "converterNumBuffers": "2", + }, + multidex = "native", + deps = [ + ":mediapipe_jni_lib", + "//mediapipe/examples/android/src/java/com/google/mediapipe/apps/basic:basic_lib", + ], +) diff --git a/mediapipe/examples/desktop/autoflip/calculators/face_box_adjuster_calculator.proto b/mediapipe/examples/desktop/autoflip/calculators/face_box_adjuster_calculator.proto index dd43de9da..92195f2a0 100644 --- a/mediapipe/examples/desktop/autoflip/calculators/face_box_adjuster_calculator.proto +++ b/mediapipe/examples/desktop/autoflip/calculators/face_box_adjuster_calculator.proto @@ -49,7 +49,23 @@ message FaceBoxAdjusterCalculatorOptions { optional float ipd_face_box_height_ratio = 7 [default = 0.3131]; // The max look up angle before considering the eye distance unstable. - optional float max_head_tilt_angle_deg = 8 [default = 12.0]; + optional float max_head_tilt_angle_deg = 8 [default = 5.0]; + // The min look up angle (i.e. looking down) before considering the eye + // distance unstable. + optional float min_head_tilt_angle_deg = 10 [default = -18.0]; + // The max look right angle before considering the eye distance unstable. + optional float max_head_pan_angle_deg = 11 [default = 25.0]; + // The min look right angle (i.e. looking left) before considering the eye + // distance unstable. + optional float min_head_pan_angle_deg = 12 [default = -25.0]; + + // Update rate for motion history, valid values [0.0, 1.0]. + optional float motion_history_alpha = 13 [default = 0.5]; + + // Max value of head motion (max of current or history) to be considered still + // stable. + optional float head_motion_threshold = 14 [default = 10.0]; + // The max amount of time to use an old eye distance when the face look angle // is unstable. optional int32 max_facesize_history_us = 9 [default = 8000000]; diff --git a/mediapipe/examples/desktop/autoflip/subgraph/face_detection_subgraph.pbtxt b/mediapipe/examples/desktop/autoflip/subgraph/face_detection_subgraph.pbtxt index 2a40f1d06..4024f355a 100644 --- a/mediapipe/examples/desktop/autoflip/subgraph/face_detection_subgraph.pbtxt +++ b/mediapipe/examples/desktop/autoflip/subgraph/face_detection_subgraph.pbtxt @@ -14,8 +14,8 @@ node: { output_stream: "LETTERBOX_PADDING:letterbox_padding" options: { [mediapipe.ImageTransformationCalculatorOptions.ext] { - output_width: 256 - output_height: 256 + output_width: 192 + output_height: 192 scale_mode: FIT } } @@ -50,19 +50,17 @@ node { output_side_packet: "anchors" options: { [mediapipe.SsdAnchorsCalculatorOptions.ext] { - num_layers: 4 - min_scale: 0.15625 + num_layers: 1 + min_scale: 0.1484375 max_scale: 0.75 - input_size_height: 256 - input_size_width: 256 + input_size_height: 192 + input_size_width: 192 anchor_offset_x: 0.5 anchor_offset_y: 0.5 - strides: 16 - strides: 32 - strides: 32 - strides: 32 + strides: 4 aspect_ratios: 1.0 fixed_anchor_size: true + interpolated_scale_aspect_ratio: 0.0 } } } @@ -78,7 +76,7 @@ node { options: { [mediapipe.TfLiteTensorsToDetectionsCalculatorOptions.ext] { num_classes: 1 - num_boxes: 896 + num_boxes: 2304 num_coords: 16 box_coord_offset: 0 keypoint_coord_offset: 4 @@ -87,11 +85,11 @@ node { sigmoid_score: true score_clipping_thresh: 100.0 reverse_output_order: true - x_scale: 256.0 - y_scale: 256.0 - h_scale: 256.0 - w_scale: 256.0 - min_score_thresh: 0.65 + x_scale: 192.0 + y_scale: 192.0 + h_scale: 192.0 + w_scale: 192.0 + min_score_thresh: 0.6 } } } diff --git a/mediapipe/examples/desktop/selfie_segmentation/BUILD b/mediapipe/examples/desktop/selfie_segmentation/BUILD new file mode 100644 index 000000000..ae93aa94c --- /dev/null +++ b/mediapipe/examples/desktop/selfie_segmentation/BUILD @@ -0,0 +1,34 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +licenses(["notice"]) + +package(default_visibility = ["//mediapipe/examples:__subpackages__"]) + +cc_binary( + name = "selfie_segmentation_cpu", + deps = [ + "//mediapipe/examples/desktop:demo_run_graph_main", + "//mediapipe/graphs/selfie_segmentation:selfie_segmentation_cpu_deps", + ], +) + +# Linux only +cc_binary( + name = "selfie_segmentation_gpu", + deps = [ + "//mediapipe/examples/desktop:demo_run_graph_main_gpu", + "//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu_deps", + ], +) diff --git a/mediapipe/examples/ios/selfiesegmentationgpu/BUILD b/mediapipe/examples/ios/selfiesegmentationgpu/BUILD new file mode 100644 index 000000000..884ac95a5 --- /dev/null +++ b/mediapipe/examples/ios/selfiesegmentationgpu/BUILD @@ -0,0 +1,69 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +load( + "@build_bazel_rules_apple//apple:ios.bzl", + "ios_application", +) +load( + "//mediapipe/examples/ios:bundle_id.bzl", + "BUNDLE_ID_PREFIX", + "example_provisioning", +) + +licenses(["notice"]) + +MIN_IOS_VERSION = "10.0" + +alias( + name = "selfiesegmentationgpu", + actual = "SelfieSegmentationGpuApp", +) + +ios_application( + name = "SelfieSegmentationGpuApp", + app_icons = ["//mediapipe/examples/ios/common:AppIcon"], + bundle_id = BUNDLE_ID_PREFIX + ".SelfieSegmentationGpu", + families = [ + "iphone", + "ipad", + ], + infoplists = [ + "//mediapipe/examples/ios/common:Info.plist", + "Info.plist", + ], + minimum_os_version = MIN_IOS_VERSION, + provisioning_profile = example_provisioning(), + deps = [ + ":SelfieSegmentationGpuAppLibrary", + "@ios_opencv//:OpencvFramework", + ], +) + +objc_library( + name = "SelfieSegmentationGpuAppLibrary", + data = [ + "//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu.binarypb", + "//mediapipe/modules/selfie_segmentation:selfie_segmentation.tflite", + ], + deps = [ + "//mediapipe/examples/ios/common:CommonMediaPipeAppLibrary", + ] + select({ + "//mediapipe:ios_i386": [], + "//mediapipe:ios_x86_64": [], + "//conditions:default": [ + "//mediapipe/graphs/selfie_segmentation:selfie_segmentation_gpu_deps", + ], + }), +) diff --git a/mediapipe/examples/ios/selfiesegmentationgpu/Info.plist b/mediapipe/examples/ios/selfiesegmentationgpu/Info.plist new file mode 100644 index 000000000..e4349f567 --- /dev/null +++ b/mediapipe/examples/ios/selfiesegmentationgpu/Info.plist @@ -0,0 +1,14 @@ + + + + + CameraPosition + front + GraphOutputStream + output_video + GraphInputStream + input_video + GraphName + selfie_segmentation_gpu + + diff --git a/mediapipe/framework/BUILD b/mediapipe/framework/BUILD index f74e09fc6..109625bbb 100644 --- a/mediapipe/framework/BUILD +++ b/mediapipe/framework/BUILD @@ -225,7 +225,7 @@ cc_library( "//mediapipe/framework:stream_handler_cc_proto", "//mediapipe/framework/port:any_proto", "//mediapipe/framework/port:status", - "//mediapipe/framework/tool:options_util", + "//mediapipe/framework/tool:options_map", "//mediapipe/framework/tool:tag_map", "@com_google_absl//absl/memory", ], @@ -473,7 +473,7 @@ cc_library( "//mediapipe/framework:calculator_cc_proto", "//mediapipe/framework/port:any_proto", "//mediapipe/framework/port:logging", - "//mediapipe/framework/tool:options_util", + "//mediapipe/framework/tool:options_map", "@com_google_absl//absl/base:core_headers", "@com_google_absl//absl/strings", ], diff --git a/mediapipe/framework/api2/README.md b/mediapipe/framework/api2/README.md index 32a699bd2..eb53dd67e 100644 --- a/mediapipe/framework/api2/README.md +++ b/mediapipe/framework/api2/README.md @@ -1,4 +1,4 @@ -# Experimental new APIs +# New MediaPipe APIs This directory defines new APIs for MediaPipe: @@ -6,13 +6,12 @@ This directory defines new APIs for MediaPipe: - Builder API, for assembling CalculatorGraphConfigs with C++, as an alternative to using the proto API directly. -The code is working, and the new APIs interoperate fully with the existing -framework code. They are considered a work in progress, but are being released -now so we can begin adopting them in our calculators. +The new APIs interoperate fully with the existing framework code, and we are +adopting them in our calculators. We are still making improvements, and the +placement of this code under the `mediapipe::api2` namespace is not final. -Developers are welcome to try out these APIs as early adopters, but should -expect breaking changes. The placement of this code under the `mediapipe::api2` -namespace is not final. +Developers are welcome to try out these APIs as early adopters, but there may be +breaking changes. ## Node API diff --git a/mediapipe/framework/calculator_contract.h b/mediapipe/framework/calculator_contract.h index 9ff189ffb..a476a6739 100644 --- a/mediapipe/framework/calculator_contract.h +++ b/mediapipe/framework/calculator_contract.h @@ -29,7 +29,7 @@ #include "mediapipe/framework/port.h" #include "mediapipe/framework/port/any_proto.h" #include "mediapipe/framework/status_handler.pb.h" -#include "mediapipe/framework/tool/options_util.h" +#include "mediapipe/framework/tool/options_map.h" namespace mediapipe { diff --git a/mediapipe/framework/calculator_state.h b/mediapipe/framework/calculator_state.h index 8a50f5d8e..f2af95725 100644 --- a/mediapipe/framework/calculator_state.h +++ b/mediapipe/framework/calculator_state.h @@ -32,7 +32,7 @@ #include "mediapipe/framework/packet_set.h" #include "mediapipe/framework/port.h" #include "mediapipe/framework/port/any_proto.h" -#include "mediapipe/framework/tool/options_util.h" +#include "mediapipe/framework/tool/options_map.h" namespace mediapipe { diff --git a/mediapipe/framework/tool/BUILD b/mediapipe/framework/tool/BUILD index 890889a18..b5fca7e9f 100644 --- a/mediapipe/framework/tool/BUILD +++ b/mediapipe/framework/tool/BUILD @@ -154,12 +154,25 @@ cc_test( ], ) +cc_library( + name = "options_map", + hdrs = ["options_map.h"], + visibility = ["//mediapipe/framework:mediapipe_internal"], + deps = [ + "//mediapipe/framework:calculator_cc_proto", + "//mediapipe/framework/port:any_proto", + "//mediapipe/framework/port:status", + "//mediapipe/framework/tool:type_util", + ], +) + cc_library( name = "options_util", srcs = ["options_util.cc"], hdrs = ["options_util.h"], visibility = ["//mediapipe/framework:mediapipe_internal"], deps = [ + ":options_map", ":proto_util_lite", "//mediapipe/framework:calculator_cc_proto", "//mediapipe/framework:collection", @@ -199,17 +212,6 @@ mediapipe_cc_test( ], ) -cc_library( - name = "packet_util", - hdrs = ["packet_util.h"], - visibility = ["//visibility:public"], - deps = [ - "//mediapipe/framework:packet", - "//mediapipe/framework/port:statusor", - "@org_tensorflow//tensorflow/core:protos_all_cc", - ], -) - cc_library( name = "proto_util_lite", srcs = ["proto_util_lite.cc"], @@ -681,6 +683,7 @@ cc_library( "//mediapipe/framework/port:logging", "//mediapipe/framework/port:ret_check", "//mediapipe/framework/port:status", + "//mediapipe/framework/stream_handler:immediate_input_stream_handler", "//mediapipe/framework/tool:switch_container_cc_proto", "@com_google_absl//absl/strings", ], @@ -706,6 +709,7 @@ cc_library( "//mediapipe/framework/port:logging", "//mediapipe/framework/port:ret_check", "//mediapipe/framework/port:status", + "//mediapipe/framework/stream_handler:immediate_input_stream_handler", "//mediapipe/framework/tool:switch_container_cc_proto", "@com_google_absl//absl/strings", ], diff --git a/mediapipe/framework/tool/options_map.h b/mediapipe/framework/tool/options_map.h new file mode 100644 index 000000000..242ffe161 --- /dev/null +++ b/mediapipe/framework/tool/options_map.h @@ -0,0 +1,107 @@ +#ifndef MEDIAPIPE_FRAMEWORK_TOOL_OPTIONS_MAP_H_ +#define MEDIAPIPE_FRAMEWORK_TOOL_OPTIONS_MAP_H_ + +#include +#include +#include + +#include "mediapipe/framework/calculator.pb.h" +#include "mediapipe/framework/port/any_proto.h" +#include "mediapipe/framework/port/status.h" +#include "mediapipe/framework/tool/type_util.h" + +namespace mediapipe { + +namespace tool { + +// A compile-time detector for the constant |T::ext|. +template +struct IsExtension { + private: + template + static char test(decltype(&U::ext)); + + template + static int test(...); + + public: + static constexpr bool value = (sizeof(test(0)) == sizeof(char)); +}; + +template ::value, int>::type = 0> +void GetExtension(const CalculatorOptions& options, T* result) { + if (options.HasExtension(T::ext)) { + *result = options.GetExtension(T::ext); + } +} + +template ::value, int>::type = 0> +void GetExtension(const CalculatorOptions& options, T* result) {} + +template +void GetNodeOptions(const CalculatorGraphConfig::Node& node_config, T* result) { +#if defined(MEDIAPIPE_PROTO_LITE) && defined(MEDIAPIPE_PROTO_THIRD_PARTY) + // protobuf::Any is unavailable with third_party/protobuf:protobuf-lite. +#else + for (const mediapipe::protobuf::Any& options : node_config.node_options()) { + if (options.Is()) { + options.UnpackTo(result); + } + } +#endif +} + +// A map from object type to object. +class TypeMap { + public: + template + bool Has() const { + return content_.count(TypeId()) > 0; + } + template + T* Get() const { + if (!Has()) { + content_[TypeId()] = std::make_shared(); + } + return static_cast(content_[TypeId()].get()); + } + + private: + mutable std::map> content_; +}; + +// Extracts the options message of a specified type from a +// CalculatorGraphConfig::Node. +class OptionsMap { + public: + OptionsMap& Initialize(const CalculatorGraphConfig::Node& node_config) { + node_config_ = &node_config; + return *this; + } + + // Returns the options data for a CalculatorGraphConfig::Node, from + // either "options" or "node_options" using either GetExtension or UnpackTo. + template + const T& Get() const { + if (options_.Has()) { + return *options_.Get(); + } + T* result = options_.Get(); + if (node_config_->has_options()) { + GetExtension(node_config_->options(), result); + } else { + GetNodeOptions(*node_config_, result); + } + return *result; + } + + const CalculatorGraphConfig::Node* node_config_; + TypeMap options_; +}; + +} // namespace tool +} // namespace mediapipe + +#endif // MEDIAPIPE_FRAMEWORK_TOOL_OPTIONS_MAP_H_ diff --git a/mediapipe/framework/tool/options_util.h b/mediapipe/framework/tool/options_util.h index da943a121..520e92a22 100644 --- a/mediapipe/framework/tool/options_util.h +++ b/mediapipe/framework/tool/options_util.h @@ -20,6 +20,7 @@ #include "mediapipe/framework/packet.h" #include "mediapipe/framework/packet_set.h" #include "mediapipe/framework/port/any_proto.h" +#include "mediapipe/framework/tool/options_map.h" #include "mediapipe/framework/tool/type_util.h" namespace mediapipe { @@ -34,64 +35,6 @@ inline T MergeOptions(const T& base, const T& options) { return result; } -// A compile-time detector for the constant |T::ext|. -template -struct IsExtension { - private: - template - static char test(decltype(&U::ext)); - - template - static int test(...); - - public: - static constexpr bool value = (sizeof(test(0)) == sizeof(char)); -}; - -// A map from object type to object. -class TypeMap { - public: - template - bool Has() const { - return content_.count(TypeId()) > 0; - } - template - T* Get() const { - if (!Has()) { - content_[TypeId()] = std::make_shared(); - } - return static_cast(content_[TypeId()].get()); - } - - private: - mutable std::map> content_; -}; - -template ::value, int>::type = 0> -void GetExtension(const CalculatorOptions& options, T* result) { - if (options.HasExtension(T::ext)) { - *result = options.GetExtension(T::ext); - } -} - -template ::value, int>::type = 0> -void GetExtension(const CalculatorOptions& options, T* result) {} - -template -void GetNodeOptions(const CalculatorGraphConfig::Node& node_config, T* result) { -#if defined(MEDIAPIPE_PROTO_LITE) && defined(MEDIAPIPE_PROTO_THIRD_PARTY) - // protobuf::Any is unavailable with third_party/protobuf:protobuf-lite. -#else - for (const mediapipe::protobuf::Any& options : node_config.node_options()) { - if (options.Is()) { - options.UnpackTo(result); - } - } -#endif -} - // Combine a base options message with an optional side packet. The specified // packet can hold either the specified options type T or CalculatorOptions. // Fields are either replaced or merged depending on field merge_fields. @@ -132,35 +75,6 @@ inline T RetrieveOptions(const T& base, const InputStreamShardSet& stream_set, return base; } -// Extracts the options message of a specified type from a -// CalculatorGraphConfig::Node. -class OptionsMap { - public: - OptionsMap& Initialize(const CalculatorGraphConfig::Node& node_config) { - node_config_ = &node_config; - return *this; - } - - // Returns the options data for a CalculatorGraphConfig::Node, from - // either "options" or "node_options" using either GetExtension or UnpackTo. - template - const T& Get() const { - if (options_.Has()) { - return *options_.Get(); - } - T* result = options_.Get(); - if (node_config_->has_options()) { - GetExtension(node_config_->options(), result); - } else { - GetNodeOptions(*node_config_, result); - } - return *result; - } - - const CalculatorGraphConfig::Node* node_config_; - TypeMap options_; -}; - // Finds the descriptor for a protobuf. const proto_ns::Descriptor* GetProtobufDescriptor(const std::string& type_name); diff --git a/mediapipe/framework/tool/packet_util.h b/mediapipe/framework/tool/packet_util.h deleted file mode 100644 index 69d296f40..000000000 --- a/mediapipe/framework/tool/packet_util.h +++ /dev/null @@ -1,57 +0,0 @@ -// Copyright 2019 The MediaPipe Authors. -// -// Licensed under the Apache License, Version 2.0 (the "License"); -// you may not use this file except in compliance with the License. -// You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, software -// distributed under the License is distributed on an "AS IS" BASIS, -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -// See the License for the specific language governing permissions and -// limitations under the License. - -#ifndef MEDIAPIPE_FRAMEWORK_TOOL_PACKET_UTIL_H_ -#define MEDIAPIPE_FRAMEWORK_TOOL_PACKET_UTIL_H_ - -#include "mediapipe/framework/packet.h" -#include "tensorflow/core/example/example.pb.h" - -namespace mediapipe { -namespace tool { -// The CLIF-friendly util functions to create and access a typed MediaPipe -// Packet from MediaPipe Python interface. - -// Functions for SequenceExample Packets. - -// Make a SequenceExample packet from a serialized SequenceExample. -// The SequenceExample in the Packet is owned by the C++ packet. -Packet CreateSequenceExamplePacketFromString(std::string* serialized_content) { - tensorflow::SequenceExample sequence_example; - sequence_example.ParseFromString(*serialized_content); - return MakePacket(sequence_example); -} - -// Get a serialized SequenceExample std::string from a Packet. -// The ownership of the returned std::string will be transferred to the Python -// object. -std::unique_ptr GetSerializedSequenceExample(Packet* packet) { - return absl::make_unique( - packet->Get().SerializeAsString()); -} - -// Make a String packet -Packet CreateStringPacket(std::string* input_string) { - return MakePacket(*input_string); -} - -// Get the std::string from a Packet -std::unique_ptr GetString(Packet* packet) { - return absl::make_unique(packet->Get()); -} - -} // namespace tool -} // namespace mediapipe - -#endif // MEDIAPIPE_FRAMEWORK_TOOL_PACKET_UTIL_H_ diff --git a/mediapipe/framework/tool/type_util.h b/mediapipe/framework/tool/type_util.h index d3c042dc5..cd3540989 100644 --- a/mediapipe/framework/tool/type_util.h +++ b/mediapipe/framework/tool/type_util.h @@ -16,8 +16,6 @@ #define MEDIAPIPE_FRAMEWORK_TOOL_TYPE_UTIL_H_ #include -#include -#include #include #include "mediapipe/framework/port.h" diff --git a/mediapipe/gpu/metal.bzl b/mediapipe/gpu/metal.bzl index 9d5291d95..d623f4c3e 100644 --- a/mediapipe/gpu/metal.bzl +++ b/mediapipe/gpu/metal.bzl @@ -142,7 +142,7 @@ def _metal_library_impl(ctx): if ctx.files.hdrs: additional_params["header"] = depset([f for f in ctx.files.hdrs]) objc_provider = apple_common.new_objc_provider( - providers = [x.objc for x in ctx.attr.deps if hasattr(x, "objc")], + providers = [x[apple_common.Objc] for x in ctx.attr.deps if apple_common.Objc in x], **additional_params ) @@ -169,7 +169,7 @@ def _metal_library_impl(ctx): METAL_LIBRARY_ATTRS = dicts.add(apple_support.action_required_attrs(), { "srcs": attr.label_list(allow_files = [".metal"], allow_empty = False), "hdrs": attr.label_list(allow_files = [".h"]), - "deps": attr.label_list(providers = [["objc", CcInfo]]), + "deps": attr.label_list(providers = [["objc", CcInfo], [apple_common.Objc, CcInfo]]), "copts": attr.string_list(), "minimum_os_version": attr.string(), }) diff --git a/mediapipe/graphs/face_detection/face_detection_back_desktop_live.pbtxt b/mediapipe/graphs/face_detection/face_detection_back_desktop_live.pbtxt index a70e4c134..23a2db27e 100644 --- a/mediapipe/graphs/face_detection/face_detection_back_desktop_live.pbtxt +++ b/mediapipe/graphs/face_detection/face_detection_back_desktop_live.pbtxt @@ -40,8 +40,8 @@ node: { output_stream: "LETTERBOX_PADDING:letterbox_padding" node_options: { [type.googleapis.com/mediapipe.ImageTransformationCalculatorOptions] { - output_width: 256 - output_height: 256 + output_width: 192 + output_height: 192 scale_mode: FIT } } @@ -76,19 +76,17 @@ node { output_side_packet: "anchors" node_options: { [type.googleapis.com/mediapipe.SsdAnchorsCalculatorOptions] { - num_layers: 4 - min_scale: 0.15625 + num_layers: 1 + min_scale: 0.1484375 max_scale: 0.75 - input_size_height: 256 - input_size_width: 256 + input_size_height: 192 + input_size_width: 192 anchor_offset_x: 0.5 anchor_offset_y: 0.5 - strides: 16 - strides: 32 - strides: 32 - strides: 32 + strides: 4 aspect_ratios: 1.0 fixed_anchor_size: true + interpolated_scale_aspect_ratio: 0.0 } } } @@ -104,7 +102,7 @@ node { node_options: { [type.googleapis.com/mediapipe.TfLiteTensorsToDetectionsCalculatorOptions] { num_classes: 1 - num_boxes: 896 + num_boxes: 2304 num_coords: 16 box_coord_offset: 0 keypoint_coord_offset: 4 @@ -113,11 +111,11 @@ node { sigmoid_score: true score_clipping_thresh: 100.0 reverse_output_order: true - x_scale: 256.0 - y_scale: 256.0 - h_scale: 256.0 - w_scale: 256.0 - min_score_thresh: 0.65 + x_scale: 192.0 + y_scale: 192.0 + h_scale: 192.0 + w_scale: 192.0 + min_score_thresh: 0.6 } } } diff --git a/mediapipe/graphs/face_detection/face_detection_back_mobile_gpu.pbtxt b/mediapipe/graphs/face_detection/face_detection_back_mobile_gpu.pbtxt index 893434190..c69bf50ae 100644 --- a/mediapipe/graphs/face_detection/face_detection_back_mobile_gpu.pbtxt +++ b/mediapipe/graphs/face_detection/face_detection_back_mobile_gpu.pbtxt @@ -41,8 +41,8 @@ node: { output_stream: "LETTERBOX_PADDING:letterbox_padding" node_options: { [type.googleapis.com/mediapipe.ImageTransformationCalculatorOptions] { - output_width: 256 - output_height: 256 + output_width: 192 + output_height: 192 scale_mode: FIT } } @@ -77,19 +77,17 @@ node { output_side_packet: "anchors" node_options: { [type.googleapis.com/mediapipe.SsdAnchorsCalculatorOptions] { - num_layers: 4 - min_scale: 0.15625 + num_layers: 1 + min_scale: 0.1484375 max_scale: 0.75 - input_size_height: 256 - input_size_width: 256 + input_size_height: 192 + input_size_width: 192 anchor_offset_x: 0.5 anchor_offset_y: 0.5 - strides: 16 - strides: 32 - strides: 32 - strides: 32 + strides: 4 aspect_ratios: 1.0 fixed_anchor_size: true + interpolated_scale_aspect_ratio: 0.0 } } } @@ -105,7 +103,7 @@ node { node_options: { [type.googleapis.com/mediapipe.TfLiteTensorsToDetectionsCalculatorOptions] { num_classes: 1 - num_boxes: 896 + num_boxes: 2304 num_coords: 16 box_coord_offset: 0 keypoint_coord_offset: 4 @@ -114,11 +112,11 @@ node { sigmoid_score: true score_clipping_thresh: 100.0 reverse_output_order: true - x_scale: 256.0 - y_scale: 256.0 - h_scale: 256.0 - w_scale: 256.0 - min_score_thresh: 0.65 + x_scale: 192.0 + y_scale: 192.0 + h_scale: 192.0 + w_scale: 192.0 + min_score_thresh: 0.6 } } } diff --git a/mediapipe/graphs/instant_motion_tracking/calculators/matrices_manager_calculator.cc b/mediapipe/graphs/instant_motion_tracking/calculators/matrices_manager_calculator.cc index f8190c506..c003135bd 100644 --- a/mediapipe/graphs/instant_motion_tracking/calculators/matrices_manager_calculator.cc +++ b/mediapipe/graphs/instant_motion_tracking/calculators/matrices_manager_calculator.cc @@ -15,9 +15,9 @@ #include #include +#include "Eigen/Core" #include "Eigen/Dense" -#include "Eigen/src/Core/util/Constants.h" -#include "Eigen/src/Geometry/Quaternion.h" +#include "Eigen/Geometry" #include "absl/memory/memory.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" diff --git a/mediapipe/graphs/object_detection_3d/calculators/annotations_to_model_matrices_calculator.cc b/mediapipe/graphs/object_detection_3d/calculators/annotations_to_model_matrices_calculator.cc index c2166c648..183f6fc85 100644 --- a/mediapipe/graphs/object_detection_3d/calculators/annotations_to_model_matrices_calculator.cc +++ b/mediapipe/graphs/object_detection_3d/calculators/annotations_to_model_matrices_calculator.cc @@ -14,9 +14,9 @@ #include +#include "Eigen/Core" #include "Eigen/Dense" -#include "Eigen/src/Core/util/Constants.h" -#include "Eigen/src/Geometry/Quaternion.h" +#include "Eigen/Geometry" #include "absl/memory/memory.h" #include "absl/strings/str_cat.h" #include "absl/strings/str_join.h" diff --git a/mediapipe/graphs/selfie_segmentation/BUILD b/mediapipe/graphs/selfie_segmentation/BUILD new file mode 100644 index 000000000..ddca178de --- /dev/null +++ b/mediapipe/graphs/selfie_segmentation/BUILD @@ -0,0 +1,54 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +load( + "//mediapipe/framework/tool:mediapipe_graph.bzl", + "mediapipe_binary_graph", +) + +licenses(["notice"]) + +package(default_visibility = ["//visibility:public"]) + +cc_library( + name = "selfie_segmentation_gpu_deps", + deps = [ + "//mediapipe/calculators/core:flow_limiter_calculator", + "//mediapipe/calculators/image:recolor_calculator", + "//mediapipe/modules/selfie_segmentation:selfie_segmentation_gpu", + ], +) + +mediapipe_binary_graph( + name = "selfie_segmentation_gpu_binary_graph", + graph = "selfie_segmentation_gpu.pbtxt", + output_name = "selfie_segmentation_gpu.binarypb", + deps = [":selfie_segmentation_gpu_deps"], +) + +cc_library( + name = "selfie_segmentation_cpu_deps", + deps = [ + "//mediapipe/calculators/core:flow_limiter_calculator", + "//mediapipe/calculators/image:recolor_calculator", + "//mediapipe/modules/selfie_segmentation:selfie_segmentation_cpu", + ], +) + +mediapipe_binary_graph( + name = "selfie_segmentation_cpu_binary_graph", + graph = "selfie_segmentation_cpu.pbtxt", + output_name = "selfie_segmentation_cpu.binarypb", + deps = [":selfie_segmentation_cpu_deps"], +) diff --git a/mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt b/mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt new file mode 100644 index 000000000..db1b479a1 --- /dev/null +++ b/mediapipe/graphs/selfie_segmentation/selfie_segmentation_cpu.pbtxt @@ -0,0 +1,52 @@ +# MediaPipe graph that performs selfie segmentation with TensorFlow Lite on CPU. + +# CPU buffer. (ImageFrame) +input_stream: "input_video" + +# Output image with rendered results. (ImageFrame) +output_stream: "output_video" + +# Throttles the images flowing downstream for flow control. It passes through +# the very first incoming image unaltered, and waits for downstream nodes +# (calculators and subgraphs) in the graph to finish their tasks before it +# passes through another image. All images that come in while waiting are +# dropped, limiting the number of in-flight images in most part of the graph to +# 1. This prevents the downstream nodes from queuing up incoming images and data +# excessively, which leads to increased latency and memory usage, unwanted in +# real-time mobile applications. It also eliminates unnecessarily computation, +# e.g., the output produced by a node may get dropped downstream if the +# subsequent nodes are still busy processing previous inputs. +node { + calculator: "FlowLimiterCalculator" + input_stream: "input_video" + input_stream: "FINISHED:output_video" + input_stream_info: { + tag_index: "FINISHED" + back_edge: true + } + output_stream: "throttled_input_video" +} + +# Subgraph that performs selfie segmentation. +node { + calculator: "SelfieSegmentationCpu" + input_stream: "IMAGE:throttled_input_video" + output_stream: "SEGMENTATION_MASK:segmentation_mask" +} + + +# Colors the selfie segmentation with the color specified in the option. +node { + calculator: "RecolorCalculator" + input_stream: "IMAGE:throttled_input_video" + input_stream: "MASK:segmentation_mask" + output_stream: "IMAGE:output_video" + node_options: { + [type.googleapis.com/mediapipe.RecolorCalculatorOptions] { + color { r: 0 g: 0 b: 255 } + mask_channel: RED + invert_mask: true + adjust_with_luminance: false + } + } +} diff --git a/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt b/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt new file mode 100644 index 000000000..08d4c36a8 --- /dev/null +++ b/mediapipe/graphs/selfie_segmentation/selfie_segmentation_gpu.pbtxt @@ -0,0 +1,52 @@ +# MediaPipe graph that performs selfie segmentation with TensorFlow Lite on GPU. + +# GPU buffer. (GpuBuffer) +input_stream: "input_video" + +# Output image with rendered results. (GpuBuffer) +output_stream: "output_video" + +# Throttles the images flowing downstream for flow control. It passes through +# the very first incoming image unaltered, and waits for downstream nodes +# (calculators and subgraphs) in the graph to finish their tasks before it +# passes through another image. All images that come in while waiting are +# dropped, limiting the number of in-flight images in most part of the graph to +# 1. This prevents the downstream nodes from queuing up incoming images and data +# excessively, which leads to increased latency and memory usage, unwanted in +# real-time mobile applications. It also eliminates unnecessarily computation, +# e.g., the output produced by a node may get dropped downstream if the +# subsequent nodes are still busy processing previous inputs. +node { + calculator: "FlowLimiterCalculator" + input_stream: "input_video" + input_stream: "FINISHED:output_video" + input_stream_info: { + tag_index: "FINISHED" + back_edge: true + } + output_stream: "throttled_input_video" +} + +# Subgraph that performs selfie segmentation. +node { + calculator: "SelfieSegmentationGpu" + input_stream: "IMAGE:throttled_input_video" + output_stream: "SEGMENTATION_MASK:segmentation_mask" +} + + +# Colors the selfie segmentation with the color specified in the option. +node { + calculator: "RecolorCalculator" + input_stream: "IMAGE_GPU:throttled_input_video" + input_stream: "MASK_GPU:segmentation_mask" + output_stream: "IMAGE_GPU:output_video" + node_options: { + [type.googleapis.com/mediapipe.RecolorCalculatorOptions] { + color { r: 0 g: 0 b: 255 } + mask_channel: RED + invert_mask: true + adjust_with_luminance: false + } + } +} diff --git a/mediapipe/java/com/google/mediapipe/components/GlSurfaceViewRenderer.java b/mediapipe/java/com/google/mediapipe/components/GlSurfaceViewRenderer.java new file mode 100644 index 000000000..694ffc5d9 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/components/GlSurfaceViewRenderer.java @@ -0,0 +1,223 @@ +// Copyright 2019-2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.components; + +import android.graphics.SurfaceTexture; +import android.opengl.GLES11Ext; +import android.opengl.GLES20; +import android.opengl.GLSurfaceView; +import android.opengl.Matrix; +import android.util.Log; +import com.google.mediapipe.framework.TextureFrame; +import com.google.mediapipe.glutil.CommonShaders; +import com.google.mediapipe.glutil.ShaderUtil; +import java.nio.FloatBuffer; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.atomic.AtomicReference; +import javax.microedition.khronos.egl.EGLConfig; +import javax.microedition.khronos.opengles.GL10; + +/** + * Renderer for a {@link GLSurfaceView}. It displays a texture. The texture is scaled and cropped as + * necessary to fill the view, while maintaining its aspect ratio. + * + *

It can render both textures bindable to the normal {@link GLES20#GL_TEXTURE_2D} target as well + * as textures bindable to {@link GLES11Ext#GL_TEXTURE_EXTERNAL_OES}, which is used for Android + * surfaces. Call {@link #setTextureTarget(int)} to choose the correct target. + * + *

It can display a {@link SurfaceTexture} (call {@link #setSurfaceTexture(SurfaceTexture)}) or a + * {@link TextureFrame} (call {@link #setNextFrame(TextureFrame)}). + */ +public class GlSurfaceViewRenderer implements GLSurfaceView.Renderer { + private static final String TAG = "DemoRenderer"; + private static final int ATTRIB_POSITION = 1; + private static final int ATTRIB_TEXTURE_COORDINATE = 2; + + private int surfaceWidth; + private int surfaceHeight; + private int frameWidth = 0; + private int frameHeight = 0; + private int program = 0; + private int frameUniform; + private int textureTarget = GLES11Ext.GL_TEXTURE_EXTERNAL_OES; + private int textureTransformUniform; + // Controls the alignment between frame size and surface size, 0.5f default is centered. + private float alignmentHorizontal = 0.5f; + private float alignmentVertical = 0.5f; + private float[] textureTransformMatrix = new float[16]; + private SurfaceTexture surfaceTexture = null; + private final AtomicReference nextFrame = new AtomicReference<>(); + + @Override + public void onSurfaceCreated(GL10 gl, EGLConfig config) { + if (surfaceTexture == null) { + Matrix.setIdentityM(textureTransformMatrix, 0 /* offset */); + } + Map attributeLocations = new HashMap<>(); + attributeLocations.put("position", ATTRIB_POSITION); + attributeLocations.put("texture_coordinate", ATTRIB_TEXTURE_COORDINATE); + Log.d(TAG, "external texture: " + isExternalTexture()); + program = + ShaderUtil.createProgram( + CommonShaders.VERTEX_SHADER, + isExternalTexture() + ? CommonShaders.FRAGMENT_SHADER_EXTERNAL + : CommonShaders.FRAGMENT_SHADER, + attributeLocations); + frameUniform = GLES20.glGetUniformLocation(program, "video_frame"); + textureTransformUniform = GLES20.glGetUniformLocation(program, "texture_transform"); + ShaderUtil.checkGlError("glGetUniformLocation"); + + GLES20.glClearColor(0.0f, 0.0f, 0.0f, 1.0f); + } + + @Override + public void onSurfaceChanged(GL10 gl, int width, int height) { + surfaceWidth = width; + surfaceHeight = height; + GLES20.glViewport(0, 0, width, height); + } + + @Override + public void onDrawFrame(GL10 gl) { + TextureFrame frame = nextFrame.getAndSet(null); + + GLES20.glClear(GLES20.GL_COLOR_BUFFER_BIT); + ShaderUtil.checkGlError("glClear"); + + if (surfaceTexture == null && frame == null) { + return; + } + + GLES20.glActiveTexture(GLES20.GL_TEXTURE0); + ShaderUtil.checkGlError("glActiveTexture"); + if (surfaceTexture != null) { + surfaceTexture.updateTexImage(); + surfaceTexture.getTransformMatrix(textureTransformMatrix); + } else { + GLES20.glBindTexture(textureTarget, frame.getTextureName()); + ShaderUtil.checkGlError("glBindTexture"); + } + GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_LINEAR); + GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_LINEAR); + GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE); + GLES20.glTexParameteri(textureTarget, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE); + ShaderUtil.checkGlError("texture setup"); + + GLES20.glUseProgram(program); + GLES20.glUniform1i(frameUniform, 0); + GLES20.glUniformMatrix4fv(textureTransformUniform, 1, false, textureTransformMatrix, 0); + ShaderUtil.checkGlError("glUniformMatrix4fv"); + GLES20.glEnableVertexAttribArray(ATTRIB_POSITION); + GLES20.glVertexAttribPointer( + ATTRIB_POSITION, 2, GLES20.GL_FLOAT, false, 0, CommonShaders.SQUARE_VERTICES); + + // TODO: compute scale from surfaceTexture size. + float scaleWidth = frameWidth > 0 ? (float) surfaceWidth / (float) frameWidth : 1.0f; + float scaleHeight = frameHeight > 0 ? (float) surfaceHeight / (float) frameHeight : 1.0f; + // Whichever of the two scales is greater corresponds to the dimension where the image + // is proportionally smaller than the view. Dividing both scales by that number results + // in that dimension having scale 1.0, and thus touching the edges of the view, while the + // other is cropped proportionally. + float maxScale = Math.max(scaleWidth, scaleHeight); + scaleWidth /= maxScale; + scaleHeight /= maxScale; + + // Alignment controls where the visible section is placed within the full camera frame, with + // (0, 0) being the bottom left, and (1, 1) being the top right. + float textureLeft = (1.0f - scaleWidth) * alignmentHorizontal; + float textureRight = textureLeft + scaleWidth; + float textureBottom = (1.0f - scaleHeight) * alignmentVertical; + float textureTop = textureBottom + scaleHeight; + + // Unlike on iOS, there is no need to flip the surfaceTexture here. + // But for regular textures, we will need to flip them. + final FloatBuffer passThroughTextureVertices = + ShaderUtil.floatBuffer( + textureLeft, textureBottom, + textureRight, textureBottom, + textureLeft, textureTop, + textureRight, textureTop); + GLES20.glEnableVertexAttribArray(ATTRIB_TEXTURE_COORDINATE); + GLES20.glVertexAttribPointer( + ATTRIB_TEXTURE_COORDINATE, 2, GLES20.GL_FLOAT, false, 0, passThroughTextureVertices); + ShaderUtil.checkGlError("program setup"); + + GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4); + ShaderUtil.checkGlError("glDrawArrays"); + GLES20.glBindTexture(textureTarget, 0); + ShaderUtil.checkGlError("unbind surfaceTexture"); + + // We must flush before releasing the frame. + GLES20.glFlush(); + + if (frame != null) { + frame.release(); + } + } + + public void setTextureTarget(int target) { + if (program != 0) { + throw new IllegalStateException( + "setTextureTarget must be called before the surface is created"); + } + textureTarget = target; + } + + public void setSurfaceTexture(SurfaceTexture texture) { + if (!isExternalTexture()) { + throw new IllegalStateException( + "to use a SurfaceTexture, the texture target must be GL_TEXTURE_EXTERNAL_OES"); + } + TextureFrame oldFrame = nextFrame.getAndSet(null); + if (oldFrame != null) { + oldFrame.release(); + } + surfaceTexture = texture; + } + + // Use this when the texture is not a SurfaceTexture. + public void setNextFrame(TextureFrame frame) { + if (surfaceTexture != null) { + Matrix.setIdentityM(textureTransformMatrix, 0 /* offset */); + } + TextureFrame oldFrame = nextFrame.getAndSet(frame); + if (oldFrame != null + && (frame == null || (oldFrame.getTextureName() != frame.getTextureName()))) { + oldFrame.release(); + } + surfaceTexture = null; + } + + public void setFrameSize(int width, int height) { + frameWidth = width; + frameHeight = height; + } + + /** + * When the aspect ratios between the camera frame and the surface size are mismatched, this + * controls how the image is aligned. 0.0 means aligning the left/bottom edges; 1.0 means aligning + * the right/top edges; 0.5 (default) means aligning the centers. + */ + public void setAlignment(float horizontal, float vertical) { + alignmentHorizontal = horizontal; + alignmentVertical = vertical; + } + + private boolean isExternalTexture() { + return textureTarget == GLES11Ext.GL_TEXTURE_EXTERNAL_OES; + } +} diff --git a/mediapipe/java/com/google/mediapipe/framework/AndroidPacketCreator.java b/mediapipe/java/com/google/mediapipe/framework/AndroidPacketCreator.java index 69c0ebeb6..5ddeb98c9 100644 --- a/mediapipe/java/com/google/mediapipe/framework/AndroidPacketCreator.java +++ b/mediapipe/java/com/google/mediapipe/framework/AndroidPacketCreator.java @@ -16,6 +16,7 @@ package com.google.mediapipe.framework; import android.graphics.Bitmap; import java.nio.ByteBuffer; +import java.util.List; // TODO: use Preconditions in this file. /** diff --git a/mediapipe/java/com/google/mediapipe/framework/jni/packet_getter_jni.cc b/mediapipe/java/com/google/mediapipe/framework/jni/packet_getter_jni.cc index 30ec19a25..1a7fd18b0 100644 --- a/mediapipe/java/com/google/mediapipe/framework/jni/packet_getter_jni.cc +++ b/mediapipe/java/com/google/mediapipe/framework/jni/packet_getter_jni.cc @@ -444,8 +444,16 @@ JNIEXPORT jlong JNICALL PACKET_GETTER_METHOD(nativeGetGpuBuffer)(JNIEnv* env, mediapipe::android::Graph::GetPacketFromHandle(packet); mediapipe::GlTextureBufferSharedPtr ptr; if (mediapipe_packet.ValidateAsType().ok()) { - const mediapipe::Image& buffer = mediapipe_packet.Get(); - ptr = buffer.GetGlTextureBufferSharedPtr(); + auto mediapipe_graph = + mediapipe::android::Graph::GetContextFromHandle(packet); + auto gl_context = mediapipe_graph->GetGpuResources()->gl_context(); + auto status = + gl_context->Run([gl_context, mediapipe_packet, &ptr]() -> absl::Status { + const mediapipe::Image& buffer = + mediapipe_packet.Get(); + ptr = buffer.GetGlTextureBufferSharedPtr(); + return absl::OkStatus(); + }); } else { const mediapipe::GpuBuffer& buffer = mediapipe_packet.Get(); diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/BUILD b/mediapipe/java/com/google/mediapipe/solutionbase/BUILD new file mode 100644 index 000000000..a3acad5d0 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/BUILD @@ -0,0 +1,67 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +package(default_visibility = ["//visibility:public"]) + +licenses(["notice"]) + +android_library( + name = "solution_base", + srcs = glob( + ["*.java"], + exclude = [ + "CameraInput.java", + ], + ), + visibility = ["//visibility:public"], + deps = [ + "//mediapipe/java/com/google/mediapipe/framework:android_framework", + "//mediapipe/java/com/google/mediapipe/glutil", + "//third_party:autovalue", + "@maven//:com_google_code_findbugs_jsr305", + "@maven//:com_google_guava_guava", + ], +) + +android_library( + name = "camera_input", + srcs = ["CameraInput.java"], + visibility = ["//visibility:public"], + deps = [ + "//mediapipe/java/com/google/mediapipe/components:android_camerax_helper", + "//mediapipe/java/com/google/mediapipe/components:android_components", + "//mediapipe/java/com/google/mediapipe/framework:android_framework", + "@maven//:com_google_guava_guava", + ], +) + +# Native dependencies of all MediaPipe solutions. +cc_binary( + name = "libmediapipe_jni.so", + linkshared = 1, + linkstatic = 1, + # TODO: Add more calculators to support other top-level solutions. + deps = [ + "//mediapipe/java/com/google/mediapipe/framework/jni:mediapipe_framework_jni", + "//mediapipe/modules/hand_landmark:hand_landmark_tracking_gpu_image", + ], +) + +# Converts the .so cc_binary into a cc_library, to be consumed in an android_binary. +cc_library( + name = "mediapipe_jni_lib", + srcs = [":libmediapipe_jni.so"], + visibility = ["//visibility:public"], + alwayslink = 1, +) diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/CameraInput.java b/mediapipe/java/com/google/mediapipe/solutionbase/CameraInput.java new file mode 100644 index 000000000..acfbde74d --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/CameraInput.java @@ -0,0 +1,109 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +import android.app.Activity; +import com.google.mediapipe.components.CameraHelper; +import com.google.mediapipe.components.CameraXPreviewHelper; +import com.google.mediapipe.components.ExternalTextureConverter; +import com.google.mediapipe.components.PermissionHelper; +import com.google.mediapipe.components.TextureFrameConsumer; +import com.google.mediapipe.framework.MediaPipeException; +import com.google.mediapipe.framework.TextureFrame; +import javax.microedition.khronos.egl.EGLContext; + +/** + * The camera component that takes the camera input and produces MediaPipe {@link TextureFrame} + * objects. + */ +public class CameraInput { + private static final String TAG = "CameraInput"; + + /** Represents the direction the camera faces relative to device screen. */ + public static enum CameraFacing { + FRONT, + BACK + }; + + private final CameraXPreviewHelper cameraHelper; + private TextureFrameConsumer cameraNewFrameListener; + private ExternalTextureConverter converter; + + /** + * Initializes CamereInput and requests camera permissions. + * + * @param activity an Android {@link Activity}. + */ + public CameraInput(Activity activity) { + cameraHelper = new CameraXPreviewHelper(); + PermissionHelper.checkAndRequestCameraPermissions(activity); + } + + /** + * Sets a callback to be invoked when new frames available. + * + * @param listener the callback. + */ + public void setCameraNewFrameListener(TextureFrameConsumer listener) { + cameraNewFrameListener = listener; + } + + /** + * Sets up the external texture converter and starts the camera. + * + * @param activity an Android {@link Activity}. + * @param eglContext an OpenGL {@link EGLContext}. + * @param cameraFacing the direction the camera faces relative to device screen. + * @param width the desired width of the converted texture. + * @param height the desired height of the converted texture. + */ + public void start( + Activity activity, EGLContext eglContext, CameraFacing cameraFacing, int width, int height) { + if (!PermissionHelper.cameraPermissionsGranted(activity)) { + return; + } + if (converter == null) { + converter = new ExternalTextureConverter(eglContext, 2); + } + if (cameraNewFrameListener == null) { + throw new MediaPipeException( + MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(), + "cameraNewFrameListener is not set."); + } + converter.setConsumer(cameraNewFrameListener); + cameraHelper.setOnCameraStartedListener( + surfaceTexture -> + converter.setSurfaceTextureAndAttachToGLContext(surfaceTexture, width, height)); + cameraHelper.startCamera( + activity, + cameraFacing == CameraFacing.FRONT + ? CameraHelper.CameraFacing.FRONT + : CameraHelper.CameraFacing.BACK, + /*unusedSurfaceTexture=*/ null, + null); + } + + /** Stops the camera input. */ + public void stop() { + if (converter != null) { + converter.close(); + } + } + + /** Returns a boolean which is true if the camera is in Portrait mode, false in Landscape mode. */ + public boolean isCameraRotated() { + return cameraHelper.isCameraRotated(); + } +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/ErrorListener.java b/mediapipe/java/com/google/mediapipe/solutionbase/ErrorListener.java new file mode 100644 index 000000000..62723cdfb --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/ErrorListener.java @@ -0,0 +1,20 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +/** Interface for the customizable MediaPipe solution error listener. */ +public interface ErrorListener { + void onError(String message, RuntimeException e); +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/ImageSolutionBase.java b/mediapipe/java/com/google/mediapipe/solutionbase/ImageSolutionBase.java new file mode 100644 index 000000000..11d1808bf --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/ImageSolutionBase.java @@ -0,0 +1,174 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +import android.content.Context; +import android.graphics.Bitmap; +import android.util.Log; +import com.google.mediapipe.framework.MediaPipeException; +import com.google.mediapipe.framework.Packet; +import com.google.mediapipe.framework.TextureFrame; +import com.google.mediapipe.glutil.EglManager; +import java.util.concurrent.atomic.AtomicInteger; +import javax.microedition.khronos.egl.EGLContext; + +/** The base class of the MediaPipe image solutions. */ +// TODO: Consolidates the "send" methods to be a single "send(MlImage image)". +public class ImageSolutionBase extends SolutionBase { + public static final String TAG = "ImageSolutionBase"; + protected boolean staticImageMode; + private EglManager eglManager; + // Internal fake timestamp for static images. + private final AtomicInteger staticImageTimestamp = new AtomicInteger(0); + + /** + * Initializes MediaPipe image solution base with Android context, solution specific settings, and + * solution result handler. + * + * @param context an Android {@link Context}. + * @param solutionInfo a {@link SolutionInfo} contains binary graph file path, graph input and + * output stream names. + * @param outputHandler a {@link OutputHandler} handles the solution graph output packets and + * runtime exception. + */ + @Override + public synchronized void initialize( + Context context, + SolutionInfo solutionInfo, + OutputHandler outputHandler) { + staticImageMode = solutionInfo.staticImageMode(); + try { + super.initialize(context, solutionInfo, outputHandler); + eglManager = new EglManager(/*parentContext=*/ null); + solutionGraph.setParentGlContext(eglManager.getNativeContext()); + } catch (MediaPipeException e) { + throwException("Error occurs when creating MediaPipe image solution graph. ", e); + } + } + + /** Returns the managed {@link EGLContext} to share the opengl context with other components. */ + public EGLContext getGlContext() { + return eglManager.getContext(); + } + + + /** Returns the opengl major version number. */ + public int getGlMajorVersion() { + return eglManager.getGlMajorVersion(); + } + + /** Sends a {@link TextureFrame} into solution graph for processing. */ + public void send(TextureFrame textureFrame) { + if (!staticImageMode && textureFrame.getTimestamp() == Long.MIN_VALUE) { + throwException( + "Error occurs when calling the solution send method. ", + new MediaPipeException( + MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(), + "TextureFrame's timestamp needs to be explicitly set if not in static image mode.")); + return; + } + long timestampUs = + staticImageMode ? staticImageTimestamp.getAndIncrement() : textureFrame.getTimestamp(); + sendImage(textureFrame, timestampUs); + } + + /** + * Sends a {@link Bitmap} with a timestamp into solution graph for processing. In static image + * mode, the timestamp is ignored. + */ + public void send(Bitmap inputBitmap, long timestamp) { + if (staticImageMode) { + Log.w(TAG, "In static image mode, the MediaPipe solution ignores the input timestamp."); + } + sendImage(inputBitmap, staticImageMode ? staticImageTimestamp.getAndIncrement() : timestamp); + } + + /** Sends a {@link Bitmap} (static image) into solution graph for processing. */ + public void send(Bitmap inputBitmap) { + if (!staticImageMode) { + throwException( + "Error occurs when calling the solution send method. ", + new MediaPipeException( + MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(), + "When not in static image mode, a timestamp associated with the image is required." + + " Use send(Bitmap inputBitmap, long timestamp) instead.")); + return; + } + sendImage(inputBitmap, staticImageTimestamp.getAndIncrement()); + } + + /** Internal implementation of sending Bitmap/TextureFrame into the MediaPipe solution. */ + private synchronized void sendImage(T imageObj, long timestamp) { + if (lastTimestamp >= timestamp) { + throwException( + "The received frame having a smaller timestamp than the processed timestamp.", + new MediaPipeException( + MediaPipeException.StatusCode.FAILED_PRECONDITION.ordinal(), + "Receving a frame with invalid timestamp.")); + return; + } + lastTimestamp = timestamp; + Packet imagePacket = null; + try { + if (imageObj instanceof TextureFrame) { + imagePacket = packetCreator.createImage((TextureFrame) imageObj); + imageObj = null; + } else if (imageObj instanceof Bitmap) { + imagePacket = packetCreator.createRgbaImage((Bitmap) imageObj); + } else { + throwException( + "The input image type is not supported. ", + new MediaPipeException( + MediaPipeException.StatusCode.UNIMPLEMENTED.ordinal(), + "The input image type is not supported.")); + } + + try { + // addConsumablePacketToInputStream allows the graph to take exclusive ownership of the + // packet, which may allow for more memory optimizations. + solutionGraph.addConsumablePacketToInputStream( + imageInputStreamName, imagePacket, timestamp); + // If addConsumablePacket succeeded, we don't need to release the packet ourselves. + imagePacket = null; + } catch (MediaPipeException e) { + // TODO: do not suppress exceptions here! + if (errorListener == null) { + Log.e(TAG, "Mediapipe error: ", e); + } else { + throw e; + } + } + } catch (RuntimeException e) { + if (errorListener != null) { + errorListener.onError("Mediapipe error: ", e); + } else { + throw e; + } + } finally { + if (imagePacket != null) { + // In case of error, addConsumablePacketToInputStream will not release the packet, so we + // have to release it ourselves. (We could also re-try adding, but we don't). + imagePacket.release(); + } + if (imageObj instanceof TextureFrame) { + if (imageObj != null) { + // imagePacket will release frame if it has been created, but if not, we need to + // release it. + ((TextureFrame) imageObj).release(); + } + } + } + } +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/ImageSolutionResult.java b/mediapipe/java/com/google/mediapipe/solutionbase/ImageSolutionResult.java new file mode 100644 index 000000000..9e8cc11a1 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/ImageSolutionResult.java @@ -0,0 +1,59 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +import android.graphics.Bitmap; +import com.google.mediapipe.framework.AndroidPacketGetter; +import com.google.mediapipe.framework.Packet; +import com.google.mediapipe.framework.PacketGetter; +import com.google.mediapipe.framework.TextureFrame; + +/** + * The base class of any MediaPipe image solution result. The base class contains the common parts + * across all image solution results, including the input timestamp and the input image data. A new + * MediaPipe image solution result class should extend ImageSolutionResult. + */ +public class ImageSolutionResult implements SolutionResult { + protected long timestamp; + protected Packet imagePacket; + private Bitmap cachedBitmap; + + // Result timestamp, which is set to the timestamp of the corresponding input image. May return + // Long.MIN_VALUE if the input image is not associated with a timestamp. + @Override + public long timestamp() { + return timestamp; + } + + // Returns the corresponding input image as a {@link Bitmap}. + public Bitmap inputBitmap() { + if (cachedBitmap != null) { + return cachedBitmap; + } + cachedBitmap = AndroidPacketGetter.getBitmapFromRgba(imagePacket); + return cachedBitmap; + } + + // Returns the corresponding input image as a {@link TextureFrame}. The caller must release the + // acquired {@link TextureFrame} after using. + public TextureFrame acquireTextureFrame() { + return PacketGetter.getTextureFrame(imagePacket); + } + + // Releases image packet and the underlying data. + void releaseImagePacket() { + imagePacket.release(); + } +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/OutputHandler.java b/mediapipe/java/com/google/mediapipe/solutionbase/OutputHandler.java new file mode 100644 index 000000000..f76d5c7a3 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/OutputHandler.java @@ -0,0 +1,86 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +import android.util.Log; +import com.google.mediapipe.framework.MediaPipeException; +import com.google.mediapipe.framework.Packet; +import java.util.List; + +/** Interface for handling MediaPipe solution graph outputs. */ +public class OutputHandler { + private static final String TAG = "OutputHandler"; + + /** Interface for converting outputs packet lists to solution result objects. */ + public interface OutputConverter { + public abstract T convert(List packets); + } + // A solution specific graph output converter that should be implemented by solution. + private OutputConverter outputConverter; + // The user-defined solution result listener. + private ResultListener customResultListener; + // The user-defined error listener. + private ErrorListener customErrorListener; + + /** + * Sets a callback to be invoked to convert a packet list to a solution result object. + * + * @param converter the solution-defined {@link OutputConverter} callback. + */ + public void setOutputConverter(OutputConverter converter) { + this.outputConverter = converter; + } + + /** + * Sets a callback to be invoked when a solution result objects become available . + * + * @param listener the user-defined {@link ResultListener} callback. + */ + public void setResultListener(ResultListener listener) { + this.customResultListener = listener; + } + + /** + * Sets a callback to be invoked when exceptions are thrown in the solution. + * + * @param listener the user-defined {@link ErrorListener} callback. + */ + public void setErrorListener(ErrorListener listener) { + this.customErrorListener = listener; + } + + /** Handles a list of output packets. Invoked when packet lists become available. */ + public void run(List packets) { + T solutionResult = null; + try { + solutionResult = outputConverter.convert(packets); + customResultListener.run(solutionResult); + } catch (MediaPipeException e) { + if (customErrorListener != null) { + customErrorListener.onError("Error occurs when getting MediaPipe solution result. ", e); + } else { + Log.e(TAG, "Error occurs when getting MediaPipe solution result. " + e); + } + } finally { + for (Packet packet : packets) { + packet.release(); + } + if (solutionResult instanceof ImageSolutionResult) { + ImageSolutionResult imageSolutionResult = (ImageSolutionResult) solutionResult; + imageSolutionResult.releaseImagePacket(); + } + } + } +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/ResultListener.java b/mediapipe/java/com/google/mediapipe/solutionbase/ResultListener.java new file mode 100644 index 000000000..938c115f9 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/ResultListener.java @@ -0,0 +1,20 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +/** Interface for the customizable MediaPipe solution result listener. */ +public interface ResultListener { + void run(T result); +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/SolutionBase.java b/mediapipe/java/com/google/mediapipe/solutionbase/SolutionBase.java new file mode 100644 index 000000000..d42194567 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/SolutionBase.java @@ -0,0 +1,150 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +import static java.util.concurrent.TimeUnit.MICROSECONDS; +import static java.util.concurrent.TimeUnit.MILLISECONDS; + +import android.content.Context; +import android.os.SystemClock; +import android.util.Log; +import com.google.common.collect.ImmutableList; +import com.google.mediapipe.framework.AndroidAssetUtil; +import com.google.mediapipe.framework.AndroidPacketCreator; +import com.google.mediapipe.framework.Graph; +import com.google.mediapipe.framework.MediaPipeException; +import com.google.mediapipe.framework.Packet; +import com.google.mediapipe.framework.PacketGetter; +import com.google.protobuf.Parser; +import java.io.File; +import java.util.List; +import java.util.Map; +import java.util.concurrent.atomic.AtomicBoolean; +import javax.annotation.Nullable; + +/** The base class of the MediaPipe solutions. */ +public class SolutionBase { + private static final String TAG = "SolutionBase"; + protected Graph solutionGraph; + protected AndroidPacketCreator packetCreator; + protected ErrorListener errorListener; + protected String imageInputStreamName; + protected long lastTimestamp = Long.MIN_VALUE; + protected final AtomicBoolean solutionGraphStarted = new AtomicBoolean(false); + + static { + // Load all native libraries needed by the app. + System.loadLibrary("mediapipe_jni"); + System.loadLibrary("opencv_java3"); + } + + /** + * Initializes solution base with Android context, solution specific settings, and solution result + * handler. + * + * @param context an Android {@link Context}. + * @param solutionInfo a {@link SolutionInfo} contains binary graph file path, graph input and + * output stream names. + * @param outputHandler a {@link OutputHandler} handles both solution result object and runtime + * exception. + */ + public synchronized void initialize( + Context context, + SolutionInfo solutionInfo, + OutputHandler outputHandler) { + this.imageInputStreamName = solutionInfo.imageInputStreamName(); + try { + AndroidAssetUtil.initializeNativeAssetManager(context); + solutionGraph = new Graph(); + if (new File(solutionInfo.binaryGraphPath()).isAbsolute()) { + solutionGraph.loadBinaryGraph(solutionInfo.binaryGraphPath()); + } else { + solutionGraph.loadBinaryGraph( + AndroidAssetUtil.getAssetBytes(context.getAssets(), solutionInfo.binaryGraphPath())); + } + solutionGraph.addMultiStreamCallback( + solutionInfo.outputStreamNames(), outputHandler::run, /*observeTimestampBounds=*/ true); + packetCreator = new AndroidPacketCreator(solutionGraph); + } catch (MediaPipeException e) { + throwException("Error occurs when creating the MediaPipe solution graph. ", e); + } + } + + /** Throws exception with error message. */ + protected void throwException(String message, MediaPipeException e) { + if (errorListener != null) { + errorListener.onError(message, e); + } else { + Log.e(TAG, message, e); + } + } + + /** + * A convinence method to get proto list from a packet. If packet is empty, returns an empty list. + */ + protected List getProtoVector(Packet packet, Parser messageParser) { + return packet.isEmpty() + ? ImmutableList.of() + : PacketGetter.getProtoVector(packet, messageParser); + } + + /** Gets current timestamp in microseconds. */ + protected long getCurrentTimestampUs() { + return MICROSECONDS.convert(SystemClock.elapsedRealtime(), MILLISECONDS); + } + + /** Starts the solution graph by taking an optional input side packets map. */ + public synchronized void start(@Nullable Map inputSidePackets) { + try { + if (inputSidePackets != null) { + solutionGraph.setInputSidePackets(inputSidePackets); + } + if (!solutionGraphStarted.getAndSet(true)) { + solutionGraph.startRunningGraph(); + } + } catch (MediaPipeException e) { + throwException("Error occurs when starting the MediaPipe solution graph. ", e); + } + } + + /** A blocking API that returns until the solution finishes processing all the pending tasks. */ + public void waitUntilIdle() { + try { + solutionGraph.waitUntilGraphIdle(); + } catch (MediaPipeException e) { + throwException("Error occurs when waiting until the MediaPipe graph becomes idle. ", e); + } + } + + /** Closes and cleans up the solution graph. */ + public void close() { + if (solutionGraphStarted.get()) { + try { + solutionGraph.closeAllPacketSources(); + solutionGraph.waitUntilGraphDone(); + } catch (MediaPipeException e) { + // Note: errors during Process are reported at the earliest opportunity, + // which may be addPacket or waitUntilDone, depending on timing. For consistency, + // we want to always report them using the same async handler if installed. + throwException("Error occurs when closing the Mediapipe solution graph. ", e); + } + try { + solutionGraph.tearDown(); + } catch (MediaPipeException e) { + throwException("Error occurs when closing the Mediapipe solution graph. ", e); + } + } + } +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/SolutionInfo.java b/mediapipe/java/com/google/mediapipe/solutionbase/SolutionInfo.java new file mode 100644 index 000000000..fed2994b2 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/SolutionInfo.java @@ -0,0 +1,48 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +import com.google.auto.value.AutoValue; +import com.google.common.collect.ImmutableList; + +/** SolutionInfo contains all needed informaton to initialize a MediaPipe solution graph. */ +@AutoValue +public abstract class SolutionInfo { + public abstract String binaryGraphPath(); + + public abstract String imageInputStreamName(); + + public abstract ImmutableList outputStreamNames(); + + public abstract boolean staticImageMode(); + + public static Builder builder() { + return new AutoValue_SolutionInfo.Builder(); + } + + /** Builder for {@link SolutionInfo}. */ + @AutoValue.Builder + public abstract static class Builder { + public abstract Builder setBinaryGraphPath(String value); + + public abstract Builder setImageInputStreamName(String value); + + public abstract Builder setOutputStreamNames(ImmutableList value); + + public abstract Builder setStaticImageMode(boolean value); + + public abstract SolutionInfo build(); + } +} diff --git a/mediapipe/java/com/google/mediapipe/solutionbase/SolutionResult.java b/mediapipe/java/com/google/mediapipe/solutionbase/SolutionResult.java new file mode 100644 index 000000000..c77b79847 --- /dev/null +++ b/mediapipe/java/com/google/mediapipe/solutionbase/SolutionResult.java @@ -0,0 +1,23 @@ +// Copyright 2021 The MediaPipe Authors. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package com.google.mediapipe.solutionbase; + +/** + * Interface of the MediaPipe solution result. Any MediaPipe solution-specific result class should + * implement SolutionResult. + */ +public interface SolutionResult { + long timestamp(); +} diff --git a/mediapipe/modules/face_detection/BUILD b/mediapipe/modules/face_detection/BUILD index 374cfeb58..9ee606455 100644 --- a/mediapipe/modules/face_detection/BUILD +++ b/mediapipe/modules/face_detection/BUILD @@ -83,6 +83,8 @@ mediapipe_simple_subgraph( exports_files( srcs = [ + "face_detection_back.tflite", + "face_detection_back_sparse.tflite", "face_detection_front.tflite", ], ) diff --git a/mediapipe/modules/face_detection/face_detection_back.tflite b/mediapipe/modules/face_detection/face_detection_back.tflite index b15764ebd..98c5c16bb 100755 Binary files a/mediapipe/modules/face_detection/face_detection_back.tflite and b/mediapipe/modules/face_detection/face_detection_back.tflite differ diff --git a/mediapipe/modules/face_detection/face_detection_back_sparse.tflite b/mediapipe/modules/face_detection/face_detection_back_sparse.tflite new file mode 100755 index 000000000..9575d8c1f Binary files /dev/null and b/mediapipe/modules/face_detection/face_detection_back_sparse.tflite differ diff --git a/mediapipe/modules/face_landmark/face_landmark_cpu.pbtxt b/mediapipe/modules/face_landmark/face_landmark_cpu.pbtxt index 4eb29be65..a94a8c803 100644 --- a/mediapipe/modules/face_landmark/face_landmark_cpu.pbtxt +++ b/mediapipe/modules/face_landmark/face_landmark_cpu.pbtxt @@ -109,7 +109,7 @@ node { output_stream: "ensured_landmark_tensors" } -# Decodes the landmark tensors into a vector of lanmarks, where the landmark +# Decodes the landmark tensors into a vector of landmarks, where the landmark # coordinates are normalized by the size of the input image to the model. node { calculator: "TensorsToLandmarksCalculator" diff --git a/mediapipe/modules/face_landmark/face_landmark_gpu.pbtxt b/mediapipe/modules/face_landmark/face_landmark_gpu.pbtxt index 17c9bd78c..7d8c3bf7d 100644 --- a/mediapipe/modules/face_landmark/face_landmark_gpu.pbtxt +++ b/mediapipe/modules/face_landmark/face_landmark_gpu.pbtxt @@ -109,7 +109,7 @@ node { output_stream: "ensured_landmark_tensors" } -# Decodes the landmark tensors into a vector of lanmarks, where the landmark +# Decodes the landmark tensors into a vector of landmarks, where the landmark # coordinates are normalized by the size of the input image to the model. node { calculator: "TensorsToLandmarksCalculator" diff --git a/mediapipe/modules/objectron/calculators/box.cc b/mediapipe/modules/objectron/calculators/box.cc index a7d0f1460..bd2ce57f9 100644 --- a/mediapipe/modules/objectron/calculators/box.cc +++ b/mediapipe/modules/objectron/calculators/box.cc @@ -14,7 +14,7 @@ #include "mediapipe/modules/objectron/calculators/box.h" -#include "Eigen/src/Core/util/Constants.h" +#include "Eigen/Core" #include "mediapipe/framework/port/logging.h" namespace mediapipe { diff --git a/mediapipe/modules/pose_landmark/BUILD b/mediapipe/modules/pose_landmark/BUILD index 90edbb8a0..f38b2040d 100644 --- a/mediapipe/modules/pose_landmark/BUILD +++ b/mediapipe/modules/pose_landmark/BUILD @@ -78,7 +78,9 @@ mediapipe_simple_subgraph( graph = "pose_landmark_filtering.pbtxt", register_as = "PoseLandmarkFiltering", deps = [ + "//mediapipe/calculators/util:alignment_points_to_rects_calculator", "//mediapipe/calculators/util:landmarks_smoothing_calculator", + "//mediapipe/calculators/util:landmarks_to_detection_calculator", "//mediapipe/calculators/util:visibility_smoothing_calculator", "//mediapipe/framework/tool:switch_container", ], diff --git a/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt b/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt index 6f777ed5e..2560dda79 100644 --- a/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt +++ b/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt @@ -29,6 +29,29 @@ output_stream: "FILTERED_NORM_LANDMARKS:filtered_landmarks" # Filtered auxiliary set of normalized landmarks. (NormalizedRect) output_stream: "FILTERED_AUX_NORM_LANDMARKS:filtered_aux_landmarks" +# Converts landmarks to a detection that tightly encloses all landmarks. +node { + calculator: "LandmarksToDetectionCalculator" + input_stream: "NORM_LANDMARKS:aux_landmarks" + output_stream: "DETECTION:aux_detection" +} + +# Converts detection into a rectangle based on center and scale alignment +# points. +node { + calculator: "AlignmentPointsRectsCalculator" + input_stream: "DETECTION:aux_detection" + input_stream: "IMAGE_SIZE:image_size" + output_stream: "NORM_RECT:roi" + options: { + [mediapipe.DetectionsToRectsCalculatorOptions.ext] { + rotation_vector_start_keypoint_index: 0 + rotation_vector_end_keypoint_index: 1 + rotation_vector_target_angle_degrees: 90 + } + } +} + # Smoothes pose landmark visibilities to reduce jitter. node { calculator: "SwitchContainer" @@ -66,6 +89,7 @@ node { input_side_packet: "ENABLE:enable" input_stream: "NORM_LANDMARKS:filtered_visibility" input_stream: "IMAGE_SIZE:image_size" + input_stream: "OBJECT_SCALE_ROI:roi" output_stream: "NORM_FILTERED_LANDMARKS:filtered_landmarks" options: { [mediapipe.SwitchContainerOptions.ext] { @@ -83,12 +107,12 @@ node { options: { [mediapipe.LandmarksSmoothingCalculatorOptions.ext] { one_euro_filter { - # Min cutoff 0.1 results into ~ 0.02 alpha in landmark EMA filter + # Min cutoff 0.1 results into ~0.01 alpha in landmark EMA filter # when landmark is static. - min_cutoff: 0.1 - # Beta 40.0 in combintation with min_cutoff 0.1 results into ~0.8 - # alpha in landmark EMA filter when landmark is moving fast. - beta: 40.0 + min_cutoff: 0.05 + # Beta 80.0 in combintation with min_cutoff 0.05 results into + # ~0.94 alpha in landmark EMA filter when landmark is moving fast. + beta: 80.0 # Derivative cutoff 1.0 results into ~0.17 alpha in landmark # velocity EMA filter. derivate_cutoff: 1.0 @@ -119,6 +143,7 @@ node { calculator: "LandmarksSmoothingCalculator" input_stream: "NORM_LANDMARKS:filtered_aux_visibility" input_stream: "IMAGE_SIZE:image_size" + input_stream: "OBJECT_SCALE_ROI:roi" output_stream: "NORM_FILTERED_LANDMARKS:filtered_aux_landmarks" options: { [mediapipe.LandmarksSmoothingCalculatorOptions.ext] { @@ -127,12 +152,12 @@ node { # object is not moving but responsive enough in case of sudden # movements. one_euro_filter { - # Min cutoff 0.01 results into ~ 0.002 alpha in landmark EMA + # Min cutoff 0.01 results into ~0.002 alpha in landmark EMA # filter when landmark is static. min_cutoff: 0.01 - # Beta 1.0 in combintation with min_cutoff 0.01 results into ~0.2 + # Beta 10.0 in combintation with min_cutoff 0.01 results into ~0.68 # alpha in landmark EMA filter when landmark is moving fast. - beta: 1.0 + beta: 10.0 # Derivative cutoff 1.0 results into ~0.17 alpha in landmark # velocity EMA filter. derivate_cutoff: 1.0 diff --git a/mediapipe/modules/selfie_segmentation/BUILD b/mediapipe/modules/selfie_segmentation/BUILD new file mode 100644 index 000000000..f2babd7c5 --- /dev/null +++ b/mediapipe/modules/selfie_segmentation/BUILD @@ -0,0 +1,73 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +load( + "//mediapipe/framework/tool:mediapipe_graph.bzl", + "mediapipe_simple_subgraph", +) + +licenses(["notice"]) + +package(default_visibility = ["//visibility:public"]) + +mediapipe_simple_subgraph( + name = "selfie_segmentation_model_loader", + graph = "selfie_segmentation_model_loader.pbtxt", + register_as = "SelfieSegmentationModelLoader", + deps = [ + "//mediapipe/calculators/core:constant_side_packet_calculator", + "//mediapipe/calculators/tflite:tflite_model_calculator", + "//mediapipe/calculators/util:local_file_contents_calculator", + "//mediapipe/framework/tool:switch_container", + ], +) + +mediapipe_simple_subgraph( + name = "selfie_segmentation_cpu", + graph = "selfie_segmentation_cpu.pbtxt", + register_as = "SelfieSegmentationCpu", + deps = [ + ":selfie_segmentation_model_loader", + "//mediapipe/calculators/image:image_properties_calculator", + "//mediapipe/calculators/tensor:image_to_tensor_calculator", + "//mediapipe/calculators/tensor:inference_calculator", + "//mediapipe/calculators/tensor:tensors_to_segmentation_calculator", + "//mediapipe/calculators/tflite:tflite_custom_op_resolver_calculator", + "//mediapipe/calculators/util:from_image_calculator", + "//mediapipe/framework/tool:switch_container", + ], +) + +mediapipe_simple_subgraph( + name = "selfie_segmentation_gpu", + graph = "selfie_segmentation_gpu.pbtxt", + register_as = "SelfieSegmentationGpu", + deps = [ + ":selfie_segmentation_model_loader", + "//mediapipe/calculators/image:image_properties_calculator", + "//mediapipe/calculators/tensor:image_to_tensor_calculator", + "//mediapipe/calculators/tensor:inference_calculator", + "//mediapipe/calculators/tensor:tensors_to_segmentation_calculator", + "//mediapipe/calculators/tflite:tflite_custom_op_resolver_calculator", + "//mediapipe/calculators/util:from_image_calculator", + "//mediapipe/framework/tool:switch_container", + ], +) + +exports_files( + srcs = [ + "selfie_segmentation.tflite", + "selfie_segmentation_landscape.tflite", + ], +) diff --git a/mediapipe/modules/selfie_segmentation/README.md b/mediapipe/modules/selfie_segmentation/README.md new file mode 100644 index 000000000..cd6c5e044 --- /dev/null +++ b/mediapipe/modules/selfie_segmentation/README.md @@ -0,0 +1,6 @@ +# selfie_segmentation + +Subgraphs|Details +:--- | :--- +[`SelfieSegmentationCpu`](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_cpu.pbtxt)| Segments the person from background in a selfie image. (CPU input, and inference is executed on CPU.) +[`SelfieSegmentationGpu`](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt)| Segments the person from background in a selfie image. (GPU input, and inference is executed on GPU.) diff --git a/mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite b/mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite new file mode 100644 index 000000000..374c0720d Binary files /dev/null and b/mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite differ diff --git a/mediapipe/modules/selfie_segmentation/selfie_segmentation_cpu.pbtxt b/mediapipe/modules/selfie_segmentation/selfie_segmentation_cpu.pbtxt new file mode 100644 index 000000000..550cee906 --- /dev/null +++ b/mediapipe/modules/selfie_segmentation/selfie_segmentation_cpu.pbtxt @@ -0,0 +1,131 @@ +# MediaPipe graph to perform selfie segmentation. (CPU input, and all processing +# and inference are also performed on CPU) +# +# It is required that "selfie_segmentation.tflite" or +# "selfie_segmentation_landscape.tflite" is available at +# "mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite" +# or +# "mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite" +# path respectively during execution, depending on the specification in the +# MODEL_SELECTION input side packet. +# +# EXAMPLE: +# node { +# calculator: "SelfieSegmentationCpu" +# input_side_packet: "MODEL_SELECTION:model_selection" +# input_stream: "IMAGE:image" +# output_stream: "SEGMENTATION_MASK:segmentation_mask" +# } + +type: "SelfieSegmentationCpu" + +# CPU image. (ImageFrame) +input_stream: "IMAGE:image" + +# An integer 0 or 1. Use 0 to select a general-purpose model (operating on a +# 256x256 tensor), and 1 to select a model (operating on a 256x144 tensor) more +# optimized for landscape images. If unspecified, functions as set to 0. (int) +input_side_packet: "MODEL_SELECTION:model_selection" + +# Segmentation mask. (ImageFrame in ImageFormat::VEC32F1) +output_stream: "SEGMENTATION_MASK:segmentation_mask" + +# Resizes the input image into a tensor with a dimension desired by the model. +node { + calculator: "SwitchContainer" + input_side_packet: "SELECT:model_selection" + input_stream: "IMAGE:image" + output_stream: "TENSORS:input_tensors" + options: { + [mediapipe.SwitchContainerOptions.ext] { + select: 0 + contained_node: { + calculator: "ImageToTensorCalculator" + options: { + [mediapipe.ImageToTensorCalculatorOptions.ext] { + output_tensor_width: 256 + output_tensor_height: 256 + keep_aspect_ratio: false + output_tensor_float_range { + min: 0.0 + max: 1.0 + } + border_mode: BORDER_ZERO + } + } + } + contained_node: { + calculator: "ImageToTensorCalculator" + options: { + [mediapipe.ImageToTensorCalculatorOptions.ext] { + output_tensor_width: 256 + output_tensor_height: 144 + keep_aspect_ratio: false + output_tensor_float_range { + min: 0.0 + max: 1.0 + } + border_mode: BORDER_ZERO + } + } + } + } + } +} + +# Generates a single side packet containing a TensorFlow Lite op resolver that +# supports custom ops needed by the model used in this graph. +node { + calculator: "TfLiteCustomOpResolverCalculator" + output_side_packet: "op_resolver" +} + +# Loads the selfie segmentation TF Lite model. +node { + calculator: "SelfieSegmentationModelLoader" + input_side_packet: "MODEL_SELECTION:model_selection" + output_side_packet: "MODEL:model" +} + +# Runs model inference on CPU. +node { + calculator: "InferenceCalculator" + input_stream: "TENSORS:input_tensors" + output_stream: "TENSORS:output_tensors" + input_side_packet: "MODEL:model" + input_side_packet: "CUSTOM_OP_RESOLVER:op_resolver" + options: { + [mediapipe.InferenceCalculatorOptions.ext] { + delegate { xnnpack {} } + } + # + } +} + +# Retrieves the size of the input image. +node { + calculator: "ImagePropertiesCalculator" + input_stream: "IMAGE_CPU:image" + output_stream: "SIZE:input_size" +} + +# Processes the output tensors into a segmentation mask that has the same size +# as the input image into the graph. +node { + calculator: "TensorsToSegmentationCalculator" + input_stream: "TENSORS:output_tensors" + input_stream: "OUTPUT_SIZE:input_size" + output_stream: "MASK:mask_image" + options: { + [mediapipe.TensorsToSegmentationCalculatorOptions.ext] { + activation: NONE + } + } +} + +# Converts the incoming Image into the corresponding ImageFrame type. +node: { + calculator: "FromImageCalculator" + input_stream: "IMAGE:mask_image" + output_stream: "IMAGE_CPU:segmentation_mask" +} diff --git a/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt b/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt new file mode 100644 index 000000000..5f9e55eb7 --- /dev/null +++ b/mediapipe/modules/selfie_segmentation/selfie_segmentation_gpu.pbtxt @@ -0,0 +1,133 @@ +# MediaPipe graph to perform selfie segmentation. (GPU input, and all processing +# and inference are also performed on GPU) +# +# It is required that "selfie_segmentation.tflite" or +# "selfie_segmentation_landscape.tflite" is available at +# "mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite" +# or +# "mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite" +# path respectively during execution, depending on the specification in the +# MODEL_SELECTION input side packet. +# +# EXAMPLE: +# node { +# calculator: "SelfieSegmentationGpu" +# input_side_packet: "MODEL_SELECTION:model_selection" +# input_stream: "IMAGE:image" +# output_stream: "SEGMENTATION_MASK:segmentation_mask" +# } + +type: "SelfieSegmentationGpu" + +# GPU image. (GpuBuffer) +input_stream: "IMAGE:image" + +# An integer 0 or 1. Use 0 to select a general-purpose model (operating on a +# 256x256 tensor), and 1 to select a model (operating on a 256x144 tensor) more +# optimized for landscape images. If unspecified, functions as set to 0. (int) +input_side_packet: "MODEL_SELECTION:model_selection" + +# Segmentation mask. (GpuBuffer in RGBA, with the same mask values in R and A) +output_stream: "SEGMENTATION_MASK:segmentation_mask" + +# Resizes the input image into a tensor with a dimension desired by the model. +node { + calculator: "SwitchContainer" + input_side_packet: "SELECT:model_selection" + input_stream: "IMAGE_GPU:image" + output_stream: "TENSORS:input_tensors" + options: { + [mediapipe.SwitchContainerOptions.ext] { + select: 0 + contained_node: { + calculator: "ImageToTensorCalculator" + options: { + [mediapipe.ImageToTensorCalculatorOptions.ext] { + output_tensor_width: 256 + output_tensor_height: 256 + keep_aspect_ratio: false + output_tensor_float_range { + min: 0.0 + max: 1.0 + } + border_mode: BORDER_ZERO + gpu_origin: TOP_LEFT + } + } + } + contained_node: { + calculator: "ImageToTensorCalculator" + options: { + [mediapipe.ImageToTensorCalculatorOptions.ext] { + output_tensor_width: 256 + output_tensor_height: 144 + keep_aspect_ratio: false + output_tensor_float_range { + min: 0.0 + max: 1.0 + } + border_mode: BORDER_ZERO + gpu_origin: TOP_LEFT + } + } + } + } + } +} + +# Generates a single side packet containing a TensorFlow Lite op resolver that +# supports custom ops needed by the model used in this graph. +node { + calculator: "TfLiteCustomOpResolverCalculator" + output_side_packet: "op_resolver" + options: { + [mediapipe.TfLiteCustomOpResolverCalculatorOptions.ext] { + use_gpu: true + } + } +} + +# Loads the selfie segmentation TF Lite model. +node { + calculator: "SelfieSegmentationModelLoader" + input_side_packet: "MODEL_SELECTION:model_selection" + output_side_packet: "MODEL:model" +} + +# Runs model inference on GPU. +node { + calculator: "InferenceCalculator" + input_stream: "TENSORS:input_tensors" + output_stream: "TENSORS:output_tensors" + input_side_packet: "MODEL:model" + input_side_packet: "CUSTOM_OP_RESOLVER:op_resolver" +} + +# Retrieves the size of the input image. +node { + calculator: "ImagePropertiesCalculator" + input_stream: "IMAGE_GPU:image" + output_stream: "SIZE:input_size" +} + +# Processes the output tensors into a segmentation mask that has the same size +# as the input image into the graph. +node { + calculator: "TensorsToSegmentationCalculator" + input_stream: "TENSORS:output_tensors" + input_stream: "OUTPUT_SIZE:input_size" + output_stream: "MASK:mask_image" + options: { + [mediapipe.TensorsToSegmentationCalculatorOptions.ext] { + activation: NONE + gpu_origin: TOP_LEFT + } + } +} + +# Converts the incoming Image into the corresponding GpuBuffer type. +node: { + calculator: "FromImageCalculator" + input_stream: "IMAGE:mask_image" + output_stream: "IMAGE_GPU:segmentation_mask" +} diff --git a/mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite b/mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite new file mode 100755 index 000000000..4ea3f8a10 Binary files /dev/null and b/mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite differ diff --git a/mediapipe/modules/selfie_segmentation/selfie_segmentation_model_loader.pbtxt b/mediapipe/modules/selfie_segmentation/selfie_segmentation_model_loader.pbtxt new file mode 100644 index 000000000..39495f80d --- /dev/null +++ b/mediapipe/modules/selfie_segmentation/selfie_segmentation_model_loader.pbtxt @@ -0,0 +1,63 @@ +# MediaPipe graph to load a selected selfie segmentation TF Lite model. + +type: "SelfieSegmentationModelLoader" + +# An integer 0 or 1. Use 0 to select a general-purpose model (operating on a +# 256x256 tensor), and 1 to select a model (operating on a 256x144 tensor) more +# optimized for landscape images. If unspecified, functions as set to 0. (int) +input_side_packet: "MODEL_SELECTION:model_selection" + +# TF Lite model represented as a FlatBuffer. +# (std::unique_ptr>) +output_side_packet: "MODEL:model" + +# Determines path to the desired pose landmark model file. +node { + calculator: "SwitchContainer" + input_side_packet: "SELECT:model_selection" + output_side_packet: "PACKET:model_path" + options: { + [mediapipe.SwitchContainerOptions.ext] { + select: 0 + contained_node: { + calculator: "ConstantSidePacketCalculator" + options: { + [mediapipe.ConstantSidePacketCalculatorOptions.ext]: { + packet { + string_value: "mediapipe/modules/selfie_segmentation/selfie_segmentation.tflite" + } + } + } + } + contained_node: { + calculator: "ConstantSidePacketCalculator" + options: { + [mediapipe.ConstantSidePacketCalculatorOptions.ext]: { + packet { + string_value: "mediapipe/modules/selfie_segmentation/selfie_segmentation_landscape.tflite" + } + } + } + } + } + } +} + +# Loads the file in the specified path into a blob. +node { + calculator: "LocalFileContentsCalculator" + input_side_packet: "FILE_PATH:model_path" + output_side_packet: "CONTENTS:model_blob" + options: { + [mediapipe.LocalFileContentsCalculatorOptions.ext]: { + text_mode: false + } + } +} + +# Converts the input blob into a TF Lite model. +node { + calculator: "TfLiteModelCalculator" + input_side_packet: "MODEL_BLOB:model_blob" + output_side_packet: "MODEL:model" +} diff --git a/mediapipe/opensource_only/ISSUE_TEMPLATE/30-bug-issue.md b/mediapipe/opensource_only/ISSUE_TEMPLATE/30-bug-issue.md new file mode 100644 index 000000000..f31f3649f --- /dev/null +++ b/mediapipe/opensource_only/ISSUE_TEMPLATE/30-bug-issue.md @@ -0,0 +1,26 @@ +Please make sure that this is a bug and also refer to the [troubleshooting](https://google.github.io/mediapipe/getting_started/troubleshooting.html), FAQ documentation before raising any issues. + +**System information** (Please provide as much relevant information as possible) + +- Have I written custom code (as opposed to using a stock example script provided in MediaPipe): +- OS Platform and Distribution (e.g., Linux Ubuntu 16.04, Android 11, iOS 14.4): +- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: +- Browser and version (e.g. Google Chrome, Safari) if the issue happens on browser: +- Programming Language and version ( e.g. C++, Python, Java): +- [MediaPipe version](https://github.com/google/mediapipe/releases): +- Bazel version (if compiling from source): +- Solution ( e.g. FaceMesh, Pose, Holistic ): +- Android Studio, NDK, SDK versions (if issue is related to building in Android environment): +- Xcode & Tulsi version (if issue is related to building for iOS): + +**Describe the current behavior:** + +**Describe the expected behavior:** + +**Standalone code to reproduce the issue:** +Provide a reproducible test case that is the bare minimum necessary to replicate the problem. If possible, please share a link to Colab/repo link /any notebook: + +**Other info / Complete Logs :** + Include any logs or source code that would be helpful to +diagnose the problem. If including tracebacks, please include the full +traceback. Large logs and files should be attached diff --git a/mediapipe/opensource_only/ISSUE_TEMPLATE/40-feature-request.md b/mediapipe/opensource_only/ISSUE_TEMPLATE/40-feature-request.md new file mode 100644 index 000000000..2da72f3b1 --- /dev/null +++ b/mediapipe/opensource_only/ISSUE_TEMPLATE/40-feature-request.md @@ -0,0 +1,18 @@ +Please make sure that this is a feature request. + +**System information** (Please provide as much relevant information as possible) + +- MediaPipe Solution (you are using): +- Programming language : C++/typescript/Python/Objective C/Android Java +- Are you willing to contribute it (Yes/No): + + +**Describe the feature and the current behavior/state:** + +**Will this change the current api? How?** + +**Who will benefit with this feature?** + +**Please specify the use cases for this feature:** + +**Any Other info:** diff --git a/mediapipe/opensource_only/ISSUE_TEMPLATE/50-other-issues.md b/mediapipe/opensource_only/ISSUE_TEMPLATE/50-other-issues.md new file mode 100644 index 000000000..9e094dd9c --- /dev/null +++ b/mediapipe/opensource_only/ISSUE_TEMPLATE/50-other-issues.md @@ -0,0 +1,8 @@ +This template is for miscellaneous issues not covered by the other issue categories + +For questions on how to work with MediaPipe, or support for problems that are not verified bugs in MediaPipe, please go to [StackOverflow](https://stackoverflow.com/questions/tagged/mediapipe) and [Slack](https://mediapipe.page.link/joinslack) communities. + +If you are reporting a vulnerability, please use the [dedicated reporting process](https://github.com/google/mediapipe/security). + +For high-level discussions about MediaPipe, please post to discuss@mediapipe.org, for questions about the development or internal workings of MediaPipe, or if you would like to know how to contribute to MediaPipe, please post to developers@mediapipe.org. + diff --git a/mediapipe/python/BUILD b/mediapipe/python/BUILD index 08a299589..11fe45835 100644 --- a/mediapipe/python/BUILD +++ b/mediapipe/python/BUILD @@ -72,5 +72,6 @@ cc_library( "//mediapipe/modules/pose_detection:pose_detection_cpu", "//mediapipe/modules/pose_landmark:pose_landmark_by_roi_cpu", "//mediapipe/modules/pose_landmark:pose_landmark_cpu", + "//mediapipe/modules/selfie_segmentation:selfie_segmentation_cpu", ], ) diff --git a/mediapipe/python/solutions/__init__.py b/mediapipe/python/solutions/__init__.py index 8cd9af327..fc0686d22 100644 --- a/mediapipe/python/solutions/__init__.py +++ b/mediapipe/python/solutions/__init__.py @@ -21,3 +21,4 @@ import mediapipe.python.solutions.hands import mediapipe.python.solutions.holistic import mediapipe.python.solutions.objectron import mediapipe.python.solutions.pose +import mediapipe.python.solutions.selfie_segmentation diff --git a/mediapipe/python/solutions/drawing_utils.py b/mediapipe/python/solutions/drawing_utils.py index 40259153e..2e0fd9971 100644 --- a/mediapipe/python/solutions/drawing_utils.py +++ b/mediapipe/python/solutions/drawing_utils.py @@ -15,7 +15,7 @@ """MediaPipe solution drawing utils.""" import math -from typing import List, Tuple, Union +from typing import List, Optional, Tuple, Union import cv2 import dataclasses @@ -116,7 +116,7 @@ def draw_detection( def draw_landmarks( image: np.ndarray, landmark_list: landmark_pb2.NormalizedLandmarkList, - connections: List[Tuple[int, int]] = None, + connections: Optional[List[Tuple[int, int]]] = None, landmark_drawing_spec: DrawingSpec = DrawingSpec(color=RED_COLOR), connection_drawing_spec: DrawingSpec = DrawingSpec()): """Draws the landmarks and the connections on the image. diff --git a/mediapipe/python/solutions/face_detection_test.py b/mediapipe/python/solutions/face_detection_test.py index 25f5b33fd..f4185ea46 100644 --- a/mediapipe/python/solutions/face_detection_test.py +++ b/mediapipe/python/solutions/face_detection_test.py @@ -56,7 +56,8 @@ class FaceDetectionTest(absltest.TestCase): self.assertIsNone(results.detections) def test_face(self): - image_path = os.path.join(os.path.dirname(__file__), 'testdata/face.jpg') + image_path = os.path.join(os.path.dirname(__file__), + 'testdata/portrait.jpg') image = cv2.imread(image_path) with mp_faces.FaceDetection(min_detection_confidence=0.5) as faces: for idx in range(5): diff --git a/mediapipe/python/solutions/face_mesh_test.py b/mediapipe/python/solutions/face_mesh_test.py index cf112044d..2d8503872 100644 --- a/mediapipe/python/solutions/face_mesh_test.py +++ b/mediapipe/python/solutions/face_mesh_test.py @@ -96,7 +96,8 @@ class FaceMeshTest(parameterized.TestCase): @parameterized.named_parameters(('static_image_mode', True, 1), ('video_mode', False, 5)) def test_face(self, static_image_mode: bool, num_frames: int): - image_path = os.path.join(os.path.dirname(__file__), 'testdata/face.jpg') + image_path = os.path.join(os.path.dirname(__file__), + 'testdata/portrait.jpg') image = cv2.imread(image_path) with mp_faces.FaceMesh( static_image_mode=static_image_mode, diff --git a/mediapipe/python/solutions/pose_test.py b/mediapipe/python/solutions/pose_test.py index 5022514ea..2fb199919 100644 --- a/mediapipe/python/solutions/pose_test.py +++ b/mediapipe/python/solutions/pose_test.py @@ -30,18 +30,18 @@ from mediapipe.python.solutions import drawing_utils as mp_drawing from mediapipe.python.solutions import pose as mp_pose TEST_IMAGE_PATH = 'mediapipe/python/solutions/testdata' -DIFF_THRESHOLD = 30 # pixels -EXPECTED_POSE_LANDMARKS = np.array([[460, 287], [469, 277], [472, 276], - [475, 276], [464, 277], [463, 277], - [463, 276], [492, 277], [472, 277], - [471, 295], [465, 295], [542, 323], - [448, 318], [619, 319], [372, 313], - [695, 316], [296, 308], [717, 313], - [273, 304], [718, 304], [280, 298], - [709, 307], [289, 303], [521, 470], - [459, 466], [626, 533], [364, 500], - [704, 616], [347, 614], [710, 631], - [357, 633], [737, 625], [306, 639]]) +DIFF_THRESHOLD = 15 # pixels +EXPECTED_POSE_LANDMARKS = np.array([[460, 283], [467, 273], [471, 273], + [474, 273], [465, 273], [465, 273], + [466, 273], [491, 277], [480, 277], + [470, 294], [465, 294], [545, 319], + [453, 329], [622, 323], [375, 316], + [696, 316], [299, 307], [719, 316], + [278, 306], [721, 311], [274, 304], + [713, 313], [283, 306], [520, 476], + [467, 471], [612, 550], [358, 490], + [701, 613], [349, 611], [709, 624], + [363, 630], [730, 633], [303, 628]]) class PoseTest(parameterized.TestCase): diff --git a/mediapipe/python/solutions/selfie_segmentation.py b/mediapipe/python/solutions/selfie_segmentation.py new file mode 100644 index 000000000..8aa07569c --- /dev/null +++ b/mediapipe/python/solutions/selfie_segmentation.py @@ -0,0 +1,76 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""MediaPipe Selfie Segmentation.""" + +from typing import NamedTuple + +import numpy as np +# The following imports are needed because python pb2 silently discards +# unknown protobuf fields. +# pylint: disable=unused-import +from mediapipe.calculators.core import constant_side_packet_calculator_pb2 +from mediapipe.calculators.tensor import image_to_tensor_calculator_pb2 +from mediapipe.calculators.tensor import inference_calculator_pb2 +from mediapipe.calculators.tensor import tensors_to_segmentation_calculator_pb2 +from mediapipe.calculators.util import local_file_contents_calculator_pb2 +from mediapipe.framework.tool import switch_container_pb2 +# pylint: enable=unused-import + +from mediapipe.python.solution_base import SolutionBase + +BINARYPB_FILE_PATH = 'mediapipe/modules/selfie_segmentation/selfie_segmentation_cpu.binarypb' + + +class SelfieSegmentation(SolutionBase): + """MediaPipe Selfie Segmentation. + + MediaPipe Selfie Segmentation processes an RGB image and returns a + segmentation mask. + + Please refer to + https://solutions.mediapipe.dev/selfie_segmentation#python-solution-api for + usage examples. + """ + + def __init__(self, model_selection=0): + """Initializes a MediaPipe Selfie Segmentation object. + + Args: + model_selection: 0 or 1. 0 to select a general-purpose model, and 1 to + select a model more optimized for landscape images. See details in + https://solutions.mediapipe.dev/selfie_segmentation#model_selection. + """ + super().__init__( + binary_graph_path=BINARYPB_FILE_PATH, + side_inputs={ + 'model_selection': model_selection, + }, + outputs=['segmentation_mask']) + + def process(self, image: np.ndarray) -> NamedTuple: + """Processes an RGB image and returns a segmentation mask. + + Args: + image: An RGB image represented as a numpy ndarray. + + Raises: + RuntimeError: If the underlying graph throws any error. + ValueError: If the input image is not three channel RGB. + + Returns: + A NamedTuple object with a "segmentation_mask" field that contains a float + type 2d np array representing the mask. + """ + + return super().process(input_data={'image': image}) diff --git a/mediapipe/python/solutions/selfie_segmentation_test.py b/mediapipe/python/solutions/selfie_segmentation_test.py new file mode 100644 index 000000000..3dba08876 --- /dev/null +++ b/mediapipe/python/solutions/selfie_segmentation_test.py @@ -0,0 +1,68 @@ +# Copyright 2021 The MediaPipe Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Tests for mediapipe.python.solutions.selfie_segmentation.""" + +import os + +from absl.testing import absltest +from absl.testing import parameterized +import cv2 +import numpy as np + +# resources dependency +# undeclared dependency +from mediapipe.python.solutions import selfie_segmentation as mp_selfie_segmentation + +TEST_IMAGE_PATH = 'mediapipe/python/solutions/testdata' + + +class SelfieSegmentationTest(parameterized.TestCase): + + def _draw(self, frame: np.ndarray, mask: np.ndarray): + frame = np.minimum(frame, np.stack((mask,) * 3, axis=-1)) + path = os.path.join(tempfile.gettempdir(), self.id().split('.')[-1] + '.png') + cv2.imwrite(path, frame) + + def test_invalid_image_shape(self): + with mp_selfie_segmentation.SelfieSegmentation() as selfie_segmentation: + with self.assertRaisesRegex( + ValueError, 'Input image must contain three channel rgb data.'): + selfie_segmentation.process( + np.arange(36, dtype=np.uint8).reshape(3, 3, 4)) + + def test_blank_image(self): + with mp_selfie_segmentation.SelfieSegmentation() as selfie_segmentation: + image = np.zeros([100, 100, 3], dtype=np.uint8) + image.fill(255) + results = selfie_segmentation.process(image) + normalized_segmentation_mask = (results.segmentation_mask * + 255).astype(int) + self.assertLess(np.amax(normalized_segmentation_mask), 1) + + @parameterized.named_parameters(('general', 0), ('landscape', 1)) + def test_segmentation(self, model_selection): + image_path = os.path.join(os.path.dirname(__file__), + 'testdata/portrait.jpg') + image = cv2.imread(image_path) + with mp_selfie_segmentation.SelfieSegmentation( + model_selection=model_selection) as selfie_segmentation: + results = selfie_segmentation.process( + cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) + normalized_segmentation_mask = (results.segmentation_mask * + 255).astype(int) + self._draw(image.copy(), normalized_segmentation_mask) + + +if __name__ == '__main__': + absltest.main() diff --git a/mediapipe/util/resource_util.cc b/mediapipe/util/resource_util.cc index 042d1e810..8f40154a0 100644 --- a/mediapipe/util/resource_util.cc +++ b/mediapipe/util/resource_util.cc @@ -31,10 +31,10 @@ ResourceProviderFn resource_provider_ = nullptr; absl::Status GetResourceContents(const std::string& path, std::string* output, bool read_as_binary) { - if (resource_provider_ == nullptr || !resource_provider_(path, output).ok()) { - return internal::DefaultGetResourceContents(path, output, read_as_binary); + if (resource_provider_) { + return resource_provider_(path, output); } - return absl::OkStatus(); + return internal::DefaultGetResourceContents(path, output, read_as_binary); } void SetCustomGlobalResourceProvider(ResourceProviderFn fn) { diff --git a/mediapipe/util/resource_util_android.cc b/mediapipe/util/resource_util_android.cc index b7589d8fe..b18354d5f 100644 --- a/mediapipe/util/resource_util_android.cc +++ b/mediapipe/util/resource_util_android.cc @@ -51,7 +51,9 @@ absl::Status DefaultGetResourceContents(const std::string& path, // Try the test environment. absl::string_view workspace = "mediapipe"; - auto test_path = file::JoinPath(std::getenv("TEST_SRCDIR"), workspace, path); + const char* test_srcdir = std::getenv("TEST_SRCDIR"); + auto test_path = + file::JoinPath(test_srcdir ? test_srcdir : "", workspace, path); if (file::Exists(test_path).ok()) { return file::GetContents(path, output, file::Defaults()); } diff --git a/mediapipe/util/resource_util_apple.cc b/mediapipe/util/resource_util_apple.cc index 428018ee4..5fb66c24c 100644 --- a/mediapipe/util/resource_util_apple.cc +++ b/mediapipe/util/resource_util_apple.cc @@ -88,8 +88,9 @@ absl::StatusOr PathToResourceAsFile(const std::string& path) { // Try the test environment. { absl::string_view workspace = "mediapipe"; + const char* test_srcdir = std::getenv("TEST_SRCDIR"); auto test_path = - file::JoinPath(std::getenv("TEST_SRCDIR"), workspace, path); + file::JoinPath(test_srcdir ? test_srcdir : "", workspace, path); if ([[NSFileManager defaultManager] fileExistsAtPath:[NSString stringWithUTF8String:test_path.c_str()]]) { diff --git a/mediapipe/util/tflite/BUILD b/mediapipe/util/tflite/BUILD index 4d66bbe21..8c9687723 100644 --- a/mediapipe/util/tflite/BUILD +++ b/mediapipe/util/tflite/BUILD @@ -81,7 +81,7 @@ cc_library( "@org_tensorflow//tensorflow/lite:framework", "@org_tensorflow//tensorflow/lite/delegates/gpu:api", "@org_tensorflow//tensorflow/lite/delegates/gpu/common:model", - "@org_tensorflow//tensorflow/lite/delegates/gpu/common/testing:tflite_model_reader", + "@org_tensorflow//tensorflow/lite/delegates/gpu/common:model_builder", "@org_tensorflow//tensorflow/lite/delegates/gpu/gl:api2", ], "//mediapipe:android": [ @@ -93,7 +93,7 @@ cc_library( "@org_tensorflow//tensorflow/lite/delegates/gpu:api", "@org_tensorflow//tensorflow/lite/delegates/gpu/cl:api", "@org_tensorflow//tensorflow/lite/delegates/gpu/common:model", - "@org_tensorflow//tensorflow/lite/delegates/gpu/common/testing:tflite_model_reader", + "@org_tensorflow//tensorflow/lite/delegates/gpu/common:model_builder", "@org_tensorflow//tensorflow/lite/delegates/gpu/gl:api2", ], }) + ["@org_tensorflow//tensorflow/lite/core/api"], diff --git a/mediapipe/util/tflite/operations/BUILD b/mediapipe/util/tflite/operations/BUILD index a12bbf4a1..1d4a76d12 100644 --- a/mediapipe/util/tflite/operations/BUILD +++ b/mediapipe/util/tflite/operations/BUILD @@ -17,6 +17,8 @@ licenses(["notice"]) package(default_visibility = [ "//mediapipe:__subpackages__", + # For automated benchmarking of Camera models by TFLite team. + "//learning/brain/models/app_benchmarks/camera_models:__subpackages__", ]) cc_library( diff --git a/mediapipe/util/tflite/tflite_gpu_runner.cc b/mediapipe/util/tflite/tflite_gpu_runner.cc index c77c09524..ef236bf93 100644 --- a/mediapipe/util/tflite/tflite_gpu_runner.cc +++ b/mediapipe/util/tflite/tflite_gpu_runner.cc @@ -27,6 +27,7 @@ #include "tensorflow/lite/core/api/op_resolver.h" #include "tensorflow/lite/delegates/gpu/api.h" #include "tensorflow/lite/delegates/gpu/common/model.h" +#include "tensorflow/lite/delegates/gpu/common/model_builder.h" #include "tensorflow/lite/delegates/gpu/gl/api2.h" #include "tensorflow/lite/model.h" @@ -35,7 +36,6 @@ #ifdef __ANDROID__ #include "tensorflow/lite/delegates/gpu/cl/api.h" #endif -#include "tensorflow/lite/delegates/gpu/common/testing/tflite_model_reader.h" namespace tflite { namespace gpu { diff --git a/mediapipe/util/tflite/tflite_model_loader.cc b/mediapipe/util/tflite/tflite_model_loader.cc index a87d94bd6..abd0e7257 100644 --- a/mediapipe/util/tflite/tflite_model_loader.cc +++ b/mediapipe/util/tflite/tflite_model_loader.cc @@ -23,11 +23,31 @@ absl::StatusOr> TfLiteModelLoader::LoadFromPath( const std::string& path) { std::string model_path = path; - ASSIGN_OR_RETURN(model_path, mediapipe::PathToResourceAsFile(model_path)); - auto model = tflite::FlatBufferModel::BuildFromFile(model_path.c_str()); + std::string model_blob; + auto status_or_content = + mediapipe::GetResourceContents(model_path, &model_blob); + // TODO: get rid of manual resolving with PathToResourceAsFile + // as soon as it's incorporated into GetResourceContents. + if (!status_or_content.ok()) { + LOG(WARNING) + << "Trying to resolve path manually as GetResourceContents failed: " + << status_or_content.message(); + ASSIGN_OR_RETURN(auto resolved_path, + mediapipe::PathToResourceAsFile(model_path)); + MP_RETURN_IF_ERROR( + mediapipe::GetResourceContents(resolved_path, &model_blob)); + } + + auto model = tflite::FlatBufferModel::VerifyAndBuildFromBuffer( + model_blob.data(), model_blob.size()); RET_CHECK(model) << "Failed to load model from path " << model_path; return api2::MakePacket( - model.release(), [](tflite::FlatBufferModel* model) { delete model; }); + model.release(), + [model_blob = std::move(model_blob)](tflite::FlatBufferModel* model) { + // It's required that model_blob is deleted only after + // model is deleted, hence capturing model_blob. + delete model; + }); } } // namespace mediapipe diff --git a/setup.py b/setup.py index 81569b34d..31e5195bb 100644 --- a/setup.py +++ b/setup.py @@ -226,7 +226,8 @@ class BuildBinaryGraphs(build.build): 'face_landmark/face_landmark_front_cpu', 'hand_landmark/hand_landmark_tracking_cpu', 'holistic_landmark/holistic_landmark_cpu', 'objectron/objectron_cpu', - 'pose_landmark/pose_landmark_cpu' + 'pose_landmark/pose_landmark_cpu', + 'selfie_segmentation/selfie_segmentation_cpu' ] for binary_graph in binary_graphs: sys.stderr.write('generating binarypb: %s\n' % @@ -379,12 +380,20 @@ class RemoveGenerated(clean.clean): def run(self): for pattern in [ - 'mediapipe/framework/**/*pb2.py', 'mediapipe/calculators/**/*pb2.py', - 'mediapipe/gpu/**/*pb2.py', 'mediapipe/util/**/*pb2.py' + 'mediapipe/calculators/**/*pb2.py', + 'mediapipe/framework/**/*pb2.py', + 'mediapipe/gpu/**/*pb2.py', + 'mediapipe/modules/**/*pb2.py', + 'mediapipe/util/**/*pb2.py', ]: for py_file in glob.glob(pattern, recursive=True): sys.stderr.write('removing generated files: %s\n' % py_file) os.remove(py_file) + init_py = os.path.join( + os.path.dirname(os.path.abspath(py_file)), '__init__.py') + if os.path.exists(init_py): + sys.stderr.write('removing __init__ file: %s\n' % init_py) + os.remove(init_py) for binarypb_file in glob.glob( 'mediapipe/modules/**/*.binarypb', recursive=True): sys.stderr.write('removing generated binary graphs: %s\n' % binarypb_file) diff --git a/third_party/org_tensorflow_compatibility_fixes.diff b/third_party/org_tensorflow_compatibility_fixes.diff index 2846fcc80..46770e640 100644 --- a/third_party/org_tensorflow_compatibility_fixes.diff +++ b/third_party/org_tensorflow_compatibility_fixes.diff @@ -35,19 +35,6 @@ index ba50783765..5de5ea01f0 100644 #include #include #include -diff --git a/tensorflow/lite/delegates/gpu/cl/serialization.fbs b/tensorflow/lite/delegates/gpu/cl/serialization.fbs -index 67bd587162e..2a3c6bd30dc 100644 ---- a/tensorflow/lite/delegates/gpu/cl/serialization.fbs -+++ b/tensorflow/lite/delegates/gpu/cl/serialization.fbs -@@ -12,7 +12,7 @@ - // See the License for the specific language governing permissions and - // limitations under the License. - --include "tensorflow/lite/delegates/gpu/common/task/serialization_base.fbs"; -+include "../common/task/serialization_base.fbs"; - - namespace tflite.gpu.cl.data; - diff --git a/third_party/eigen3/eigen_archive.BUILD b/third_party/eigen3/eigen_archive.BUILD index dad592bec48..670017c2c0f 100644 --- a/third_party/eigen3/eigen_archive.BUILD