Project import generated by Copybara.

GitOrigin-RevId: 5b4c149782c086ebf9ef390195fb260ad0103217
2021-02-27 16:09:58 -05:00 · 2021-02-27 16:09:58 -05:00 · a92cff7a60
commit a92cff7a60
parent 350fbb2100
2 changed files with 147 additions and 121 deletions
--- a/docs/solutions/pose.md
+++ b/docs/solutions/pose.md
@ -2,6 +2,8 @@
 layout: default
 title: Pose
 parent: Solutions
 has_children: true
 has_toc: false
 nav_order: 5
 ---
@ -21,10 +23,9 @@ nav_order: 5
 ## Overview
 Human pose estimation from video plays a critical role in various applications
-such as
+such as [quantifying physical exercises](./pose_classification.md), sign
-[quantifying physical exercises](#pose-classification-and-repetition-counting),
+language recognition, and full-body gesture control. For example, it can form
-sign language recognition, and full-body gesture control. For example, it can
+the basis for yoga, dance, and fitness applications. It can also enable the
 form the basis for yoga, dance, and fitness applications. It can also enable the
 overlay of digital content and information on top of the physical world in
 augmented reality.
@ -387,121 +388,6 @@ on how to build MediaPipe examples.
    *   Target:
        [`mediapipe/examples/desktop/upper_body_pose_tracking:upper_body_pose_tracking_gpu`](https://github.com/google/mediapipe/tree/master/mediapipe/examples/desktop/upper_body_pose_tracking/BUILD)
 ## Pose Classification and Repetition Counting
 One of the applications
 [BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
 can enable is fitness. More specifically - pose classification and repetition
 counting. In this section we'll provide basic guidance on building a custom pose
 classifier with the help of a
 [Colab](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
 and wrap it in a simple
 [fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
 powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
 are used for demonstration purposes as the most common exercises.
 ![pose_classification_pushups_and_squats.gif](../images/mobile/pose_classification_pushups_and_squats.gif) |
 :--------------------------------------------------------------------------------------------------------: |
 *Fig 4. Pose classification and repetition counting with MediaPipe Pose.*                                  |
 We picked the
 [k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
 (k-NN) as the classifier. It's simple and easy to start with. The algorithm
 determines the object's class based on the closest samples in the training set.
 To build it, one needs to:
 *   Collect image samples of the target exercises and run pose prediction on
    them,
 *   Convert obtained pose landmarks to a representation suitable for the k-NN
    classifier and form a training set,
 *   Perform the classification itself followed by repetition counting.
 ### Training Set
 To build a good classifier appropriate samples should be collected for the
 training set: about a few hundred samples for each terminal state of each
 exercise (e.g., "up" and "down" positions for push-ups). It's important that
 collected samples cover different camera angles, environment conditions, body
 shapes, and exercise variations.
 ![pose_classification_pushups_un_and_down_samples.jpg](../images/mobile/pose_classification_pushups_un_and_down_samples.jpg) |
 :--------------------------------------------------------------------------------------------------------------------------: |
 *Fig 5. Two terminal states of push-ups.*                                                                                    |
 To transform samples into a k-NN classifier training set, either
 [basic](https://drive.google.com/file/d/1z4IM8kG6ipHN6keadjD-F6vMiIIgViKK/view?usp=sharing)
 or
 [extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
 Colab could be used. They both use the
 [Python Solution API](#python-solution-api) to run the BlazePose models on given
 images and dump predicted pose landmarks to a CSV file. Additionally, the
 extended Colab provides useful tools to find outliers (e.g., wrongly predicted
 poses) and underrepresented classes (e.g., not covering all camera angles) by
 classifying each sample against the entire training set. After that, you'll be
 able to test the classifier on an arbitrary video right in the Colab.
 ### Classification
 Code of the classifier is available both in the
 [extended](https://drive.google.com/file/d/19txHpN8exWhstO6WVkfmYYVC6uug_oVR/view?usp=sharing)
 Colab and in the
 [ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
 Please refer to them for details of the approach described below.
 The k-NN algorithm used for pose classification requires a feature vector
 representation of each sample and a metric to compute the distance between two
 such vectors to find the nearest pose samples to a target one.
 To convert pose landmarks to a feature vector, we use pairwise distances between
 predefined lists of pose joints, such as distances between wrist and shoulder,
 ankle and hip, and two wrists. Since the algorithm relies on distances, all
 poses are normalized to have the same torso size and vertical torso orientation
 before the conversion.
 ![pose_classification_pairwise_distances.png](../images/mobile/pose_classification_pairwise_distances.png) |
 :--------------------------------------------------------------------------------------------------------: |
 *Fig 6. Main pairwise distances used for the pose feature vector.*                                         |
 To get a better classification result, k-NN search is invoked twice with
 different distance metrics:
 *   First, to filter out samples that are almost the same as the target one but
    have only a few different values in the feature vector (which means
    differently bent joints and thus other pose class), minimum per-coordinate
    distance is used as distance metric,
 *   Then average per-coordinate distance is used to find the nearest pose
    cluster among those from the first search.
 Finally, we apply
 [exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
 (EMA) smoothing to level any noise from pose prediction or classification. To do
 that, we search not only for the nearest pose cluster, but we calculate a
 probability for each of them and use it for smoothing over time.
 ### Repetition Counter
 To count the repetitions, the algorithm monitors the probability of a target
 pose class. Let's take push-ups with its "up" and "down" terminal states:
 *   When the probability of the "down" pose class passes a certain threshold for
    the first time, the algorithm marks that the "down" pose class is entered.
 *   Once the probability drops below the threshold, the algorithm marks that the
    "down" pose class has been exited and increases the counter.
 To avoid cases when the probability fluctuates around the threshold (e.g., when
 the user pauses between "up" and "down" states) causing phantom counts, the
 threshold used to detect when the state is exited is actually slightly lower
 than the one used to detect when the state is entered. It creates an interval
 where the pose class and the counter can't be changed.
 ### Future Work
 We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
 allow us to use joint angles in the feature vectors, which are more natural and
 easier to configure (although distances can still be useful to detect touches
 between body parts) and to perform rotation normalization of poses and reduce
 the number of camera angles required for accurate k-NN classification.
 ## Resources
 *   Google AI Blog:
@ -512,5 +398,3 @@ the number of camera angles required for accurate k-NN classification.
 *   [Models and model cards](./models.md#pose)
 *   [Web demo](https://code.mediapipe.dev/codepen/pose)
 *   [Python Colab](https://mediapipe.page.link/pose_py_colab)
 *   [Pose Classification Colab (Basic)](https://mediapipe.page.link/pose_classification_basic)
 *   [Pose Classification Colab (Extended)](https://mediapipe.page.link/pose_classification_extended)
--- a/docs/solutions/pose_classification.md
+++ b/docs/solutions/pose_classification.md
@ -0,0 +1,142 @@
 ---
 layout: default
 title: Pose Classification
 parent: Pose
 grand_parent: Solutions
 nav_order: 1
 ---
 # Pose Classification
 {: .no_toc }
 <details close markdown="block">
  <summary>
    Table of contents
  </summary>
  {: .text-delta }
 1. TOC
 {:toc}
 </details>
 ---
 ## Overview
 One of the applications
 [BlazePose](https://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html)
 can enable is fitness. More specifically - pose classification and repetition
 counting. In this section we'll provide basic guidance on building a custom pose
 classifier with the help of [Colabs](#colabs) and wrap it in a simple
 [fitness app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)
 powered by [ML Kit](https://developers.google.com/ml-kit). Push-ups and squats
 are used for demonstration purposes as the most common exercises.
 ![pose_classification_pushups_and_squats.gif](../images/mobile/pose_classification_pushups_and_squats.gif) |
 :--------------------------------------------------------------------------------------------------------: |
 *Fig 1. Pose classification and repetition counting with MediaPipe Pose.*                                  |
 We picked the
 [k-nearest neighbors algorithm](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm)
 (k-NN) as the classifier. It's simple and easy to start with. The algorithm
 determines the object's class based on the closest samples in the training set.
 **To build it, one needs to:**
 1.  Collect image samples of the target exercises and run pose prediction on
    them,
 2.  Convert obtained pose landmarks to a representation suitable for the k-NN
    classifier and form a training set using these [Colabs](#colabs),
 3.  Perform the classification itself followed by repetition counting (e.g., in
    the
    [ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app)).
 ## Training Set
 To build a good classifier appropriate samples should be collected for the
 training set: about a few hundred samples for each terminal state of each
 exercise (e.g., "up" and "down" positions for push-ups). It's important that
 collected samples cover different camera angles, environment conditions, body
 shapes, and exercise variations.
 ![pose_classification_pushups_un_and_down_samples.jpg](../images/mobile/pose_classification_pushups_un_and_down_samples.jpg) |
 :--------------------------------------------------------------------------------------------------------------------------: |
 *Fig 2. Two terminal states of push-ups.*                                                                                    |
 To transform samples into a k-NN classifier training set, both
 [`Pose Classification Colab (Basic)`] and
 [`Pose Classification Colab (Extended)`] could be used. They use the
 [Python Solution API](./pose.md#python-solution-api) to run the BlazePose models
 on given images and dump predicted pose landmarks to a CSV file. Additionally,
 the [`Pose Classification Colab (Extended)`] provides useful tools to find
 outliers (e.g., wrongly predicted poses) and underrepresented classes (e.g., not
 covering all camera angles) by classifying each sample against the entire
 training set. After that, you'll be able to test the classifier on an arbitrary
 video right in the Colab.
 ## Classification
 Code of the classifier is available both in the
 [`Pose Classification Colab (Extended)`] and in the
 [ML Kit demo app](https://mediapipe.page.link/mlkit-pose-classification-demo-app).
 Please refer to them for details of the approach described below.
 The k-NN algorithm used for pose classification requires a feature vector
 representation of each sample and a metric to compute the distance between two
 such vectors to find the nearest pose samples to a target one.
 To convert pose landmarks to a feature vector, we use pairwise distances between
 predefined lists of pose joints, such as distances between wrist and shoulder,
 ankle and hip, and two wrists. Since the algorithm relies on distances, all
 poses are normalized to have the same torso size and vertical torso orientation
 before the conversion.
 ![pose_classification_pairwise_distances.png](../images/mobile/pose_classification_pairwise_distances.png) |
 :--------------------------------------------------------------------------------------------------------: |
 *Fig 3. Main pairwise distances used for the pose feature vector.*                                         |
 To get a better classification result, k-NN search is invoked twice with
 different distance metrics:
 *   First, to filter out samples that are almost the same as the target one but
    have only a few different values in the feature vector (which means
    differently bent joints and thus other pose class), minimum per-coordinate
    distance is used as distance metric,
 *   Then average per-coordinate distance is used to find the nearest pose
    cluster among those from the first search.
 Finally, we apply
 [exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average)
 (EMA) smoothing to level any noise from pose prediction or classification. To do
 that, we search not only for the nearest pose cluster, but we calculate a
 probability for each of them and use it for smoothing over time.
 ## Repetition Counting
 To count the repetitions, the algorithm monitors the probability of a target
 pose class. Let's take push-ups with its "up" and "down" terminal states:
 *   When the probability of the "down" pose class passes a certain threshold for
    the first time, the algorithm marks that the "down" pose class is entered.
 *   Once the probability drops below the threshold, the algorithm marks that the
    "down" pose class has been exited and increases the counter.
 To avoid cases when the probability fluctuates around the threshold (e.g., when
 the user pauses between "up" and "down" states) causing phantom counts, the
 threshold used to detect when the state is exited is actually slightly lower
 than the one used to detect when the state is entered. It creates an interval
 where the pose class and the counter can't be changed.
 ## Future Work
 We are actively working on improving BlazePose GHUM 3D's Z prediction. It will
 allow us to use joint angles in the feature vectors, which are more natural and
 easier to configure (although distances can still be useful to detect touches
 between body parts) and to perform rotation normalization of poses and reduce
 the number of camera angles required for accurate k-NN classification.
 ## Colabs
 *   [`Pose Classification Colab (Basic)`]
 *   [`Pose Classification Colab (Extended)`]
 [`Pose Classification Colab (Basic)`]: https://mediapipe.page.link/pose_classification_basic
 [`Pose Classification Colab (Extended)`]: https://mediapipe.page.link/pose_classification_extended