185 lines
7.6 KiB
Markdown
185 lines
7.6 KiB
Markdown
|
## Hand Tracking on Desktop
|
||
|
|
||
|
This is an example of using MediaPipe to run hand tracking models (TensorFlow
|
||
|
Lite) and render bounding boxes on the detected hand (one hand only). To know
|
||
|
more about the hand tracking models, please refer to the model [`README file`].
|
||
|
Moreover, if you are interested in running the same TensorfFlow Lite model on
|
||
|
Android/iOS, please see the
|
||
|
[Hand Tracking on GPU on Android/iOS](hand_tracking_mobile_gpu.md) and
|
||
|
|
||
|
We show the hand tracking demos with TensorFlow Lite model using the Webcam:
|
||
|
|
||
|
- [TensorFlow Lite Hand Tracking Demo with Webcam (CPU)](#tensorflow-lite-hand-tracking-demo-with-webcam-cpu)
|
||
|
|
||
|
- [TensorFlow Lite Hand Tracking Demo with Webcam (GPU)](#tensorflow-lite-hand-tracking-demo-with-webcam-gpu)
|
||
|
|
||
|
Note: Desktop GPU works only on Linux. Mesa drivers need to be installed. Please
|
||
|
see
|
||
|
[step 4 of "Installing on Debian and Ubuntu" in the installation guide](./install.md).
|
||
|
|
||
|
Note: If MediaPipe depends on OpenCV 2, please see the [known issues with OpenCV 2](#known-issues-with-opencv-2) section.
|
||
|
|
||
|
### TensorFlow Lite Hand Tracking Demo with Webcam (CPU)
|
||
|
|
||
|
To build and run the TensorFlow Lite example on desktop (CPU) with Webcam, run:
|
||
|
|
||
|
```bash
|
||
|
# Video from webcam running on desktop CPU
|
||
|
$ bazel build -c opt --define MEDIAPIPE_DISABLE_GPU=1 \
|
||
|
mediapipe/examples/desktop/hand_tracking:hand_tracking_cpu
|
||
|
|
||
|
# It should print:
|
||
|
#Target //mediapipe/examples/desktop/hand_tracking:hand_tracking_cpu up-to-date:
|
||
|
# bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_cpu
|
||
|
#INFO: Elapsed time: 22.645s, Forge stats: 13356/13463 actions cached, 1.5m CPU used, 0.0s queue time, 819.8 MB ObjFS output (novel bytes: 85.6 MB), 0.0 MB local output, Critical Path: 14.43s, Remote (87.25% of the time): [queue: 0.00%, network: 14.88%, setup: 4.80%, process: 39.80%, fetch: 18.15%]
|
||
|
#INFO: Streaming build results to: http://sponge2/360196b9-33ab-44b1-84a7-1022b5043307
|
||
|
#INFO: Build completed successfully, 12517 total actions
|
||
|
|
||
|
$ export GLOG_logtostderr=1
|
||
|
# This will open up your webcam as long as it is connected and on
|
||
|
# Any errors is likely due to your webcam being not accessible
|
||
|
$ bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_cpu \
|
||
|
--calculator_graph_config_file=mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt
|
||
|
```
|
||
|
|
||
|
### TensorFlow Lite Hand Tracking Demo with Webcam (GPU)
|
||
|
|
||
|
To build and run the TensorFlow Lite example on desktop (GPU) with Webcam, run:
|
||
|
|
||
|
```bash
|
||
|
# Video from webcam running on desktop GPU
|
||
|
# This works only for linux currently
|
||
|
$ bazel build -c opt --copt -DMESA_EGL_NO_X11_HEADERS \
|
||
|
mediapipe/examples/desktop/hand_tracking:hand_tracking_gpu
|
||
|
|
||
|
# It should print:
|
||
|
# Target //mediapipe/examples/desktop/hand_tracking:hand_tracking_gpu up-to-date:
|
||
|
# bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_gpu
|
||
|
#INFO: Elapsed time: 84.055s, Forge stats: 6858/19343 actions cached, 1.6h CPU used, 0.9s queue time, 1.68 GB ObjFS output (novel bytes: 485.1 MB), 0.0 MB local output, Critical Path: 48.14s, Remote (99.40% of the time): [queue: 0.00%, setup: 5.59%, process: 74.44%]
|
||
|
#INFO: Streaming build results to: http://sponge2/00c7f95f-6fbc-432d-8978-f5d361efca3b
|
||
|
#INFO: Build completed successfully, 22455 total actions
|
||
|
|
||
|
$ export GLOG_logtostderr=1
|
||
|
# This will open up your webcam as long as it is connected and on
|
||
|
# Any errors is likely due to your webcam being not accessible,
|
||
|
# or GPU drivers not setup properly.
|
||
|
$ bazel-bin/mediapipe/examples/desktop/hand_tracking/hand_tracking_gpu \
|
||
|
--calculator_graph_config_file=mediapipe/graphs/hand_tracking/hand_tracking_mobile.pbtxt
|
||
|
```
|
||
|
|
||
|
#### Graph
|
||
|
|
||
|
![graph visualization](images/hand_tracking_desktop.png)
|
||
|
|
||
|
To visualize the graph as shown above, copy the text specification of the graph
|
||
|
below and paste it into
|
||
|
[MediaPipe Visualizer](https://viz.mediapipe.dev).
|
||
|
|
||
|
```bash
|
||
|
# MediaPipe graph that performs hand tracking on desktop with TensorFlow Lite
|
||
|
# on CPU & GPU.
|
||
|
# Used in the example in
|
||
|
# mediapipie/examples/desktop/hand_tracking:hand_tracking_cpu.
|
||
|
|
||
|
# Images coming into and out of the graph.
|
||
|
input_stream: "input_video"
|
||
|
output_stream: "output_video"
|
||
|
|
||
|
# Caches a hand-presence decision fed back from HandLandmarkSubgraph, and upon
|
||
|
# the arrival of the next input image sends out the cached decision with the
|
||
|
# timestamp replaced by that of the input image, essentially generating a packet
|
||
|
# that carries the previous hand-presence decision. Note that upon the arrival
|
||
|
# of the very first input image, an empty packet is sent out to jump start the
|
||
|
# feedback loop.
|
||
|
node {
|
||
|
calculator: "PreviousLoopbackCalculator"
|
||
|
input_stream: "MAIN:input_video"
|
||
|
input_stream: "LOOP:hand_presence"
|
||
|
input_stream_info: {
|
||
|
tag_index: "LOOP"
|
||
|
back_edge: true
|
||
|
}
|
||
|
output_stream: "PREV_LOOP:prev_hand_presence"
|
||
|
}
|
||
|
|
||
|
# Drops the incoming image if HandLandmarkSubgraph was able to identify hand
|
||
|
# presence in the previous image. Otherwise, passes the incoming image through
|
||
|
# to trigger a new round of hand detection in HandDetectionSubgraph.
|
||
|
node {
|
||
|
calculator: "GateCalculator"
|
||
|
input_stream: "input_video"
|
||
|
input_stream: "DISALLOW:prev_hand_presence"
|
||
|
output_stream: "hand_detection_input_video"
|
||
|
|
||
|
node_options: {
|
||
|
[type.googleapis.com/mediapipe.GateCalculatorOptions] {
|
||
|
empty_packets_as_allow: true
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
|
||
|
# Subgraph that detections hands (see hand_detection_cpu.pbtxt).
|
||
|
node {
|
||
|
calculator: "HandDetectionSubgraph"
|
||
|
input_stream: "hand_detection_input_video"
|
||
|
output_stream: "DETECTIONS:palm_detections"
|
||
|
output_stream: "NORM_RECT:hand_rect_from_palm_detections"
|
||
|
}
|
||
|
|
||
|
# Subgraph that localizes hand landmarks (see hand_landmark_cpu.pbtxt).
|
||
|
node {
|
||
|
calculator: "HandLandmarkSubgraph"
|
||
|
input_stream: "IMAGE:input_video"
|
||
|
input_stream: "NORM_RECT:hand_rect"
|
||
|
output_stream: "LANDMARKS:hand_landmarks"
|
||
|
output_stream: "NORM_RECT:hand_rect_from_landmarks"
|
||
|
output_stream: "PRESENCE:hand_presence"
|
||
|
}
|
||
|
|
||
|
# Caches a hand rectangle fed back from HandLandmarkSubgraph, and upon the
|
||
|
# arrival of the next input image sends out the cached rectangle with the
|
||
|
# timestamp replaced by that of the input image, essentially generating a packet
|
||
|
# that carries the previous hand rectangle. Note that upon the arrival of the
|
||
|
# very first input image, an empty packet is sent out to jump start the
|
||
|
# feedback loop.
|
||
|
node {
|
||
|
calculator: "PreviousLoopbackCalculator"
|
||
|
input_stream: "MAIN:input_video"
|
||
|
input_stream: "LOOP:hand_rect_from_landmarks"
|
||
|
input_stream_info: {
|
||
|
tag_index: "LOOP"
|
||
|
back_edge: true
|
||
|
}
|
||
|
output_stream: "PREV_LOOP:prev_hand_rect_from_landmarks"
|
||
|
}
|
||
|
|
||
|
# Merges a stream of hand rectangles generated by HandDetectionSubgraph and that
|
||
|
# generated by HandLandmarkSubgraph into a single output stream by selecting
|
||
|
# between one of the two streams. The former is selected if the incoming packet
|
||
|
# is not empty, i.e., hand detection is performed on the current image by
|
||
|
# HandDetectionSubgraph (because HandLandmarkSubgraph could not identify hand
|
||
|
# presence in the previous image). Otherwise, the latter is selected, which is
|
||
|
# never empty because HandLandmarkSubgraphs processes all images (that went
|
||
|
# through FlowLimiterCaculator).
|
||
|
node {
|
||
|
calculator: "MergeCalculator"
|
||
|
input_stream: "hand_rect_from_palm_detections"
|
||
|
input_stream: "prev_hand_rect_from_landmarks"
|
||
|
output_stream: "hand_rect"
|
||
|
}
|
||
|
|
||
|
# Subgraph that renders annotations and overlays them on top of the input
|
||
|
# images (see renderer_cpu.pbtxt).
|
||
|
node {
|
||
|
calculator: "RendererSubgraph"
|
||
|
input_stream: "IMAGE:input_video"
|
||
|
input_stream: "LANDMARKS:hand_landmarks"
|
||
|
input_stream: "NORM_RECT:hand_rect"
|
||
|
input_stream: "DETECTIONS:palm_detections"
|
||
|
output_stream: "IMAGE:output_video"
|
||
|
}
|
||
|
|
||
|
```
|
||
|
|
||
|
[`README file`]:https://github.com/google/mediapipe/tree/master/mediapipe/README.md
|