Classification for hand pose gestures

Mar 11, 2024

This is a brief post on classification, covering data capture and then applying KMeans classification. It serves as a supplementary post to the main article.

Preparation for data collection

In the first place, to perform any data classification, we need to have the data. In our case, utilize the skeleton from a previous post, which already implements hand posture detection. You don’t need to follow the full post, just the first part where detection is implemented. Here is the source code.

Then we need to make several small changes to the code. First, let's add a function for data collection:

const captureData = async () => {
    const hands = await detector.estimateHands(videoRef.current!);

    if (hands.length > 0) {
      const data = createKeyMap(hands[0].keypoints);
      const distances = getHandPoseEstimationsDistances(data);

      const rows = localStorage.getItem('handPoseData') ? (localStorage.getItem('handPoseData') as string).split('\n') : [];
      const newRow = `${['palm', ...distances].join(',')}`;

      rows.push(newRow);

      localStorage.setItem('handPoseData', rows.join('\n'));
    }
  }

Next, let's add a trigger to capture the data by adding a button element in the return statement:

<button onClick={captureData}>Capture</button>

We’re all set up to start collecting the data! Here you can find the full code of the example.

Data collection

Once we're all set up, we can start collecting the data. The idea is quite simple: we need to capture N samples for one hand posture, then N samples of another hand posture, and repeat this process for as many postures as you want.

To capture the posture, you need to specify what type of posture it is inside the captureData function, in our example we’re using palm. Then, show this posture in front of the camera and click on the Capture button. We need to repeat this process for N iterations with different angles and distances of your hand. The more variations you have, the more robust the classification will be.I would recommend having at least 100 samples of each posture.

Once we have all the samples we need, the next step is to create a CSV file. For that, we need to open developer tools in the browser, print the stored data, and copy it into the clipboard:

To do that, in the console, you need to type 'localStorage.getItem('handPoseData')' and then copy the data using 'Copy string elements.

Once you have our data in the clipboard, you can open a spreadsheet and paste the data:

Select 'Split text to columns' for proper data formatting. Last, but not least, you need to download the file as a CSV (File → Download → Comma Separated Values). And that's all! Our dataset is ready to be used.

Classification

For the classification, we will use Colab. The very first step is to create a Colab project. Once you have a Colab project created, we will need to import several dependencies:

from google.colab import drive
import pandas as pd
from sklearn.cluster import KMeans

To access our data, we will use Google Drive. The idea is to connect to Google Drive and load our created dataset from it:

# Define a path to the data
data_path = '/content/drive/MyDrive/{path_to_your_data}/data.csv'

# Connect to the Google Drive
drive.mount('/content/drive', force_remount=True)

# Load the data 
csv_data = pd.read_csv(data_path)

It is recommended to visualize your data before making any further progress. This is an important step to become familiar with the dataset, ensure that our datasets appear as we expect, and observe the insights we aim to extract. To keep this post brief, we will skip this step. However, if you want to explore further, a hint would be to use matplotlib.pyplot for data visualization.

Let's move to the most exciting part: training the data!

# For classification, we need to use only numeric values,
# so we need to drop our first column of the dataset that contains
# the name of the pose.
csv_data.drop(columns=csv_data.columns[0], inplace=True)

# Number of clusters.
# It does correlate with the number of different hand gestures.
# If we want to classify two hand poses,
# we need to provide the number of clusters equal to 2.
n_clusters = 2

# Create KMeans clusters
kmeans = KMeans(n_clusters=n_clusters)

# Feed the data into the KMeans
clusters = kmeans.fit_predict(csv_data)

The last step is to print those centroids, so we can copy them and use them for classification:

# Get the cluster centroids
centroids = kmeans.cluster_centers_

# Convert centroids to a suitable format (like a list of lists)
centroids_list = centroids.tolist()

centroids_list

Here you can find full Colab project with each step described!

Summary

In this straightforward post, we explored how to collect data in the client application and use this data for classification. This provides fundamental knowledge necessary for making any classification using the K-means method.

If you're interested in learning how this classification can be applied in a client application, or if you want to become more familiar with TensorFlow.js, I invite you to check out my post, Controlling web app through the camera and hand gestures.

Thank you for reading. Any feedback or suggestions are welcome.

Deividas’s Substack

Discussion about this post