Real-time face mesh point cloud with Three.JS, Tensorflow.js and Typescript

TECHTEE
10 min readJun 13, 2022

--

Introduction:

This article focuses on the steps needed to implement a realtime face mesh point cloud with Three.js and Tensorflow.js. It assumes previous knowledge of asynchronous javascript and Three.js basics, it will not cover the basics.

The source code for the project can be found in this Git repo. It will be helpful to have a look at that code while reading this article as some of the basic implementation steps will be skipped.

This article will also implement the project in an object oriented way with lots of abstractions, so a basic understanding of classes in Typescript is an advantage.

Implementation Steps:

  1. Get Three Js setup
  2. Generate video data from webcam
  3. Create a face mesh detector
  4. Create empty point cloud
  5. Feed the tracking information to the point cloud
  1. Get Three.js setup:

Since our goal in this tutorial is to render a face point cloud, we will need to start with setting up our Three js scene.

The data and methods needed for setting up the scene are encapsulated in a factory class called ThreeSetUp in sceneSetUp.ts. This class is responsible for creating all the necessary scene objects like the renderer, camera and the scene. It also initiates the resize handler for the canvas element. This class has the following public methods:

a. getSetUp: this function returns an object containing the camera, the scene, the renderer and the sizes information for the canvas.

getSetUp(){
return {
camera: this.camera,
scene: this.scene,
renderer : this.renderer,
sizes: this.sizes,
}
}

b. applyOrbitControls: this method will take care of adding orbit controls in our set up and returns the function we need to call to update the orbit controls.

applyOrbitControls(){
const controls = new OrbitControls(
this.camera, this.renderer.domElement!
)
controls.enableDamping = true
return ()=> controls.update();
}

Our main implementation class FacePointCloud will initiate the ThreeSetUP class and call these two methods to get the setup elements and apply orbit controls.

2. Generate video data from webcam:

In order for us to be able to get face mesh tracking information, we need a Pixel Input to supply to the face mesh tracker. In this case, we will be using the devices webcam to generate such input. We will also use an HTML video element (without adding it to the Dom) to read the media stream from the webcam and load it in a way our code can interface with. Following this step, we will be setting up an HTML canvas element (also without adding to the Dom) and rendering our video output to it. This allows us to also have the option to generate a Three Js texture from the canvas and use it as a material (we will not be implementing this in this tutorial). The canvas element is what we will be using as an Input to the FaceMeshTracker.

To handle reading the media stream from the webcam and loading it to the video HTML element, we will create a class called WebcamVideo. This class will handle creating the HTML video element and calling the navigator api to load get user permission and load information from the device’s webcam.

On initiating this class, the private init method will be called which has the following code:

private init(){
navigator.mediaDevices.getUserMedia(this.videoConstraints)
.then((mediaStream)=>{
this.videoTarget.srcObject = mediaStream
this.videoTarget.onloadedmetadata = () => this.onLoadMetadata()
}
).catch(function (err) {
alert(err.name + ': ' + err.message)
}
)
}

This method calls the getUserMedia method on the mediaDevices property of the navigator object. This method takes video constraints (aka video settings) as a parameter and takes returns a promise. This promise resolves to the mediaStream object which contains the video data from the webcam. In the resolve callback of the promise, we set the source of our video element as the mediaStream returned.

In the promise resolve callback, we also add a loadedmetadata event listener on the video element. The callback of this listener triggers the onLoadMetaData method of the object and sets the following side effects:

a. Autoplaying the video

b. Ensuring that the video plays inline

c. Calls an optional callback we pass to the object to call when the event fires

private onLoadMetadata(){
this.videoTarget.setAttribute('autoplay', 'true')
this.videoTarget.setAttribute('playsinline', 'true')
this.videoTarget.play()
this.onReceivingData()
}

At this point we have a WebcamVideo object which handles creating the video element which contains our live webcam data. The next step is to paint the video output on a canvas object.

For this, we will create a specific WebcamCanvas class which consumes the WebcamVideo class. This class will create an instance of WebcamVideo class and use it to paint the video’s output to a canvas using the drawImage() canvas context method. This will be implemented on an updateFromWebcam method.

updateFromWebCam(){
this.canvasCtx.drawImage(
this.webcamVideo.videoTarget,
0,
0,
this.canvas.width,
this.canvas.height
)
}

We will have to continuously call this function in a render loop to keep updating the canvas with the current frame of the video.

At this point, we have our Pixel Input ready as a Canvas Element which displays our webcam.

3. Create a Face Mesh detector with Tensorflow.js:

Creating the face mesh detector and generating detection data is the main part of this tutorial. This will implement the Tensorflow.js face landmark detection model.

npm add @tensorflow/tfjs-core, @tensorflow/tfjs-converter
npm add @tensorflow/tfjs-backend-webgl
npm add @tensorflow-models/face-detection
npm add @tensorflow-models/face-landmarks-detection

After installing all the relevant packages, we will create a class which handles the following:

a. Loading the model,

b. Getting the detector object,

c. Adding the detector to the class,

d. Implement a public detect function to be used by other objects.

We have created a file called faceLandmark.ts which implements the class. The imports on top of the file are:

import '@mediapipe/face_mesh'
import '@tensorflow/tfjs-core'
import '@tensorflow/tfjs-backend-webgl'
import * as faceLandmarksDetection from '@tensorflow-models/face-landmarks-detection'

these modules will be needed to run and create the detector object.

We create the FaceMeshDetectorClass which looks like this:

the main method in this class is getDetector which calls the createDetector method on faceLandMarksDetection which we imported from Tensorflow.js. The createDetector then takes the model which we leaded in the constructor:

this.model = faceLandmarksDetection.SupportedModels.MediaPipeFaceMesh;

and the detection config object which specifies parameters for the detector:

this.detectorConfig = {
runtime: 'mediapipe',
refineLandmarks: true,
solutionPath: 'https://cdn.jsdelivr.net/npm/@mediapipe/face_mesh',
}

the detect function will return a promise, the promise will resolve to the detector object. The private getDetector function is then used in the loadDetector public async method, this method sets this.detector property on the class to the detector.

The FaceMeshDetector class also implements a public detectFace method:

async detectFace(source){
const data = await this.detector!.estimateFaces(source)
const keypoints = (data as FaceLandmark[])[0]?.keypoints
if(keypoints) return keypoints;
return [];
}

this method takes a source parameter which is the pixel input. This is where we will use the canvas element we sat up above as a source for the tracking. This function will be called as follows:

faceMeshDetector.detectFace(this.webcamCanvas.canvas)

this method calls the estimateFaces method on the detector, if this method detects faces in the webcam output, it will return an array with an object containing the detection data. This object has a property called keypoints, it includes an array of object for each of the 478 points the model detects on the face. Each object has x, y and z properties which include the coordinates of the point in the canvas. Example:

[
{
box: {
xMin: 304.6476503248806,
xMax: 502.5079975897382,
yMin: 102.16298762367356,
yMax: 349.035215984403,
width: 197.86034726485758,
height: 246.87222836072945
},
keypoints: [
{x: 406.53152857172876, y: 256.8054528661723, z: 10.2, name:
"lips"},
{x: 406.544237446397, y: 230.06933367750395, z: 8},
...
],
}
]

It is important to note that those points are returned as coordinates in canvas space, this means that the reference point, the x: 0 and y: 0 point is on the top left of the canvas. This will be relevant later when we will have to convert the coordinates into Three.js scene space which has the point of reference in the center of the scene.

At this point, we have our pixel input source, as well as the face mesh detector which will give us the detected points. Now, we can move to the Three.js part!

4. Create empty point cloud:

In order to generate the face mesh in Three.js we will have to load the face mesh points from the detector and then use them as position attributes for a Three js Points object. To make the Three js face mesh reflect the movements in the video (react in real time) we will have to update this position attribute whenever there is a face detection change from the detector we created.

To implement that, we will create another factory class called PointCloud which will creates an empty Points object, and a public method we can use to update the attributes of this points object such as the position attribute. This class will look like this:

export default class PointCloud {
bufferGeometry: THREE.BufferGeometry;
material: THREE.PointsMaterial;
cloud: THREE.Points<THREE.BufferGeometry, THREE.PointsMaterial>;

constructor() {
this.bufferGeometry = new THREE.BufferGeometry();
this.material = new THREE.PointsMaterial({
color: 0x888888,
size: 0.0151,
sizeAttenuation: true,
});
this.cloud = new THREE.Points(this.bufferGeometry, this.material);
}
updateProperty(attribute: THREE.BufferAttribute, name: string){
this.bufferGeometry.setAttribute(
name,
attribute
);
this.bufferGeometry.attributes[name].needsUpdate = true;
}
}

This class initiates the empty BufferGrometry, a material for the points and the points object which consumes both. Adding this point object to the scene will not change anything as the geometry does not have any position attribute, in other words no vertices.

The PointCloud class also exposes the updateProperty method which takes in a buffer attribute, and the name of the property. It will then call the bufferGeometry setAttribute method and sets the needsUpdate property to true. This will allow Three.js to reflect the changes of the bufferAttribute on the next requestAnimationFrame iteration.

This updateProperty method is the method we will use to change the shape of the point cloud based on the points received from the Tensorflow.js detector.

Now, we also have our point cloud ready to take in new position data. So, it’s time to tie everything together !!

5. Feed the tracking information to the PointCloud:

To tie everything together we will create an implementation class to call the classes, methods and steps we need to get everything working. This class is called FacePointCloud. In the constructor, it will instantiate the following classes:

a. The ThreeSetUp class to get the scene setup objects

b. The CanvasWebcam to get a canvas object which displays the webcam content

c. The faceLandMark class to load the tracking models and get the detector

d. The PointCloud class to set up an empty point cloud and update it later with the detection data

constructor() {
this.threeSetUp = new ThreeSetUp()
this.setUpElements = this.threeSetUp.getSetUp()
this.webcamCanvas = new WebcamCanvas();
this.faceMeshDetector = new faceLandMark()
this.pointCloud = new PointCloud()
}

this class will also have a method called bindFaceDataToPointCloud which performs the main part of our logic, which is taking the data provided by the detector, convert it to a form Three.js can understand, create a Three.js buffer attribute from it and use it to update the point cloud.

async bindFaceDataToPointCloud(){
const keypoints = await
this.faceMeshDetector.detectFace(this.webcamCanvas.canvas)
const flatData = flattenFacialLandMarkArray(keypoints)
const facePositions = createBufferAttribute(flatData)
this.pointCloud.updateProperty(facePositions, 'position')
}

So we are passing our canvas pixel source to the detectFace method, and then performing manipulation on the returned data in the utility function flattenFacialLandMarkArray. This is very important because of two issues:

a. As we mentioned above, the points from the face detection model will be returned in the following shape:

keypoints: [
{x: 0.542, y: 0.967, z: 0.037},
...
]

while the buffer attribute expects the data/numbers in the following shape:

number[] or [0.542, 0.967, 0.037, .....]

b. The difference in the coordinates system between the source of the data, the canvas, which has a coordinates system that looks like this:

a picture explaining html canvas coordinates system starting from top left corner

and the Three.js scene coordinates system which looks like this:

3D space coordinates system with xyz and origin in the center

so considering these two options we implemented the flattenFacialLandMarkArray function which takes care of those issues. The code for this function looks as follows:

function flattenFacialLandMarkArray(data: vector[]){
let array: number[] = [];
data.forEach((el)=>{
el.x = mapRangetoRange(500 / videoAspectRatio, el.x,
screenRange.height) - 1

el.y = mapRangetoRange(500 / videoAspectRatio, el.y,
screenRange.height, true)+1
el.z = (el.z / 100 * -1) + 0.5;

array = [
...array,
...Object.values(el),
]
})
return array.filter((el)=> typeof el === 'number');
}

the flattenFacialLandMarkArray function takes the keypoints input we receive from the face detector, and spreads them into an array to be in a number[] form instead of objects[]. Before passing the numbers to the new output array, it maps them from the canvas coordinates system to three.js coordinates system through the mapRangetoRange function. This function looks as follows:

function mapRangetoRange(from: number, point: number, range: range, invert: boolean = false): number{
let pointMagnitude: number = point/from;
if(invert) pointMagnitude = 1-pointMagnitude;
const targetMagnitude = range.to - range.from;
const pointInRange = targetMagnitude * pointMagnitude +
range.from;

return pointInRange
}

we can now create our init function and our animation loop. This was implemented in the initWork method of the FacePointCloud class as follows:

async initWork() {
const { camera, scene, renderer } = this.setUpElements
camera.position.z = 3
camera.position.y = 1
camera.lookAt(0,0,0)
const orbitControlsUpdate = this.threeSetUp.applyOrbitControls()
const gridHelper = new THREE.GridHelper(10, 10)
scene.add(gridHelper)
scene.add(this.pointCloud.cloud)

await this.faceMeshDetector.loadDetector()

const animate = () => {
requestAnimationFrame(animate)
if (this.webcamCanvas.receivingStreem){
this.bindFaceDataToPointCloud()
}
this.webcamCanvas.updateFromWebCam()
orbitControlsUpdate()
renderer.render(scene, camera)
}

animate()
}

We can see how this init function ties everything together, it gets the Three.js setup elements and sets up the camera, adds a gridHelper to the scene and our pointCloud.

It then loads the detector on the faceLandMark class and moves to setting up our animate function. Inside this animate function, we are first checking if our WebcamCanvas element is receiving the stream from the webcam, and then calling the bindFaceDataToPointCloud method which internally calls the detect face function and converts the data to a bufferAttribute and updates the point cloud position attribute.

Now, if you run the code you should get the following result in the browser!

For more information about TECHTEE, and how we can help you build your software product or solution, visit us here.

--

--

TECHTEE

A software house building emerging, early-adoption technology - that matters.