This assignment explores 3D shape matching, with possible applications to a 3D shape search engine. That is, given a particular search shape and a database of known shapes, what are the shapes in the database that are the most similar to the shape in question? This assignment explores different "3D shape descriptors," which are objective functions calculated over all of the shapes, as similarity metrics for the shapes in attempt to classify them. In other words, a descriptor is some array of numbers for each shape that is able to hold some features of the shape, but that is a highly compressed (and consequently not usually invertible) description that makes batch comparisons quick.
The database that is used as a benchmark for the different descriptors can be found here. It consists of 200 different shapes (20 different types of shapes, 10 of each type). To test the descriptors for success, each shape is taken out in turn and compared to the remaining shapes, and the program tests to see if the most similar shapes are of the same type (more detail about this in the "results" section).
I wrote the majority of the code for this project in C++, with the help of a few outside libraries to make matrix and vector operations easier.
The main library I used was the "R3" library, from COS 426: Computer Graphics, last semester. The library can be found here. I used R3Point.h to store the point locations in the point cloud, and R3Vector.h to store the normals. I also used the R3Matrix.h class briefly for the egi descriptor, to do a transformation on all of the point and normal sets.
I also used the TNT library to do singular value decomposition.
The models in the database are supplied as triangle meshes. But to calculate most of the descriptors, it is more convenient to go back a step and work from a 3D point cloud with normals at each point. A program, pointsample.exe, is able to convert from the triangle mesh format to a point cloud. I call this program from within my C++ program using the system command.
As each model is sampled, I read in the set of points and normals for each model into a PointCloud object (a class that I invented). In the constructor for this object, I compute the center of mass of all of the points, and subtract it from every point. This is to help normalize for translation, to prevent the objective functions from being different for two very similar shapes that are simply translational offsets of each other.
Also in the constructor, I normalize for scale by scaling each point down by the root mean square distance of all of the points from the origin. This makes the new root mean square distance of all of the points from the origin 1.
NOTE: I also made my own program with OpenGL to view the pointclouds and the normals, because I didn't like the one in Matlab. I simply modified the Mesh Viewer from the same Assignment 2 in 426 last semester that I got the R3 library from. I made it so that the mouse rotates the mesh around its center of mass. Left click to rotate the object, right click to translate the object, and center click to zoom in/zoom out. Type "v" to toggle displaying the points, and type "n" to toggle displaying the normals. Here are a couple of screenshots:
|Looking at the point samples of a ship||Looking at the point samples of a sword, along with their surface normals (drawn in red)|
The simplest shape descriptor is the "D2 shape descriptor." This consists of taking pairs of random points, computing their distances, and placing those distances into a histogram. The histogram can them be looked at as a probability distribution for how far apart points are from each other on the object. Note that because I normalized for scale by scaling down by the RMS distance of points, this distance could be unbounded in the case of outliers (i.e. having tons of points really close to the center and 1 point really far away). This case is rare, and after experimentation it seems that most points are within a distance 3 of the origin after the first step. So I simply chose to put everything further away from 3 in the 3 bin. I then have 100 bins total spaced from a distance of 0 to a distance of 3, and I can take as many random samples as I want to construct the histogram in this interval.
NOTE: In order for the comparisons to make sense between shapes, the same number of random samples should be taken in each shape. If we wanted a different number of samples for some reason, but we still wanted to be as correct as possible, we should scale each bin down by the number of samples (thus, turning the histogram into an actual probability density function which sums to 1). I didn't do this in my program since I'm always comparing objects with the same number of random samples.
The shell histogram is similar to the D2 shape descriptor, except it is deterministic. The bins now represent the distance of each point to the center of mass of the object. I chose to use 10 bins for the shells, with the radii of the shells spaced evenly from 0 to 3.0 (please refer to this analysis for why I settled on 10 bins, as opposed to the 200 that I was originally using).
NOTE: Similarly to the D2 descriptor, comparing shell descriptors only makes sense if the number of sampled point cloud is the same between objects. Otherwise, they should be scaled down by the number of samples to turn them into a PDF. Once again, I don't do this since I always sample each object with the same number of points.
This is similar to the shell descriptor, but one level of complexity is added. Intuitively, for similar shapes, all of the points falling into a shell that's a particular distnace from the origin should have a similar distribution across that shell. But even if the shapes are similar, they could be rotational offsets of each other around the center of mass. This will change the exact distribution of points in shells, but it won't change the number of points in each shell or their locations relative to each other. Thus, this descriptor takes advantage of this fact tries to find the principal directions of the points in a particular shell, and it stores the strength of those directions (instead of their actual directions).
This is done by putting all of the points in a shell as row vectors in an nx3 matrix (where n is the number of points in that shell), and taking the singular value decomposition of that matrix. The singular value matrix stores an orthonormal basis for input vectors that remains orthonormal after being applied to the matrix (singular values are similar to eigenvalues, except the basis can rotate or flip when applied to the matrix, in addition to being scaled). The "singular values" are the square of the amount by which each vector in the input orthonormal basis is scaled as it is put through the matrix. The singular values are listed in decreasing order of magnitude.
Now let's imagine taking a point P and putting it into a 3x1 column vector, and then multiplying it on the left by the nx3 matrix of all points in a shell. Each multiplication of a row by P can be looked at as taking the dot product of P and the point in that row. Intuitively, then, the largest dot product should appear when P is the closest to a principal direction (i.e. a bunch of points are pointing along one axis). This principal axis would correspond to the orthonormal direction with the largest singular value (since it scales the input the most when P has a larger dot product with the rest of the points).
Thus, the singular value decomposition will give back a set of 3 orthonormal vectors, where each vector points in a direction of strong variation, with the direction of strongest variation listed first (since the singular values are listed in decreasing order of magnitude).
|An example of using SVD to find the principal, orthognoal components on a 2D point set|
(this process is also known "Principal Components Analysis," or PCA for short)
Another idea to try to store more information about the shells of points around the origin is to try to partition them into sectors in each shell. First, come up with a set of equally spaced points about the unit sphere (this is the point set I used to define the sectors spaced equally around the sphere). For all of the points within the shell, find the closest point in that set, and place the point there. This way, there is a 2D descriptor, where the first dimension is the radial distance from the center of mass, and the second dimension is the "sector," or some point on the surface of a sphere that points can cluster to.
There is a slight problem if two models are similar but aligned to different rotational axes, however, since this will shift all of the sectors (and the descriptors will be totally different). To try to make this rotationally invariant, a bit of spatial information is sacraficed. There are two options:
A natural question to ask after the shells and sectors descriptor is, is it possible to somehow rotate the shapes so that they will have the same orientation before trying to calculate a descriptor (so that awkward steps like sorting bins can be avoided)? One solution is to calculate the 3 principal components of all of the points in the point cloud to estimate the principal axes of the shape, and then to go into the coordinate frame of those axes (effectively rotating the shape so that those axes align with the unit axes x, y, and z). This is the approach that the Extended Gaussian Image descriptor takes. First, it rotates the 3D shape as such, and then it creates a histogram of normals by choosing which normal is closest to which uniformly-sampled point on the sphere. Note that, unlike the shells and sectors case, there is no longer any information stored about the location of the points. But also note that in a convex shape, no two normals point in the same direction. So this descriptor definition catpures some important information (and it may even be invertible in certain cases...assuming there isn't too much information thrown away when the normals are quantized to bins).
Now let's formalize the first step of calculating the principal directions, by noticing the following facts:
|See how two orthonormal principal direction bases can plausibly describe the same object here|
If you go into the "Debug" directory produced by Visual Studio and type "Descriptors3D" (the main executable for calculating the descriptors), my program prints the following:
Note how the first three parameters are mandatory, and the rest are optional. Here's what this all means:
To test the descriptors for success, some retrieval task should be performed. A common way to analyze classifications against a database it to create a "precision recall graph." What happens is, we start with a model, and we want to find what type it is by comparing it to models we have in a database already. So we compute the euclidean distance between the descriptor of this model and the descriptors of all of the models in the database, and we sort sort the models in ascending order of their Euclidean distance, so that the model with the closest Euclidean distance is deemed the "most similar" to this model using this descriptor (NOTE: There are probably better ways to compare probability distributions, but Euclidean Distance is a good place to start for the purposes of this assignment).
Now go through all of the models in the database from closest to furthest Euclidean distance with the descriptor. Go through a certain number of models in that list. The "recall" is defined as the percentage of the models of the correct type that have appeared in the list up to that point. For example, let's say that we gave the database a cat. If there are 20 cats total in the database, and 5 cats have been seen in the list so far, then the recall is (5/20) = 25%. The precision is defined as the ratio of the number of correct models that have been recalled, to the total number of models that have been recalled up to a point. So if in the first 10 items, the five cats appeared (but the other 5 out of the first 10 most similar were not cats), the precision would be (5/10) = 50%. The recall is plotted on the horizontal axis, while the precision is plotted on the vertical axis.
NOTE: In my program for each descriptor, I go through and take out each model individually, and then query the database as described above with the models that remain. There are 10 models of each type, so I'm looking to recall a total of 9 objects for each model that I take out. I then compute the precision recall graph like this for every item in the database, and average them together for the overall precision recall graph for each descriptor.
Descriptors3D models models/types.txt 10 -resample 200 -d2 100000 -shell -shellPCA -shellSectors -shellSectorsOverall -egi
Descriptors3D models models/types.txt 10 -resample 500 -d2 100000 -shell -shellPCA -shellSectors -shellSectorsOverall -egi
Descriptors3D models models/types.txt 10 -resample 1000 -d2 100000 -shell -shellPCA -shellSectors -shellSectorsOverall -egi
Descriptors3D models models/types.txt 10 -resample 3000 -d2 100000 -shell -shellPCA -shellSectors -shellSectorsOverall -egi
|PointCloud.cpp, PointCloud.h||This is the main class that I created to store point cloud objects from sampled triangle meshes. I store parallel STL Vectors of points and normals. I have member functions that can compute all of the different descriptors for each object|
|main.cpp||This is the main program that constructs tons of PointCloud objects and compares their descriptors to output precision recall graphs|
|ViewCloud.cpp||My own version of a program that allows the user to view and interact with a 3D point cloud (adapted from COS 426 last semester)|
|PlotEverything.m||A matlab scrip that automatically plots the six descriptors against each other and labels them|
|Assignment3.vcproj||The Visual C++ Project file that has all of the proper paths and links everything together|