Enabling the generalizable grasping for embodied AI

GenDexGrasp is a novel, hand-agnostic grasping algorithm that addresses the challenges of generalizable robotic grasping. Trained on the large-scale MultiDex dataset, it efficiently generates diverse and successful grasps for various multi-fingered robotic hands, outperforming previous methods in success rate, inference speed, and diversity.

Paper Link: GenDexGrasp: Generalizable Dexterous Grasping

Figure 1: General Grasping

Although the current grasping algorithms used by robotic hands can achieve relatively stable object grasping, they still fall far short in terms of the generality and diversity of graspable objects compared to human grasping abilities. Humans are not only capable of using all their fingers for a full grasp but can also efficiently use two or three fingers when some fingers are unavailable. Moreover, when we imagine having hands of different shapes, such as tentacle-like or talon-like hands, we can quickly envision how to stably grasp objects with those new hands. In order to achieve generalizable and diverse dexterous grasping that approaches human-level performance, this paper proposes GenDexGrasp, a novel grasp algorithm designed for arbitrary hands. Compared to previous general grasp algorithms, GenDexGrasp achieves a balance among success rate, reasoning speed, and generation diversity.

In this paper, we define general-purpose dexterous grasping as the problem of generating grasp poses for unseen robotic hands and observed objects. We evaluate general-purpose dexterous grasping based on three aspects: speed, diversity, and generalization capability. Existing methods can only achieve acceptable results in two of these aspects at most.

Figure 2: Pipeline of GenDexGrasp

To achieve a balance in these three aspects, we have designed GenDexGrasp, a dexterous grasping algorithm that is applicable to arbitrary hands. Firstly, we use a conditional variational autoencoder (cVAE) to generate contact surfaces for a given object that are compatible with arbitrary hands.

Next, we optimize the hand poses to match the generated contact surfaces. Finally, we refine the grasp poses through physical simulation to ensure the physical feasibility of the contacts. GenDexGrasp provides generalization by reducing assumptions about hand structures and achieves fast reasoning through improved contact surface computation and efficient optimization strategies. It achieves diversity in grasp generation through a variational generation model with random initialization.

Figure 3: schematic diagram of aligned distance

(b) and (d) represent the Euclidean distance and the corresponding contact surface for the grasp shown in the figure (a) under the Euclidean distance metric. (c) and (e) represent the alignment distance and the corresponding contact surface for the grasp shown in the figure (a) under the alignment distance metric.

To address the ambiguity of contact surfaces during grasp optimization, especially for thin-shell objects, we have designed a novel metric called “alignment distance” to measure the distance between the object surface points and the hand. It helps to accurately represent the contact surfaces generated by the grasping algorithm. Specifically, the traditional Euclidean distance tends to erroneously label both sides of a thin-shell object as contact points when contacting only one side, while the alignment distance considers the direction of contact points and the normals of the object surface, correcting these errors.

To learn contact surfaces for arbitrary hands, we used force-closure optimization [1] to collect a large-scale multi-hand dataset called MultiDex. MultiDex consists of 436,000 diverse grasp poses for 58 household objects with five different hands.

Through experiments, we have demonstrated that our method can generate diverse grasp poses for three-fingered, four-fingered, and five-fingered robotic hands separately, even without prior exposure to such hands. Table 1 presents quantitative experiments that demonstrate the balanced achievement of quality, speed, and diversity in our approach.

Figure 4: Results generated by GenDexGrasp for three-fingered robotic hand (Barrett, first row), four-fingered robotic hand (Allegro, second row), and five-fingered robotic hand (Shadowhand, third row).

For each row of results generated, GenDexGrasp has not seen any grasp data for the corresponding indexed robotic hand in the training data.

Table 1: Quantitative Experimental Results demonstrating the first achievement of a balance among success rate, diversity, and reasoning speed in our method.

This article introduces GenDexGrasp, a general-purpose dexterous grasping method that can be applied to any robotic hand. By utilizing contact surfaces as an intermediate representation, a novel alignment distance metric for measuring the distance from the hand to points, and a new grasp algorithm, GenDexGrasp is capable of generating diverse and high-quality grasp poses within reasonable inference time.

Quantitative experiments demonstrate that our method achieves a reasonable balance among quality, diversity, and speed for the first time. Additionally, we have collected a large-scale synthetic dataset called MultiDex for dexterous grasping. MultiDex includes five robotic hands with different kinematic structures, common household objects, and diverse grasp poses.

Explore more about Research

Scroll to Top