How to Pick Up an Object Autonomously Using the xArm5 Robotic Arm


In my previous post, “How to Set Up the Development Environment for the xArm5 Robotic Arm,” I explained how to set up a development environment on the Ubuntu 20.04 LTS operating system to write scripts for robot control. Today, I will describe how to use the xArm 5 (manufactured by UFactory) to pick up an object autonomously without using manual control.


I have used ROS (Robot Operating System), an open-source robotics development framework. Firstly, the camera detects the object in the frame and calculates its coordinates. The frame coordinates allow the robotic arm (with the gripper) to move to that location and then pick up the object.

The object coordinates must be translated from the camera frame to the gripper frame in real time. Hence, the camera needs to be calibrated, for which I used two packages: aruco_ros and easy_handeye. I put the Aruco marker in the frame for the camera to detect and identify, and then I saved the calculation results that will be used later on for transferring object coordinates from the camera frame to the gripper frame.

The robotic arm is equipped with an Intel Depth Sensing Camera D415 that allows me to do the object detection and track the object in real time. I used the open-source package,find_object_2d, a simple Qt interface, to use OpenCV implementations of popular feature detectors and descriptors such as SIFT and SURF. The camera detects the features of the object in the view and puts a frame around the object (that can be seen on the screen) in real-time.

A white matte background gives the best lighting conditions for optimal performance of object detection models. Dark colors absorb the light, and shiny white surfaces reflect too much light resulting in too many noisy features also detected.

Also, if the object is too close to the camera, then the camera cannot determine depth. Hence, the object should be placed at least 20 inches from the camera. However, feel free to try out this distance by using trial and error.

The frame coordinates calculated by the object detection software give the shape of the object (x, y) and the distance of the object from the camera (depth, z). All these 3 coordinates (x, y, z) are calculated with reference to the camera. However, I need the coordinates of the object frame with respect to the gripper since the gripper has to move to the location and pick up the object.

The object (a Keurig coffee capsule, in this experiment) is detected by the camera, and a blue square frame is put around it in the left window. The right window displays the message for successful detection of the object.

The object frame coordinates are converted from the camera frame to the gripper frame in real-time using thetf package in ROS.

Now, the gripper opens its ends to the maximum and moves into the overhead position (directly above the object) using the coordinates provided in the previous step. The gripper moves downwards (in the z-axis), picks up the object, and moves the object to the desired new location coordinates.

Refinements and Further Work

I am working on testing our code on edge cases and resolving the problems. A few issues I am currently working on are:

  • Replacing feature-based object detection models with deep-learning-based object detection models. The traditional OpenCV implementations of SURF and SIFT are not robust and accurate at object detection. For example, the object in the frame is not recognized if lighting conditions or object pose change.
  • Handling hard cases for object gripping, for example, the gripper should also be able to pick up the object when approaching from the side.
  • Setting a maximum and minimum threshold for the object’s distance from the camera. Making a check in the code that gives the user a message to place the object at the correct location.

Perfecting a simple pick-and-place task will allow us to learn the different components involved in robotic arm manipulation and navigation. This will help us use the robotic arm to perform complex tasks for our retail customers.

Please stay tuned for the next updates :)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Vivek Sahukar

I’m a data scientist and my interests are deep learning and computer vision. Currently, I am working with a robotic arm to pick and place objects autonomously.