AI Robot Vision | Abdul Mukit

Introduced AI Vision System for Robotic Depalletization @Mujin

Trained, optimized, and deployed an instance segmentation model, increasing the production rate by 50%. The model outperformed Mujin’s existing vision system’s mAP by 60% and detection speed by 800%.
Collected, distilled, and annotated a large warehouse dataset of 20,000 images. CVAT, Datumaro, SAM, and Voxel51 were used to optimize data and annotations carefully.
Developed an auto-annotation software to generate ground truth segmentation datasets with 94% mAP automatically.

Package Condition Sensitive Point Cloud Filtering

Implemented package condition (damaged or tilted) estimation algorithm for cardboard boxes and packs of cans.
Developed 3D point cloud filtering algorithms for damaged and tilted items using OpenCV and PCL.
Integrated vision algorithms to robotic systems at Walmart warehouses enabling damaged and tilted item handling.

Skills/Tools: PyTorch, OpenCV, PCL, Open3D, ONNX, CVAT, Python, Linux, Git, Voxel-51, Datumaro.

A robot has to detect and pick, non-stop every day. A depal robot has to pick infinite variations of boxes, pallets, anti-slip sheets, and packs of cans. The vision system has to update the digital twin in real time for the robot to operate.

I joined as the first Computer Vision Engineer at Mujin-US in 2023. Before I joined, for the past decade, Mujin had traditionally been a 100% geometric computer vision-based company. No machine learning, no deep learning. Previous attempts at using Deep Learning for vision purposes were unsuccessful. However, it had become abundantly clear to Mujin that geometric computer vision alone is struggling to keep up with the endless complexities and chaotic realities of warehouses.

Warehouse automation is highly challenging. Left: Boxes often get damaged during transportation. Middle: Point cloud of the damaged boxes is often either missing portions or is extremely noisy. Right: Boxes or items can become tilted for whatever reason (including the operator just casually throwing some boxes on top so that the robot picks them too. Yes that happens!). Guess what! Robot can't stop. Vision has to work.

I am an avid follower of Dr. Andrew Ng, one of the pioneers of Deep Learning. Equipped with Dr. Andrew Ng’s teachings from Deep Learning Specialization and Machine Learning in Production courses, I was perfectly positioned at Mujin. Within a few months, I was able to introduce the first AI vision system that outperformed Mujin’s geometric vision system of the past decade both in terms of speed and detection accuracy. The new system was nearly 99% accurate while being nearly 10x faster than our existing vision system. Currently, I am leading the R&D of a hybrid vision system that takes advantage of the unique strengths of both AI and geometric vision.

Realtime instance segmentation and 6-DoF pose estimation enabled by deep learning.