FRC Team 2530's Vision for 2017
This year I was in charge of developing a vision system for Steamworks — of course with help from fellow students and mentors! So here are some of the details of how it works. (Note: I will not cover autonomous navigation here.)
Let me know if you have any questions or comments!
The way we set it up is that the RoboRIO streams webcam footage to the driver station, where it is picked up by a Java program based on code generated from GRIP, which sends numbers (in inches) back (over NetworkTables) to the robot for navigation. This setup is a little esoteric, and there are certainly other solutions which work too! (But I’m fond of this way.)
Some pros (versus various alternatives):
- Simple to set up in hardware (no coprocessor)
- Frees the RoboRIO from (probably expensive) image computations, the robot only needs to pick up a few numbers from the NetworkTables
- In fact, these numbers are already going to be useful measurements in inches
- Also a little more flexible — use any OpenCV function to calculate dimensions, not just the GRIP ones!
- People other than you have put in most of the work you need to make this approach work already!
- It will be affected by network latency, although that shouldn’t be too bad
- A little esoteric setup with many different parts
- Still requires a knowledge of how different pieces work to set up — in particular, understanding OpenCV contour calculations (though I have examples of the basic ones)
We used a LifeCam surrounded by a green LED ring. The webcam was mounted on a servo (which we basically never used …), attached via a 3d-printed mount. The LED ring had its wires re-soldered to be more sturdy and was connected to a Spike relay, connected on relay port 0. The webcam’s USB cable was plugged directly into the RoboRIO — we used no coprocessor. (It should not matter which USB port.)
Hardware–software interface (C++)
We’ve been using C++ for our robot code. The Relay interface is straightforward, just use it in forward-only mode like so. The webcam uses the automatic capture feature (like so), which will send it to the driver station, which GRIP and the vision program can pick up.
The second parameter to StartAutomaticCapture (probably just 0 if you have only one camera) corresponds to the 0 in “cam0” as seen on the RoboRIO Webdashboard.
GRIP vision processing
GRIP is the way to program vision processing for FRC. Use it!
(I also looked at NI Vision but could not see how to interface it with everything else.)
Some of our GRIP programs are stored in this unorganized Git repository, which also contains the vision sample images released for 2017: Team2530/Vision2017 on GitHub.
Webcam setup: Direct USB to computer
For testing it is convenient to plug the webcam into a computer running GRIP. Luckily this is easy to set up — just select it as a source.
Webcam setup: RoboRIO
The primary way to test the webcam will be to plug it into the RoboRIO, connect to that over the network, and run GRIP on the driver station. You can see the results live!
Note: since we are using a separate program for vision processing, I would recommend leaving GRIP closed for competition, so it doesn’t use extra resources, but this should not be a big deal.
I don’t know if this is documented as well … so here’s how I did it.
The camera is streamed across the network as an IP Camera — so use that source. The URL is a bit tricky to find — it’s buried in the Network Tables, which you can see in the SmartDashboard.
Specifically, if you look under the “CameraPublisher” table, you should find a subtable (“USB Camera 0” for example).
Sometimes the source stops working. If it does, try replacing the “RoboRIO-FRC-TEAM” in the URL (which requires a DNS to resolve) with the IP address (“10.TE.AM.2”).
Architecture: Filtering, Detecting, Filtering
The first step is to resize the image. This ensures that the data coming in is consistent, which is (somewhat) necessary for the vision calculations, also the size filtering to some extent.
Then we filter based on pixels. One of our mentors recommended HSL over HSV. Hue should be around the center for a green LED — this can be fairly narrow. Saturation will be very high. Luminance will depend on the exposure and brightness settings of the camera, but should range from some value all the way up to max.
Next we find contours — only external ones, and run convex hulls, to simplify all of the curves. We also filter to remove extraneous noise:
- Minimum area started at 150, for the resolution of our image. Also depends on how far away you want to track from. This is the primary number that will filter out small noise.
- Minimum ratio is actually useful in this situation, since the targets are supposed to have an aspect ration of 2in/5in=0.4 as viewed from straight on. 0.2 was my conservative guess, since targets will get slimmer if viewed from an angle.
- Maximum ratio I set at 1, since even when the gear peg interferes with the target, a contour should never be wider than it is tall.
- Filter out impossibly weird shapes with max vertices … I put 20.
Another team recommended matching erosion/dilation steps to filter out small particles, but I think this was accomplished effectively enough by filtering. YMMV.
You can set up network publish operations, but this is not relevant to generating code — it only runs when the GRIP program itself is running. Network Table publishing does allow you to view the results in the driver station (if GRIP is running)…
Set the adjustments to the threshold value (where it just barely captures only everything you need), plus or minus a little more to be conservative. Lighting and quality of the retroreflective tape may affect the quality, but there should not be large changes needed.
Fun fact: at the Seven Rivers regional, their practice field had a whole airship set up with functioning gear lifts, but the retroreflective tape was actually too weak to be picked up by our camera’s default filter program, even though our homemade target worked OK in the same lighting!
Again, some of the HSL Threshold values will depend on exposure and brightness, so find good values for those!
In GRIP go to Tools > Generate Code, or use Ctrl+G. Enter these settings:
- Export as a Java program.
- Ensure that “Implement WPILib VisionPipeline” is UNCHECKED. The Java sample program does not seem to include this class, so it causes an error.
- Pick names for the pipeline and package. We chose to call it “GripGearPipeline” in “edu.team2530”.
- Save it into the folder of the sample vision source — more on this in the next section. It should end like VisionBuildSamples/Java/src/main/java. (Yes, I use Unix path conventions. The right way.)
Vision program itself (Java)
WPILib has published a sample program/framework to process vision on a driver station. It’s really nice — it handles setting up OpenCV and the network connection, even refreshing the network to connect as soon as the robot is available.
Why Java for a C++ team, since the example includes both? Basically I was too lazy to find a C++ compiler. Yup.
The version I created for my team for 2017 resides here: Team2530/VisionBuildSamples on GitHub.
Things you will need to change:
- GRIP Pipeline class import
- GRIP Pipeline class name (used here)
- Team number (5… 10… 15… 20… TWENTY-FIVE THIRTY!!1!1 Oh, not yours? sorry …)
- NetworkTable name (for returning calculations to your robot)
- USB Camera name (probably ok as is)
Open the directory in a command prompt (cmd on windoze) and run “gradlew build” (or gradlew.bat on windoze). Add “--offline” to compile without an internet connection. (Gradle is a little fussy about downloading dependencies.) Pretty much works like magic though (I think gradle even downloads itself on first run or something).
Once you have compiled, you should see a directory called output whose most important member is runCameraVision(.bat). This starts the program, which will check for a connection to the robot’s webcam stream and publish data to the Network Tables as soon as it can.
Protip 1: make a shortcut for this and place it right next to the driver station so your drive team can’t forget it! I also made a shortcut for compiling too.
Protip 2: Make a pit checklist with this on it and make sure everyone on your drive team knows how to start vision or ensure it is running prior to competition!
I was interested in obtaining two measurements from the camera picture: distance from the targets, and displacement left/right from the center of the peg. Note that this is relative to the camera, so we had to account for the distance (about 6 inches) from the center of the camera to the center of where the gear sits.
It should be possible to calculate the angle, but we did not do this as we used a gyro (as part of a NavX … a little overkill tbh) to ensure our robot would be aligned with the appropriate gear peg.
Here is what the vision targets looked like:
The steps to run the pipeline result in a list of contours which we must iterate through.
Unfortunately, GRIP does not seem to generate the proper OpenCV commands to calculate the metrics that it publishes when it is running. But that’s okay, we can recreate them and even do better!
We’ll need to calculate the area of each of the squares, using Imgproc.contourArea on each contour.
We also need the average area of the two. I find that this works pretty well even when the angle is off — as one gets larger, the other gets smaller and the average will stay roughly constant. But, generally, it will be more accurate to ensure the robot is aligned to the right angle.
Note: this area could depend a little on how high the camera is relative to the tape. It is best to have it near the same level, but a little off should be okay.
We also need the average x-position of the squares. Get the coordinates of each bounding box using Imgproc.boundingRect. Take x and add half of the width.
Calculating left/right displacement is quite straightforward, and involves no trigonometry or anything.
The first trick is calculating a conversion from pixels to inches — and it’s easier to start with square pixels to inches squared. The average area should correspond to approximately 10in² (= length × width = 2in × 5in according to the diagram) per piece of tape. Take the square root of this ratio and we get a conversion px/in.
Note: pixels on the camera are technically a unit of angle, but due to the small-angle approximation for the small perspective seen by the camera, it approximates a unit of length.
Use this ratio to convert the average displacement of the targets and send that over the Network Tables.
Also, if there is just one strip of tape visible, and your camera is mounted off to the side, you can add or subtract a constant (4.125 in: the distance from the center of one strip to peg) based on which tape you are more likely to see — left or right. We did this in the robot code.
Note: I accounted for the camera’s offset from center in the autonomous code. This is because we changed that more often, and in general the robot code is more attached to the hardware aspects of the robot (whereas the vision program is just tied to the camera resolution and the nature of the targets to calculate). Thus this displacement number seen in the network tables is relative to the camera’s position.
This is the essential measurement and is a little trickier. The distance is a function of the area seen. Ready for it? It’s proportional to the inverse of the square root of the average area, i.e. 1/sqrt(avgArea) or exp(avgArea, -0.5).
(If the target were high up, like on the boiler, it could be effective to calculate the angle to it, with perspective, and use its height (above the camera, not the floor) to calculate the distance to it. Bleh, trigonometry. Requires a very steady camera angle too.)
What’s the constant of proportionality? Unfortunately that depends on … a lot of things. The view angle of the camera, the resolution of the image received, the area of the targets in real life. The easiest way is to use GRIP to test a few distances and build up a graph of distance versus area and try to fit that using a graphing program. Or calculate distance/sqrt(avgArea) repeatedly and find a good average of that. For us, using the LifeCam at a resolution 320x240 searching for 10in² targets, at a height roughly equivalent to where the targets are, the constant came to 1023.0.
Measurements versus calculations seemed to be within inches, so pretty accurate. It did not seem to slow down the laptop and network usage was acceptable. Good enough, I say.
It should be possible to have the computations adjust to whatever resolution the video feed is at, but I have not implemented that.
It would be really cool to calculate the expected y-height of the targets and filter in the program for this. Then we could not worry about detecting ceiling lights and bright green shirts that show our team spirit! (True story, fluorescent ceiling lights were detected by my program at least at one point, but I never had to worry about this for competition, or maybe that bug was fixed.)
It also would be super cool if we could detect two contours that have been split up by the peg (which occurs close enough at an angle…) and stitch it back together.
I also tried to create a processed image that we could display on the dashboard with like targeting information but that never really worked … https://github.com/Team2530/VisionBuildSamples/blob/ba80e02e9db104d402f2a5fc7d657bba9f7ec734/Java/src/main/java/Main.java#L128
The sample programs for WPILib (accessible from Eclipse) do show some examples that involve creating another image, but that’s code that runs on the RoboRIO and yeah, different beast.
Also, we thought about having the brightness and exposure reset to a default setting when the LED was OFF, so the driver could see through it normally, but my implementation was not reliable enough to ensure that the exposure and brightness were correct when we had the LED on, so we had to disable it.