A Rusting Rubyist III
Rust HTTP Requests in a Ruby Module
This is part three of a series where I try to stumble my way through creating a Rust scraping library that will be embeddable in a Ruby module. If you are interested in starting from the beginning you can check out all my posts here: https://medium.com/@mfpiccolo
Follow along with the part-3 branch of the scrape repo.
Last time in ARRII, we set up an embeddable Rust lib and opened an api for a couple of string manipulation functions that could be called from a ruby module. That is some good progress towards our goal of web scraping but at the moment it really isn’t very “web” for a web scraper. (It also isn’t very scrape, but we will get to that part later)
Thank goodness the Rust core team understood the importance of a well built package management tool and created cargo. It is going to allow us to pull in some awesome code so we won’t have to write a the hard stuff, like the http request library, from scratch (although that would be an awesome project). Looking around the web, i come to a library called http-rust. That sounds great lets… Woah.
YOU SHOULD NOT USE THIS LIBRARY AT ALL
Straight from the readme. Okay, at least there is a link to another repo, hyper.
Hyper is a fast, modern HTTP implementation written in and for Rust.
840 commits — last updated a day ago. That sounds about right.
To start lets add hyper as a dependency in our Cargo.toml file.
Practicing a little Rust TDD we can set up and empty function that takes a string, returns a string and a simple test to make sure that the string returned equals the first two tag openings of the rust-lang.org body.
Lets give her a try.
$ cargo test
src/lib.rs:8:14: 8:71 error: mismatched types:
Looks like slice_chars returns and &str so lets change the type for url to &str and try again.
$ cargo test
src/lib.rs:1:1: 3:2 error: not all control paths return a value
There that is better. This is the error we are looking for because if you look at our function, it doesn’t do anything or return anything so this test should be failing. Red.
If we look at the hyper readme, it is pretty straight forward to use. We need to add the `hyper` crate and use two structs: Client and Connection
I will go ahead and put those to good use and set a response to a mutable variable binding `resp`.
I am not totally sure what resp is at this point so I am going to use the old fake-method trick to see what type it is. Just add any method that you know is not implemented for any type (i.e. this_is_totally_not_a_method() or something shorter like sldkfj()) to the end of the variable and run the test.
$ cargo test
no method named `sldkfj` found for type `hyper::client::response::Response` in the current scope
Awesome. That test us that is it is a Response type. Continuing to follow along the readme, we see that we will need to use the standard lib Read.
Finally lets get our web on!
A few things to point out here. First of all, to call the read_to_string we need to add line 7 which pulls in the Read trait from the standard library. The other thing to look at here is at the top of the file, the compiler was yelling at me to add the slice chars feature because it is apparently an unstable feature.
$ cargo test
running 1 test
test it_works ... ok
And there it is. We are requesting and returning html through an http request in Rust! All that is left to do now is to open up the api to the Ruby module and we are good to go. Green
First off we need to do our basic string conversion. Ruby is passing in a C representation of a string and we need Rust strings (and references to Rust strings). Remember way back in ARRII when we abstracted out out some Ruby to Rust and vice-versa functions and I said something like ‘…these may come in handy’. Well I guess I was right because we need them again.
Most of the new stuff added above was explained in the previous post so I won’t go into great detail about what is going on. Enable a few features to make the compiler happy, use a few Structs and some our conversion functions so we can work with Ruby strings and voila. You have yourself an embeddable get request function.
Just like in the previous post we are going to use FII to open up a Ruby interface to the Rust embeddable functions.
Lets try out our jenky unit test.
So Web Right Now
$ ruby scrape.rb
Gone Dun It!
We did it. Congrats for getting through that with me. We took one big step towards our goal of creating our Ruby callable web scraper in Rust!
Progress makes me happy :) But we cannot get complacent if we are going to reach our goals! We must push on. The next one is going to be a big one. Returning strings are great but we are going to need to return more advanced structures to be able to pull this whole thing off. ARRIV is going to be on Structs and Arrays. See you next time.
The rust community, for the most part, is pretty nice to newbs so don’t be afraid to ask a Stack Overflow question or get on the rust IRC channel. Special thanks to Stack Overflow users Adrian, shepmaster, Chris Morganand DK. Also Steve Klabnik for doing a great job on the docs.
And of course don’t hesitate to hit me up in the comments or on twitter @mfpiccolo