Use VisualML Studio To Predict Taxi Fares In New York Part 2
This article is the second in a series of predicting taxi fares in New York with visual workflow designer for Microsoft’s ML.NET library that simplifies ML training and processing.
Please refer to the first article for details about predicting taxi fares example and VisualML Studio installation instructions.
INSTALLATION
After you install VisualML Studio (make sure you have the latest release), navigate to Templates repository page. Download and extract TaxiFarePredictionCS.zip template from latest release. This is pre-prepared visual script, so that you don’t have to build it from scratch.
Navigate to ZenEngine\project\HelloWorld folder under the root of VisualML Studio. Copy assets and DB from extracted TaxiFarePredictionCS.zip into this folder (overwrite files from previous example).
Open the Template and set paths according to your environment (you already did this in the first article, but now we downloaded new Template):
- Click on ML Text Loader element. Select the path to taxi-fare-train.csv file on your computer (\visual_ml_studio_win\ZenEngine\project\HelloWorld\assets\taxi-fare-train.csv)
- Click on Text Loader Test. Select path to the taxi-fare-test.csv file from same location.
- Click on ML Model Evaluation. Enter the location where trained model will be saved.
(Click on ML Model Evaluation again so that colors it red)
Now click on the Save Template button.
LIMITATIONS FROM THE FIRST ARTICLE
In the first article basic predictions of taxi fares were demonstrated. But there are few limitations according to the original ML.NET github example:
- Visual Element that filters noise from data doesn’t exist yet at the time of writing this article
- We couldn’t test single prediction, because this part requires two custom C# objects, TaxiTrip and TaxiTripFarePrediction:
var predEngine = mlContext.Model.CreatePredictionEngine<TaxiTrip, TaxiTripFarePrediction>(trainedModel);Those two objects will never be part of VisualML Studio in a form of Visual Elements, because they are too specific to Taxi Fares Prediction case.
But how cool would it be if we could somehow include them? And if we could inject this snippet of code to test single prediction. We could then use all the visual goodies that are already implemented inside the platform and just write those two tiny pieces of code. And we wouldn’t be locked on just already implemented Visual Elements.
Here I’m going to show you how to overcome those kind of limitations.
VisualML Studio allows you to write custom C# scripts right into Visual Element. This is the strongest VisualML Studio feature that allows you unlimited flexibility in interaction of your custom code with existing Visual Elements.
In your custom scripts, you can:
- get results from Visual Elements
- set Visual Elements results
- get property values from Visual Elements
- set properties of Visual Elements
- execute Visual Elements without being physically connected with virtual wires. That means that script elements can at runtime make decision which path of workflow to choose, based on some conditions
… and much more. All that without even leaving VisualML Studio. So fasten your seatbelts and read on :)
IMPROVEMENTS
If you read the first article then this template should look familiar to you:
But you will notice two additional Elements, TextLoader Filtered and SinglePredictionTest.
Those two Elements contains C# code that allow you inserting custom scripts into existing workflow. Let’s start with SinglePredictionTest.
SCRIPT SECTIONS
If you click on this Element, you’ll see Add code button. Click on this button will open a dialog where you can enter custom code:
Let’s examine individual sections:
<header>
This is a place where you include references and using directives. References are paths to your or 3rd party dll’s. Those are relative paths to your project root. If you search Implementations folder you’ll find all this Microsoft.ML dll’s and much more.
After references there are using statements. Using statements are part of .NET Framework, so use them as you do in ordinary C# applications.
<code run=”true”>
This section is an entry point to CsScript Visual Element code execution. It’s like “main” function in .NET console applications. When workflow “hit” CsScript element, this code is executed.
<code type=”function”>
This section contains functions. It would be messy without breaking down the main function into smaller ones.
<code>
Place reserved for custom objects
SYSTEM FUNCTIONS
System functions are part of Visual StudioML. They take care of all communication between CsScript and other Visual Elements that exists in workflow. They are also bridging layer between managed elements and unmanaged C/C++ orchestration engine. But fear not, they are very simple to use.
Now that individual section are explained, I’ll explain the code where you’ll see how those system functions can interact with existing Visual Elements inside workflow.
Now scroll up to the beginning of the code.We’ll start by setting some id’s:
string MLContextElementId = “fc9ca34f-655a-406a-8e57–3839defc3927”;string modelEvaluationElementId = “a554b611-b859–4ddb-a125-b482261837aa”;
You will need to interact with two Visual Elements so you’ll need their id’s:
- First one is ML Context Element that contains all operations needed for testing prediction
- Second one is Model Evaluation Element that contains the property that you set at the beginning of this article (path of the saved model on your file system)
You don’t see element id’s in visual script (just elements names), but you can easily get it by selecting element from combo box and click on Get Id button. This will copy element id into clipboard.
var mlContext = ((MLContext)get_result_raw(MLContextElementId));This line shows usage of get_result_raw system function.
Almost every element in Template stores some kind of result. You can get those results with this function by passing elementId as argument. Elements can store results like data schemas, data from CSV files etc.
ML Context element is visual one that creates and stores object of type MLContext that you’ll need for prediction operation later.
var mlModelEvaluation = Path.Combine(get_element_property(modelEvaluationElementId,”MODEL_SAVE_PATH”),”Model.zip”);This line shows usage of get_element_property function that gets element property value. It accepts elementId as first parameter and property name as second.
You can get list of Visual Element properties by clicking Get Properties button.
Those are properties that you can find on left properties toolbar when you click on the element. We need MODEL_SAVE_PATH — this is a property that you set when you defined path where model is going to be saved. We need to append “Model.zip” to the path, because this is default model file name that element prepends automatically.
TestSinglePrediction(mlContext,mlModelEvaluation);This line is just a function call, that is defined in the next section.
Rest of code is just standard NetCore code taken from official Taxi Fares Prediction github example.
Now let’s move to second script element — TextLoader Filtered:
This code is removing extreme data like “outliers” for FareAmounts higher than $150 and lower than $1 which can be error-data. System function that we didn’t use in the first script element is
set_result(currentElement,filteredView)
This function sets the result of element, so that can be used by other Visual Elements inside workflow.
Remember, ML Training Pipeline now gets filtered data from this script element instead of raw data from ML Text Loader (like we did in the first taxi fare prediction article).
If you click on ML Training Pipeline Element, you’ll see that TextLoader Filtered Element is selected as Training data instead of ML Text Loader.
As you might have noticed, there is Clear cache checkbox in the header of the main window
C# code from script elements is compiled and dll’s are stored in cache folder. Every time the engine is started, it looks for dll inside this cache. If dll is found, then engine references it. If not, then it compile the code and saves it into cache. This optimize engine boot time and that’s perfect for production environments (when we don’t change the code anymore).
But when workflow is in development phase, we need to see our code changes immediately. Clear cache clears dll’s cache so code is compiled every time and our changes are reflected immediately.
Now you can run the Template to see prediction results:
As you can see, predicted fare is very close to actual. Thanks to the refining process with script element demonstrated in this article.
CONCLUSION
In this article I demonstrated some of the advanced Visual StudioML topics. You can see that there are endless possibilities by mixing existing Visual Elements with your own code.
Orchestration engine that executes those “visual microservices” is written in unmanaged C, so it’s lightning fast. But apart from that there is another great advantage — many different programming languages can be seamlessly included.
C/C++ and C# are currently supported, but it doesn’t have to stop here. There is room for Python, R, NodeJs….
Imagine that you have one tool where you could use a 3rd party or your own existing ML libraries written in different languages and combine them so easily with Visual Elements.
Sounds fun doesn’t it? :)
Comment or give feedback what are your needs. This would be very helpful for us to form features requests plan that will be included in the roadmap.
Please contact me if you find this project interesting and want to contribute with bug reporting or development.