Data Science in F#/.NET with VsCode
When thinking of data science or machine learning, Python immediately comes to mind. No other production-ready programming language can match its extensive set of libraries (pandas, numpy, scikit, etc.) paired with proven experimentation tools (jupyter, dash plotly, etc.).
Other ecosystems are trying to catch up in terms of libraries but, when it comes to producing an analysis and insight in a short timeframe, almost none have the tooling that still makes python the champion of productivity.
On the other hand, there is one field where python is lagging compared to other languages: performance. Even though most of the supporting libraries for data science and machine learning used by python are written in native languages, it will never have the same performance as pure native code.
In a world where milliseconds (or even microseconds in some fields) matter to deliver information, some projects have to go through the following steps:
- Data scientists write a Proof Of Concept in python/jupyter
- When going to production:
* -either- Software Engineers translate the logic to another language
* -or- Data Scientists have to expose the logic as a micro-service (and therefore need knowledge in api authoring)
Thankfully, .NET now brings the best of both worlds:
- Safety: by default, the code runs in a sandbox
- Productive languages: the .NET ecosystem supports dozens of languages that you can choose from (including python!) that can talk with each other
- High performance: thanks to the recent changes in dotnet core, C# is faster than java and has almost the same performance as native C++
- An extensive set of libraries including dataframes, bindings to numpy and tensorflow, and charting libraries.
- A jupyter-like interactive notebook for F# with support for charts and custom formatters
In this article I will show you how to install and use the interactive console notebook for F#. In other future articles, I will write about technical implementation details, performance and libraries.
Installing the F# notebook for VsCode
In order to play with a notebook in Visual Studio Code:
- Install the ionide-fsharp extension
- Install .NET Core 5: the extension uses F#5 syntax and therefore is only compatible with dotnet core 5
- Install the F# notebook extension for Visual Studio Code itself
- Edit VSCode
settings.json
as specified in the extension documentation - open the notebook panel with the command Ctrl+Alt+P > “F# Notebook+DataScience: Open Panel”
- open an *.fsx file and start coding!
Tip: Alt+Enter will execute the current line
Simple examples
The extension works exactly like the interactive fsharp interpreter (FSI) but with an additional panel that displays formatted data.
When one of the Notebook.*
helpers are called, a cell will be added to the panel. The extension has multiple built-in formatters.
Primitives and markdown
// Ctrl+Alt+P : F# Notebook: Open Panel
Notebook.Text (1+1)
Notebook.Text "Hello world"Notebook.Markdown """
# Hello, Markdown!
"""
Charts
open XPlot.Plotly
// Ctrl+Alt+P : F# Notebook: Open Panel
let chart =
Chart.Line
[ 1, 1
2, 2 ]
|> Chart.WithWidth 400
|> Chart.WithHeight 300
|> Chart.WithLayout(Layout(title = "my title"))
Notebook.Plotly chart
Maps
// Ctrl+Alt+P : F# Notebook: Open Panel
open XPlot.Plotly
open FSharp.Datalet marginWidth = 50.0
let margin = Margin(l = marginWidth, r = marginWidth, t = marginWidth, b = marginWidth)type AlcoholConsumption = CsvProvider<"https://raw.githubusercontent.com/plotly/datasets/master/2010_alcohol_consumption_by_country.csv">let consumption = AlcoholConsumption.Load("https://raw.githubusercontent.com/plotly/datasets/master/2010_alcohol_consumption_by_country.csv")
let locations = consumption.Rows |> Seq.map (fun r -> r.Location)
let z = consumption.Rows |> Seq.map (fun r -> r.Alcohol)let map =
Chart.Plot([ Choropleth(locations = locations, locationmode = "country names", z = z, autocolorscale = true) ])
|> Chart.WithLayout(Layout(title = "Alcohol consumption", width = 700.0, margin = margin, geo = Geo(projection = Projection(``type`` = "mercator"))))// display chart
Notebook.Plotly map
Dataframes
// Ctrl+Alt+P : F# Notebook: Open Panel
#r "nuget: Microsoft.Data.Analysis"
open Microsoft.Data.Analysis
let locations, alcohol =
consumption.Rows
|> Seq.map (fun row -> row.Location, row.Alcohol)
|> List.ofSeq
|> List.unzip
let df = new DataFrame(
new StringDataFrameColumn("location", locations),
new PrimitiveDataFrameColumn<decimal>("consumption", alcohol)
)
Notebook.DataFrame df
Latex expressions
Notebook.Markdown @"This is cool $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}.$$ isn't it"
Custom printers
You can also add your own printers that will display the data using a customized format.
open Notebook
fsi.AddPrinter(fun (data : YourType) ->
... // Format to string
|> HTML // or SVG or Markdown or Text
|> printerNotebook
)let x = new YourType() // this will automatically print x in the notebook panel
Conclusion
This extension is not the only one that offers an interactive environment for F#. There are other projects that offer a similar functionality, notably the jupyter kernel for C#/F#.
But none of them offer the same ease of installation and level of integration with Visual Studio code (code completion, code lenses, integration with other F# extensions for formatting or code quality etc.).
That’s all from this article. In the next article, I will do a quick round on machine learning with F#. If you have any questions or just want to chat with me feel free to leave a comment below or contact me on social media.
Note: at the time I am writing this, F#5 is still in preview and has a very nasty bug that freezes autocompletion.
Note: (oct2020) it seems that Microsoft has finally decided to address the issue on F# notebook experience! Use Microsoft’s .NET Interactive Preview 3 instead, it’s better.