Building a Machine Learning Pipeline
Artificial intelligence (AI) is wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that typically require human intelligence. These tools can collect data from edge to core to cloud, analyze the data in near real time, study from previous results, and infer actions to predict future outcomes.
In current scenario, organizations implementing AI models needs to digest exponentially a massive amount of data thereby supporting complex models and simulate those models in shorter amount of time to make real time predictions.
Below diagram depicts a simple AI pipeline diagram
- We need to process many different operations that we used to prepare a Model.
- Pipeline stages the transformation steps of the data and helps to make the process more standardized.
- Pipeline class of sklearn helps simplifying the chaining of transformation steps and the model.
- While working in a team, Pipeline regulates the model project by enforcing consistency in building a model.
Build a simple ML pipeline using sklearn.pipeline
- First step is to import pipeline module from the sklearn.pipeline library.
2. Before building the pipeline I am splitting the data into a train and test set so that I can validate the performance of the model.
3. Now we have to define pipeline which will take transform type as function parameters.
I have created a numeric transformer which applies a StandardScaler and a machine learning algorithm which performs Logistic Regression.
If you see above, we are providing name to each step which is being passed with the function call.
Alternatively we can make use of make_pipeline() function which will automatically name each step.
4. Now the next step is to call the pipeline object to fit our data.
5. Now once the model is build, you make predictions by calling the test data.
Pipeline object requires all stages to have a transform() function except for the last stage which is an estimator.
Transforms steps transforms the input data, which will be output to the next stage of pipeline.
Digest the above steps in a simpler way :)
Thank you!!!