Is PMML Useful for Model Deployment?
At LatentView, we develop a variety of predictive analytics solutions for our clients. Our deliverables typically include a power point that highlights key findings and the consequences, excel sheets to validate business hypotheses and model equations in some form for ongoing scoring. The last one is the topic of this post.
There are a variety of approaches to deliver models. For simple techniques such as Logistic Regression, we typically deliver it in Excel, or in the form of a VBA-based decision simulator. For more complex techniques, we deliver a set of SAS codes that can then be used by the client to score models on an ongoing basis. These SAS codes consist of logic for data preparation, model scoring and validation.
This is where PMML plays a key role. LatentView is standardizing on a PMML-based approach for delivering predictive models. PMML promotes model portability across different platforms, model maintenance, and better lifecycle management. It's faster and easier to score and validate with PMML models. PMML also makes it easy to develop a visualization tool.
However, from the looks of it, PMML is not so widely used.
Some of the drawbacks of PMML include potential loss of accuracy, alleged lack of support for a variety of models, lack of support for complex transformations and lack of availability of third party scoring engines that read PMML and score models. However, tools like Zementis have helped overcome some of these drawbacks, and I believe there are more such offerings in the pipeline. There's also a need for open source scoring engines that use PMML.
Today we use R to generate the PMML files. However, we understand that there's a need for an easier way to create PMML files from SAS, SPSS or other standard packages (rather than buying their most expensive licenses).
What do you think of PMML? Do you deploy models in PMML? Does it meet your model deployment needs? Why / Why not? Please post your comments here.
February 12th, 2009 - 17:26
As pointed out in your blog, PMML is not yet widely used by the predictive analytics community at large, but it is sure gaining ground very fast. In a recent poll at the KDNuggets website, more than 30% of the respondents say that they are using PMML. This means that many scientists are using different tools to build, visualize, and deploy their models. PMML allows for models to be easily moved from one application to another application by overcoming compatibility barriers. Today, the best statistical packages export PMML. These include open-source environments such as R and KNIME.
Zementis is now offering the first PMML-based scoring engine as a service. This means that anyone anywhere can implement their models (say, in R or KNIME) and deploy them by using PMML in a matter of minutes (not days or months). Imagine being able to move a model from the scientist’s desk to a production environment by just moving PMML files between applications. That’s what Zementis’ offer is about. On top of it, there is no installation. Since Zementis offers its score engine in the Amazon Elastic cloud, the scoring engine is already installed and ready to be use … and you pay for only what you use … less than $1/hour. We believe this paradigm is here to revolutionize the world of predictive analytics since it changes the whole perception of the usability and agility behind predictive models.
I have recently created a PMML discussion forum on the AnalyticBridge community (http://www.analyticbridge.com/group/pmml). Feel free to join if you are interested in discussing PMML. Many of the members are part of the DMG (Data Mining Group) which is responsible for shaping the standard itself.
So, yes, I couldn’t agree more, PMML is the way to go (see http://www.analyticbridge.com/group/pmml).