GSoC 2017 — Apache VXQuery RESTful API

Erandi Ganepola
Aug 24, 2017 · 6 min read

This post is regarding my GSoC 2017 project, implementing Apache VXQuery RESTful API (VXQUERY-180). An introduction to my GSoC project, how I came up with a design, implementation along with the problems faced and how I finally met the objectives are explained here. Also the future improvements possible are described at the end of this post. Hope you will enjoy!

Apache VXQUERY-180 (RESTful API)

Contributor — Erandi Ganepola

  • Undergraduate of BSc. in MIT (IT Special), University of Kelaniya, Sri Lanka
  • Programmer, Open Source Contributor and a Basketball player :-D

Mentors

Ian Maxon

  • Ian is a Development Engineer at UC Irvine, California
  • He is a PMC member and a Committer for Apache AsterixDB, which is the sister project of Apache VXQuery.

Preston Carman

  • Preston is a Graduate Research Assistant at University of California, Riverside. Also he is a Lead PHP Developer at Gofobo.com.
  • He is a Developer and a Committer for Apache VXQuery project.

Overview to Apache VXQuery?

The wide use of XML for document management and data exchange has created the need to query large repositories of XML data. Apache VXQuery can efficiently query such large data collections and take advantage of parallelism. More specifically, Apache VXQuery can execute XQueries on large amounts of XML data and on large collections of relatively small XML documents. The system builds upon two other open-source frameworks; Hyracks, a parallel execution engine, and Algebricks, a language agnostic compiler toolbox. Apache VXQuery extends these two frameworks and provides an implementation of the XQuery specifics (data model, data-model dependent functions and optimizations, and a parser). The queries are executed on a Hyracks cluster (on a local single node cluster if no actual cluster is available/configured). More information about VXQuery and how it works can be found in this blog post.

Introduction to My Project

Up to now, VXQuery only had a CLI tool to execute XQueries. But to use the CLI tool, the user should have some sort of an advanced knowledge on using terminals. To break this barrier, a REST API had been suggested by the VXQuery community (A Swagger configuration with the REST API specification had been defined) and the implementation of that REST API had been put for GSoC 2017. That is how I got the chance to contribute to this project. By adding a REST API to VXQuery, more users will be able to communicate with VXQuery. This will be further user friendly once a web interface is introduced exposing REST API’s features.

Both Apache VXQuery and AsterixDB runs on top of hyracks where both AQL and XQuery are first compiled to Algebricks before submitting to hyracks. The stack is drafted in below image.

Below component diagrams show the component organization before the REST API and version after the introduction of the REST API.

Old version of VXQuery implementation
Current version of VXQuery implementation

Objectives

Main objectives of this project were to implement a RESTful API which is capable of compiling and executing the queries submitted through HTTP requests and to reimplement the CLI module to use the REST API.

At the beginning the objective of building a simple web interface ontop of REST API was there. But later this was altered to the task of migrating VXQuery XTest module to use the REST API.

Deliverables

Following are the deliverables mentioned to be delivered by the end of the project period (see my project proposal) and some of the deliverables mentioned in the proposal were changed according to requirements. I’m happy to say that I was able to complete all of them as expected.

Implementation

According to the REST API definition written with swagger configuration, the REST API was implemented. In order to implement the REST API server, I used “hyracks-http” package. A new module was added to VXQuery to include the REST API related sources named as “vxquery-rest”.

When CCDriver class starts VXQuery Server and Cluster Controller are started. At this point, VXQueryApplication starts as the cluster controller application which is responsible for starting the REST server.

As per the definition of the REST API, there are two endpoints, /query and /query/result/{resultId}. They are handled by two separate servlets implemented top of hyracks-http. VXQuery REST Server allows users to submit queries and receive results either synchronously or asynchronously through the exposed REST API. When it comes to CLI, it has been implemented to use REST API and it runs in synchronous mode. Users have the flexibility of calling a remote VXQuery REST API or calling a locally started REST server through the CLI.

VXQuery XTest module runs in local mode where it creates a local hyracks cluster and use the locally started REST API for executing queries related to tests. The XTest framework has 222 test cases written to verify the correct functionality of VXQuery.

Merged Pull Request

Challenges faced

  • First of all I had to have a good understanding on all Hyracks, VXQuery, Algebricks and AsterixDB projects to do my implementation. When facing these challenges, my mentors and the VXQuery community were very supportive to overcome those.
  • Deciding a library for REST server implementation was a challenge. First I suggested to use jersey. However, Till (from VXQuery community) pointed out that there will be a license issue if we use jersey. After discussing with mentors, based on their suggestion, REST API was implemented using hyracks-http.
  • According to the needs, some parts needed to be redesigned and conflicts with earlier implementations needed to be fixed.
  • When migrating XTest module, there were some issues regarding missing result files and indexing, that lead to failure of many tests. Those were solved with the helps of mentors.

Further Enhancements

  • A simple Web Interface on top of REST API will be implemented in near future. This will allow non-technical users to submit queries to VXQuery REST server through a web interface allowing VXQuery to reach out to many new users.

Pillars of Success — Ian and Preston!

This was the first time I participated in GSoC. Even though I had some previous experiences working with open source communities, I could never get this much of exposure.

Ian was a great mentor who did not push me. Instead, he let me follow the time line I proposed in the proposal and corrected me wherever necessary. His support and responsiveness made it easy to achieve milestones within time line. He had a meeting with me every week and discussed my progress and problems I’m facing which helped me a lot to get the things moving. He was a flexible and understanding mentor whom I loved to work with.

Preston was a good teacher and was always willing to share his knowledge. He was very helpful, providing quick reviews on my pull requests. He had a thorough understanding of the project that made it easy to solve issues occurred during implementation. He was a nice and interesting person to get to know.

Finally, I would like to thank the VXQuery team and the ASF as a whole for every piece of advice and support given through the mailing list. Without these people, my mentors and VXQuery team, I wouldn’t have been able to complete this project and I hope that I was able to give something back to you. I’m planning to contribute to VXQuery in near future where implementing a simple web interface is one of the major contributions I have already planned to do.

Thank you Google!

Doing a big open source contribution was one thing I always dreamed of. Without google I wouldn’t get this opportunity to meet these amazing people and get this amazing experience by contributing to Apache, which is the world’s largest open source software foundation. Thank you Google for giving me this opportunity! Keep it up!

)

Erandi Ganepola

Written by

Software Engineer@WSO2 | Open-source Contributor | Basketball Enthusiast

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade