DataScience has released a patent-pending query optimization, translation and federation framework.
The new Grunion solution, built atop Apache Calcite and integrated into Apache Spark, is designed to help data science and engineering teams remove or eliminate the need to manually translate code from one language to another.
Grunion limits the need for expensive and slow ETL processes by providing a unified query language and APIs to push down complex query operators, joins, functions, and aggregations into SQL and NoSQL databases. But Grunion's most compelling feature is its ability to integrate with Spark SQL's Catalyst optimizer, essentially turbocharging its capabilities.
Grunion enhances DataScience's enterprise platform, the DataScience Cloud, where users can deploy models built in their language of choice without rewriting code into a production stack language (PMML). The platform also allows notebooks, models, and other files to be grouped together in the same repository or project, regardless of the language they were written in. Grunion helps facilitate these capabilities with four main components - languages, compilers, interpreters and translators.
"The idea behind Grunion -- and behind the DataScience Cloud as a whole -- is that data scientists need a way to make the work they do valuable to their whole organization, without relying heavily on outside resources like engineering," said DataScience CSO William Merchan. "By releasing Grunion, we're sharing some of those important capabilities with the larger data science community."