Ruxue ZenginGreenplum Data Clinics·Mar 20Greenplum for Data Science Blog Series Part 2: Advanced Text Analytics & Text Search on Greenplum with GPText.This blog is a follow-up (Part 2) to the Greenplum for Data Science Blog Series Part 1: Big Data Analysis with SQL and Python post. This article covers how to use Greenplum Text Analytics functions to tackle data science projects from experimentation to massive deployment. Context Greenplum reduces data silos by…Greenplum9 min readGreenplum9 min read
Ahmed Rachid HazourliinGreenplum Data Clinics·Mar 16Greenplum for Data Science Blog Series Part 1: Big Data Analysis with SQL and PythonThis article is the first part of the “Greenplum for End-to-End Data Science & ML” blog series, which covers how to use Greenplum’s Integrated In-Database Analytics functions to tackle data science projects from experimentation to massive deployment. The Growth, Challenge, and Opportunity of Data We are in the digital age, and everything we do with our smartphones…Greenplum9 min readGreenplum9 min read
Ahmed Rachid HazourliinGreenplum Data Clinics·Mar 6Accelerate Analytics with Greenplum Data-Warehouse & dbtA hands-on tutorial to help you build & deploy your first dbt project on the Greenplum data warehouse. — dbt or data-build-tool is a transformation tool in the ELT pipeline that lets teams transform data, following software engineering best practices. Through native support for connectivity to many data warehouses, dbt runs SQL queries against a database warehouse platform or query engine to materialise data as tables and views.Greenplum7 min readGreenplum7 min read
Ruxue ZenginGreenplum Data Clinics·Mar 6How to implement TPC-H queries with GreenplumPythonA quick demonstration and examples TPCH benchmark TPC-H is a benchmark developed to evaluate the performance of large-scale SQL and relational databases by the execution of sets of queries. It has 22 queries against a standard database under controlled conditions. These queries: Give answers to real-world business questions Are far more complex…Greenplum6 min readGreenplum6 min read
Ruxue ZenginGreenplum Data Clinics·Mar 2Introduction to GreenplumPythonIn-database processing of billions of rows with Python — GreenplumPython is a Python library that scales the Python data experience by building an API. It allows users to process and manipulate tables of billions of rows in Greenplum, using Python, without exporting the data to their local machines. GreenplumPython enables Data Scientists to code in their familiar Pythonic way…Greenplum6 min readGreenplum6 min read
Gregoryg·Jan 16How to Combine Fast Data and Advanced Data AnalyticsI have been involved in developing several applications with strict low latency and high throughput requirements. I will refer to them as fast data workload applications. Their response times must be in the range of nanoseconds to single-digit milliseconds for hundreds of thousands of transactions per second. An excellent example…Greenplum9 min readGreenplum9 min read
Dmitry Kirilovskiy·Oct 16, 2022Looking for DBMS to implement a MarTech project? Consider GreenPlumA story of using non-traditional RDBMS for traditional tasks — Despite the growing trend, Massive Parallel Processing (MPP) DBMS are still rarely used for MarTech projects. It is often more convenient and safe to use classical RDBMS. On the other hand, every organization eventually comes to a point when the decision should be made on how to scale the system…Greenplum13 min readGreenplum13 min read
praveen kadipikonda·May 18, 2021Data Retrieval Connecting ECS Objects with GreenplumIn this blog, I’ll be showing how we can create Greenplum object to connect to Dell ECS (Elastic Cloud Storage). Dell ECS is the leading object-storage platform engineered to support both traditional and next-generation workloads. It can provide direct HTTP access to data through S3 Storage and is optimized for…Greenplum2 min readGreenplum2 min read
Denis MatveevinFAUN Publication·May 4, 2021PostGIS for Greenplum under GNU/Linux DebianGreenplum overview Greenplum(hereinafter GP) is a relational database management system designed for massive parallel data processing, usually it is used for big data. Now this system is getting more and more popular among data scientists, AI developers, in machine learning field. …Greenplum6 min readGreenplum6 min read
Constantinos AntzakasinGreenplum Data Clinics·Jun 22, 2020Managing table objects in Greenplum Database — Part 2: Row- vs. Column-Oriented StorageColumnar databases are increasingly popular in the Online Analytical Query Processing (OLAP) systems space against the traditional row-oriented storage databases. — Many databases for data warehousing and analytics are following this storage model and are columnar-only stores. Rather than following the trends of the time, Greenplum Database provides a choice of storage orientation models: row, column, or a combination of both.Greenplum4 min readGreenplum4 min read