Denys GolotiukinDataDenysUsing partitions in ClickhouseClickhouse supports table partitioning which can be useful in cases we deal with serial data and need to work only with a moving window of…7 min read·Nov 29, 2022--1--1
Denys GolotiukinDataDenysEstimating duplicates and deduplicating data in ClickhouseClickhouse has a way to quickly estimate number of duplicates in a table as well as remove duplicates so only unique rows remain. Popular…3 min read·Nov 21, 2022--2--2
Denys GolotiukinDataDenysHow to use window functions in MysqlUsing windows in Mysql is a way to query data based on a set of rows instead of a single row. This helps in comparative analysis, when we…6 min read·Nov 11, 2022----
Denys GolotiukinDataDenysBig data exploratory data analysis with ClickhouseAny analysis starts with exploratory stage — when we try to understand the big picture of the data itself. We usually do things like…5 min read·Nov 4, 2022----
Denys GolotiukinDataDenysData cleansing and preparation for analysis with Python and PandasAny data usually (always?) contain errors. In order to do accurate analysis and build efficient ML models, data needs to be cleansed prior…5 min read·Nov 2, 2022--1--1
Denys GolotiukinDataDenysOptimizing star-schema queries with IN queries and denormalizationMost data environments usually have 2 main groups of data objects. First — event based objects, usually organized into timeseries tables…6 min read·Oct 31, 2022----
Denys GolotiukinDataDenysUsing projections to speedup queries in ClickhouseClickhouse is efficient enough so most analytical queries will execute fast in many cases without extra optimization activities. But, if…7 min read·Oct 28, 2022--1--1
Denys GolotiukinDataDenysWorking with JSON in ClickhouseThere’s plenty of cases when we can’t define data structure in advance due to its dynamic nature. Data objects can have different set of…5 min read·Oct 26, 2022--1--1
Denys GolotiukinDataDenysSolving systems of linear equations using matrices and PythonMatrices stay at the very basis of all math used for ML. Let’s understand why it is so and how matrices can be used to solve systems of…6 min read·Oct 19, 2022----
Denys GolotiukinDataDenysUsing EXPLAIN in Mysql to analyze and improve query performanceMysql EXPLAIN statement lets us understand efficiency of our queries and grasp ideas on possible ways to optimize it. Just creating indexes…7 min read·Oct 17, 2022--2--2