Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck – KDnuggets

Place your AD here

Bring the powerful tools to your laptop.

Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck
Image Generated with DALL-E

 
In a time where data analytic processing is the critical difference between a successful business and not, we need a tool stack that could support the needs. The advancement of technology has helped advance all these data tools that we need, namely DuckDB and MotherDuck.
DuckDB is an open-source, in-process SQL Online Analytical Processing (OLAP) database management system. The database system is designed for swiftly handling data analytical queries, regardless of the data size. The system implements in-memory processing and OLAP systems that effectively improve our data analytical process.
DuckDB is perfect for storing and processing tabular data involving data analysis (table join, data aggregation, etc.) and when our workflow usually involves significant changes in the table. On the other hand, DuckDB isn’t suitable for high-volume data activity and multiple concurrent processes in one database.
MotherDuck is a managed DuckDB-in-the-cloud service. It’s free to use and open-source while maintained by the DuckDB Community. It’s a service built by partnering with DuckDB Lab to create a cloud service platform that the public can use.
With a combination of DuckDB and Motherduck, we can create an analytics engine that is readily useable in every scenario. How do we do that? Let’s get into it.
 
 
We would use the native MotherDuck UI to give you an example of how the service works and why DuckDB is a powerful tool for data analytics. Please register on the website and acquire the MotherDuck account if you haven’t already.
Once you successfully register for the MotherDuck account, we will be taken to the MotherDuck UI. Try to familiarize yourself with the UI, and you will realize that the UI is similar to the Jupyter Notebook if you ever use one.
We will experiment with the DBduck power in the MotherDuck UI with the DS Salary data from Kaggle. Upload the data using the Add Files button, and a new cell will be shown with the query to execute. The query should look like this.
 
Once you create the table, try to query the data with the following code.
 
As you can see, MotherDuck is pretty much like doing data analysis in Notebook, but with SQL queries. Let’s try out the query to do data analysis in the MotherDuck.
 
Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck

 
You can execute the query in the cell; the table result is shown similarly to the image below.
 
Turn Your Laptop Into a Personal Analytics Engine with DuckDB and MotherDuck

 
You can filter out the data, pivot the table, or download the result with the selection button available in the UI.
 
 
MotherDuck also allows the user to access the database via Python on your Notebook. We need to install the DuckDB package using the following code.
 
The current version that MotherDuck supports is DuckDB 0.9.2; that’s why we installed that version. 
When the installation is successful, we need to connect the DuckDB with the Motherduck. There are a few ways to authenticate the connection, but we would use the service token. This token is acquired in your MotherDuck settings.
 
If we didn’t set any database name, MotherDuck would access using the default database, which is my_db. Next, let’s use the same query we did previously in the Notebook.
 
You will see the output similar to the table below.
 
With the query above, you can use the following code to process them into the Pandas DataFrame.
 
Lastly, you can load another dataset to the database using the following query.
 
The above query assumes your data is a CSV file. Other options include S3 or the local DuckDB to the MotherDuck database. 
 
 
DuckDB is an open-source database system that was developed specifically for data analysis. The system is designed to handle data processing swiftly and efficiently. MotherDuck is an open-source managed cloud-based service for DuckDB. 
By combining DuckDB and MotherDuck, we can turn our laptops into a personal analytics engine by having our data in the cloud and quickly processing them with DuckDB.
 
 
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.

By subscribing you accept KDnuggets Privacy Policy
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.


By subscribing you accept KDnuggets Privacy Policy
Subscribe To Our Newsletter
(Get The Complete Collection of Data Science Cheat Sheets & Great Big NLP Primer ebook)
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.
By subscribing you accept KDnuggets Privacy Policy
Get the FREE ebook ‘The Great Big Natural Language Processing Primer’ and ‘The Complete Collection of Data Science Cheat Sheets’ along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.
By subscribing you accept KDnuggets Privacy Policy

source

Share this article

Facebook
Twitter
LinkedIn
WhatsApp
Email

More from this Author