
If you’re looking to build a data and analytics solution for your eCommerce brand, you have three options:
1. Create an ELT+BI stack by connecting a series of data tools:
2. Build the entire ELT pipeline from the ground up and connect with a data warehouse & BI solution
3. Leverage an eCommerce data platform that includes ELT+BI & Reverse ETL/CDP
We’ve covered the first build option in a previous article. But in this piece, we’re going to cover option 2: what you must understand and consider if you’re looking to build a ground-up ELT pipeline for a hyper-custom data solution.
As APIs will be where your ELT work starts, that’s where we’ll start.
Although we won’t compare the technical differences among API architectures (that’s a separate article altogether), we’ll call attention to different types that you’ll encounter.
API architectures vary among platforms: you’ll encounter REST (i.e., RESTful) APIs, GraphQL APIs, and others that can’t be as neatly categorized, such as some of Klaviyo’s endpoints.
For our API discussion going forward, we’ll introduce and stick to using Shopify’s APIs as an example.
Shopify offers REST and GraphQL Admin APIs for app and integration building.
Shopify APIs use the classic “leaky bucket” algorithm to handle data requests, and they give specific figures for how often and how much you can hit the APIs.
As the metaphor goes, each tool/app has a “bucket” that can hold 40 API calls, and the bucket must not overflow. Every second, 2 API calls “leak” out of the bucket to avoid overflow, and each of your API calls goes toward filling the bucket. As long as your app doesn’t cause the bucket to overflow, it won’t trip a 429 error.
The leaky bucket lets a platform or app make unlimited calls (as long as they’re in infrequent bursts) and each API has a different way of limiting calls.
For Shopify’s REST API, you can make 40 requests per app, per store, per minute. The API replenishes (i.e., “empties the bucket”) at a rate of 2 requests per second.
On the other hand, Shopify’s GraphQL API is structured based on query costs (i.e., what fields you are calling) rather than number of queries. You can “spend” 1000 cost points per app, per store, per minute. The API replenishes at a rate of 50 cost points per second.
Given the variability of data available via API among platforms, your initial (and ongoing) challenge will be to identify the most efficient ways to call APIs to get the data you need.
Each platform has its own quirks, roadblocks, or challenges for a data extraction process, and you’ll need to thoroughly investigate documentation while always asking yourself, “Is there a better way to GET this data?”
What follows is a moderately deep dive on one of Shopify’s endpoints to illustrate the type of exploration that will be required for each extraction target.
Shopify currently has 18 endpoints, but from an analytics point of view, most of these will not be valuable to you. The most important will be:

You might ask: “Why wouldn’t I want to call the Analytics endpoint if I’m building a custom analytics app? Or, why wouldn’t I hit the Billing, Customers, or Discounts endpoint to get relevant first-party data?”
Answer: because they’re not relevant as they seem, and/or they’ll create significantly more work for you, which you certainly don’t want or need in a long-term project.
For example:
In short, you want to hit the fewest APIs that give you the most information. The Orders API will give you the most (the next most important ones are Inventory and Products, but we won’t cover those here).
The Orders endpoint provides a trove of data. With it, you will be able to build numerous custom reports as well as some analyses.

Specifically, the Order resource within the Orders endpoint is going to be the most bang-for-your-API-buck. And more specifically, the single GET “Retrieve a list of orders” is going to be your best starting point: it will enable some custom reports.
For instance, although there is a Refund resource within the Orders endpoint (see above image), there is also a “refunds” object within “Retrieve a list of orders,” where you can GET refund data:

This “Retrieve a list of orders” GET is the most data-packed from Shopify’s APIs. Pictured below is an expanded view of the API call (sample data), which shows data ranging from particular order info, to discount information, to customer information, to currency and tax data—this is still only a portion of a full call.

From this single GET, you can summarize the data and build a large number of reports on area such as:
As Amazon Redshift does not easily support semi-structured data except via time-consuming manual work, we highly recommend selecting either Snowflake or BigQuery for your ELT pipeline—both natively support JSON.
In terms of the Snowflake vs. BigQuery debate, that’s an internal discussion to have and is more about preferences around querying and other tools you may use. E.g.:
Although there will be substantial research required to understand the most efficient ways to GET data, this will be a relatively small portion of this project.
Ultimately, the most time-consuming elements will be dealing with the following complicated projects:
When it comes to denesting, you have two options (the first of which will not be feasible, so in reality, you have one option):
On the error handling front, you will need to build a separate program that can diagnose issues with your daily ELT workflow. Getting even a crude build up and running is complicated. At absolute minimum, your error handling program will have to:
The extra layer (and most important part) of the challenge here is having a program that not only indicates that something in your ELT malfunctioned, but what specifically malfunctioned, and where it malfunctioned.
A crude error handling program may indicate a general error, but that won’t necessarily help you triage the data issue. You could spend days (or even weeks) figuring out what really went wrong.
We led off this piece by highlighting DIY-with-tools Pipeline (option 1) and the DIY everything (option 2), which is the subject of this piece.
We’d like to offer a compelling third option: don’t build the ELT pipeline yourself. Avoid the multi-year build, and avoid the endless headaches of dealing with this process.
Shopify is the basis for your business, but you’ll likely have 10+ integrations at some point, both inside and outside the Shopify ecosystem. Building, updating, and maintaining those ELT pipelines, is, to be technical, a nightmare. And if you throw in a custom Amazon ELT build, forget it. It’s not going to happen.
On top of the challenges of building the APIs themselves, denesting the JSON, and building error handling systems, you’re going to have to successfully orchestrate the workflows (and that is a whole other article) based on daily data refreshes.
We know, because we have built all this. We = Daasity. (We’re Option #3.)
Daasity is a complete data platform, purpose-built for eCommerce brands on Shopify, Amazon, and Magento, including the largest brands who are looking for the flexibility, customizability, and extensibility of a truly custom build.
We support dozens of integrations, are quickly adding new ones, and we keep them updated as soon as APIs depreciate.
Interested? We’d love to show you more about how we work. You can talk to an eCommerce data and technical expert here.