In today's interconnected business environment, effective B2B data sharing is not just a convenience but a necessity, especially in the realm of analytics. The use-cases are plenty:
A logistics data provider (selling data) needs to deliver hourly refreshed data to customers so that the customers can assess and adjust their supply chain
A CRM SaaS vendor needs to make the underlying customer data available to clients so that clients can build an internal Customer360 dashboard
An online marketplace needs to provide daily performance reports to its suppliers so that suppliers can evaluate product and promotion performance
In all cases, the seamless exchange and utilization of data between businesses can lead to enhanced decision-making, drive innovation, and foster stronger collaborative relationships. This post looks at the tradeoffs with different B2B data sharing methods and suggests that it is now possible to have the best of all worlds with Data Delivery Platforms.
The trusty CSV file
The CSV file is the unsung hero in the world of B2B data sharing. It is simple enough to be human-readable, sent in an email attachment, and compatible with any platform. It’s no surprise that 90% of SaaS platforms have an “Export to CSV” button.
Strengths:
Compatibility. Non-technical users can quickly share small amounts of data given the universal compatibility with common spreadsheet tools.
Low cost. CSV files require negligible cost to create and maintain.
Tradeoffs:
Manual Integration. CSV files need to be manually exported, shared, and then imported into analytical environments. Multiple encodings and delimiters also require custom integration.
Security. While easy to create, CSV files are fundamentally insecure. Everyone with access to the file has access to all data contained within it.
Size Limits. Most CSV files and associated technologies (i.e. Excel) have a limit on the number of rows that can be stored and processed in a single file.
File-based data delivery (SFTP, S3 buckets, etc.)
The only thing better than one CSV file is a folder of CSV files. File-based data delivery methods lean into the simplicity of data files by introducing a staging area where humans and machines from providers and consumers can collaborate.
Strengths:
Machine-readability. Scalability at large volumes can be accommodated given the simple nature of file management. Newer filetypes like Parquet and Avro also solve many of the issues with CSVs.
Standardized. Common nature of files and file-transfer protocols allow for easy cross-entity collaboration at scale.
Improved Security. These methods offer improved security given file and folder level access control.
Tradeoffs:
Extra-step integration. While highly scalable, integration into analytical environments still require an extra step - namely moving the data between the file store and the data warehouse.
Technical complexity. Setting up and maintaining infrastructure requires technical expertise. Additional complexity around cost allocation arise given the file stores are typically shared between parties.
Refresh. Refresh rates operate at batch-levels given the need to write and read from files. Restatement of historical data also requires re-writing old files.
API based data sharing
APIs are ubiquitous and for most SaaS providers, they represent the only way of sharing data with their customers. While APIs are useful for real-time or transaction-level integration, in the analytical world, customers prefer to offload API scraping to integration tools like Fivetran and Informatica (check out our blog post here specifically on this topic).
Strengths:
Real-time. Data sharing between machines is easily facilitated with standardized data structures (i.e. JSON).
Secure. APIs offer highly tailored and secure data given that all responses are processed at the application layer.
Tradeoffs:
Extra-step integration. APIs require an extra step to integrate into analytical environments. First, the underlying database needs to be queried and then the response needs to be parsed and loaded into the target data warehouse.
Technical complexity. APIs come with a heavy technical lift for providers to build and maintain them and for consumers to integrate with them.
Scale limits. APIs have payload limits and therefore using them to share large amounts of data adds additional complexity and performance impact.
Today every major data warehouse provider (Snowflake, AWS Redshift, Google BigQuery, Azure Synapse, and Databricks) actually has its own mode of warehouse-to-warehouse data sharing. The great part about this approach is that a company can easily and securely share a dataset with another company without the need to actually move the data. The receiver simply sees the shared dataset as available in their environment, with no integration work needed.
Strengths:
Direct integration. Integration load is negligible since the data never leaves the warehouse environment.
Real-time. Real-time data sharing is possible between entities since the data is never copied.
Scalability. Large analytical data sets are easily supported, analytical data warehouses were built for scale.
Secure. High level of security is available with permissions that can be granted to custom filtered views.
Tradeoffs:
Technical complexity. Data warehouses place a heavy dependency on technical teams. Setting up and maintaining a data warehouse environment often requires multiple full-time resources.
Vendor lock-in. This approach only works if both the source and the destination have the same data warehouse set up. Since setting up a data warehouse is a heavy lift, this often becomes a blocker.
Data sharing approaches at a glance
Data Delivery Platforms
As we’ve seen, each approach has its strengths and its tradeoffs. Still, wouldn’t it be great to have a single method of data sharing that combines the strengths of all the approaches discussed? Fortunately there is an emerging class of data delivery platforms that allow data providers to share data with their clients leveraging all the benefits of cloud data sharing, without the need to manage any infrastructure or vendor lock-in. With a data delivery platform, data providers can publish platform-agnostic “Data Products”, while consumers can receive data directly in their destination of choice - no ETL, APIs, or pipelines needed. Some data delivery platforms, like Amplify, even remove the technical dependencies through the use of intuitive UI workflows and self-service consumption capabilities. When it comes to data sharing, thanks to data delivery platforms, we can finally have our cake and eat it too!
Book a call with us to learn more about the data sharing landscape and if a data delivery platform is right for you!