5
 min read

B2B data sharing in the 21st century: a steady march forward

B2B data sharing for analytical work has evolved tremendously in the last decade and we can now finally have the best of all worlds
Written by
Alec Whitten
Published on
17 January 2022

In today's interconnected business environment, effective B2B data sharing is not just a convenience but a necessity, especially in the realm of analytics. The use-cases are plenty:

  • A logistics data provider (selling data) needs to deliver hourly refreshed data to customers so that the customers can assess and adjust their supply chain
  • A CRM SaaS vendor needs to make the underlying customer data available to clients so that clients can build an internal Customer360 dashboard
  • An online marketplace needs to provide daily performance reports to its suppliers so that suppliers can evaluate product and promotion performance

In all cases, the seamless exchange and utilization of data between businesses can lead to enhanced decision-making, drive innovation, and foster stronger collaborative relationships. This post looks at the tradeoffs with different B2B data sharing methods and suggests that it is now possible to have the best of all worlds with Data Delivery Platforms.

The trusty CSV file

The CSV file is the unsung hero in the world of B2B data sharing. It is simple enough to be human-readable, sent in an email attachment, and compatible with any platform. It’s no surprise that 90% of SaaS platforms have an “Export to CSV” button.

Strengths:

  • Compatibility. Non-technical users can quickly share small amounts of data given the universal compatibility with common spreadsheet tools.
  • Low cost. CSV files require negligible cost to create and maintain.

Tradeoffs:

  • Manual Integration. CSV files need to be manually exported, shared, and then imported into analytical environments. Multiple encodings and delimiters also require custom integration.
  • Security. While easy to create, CSV files are fundamentally insecure. Everyone with access to the file has access to all data contained within it.
  • Size Limits. Most CSV files and associated technologies (i.e. Excel) have a limit on the number of rows that can be stored and processed in a single file.

File-based data delivery (SFTP, S3 buckets, etc.)

The only thing better than one CSV file is a folder of CSV files. File-based data delivery methods lean into the simplicity of data files by introducing a staging area where humans and machines from providers and consumers can collaborate.

Strengths:

  • Machine-readability. Scalability at large volumes can be accommodated given the simple nature of file management. Newer filetypes like Parquet and Avro also solve many of the issues with CSVs.
  • Standardized. Common nature of files and file-transfer protocols allow for easy cross-entity collaboration at scale.
  • Improved Security. These methods offer improved security given file and folder level access control.

Tradeoffs:

  • Extra-step integration. While highly scalable, integration into analytical environments still require an extra step - namely moving the data between the file store and the data warehouse.
  • Technical complexity. Setting up and maintaining infrastructure requires technical expertise. Additional complexity around cost allocation arise given the file stores are typically shared between parties.
  • Refresh. Refresh rates operate at batch-levels given the need to write and read from files. Restatement of historical data also requires re-writing old files.

API based data sharing

APIs are ubiquitous and for most SaaS providers, they represent the only way of sharing data with their customers. While APIs are useful for real-time or transaction-level integration, in the analytical world, customers prefer to offload API scraping to integration tools like Fivetran and Informatica (check out our blog post here specifically on this topic).

Strengths:

  • Real-time. Data sharing between machines is easily facilitated with standardized data structures (i.e. JSON).
  • Secure. APIs offer highly tailored and secure data given that all responses are processed at the application layer.

Tradeoffs:

  • Extra-step integration. APIs require an extra step to integrate into analytical environments. First, the underlying database needs to be queried and then the response needs to be parsed and loaded into the target data warehouse.
  • Technical complexity. APIs come with a heavy technical lift for providers to build and maintain them and for consumers to integrate with them.
  • Scale limits. APIs have payload limits and therefore using them to share large amounts of data adds additional complexity and performance impact.

Cloud sharing protocols (Snowflake Share, Databricks Delta Share, etc.)

Today every major data warehouse provider (Snowflake, AWS Redshift, Google BigQuery, Azure Synapse, and Databricks) actually has its own mode of warehouse-to-warehouse data sharing. The great part about this approach is that a company can easily and securely share a dataset with another company without the need to actually move the data. The receiver simply sees the shared dataset as available in their environment, with no integration work needed.

Strengths:

  • Direct integration. Integration load is negligible since the data never leaves the warehouse environment.
  • Real-time. Real-time data sharing is possible between entities since the data is never copied.
  • Scalability. Large analytical data sets are easily supported, analytical data warehouses were built for scale.
  • Secure. High level of security is available with permissions that can be granted to custom filtered views.

Tradeoffs:

  • Technical complexity. Data warehouses place a heavy dependency on technical teams. Setting up and maintaining a data warehouse environment often requires multiple full-time resources.
  • Vendor lock-in. This approach only works if both the source and the destination have the same data warehouse set up. Since setting up a data warehouse is a heavy lift, this often becomes a blocker.

Data sharing approaches at a glance

Data Delivery Platforms

As we’ve seen, each approach has its strengths and its tradeoffs. Still, wouldn’t it be great to have a single method of data sharing that combines the strengths of all the approaches discussed? Fortunately there is an emerging class of data delivery platforms that allow data providers to share data with their clients leveraging all the benefits of cloud data sharing, without the need to manage any infrastructure or vendor lock-in. With a data delivery platform, data providers can publish platform-agnostic “Data Products”, while consumers can receive data directly in their destination of choice - no ETL, APIs, or pipelines needed. Some data delivery platforms, like Amplify, even remove the technical dependencies through the use of intuitive UI workflows and self-service consumption capabilities. When it comes to data sharing, thanks to data delivery platforms, we can finally have our cake and eat it too!

Book a call with us to learn more about the data sharing landscape and if a data delivery platform is right for you!

Other "Perspectives" posts you might like:
Perspectives
5
 min read

Alternative Data: The Fastest Growing Industry You’ve Never Heard Of

Alternative Data is one of the hottest emerging industries - let’s explore how it works and what you need to know about it
Read post
Perspectives
5
 min read

The future is DaaS… and it is personal

As DaaS providers become more ubiquitous, they need to start personalizing their offerings to stand out from the pack
Read post
Perspectives
4
 min read

The future of SaaS analytics lies beyond in-app reporting

Customers are demanding data access from their SaaS providers, creating an opportunity to go beyond traditional in-app analytics
Read post