SharePoint Cross-Farm Documents Sharing – Pull vs Push vs Mixed Approaches

In the organizations where SharePoint is used for both Extranet and Intranet, extranet or partner hub is commonly used to collaborate on documents with customers or partners. Additionally there are fairly common needs where internal documents stored in corporate intranet needs to either make available or publish to the customer extranet portal.

I have been recently involved in the series of architectural discussions where we discussed various ways to make internal facing documents stored in the corporate farm available to the customer facing extranet farm.  This article provides high level overview of different architectural options and pro and cons of each approach.  There are basically three approaches when it comes to share documents or data between two systems – Pull Methodology, Push Methodology, or Mixed Approach with Document Warehouse.

Security and Architectural Assumptions

  • No direct authorization of Customer or Partner Credentials in Internal systems
  • No Pass-through of Customer or Partner User Security using Kerberos or Claims, Instead use Metadata
  • Typically there are Three Major Types of Documents
    • Non-User Specific documents – e.g. Corporate Communication, Public Facing Documents
    • Semi-User specific documents – e.g. Industry Specific Documents
    • User specific documents – e.g. Customer Specific Documents

 Pull Methodology – Pull documents directly from Source System

  • How?
    • Most of the work happens on the destination System
    • Destination System will Search and Download documents from the source system using Client Object Model and WCF Services Model. WCF Services plays worker role as middle tier between two farm and it can be hosted on Destination or Source SharePoint Farm in ISAPI folder running in SharePoint Process or Standalone server on IIS. There are also needs for Masking Internal Documents URLs in External Farm.
    • Internal documents are accessed by External Systems using combination of Service Accounts and Metadata. There are also needs for Metadata Sync process to make sure both internal and external systems accessing data with same metadata. May be common Metadata Management System would help.
    • Requires source system documents flagged as internal vs. external or publish to external facing unless all the documents available for external systems
  • Pros
    • Single Version and Single Source of Documents
    • No additional storage considerations
  • Cons
    • Requires Source System should be available all the time. Outage of source system for maintenance purpose makes documents unavailable in destination farm.
    • Data Security is huge concern because source system may contain both external and internal facing documents. Security Hardening concerns to pull the documents directly from the source system. Requires proper metadata, service accounts, and IT governance to harden the security.
    • Latency and Performance Issues to pull the documents or data from source system in real-time.
    • Searching source system documents requires search connectors or BCS or cross-farm search configuration
    • Requires proper asynchronous document download process to download documents from the source system.
  • Final Thoughts
    • Having single source of documents may be great idea but accessing Internal Documents in Real-time with service accounts and metadata may not be best secured approach. Internal systems  may contain both internal and external facing documents and poor service accounts & metadata management along with improper code logic may expose non-external facing documents to the external systems. Additionally, outage of the internal systems would directly affect the documents availability on the external farm.

Push Methodology  – Push documents directly to destination System

  • How?
    • Most of the work happens on the Source System
    • Destination Farm will provide document center or document libraries where source system would publish documents to share with the destination farm.
    • It would require Workflow or Document Publishing process to push documents from source farm to the destination farm. Additionally, there are considerations for Sync, Versioning, Archiving, and Deleting documents if needed.
  • Pros
    • No Security Hardening and Data Security concerns since documents are pushed to the destination system. Data Security is handled by the destination SharePoint sites.
    • Search against Pushed documents. No need for search connectors or BCS or cross-farm search.
    • Best Performance or latency due to accessing documents from the destination systems directly.
    • Doesn’t Require Source System should be available all the time. Outage of source system for maintenance purpose doesn’t affect documents availability in destination farm.
  • Cons
    • Additional Storage Consideration
    • Multiple copies of same documents in Source and multiple Destination Systems. Requires Sync or Versioning or requires proper workflow or business rules to publish to the destination systems.
    • Multiple destination systems requires pushing documents to the multiple destination systems. In future, additional destinations requires updates on the source systems to push the documents to additional destination systems.
  • Final Thoughts
    • This approach may work best when you have 1 to 1 internal and external environments. In case of multiple external facing environments, internal documents gets copied to multiple places and there might be situation where lots of duplicate documents and contents gets created over the time.

Mix of Pull vs Push Methodology – Document Warehouse or Publishing Hub

  • How?
    • This is combination of both Pull and Push Approach.
    • Needs to create Publishing Hub or Document Warehouse to host the external facing documents. Content Type Hub may help to publish documents from Internal Farm to Publishing Hub Document Center. This may allow opportunity to flatten out hierarchical internal document libraries structure in publishing hub.
    • It would require Workflow or Document Publishing process to push documents from source farm to the publishing hub. Additionally, there are considerations for Sync, Versioning, Archiving, and Deleting documents if needed.
    • Destination System will Search and Download documents from the Publishing Hub using Client Object Model and WCF Services Model. There are also needs for Masking Internal Documents URLs in External Farm.
    • Publishing Hub documents are accessed by External Systems using combination of Service Accounts and Metadata. There are also needs for Metadata Sync process to make sure both internal and external systems accessing data with same metadata. May be common Metadata Management System would help.
  • Pros
    • Better Data Security than Pull Methodology. Managing data security in the publishing hub is less critical than source systems since publishing hub would contain mostly external facing, read only documents. It would require proper metadata, service accounts, and IT governance to harden the security for customer specific documents.
    • Doesn’t require source system should be available all the time. Outage of source system for maintenance purpose doesn’t affect documents availability in destination farm.
  • Cons
    • Additional Storage Consideration
    • Latency and Performance Issues to pull the documents or data from source system in real-time.
    • Multiple copies of same documents in Source and Publishing Hub Systems. Requires Sync or Versioning or requires proper workflow or business rules to publish to the destination systems.
    • Searching source system documents requires search connectors or BCS or cross-farm search configuration
    • Requires proper asynchronous document download process to download documents from the publishing hub.
  • Final Thoughts
    • This approach would provide best of both worlds. It would provide adequate data security, control over what documents available for external facing systems, and isolation of both internal and external farm.

On the final note, although this article discusses Internal vs External farm scenario, It can be easily applied to two internal and external web applications within same farm.

Advertisements
This entry was posted in SP2010 Architecture. Bookmark the permalink.