OK - so I'm a bit irritated, but this is a great case of Microsoft's dev teams doing something which contradicts documentation and expected behavior. Here is the story, including the business case which this bug stomps all over.
Doug Ware (formerly of Magenic fame and now an independent SharePoint consultant) and I have been engaged by a client of Intellinet's to provide them with a custom web publishing solution. The business scenario is fairly simple. The client has indepenent authors creating valuable content which the company wants to sell to a variety of customers. The nature of the clients are such that each client needs its own private branded web site, so the custom publishing solution is responsible for taking the centrally created content and pushing it to n number of structurally identical portals. Sounds like an ideal case for Microsoft Office SharePoint Server 2007 WCM, doesn't it? We certainly thought so.
So here is the problem. Create a pair of web applications. Call one Publishing and call the other Destination. On the Publishing site, create some subsites and some content in those subsites. Leave Destination alone for now (a blank template is fine). Now, in the Content Publishing Paths and Jobs section of SharePoint Central Administration, create a path to push content from the Publishing site collection to the Destination site collection. Create a job from that path which pushes the entire site collection from the Publishing site collection to the Destination site collection. At this point, it is important to do the entire site collection at one shot to guarantee that all the site features and structural elements are synchronized. This last statement was confirmed by Microsoft Support (getting to that bit next). Run the job and check that the Destination site looks EXACTLY like the Publishing site. Notice that even site images and titles are changed!
Phase two of the publishing SHOULD be to brand the client's Destination site with custom themes, site images, layouts, etc. The top level site is really just a landing page directing users to the subsites, and is not expected to be re-published. So go try that - give the Destination site collection's top level site a different theme and title. Now go back to the Central Admin's paths and jobs section. Create a new job which excludes the root site and only includes one of the subsites for publication. Notice before you click OK on the job that the default setting is only to synchronize changes in content. Click OK and run the job. At this point, you would expect nothing to change, right? After all, you didn't make any changes to the Publishing site's selected subsite so there are no deltas. Ready for a surprise? Go check the Destination site's top level site. Remember that new theme and title? They're gone! Ready for another surprise? Go back to the definition of that job you just created, the one which specified a single subsite for replication instead of the entire site collection. Notice anything different? Now the job definition has changed and includes the root site! WTF!?! Doug discovered this when doing some basic POC testing with content publishing at the beginning of the client engagement.
So Doug and I called Microsoft Premier Support Services. We spoke with a support representitive named Lokanath who took a few days to understand the actual issue we were reporting. Once the actual issue was understood, he referred the issue to the internal development teams. Here is what I got back a week and a half after opening the issue:
"In any case, the behavior customer is seeing - Content deployment will always deploy root site collection even if the content deployment job only defines subsites to be deployed and explicitly excluded the root site. This behavior is by-design and that is the expected behavior. The reason is that Content Deployment requires the site collection features to be synchronized in the source and target site collections. And instead of assuming that they are synchronized, Content Deployment will always re-synch each time it is ran. To re-synch, Content Deployment needs to re-deploy the root site every time. The final outcome of the content deployment is to get the synchronized source and the destination site collections."
Bogus answer guys! This totally contradicts your documentation and even the tool's UI! The following links take you to Microsoft documentation stating that it is not necessary to replicate the entire site collection when using Web Content Publishing:
- http://office.microsoft.com/search/redir.aspx?AssetID=AM101639351033&CTT=5&Origin=HA101639821033 – a Visio diagram outlining frequent content updates on a news publishing site. This document is linked to on this page: http://technet.microsoft.com/en-us/office/bb310619.aspx
- http://office.microsoft.com/search/redir.aspx?AssetID=XT102253341033&CTT=5&Origin=HA101680161033. This document is linked to from this page: http://office.microsoft.com/en-us/sharepointserver/HA101680161033.aspx
- http://blogs.msdn.com/sharepoint/archive/2006/05/02/588140.aspx, Mr. Butler says “A job is associated with a path, and it determines exactly which sites in the source site collection will be deployed and on what schedule. You can have many different jobs for a given path, each running on different schedules and deploying specific sections of your site. That’s right – a job has a schedule and can deploy content updates regularly without the need to manually kick it off every time. For example, let’s say you have a Press Releases site that needs to be updated every hour, and an Employee Bios site that only needs to be updated every month. You would create two different jobs, one that runs every hour and deploys the Press Releases site, and another that runs monthly and deploys the Employee Bios site.”
- This MSDN article, http://msdn2.microsoft.com/en-us/library/ms549024.aspx, says “A job is associated with a path, and it determines exactly which sites in the source site collection are deployed and on what schedule. You can associate many jobs with a single path, and each can run on a different schedule and deploy specific sections of your site”
- The application’s user interface under Site Options specifically says to pick “specific sites” to publish and give you an interface to pick the desired sites and exclude undesired sites.
All of these different sources point to the ability to publish a single site independently of other sites, not on an entire site collection by site collection basis.
I'm going to lobby Microsoft to consider this a bug and not a side effect of good design and bad documentation. If the only use of publishing is to create exactly identical sites, it is limited in functionality to supporting only a single staging/production environment type of scenario. News publishing and general CMS strategies immediately go out the window. I'll post a follow-up to this entry if and when I hear back from PSS. IMHO this is a pretty big bug. Doug Ware was even more colorful in his general description of the problem.
[edit] - Before I even posted this, but after I'd typed it up in draft form, I received a call from a manager in the SharePoint product team's support staff. He heard about my dissatisfaction with the proposed resolution and has offered to put one of his tech leads on the case and to call me back by 2pm Friday. If nothing else, at least they're trying to leave me with a satisfied feeling as I work through this issue.