SharePoint “15” App Model – Introduction



MACROBUTTON AcceptAllChangesInDoc Microsoft SharePoint 2013 - Search PlanningVerified Against Build #15.0.4128.1014Prepared bySriram BalaSharePoint PracticeTable of Contents TOC \o "1-3" \h \z \u Search Overview PAGEREF _Toc383936889 \h 4What is the basic structure for Search in SharePoint 2013? PAGEREF _Toc383936890 \h 4Logical Architecture PAGEREF _Toc383936891 \h 4Crawl and Content Processing Components PAGEREF _Toc383936892 \h 5Analytics Processing Component PAGEREF _Toc383936893 \h 6Index and Query Processing PAGEREF _Toc383936894 \h 8Search Administration Component PAGEREF _Toc383936895 \h 9Physical architecture PAGEREF _Toc383936896 \h 10Small search farm PAGEREF _Toc383936897 \h 10Medium search farm PAGEREF _Toc383936898 \h 11Large search farm PAGEREF _Toc383936899 \h 12Server Roles PAGEREF _Toc383936900 \h 13Hardware requirement and scaling Considerations PAGEREF _Toc383936901 \h 15Search Engine Optimization (SEO) PAGEREF _Toc383936902 \h 18SEO Properties for publishing pages PAGEREF _Toc383936903 \h 18How you can use SEO Properties for publishing pages PAGEREF _Toc383936904 \h 18Canonical URL PAGEREF _Toc383936905 \h 22How you can use the canonical URL PAGEREF _Toc383936906 \h 22Site ownership verification PAGEREF _Toc383936907 \h 24How you can use the site ownership verification feature PAGEREF _Toc383936908 \h 24XML sitemap PAGEREF _Toc383936909 \h 25How can you use XML Sitemap PAGEREF _Toc383936910 \h 25Robots.txt file PAGEREF _Toc383936911 \h 26How you can use Robots.txt file PAGEREF _Toc383936912 \h 26Friendly URLs PAGEREF _Toc383936913 \h 27How you can use friendly URLs PAGEREF _Toc383936914 \h 27Search Overview HYPERLINK "javascript:void(0)" \o "Click to collapse. Double-click to collapse all." The search architecture contains search components and databases. How you structure the search architecture depends on where you intend to use search: for the enterprise or for Internet sites. When building the search architecture, you should take into account considerations such as high availability and fault tolerance, the volume of your content and the estimated amount of page views and queries per second.What is the basic structure for Search in SharePoint 2013? We’ve broken it down here:MS Search transfers responsibility to Host Controller ServiceMS Search Process remains the Core Process for Crawl ComponentIndependent Node Runner processes for each componentLogical ArchitectureThe search components can be categorized into five groups or processes:-Crawl and Content Processing ComponentsIn SharePoint Server 2013, the crawl and content processing architecture is responsible for crawling content from support content sources, delivering crawled items and their metadata to the content processing component, and processing the content. These break down into the following components? Crawl component—the crawl component crawls configured content sources using the associated connectors and protocol handlers for the target content source. The actual content and associated Meta data is then passed to the content processing component.? Crawl database —the crawl database is used by the crawl component to store information about crawled items and to track information and history of crawls that have taken place.ResponsibilityRetrieves content that needs to be indexedBrings actual content and the metadataInvokes the protocol handlersUtilizes the Crawl Database to maintain list of items to be crawled? Content processing component — The content processing component receives items, processes and parses items using format handlers and iFilters, and transforms items into artifacts that can be added to the search index. This includes mapping extracted properties to properties defined using the search administration component.? Link database — The Link database stores information relating to links and URLs found during the content processing process.ResponsibilityProcesses content from Crawler and Feeds to indexNew Parser Handler introduced Format HandlerWrites links to link databaseGenerates Phonetic Name VariationsContent Submission Service – CSSSharePoint Server 2013 crawl and content processing search architecture is flexible in that it enables you to scale out the crawl and content processing operationsAnalytics Processing ComponentIn SharePoint Server 2013, the analytics processing component is now directly integrated into the search architecture and is no longer an individual service application. These break down into the following components.? Analytics processing component — the analytics processing component is responsible for processing search and user-based analytics. It performs search analytics by analyzing crawled items and usage analytics by analyzing how users interact with those items. For example, user interaction information is retrieved from the event store that has beenAggregated from usage fi les on each of the web front ends in your server farm, and analyzed by the analytics processing component. This enables a wide range of sub-analyses to be performed.? Content processing component — the content processing component receives search and user analytics results that in turn are used to update the index.ResponsibilityUses Search Analytics to analyze Crawled Items, Executed Queries and clicked search resultsGenerates Usage Reports of what’s been viewed, what sites have been visited, and how many times an item has been viewedHas the ability to add more APC RolesData is stored in Analytics & Link DBEvent store? Link database — the Link database stores information extracted by the content processing component. The analytics processing component updates the Link database to store additional analytical information, for example, the number of times an item was clicked.? Analytics reporting database — the analytics reporting database stores the result of usage analysis. SharePoint Server 2013 analytics processing search architecture is flexible in that it enables you to scale out analytics processing operations by seamlessly adding additional analytics component instances to your search topology. This enables analytics processing to complete faster.?Event Store — the event store holds usage events that are captured on the front-end, such as the number of times an item is viewed. These usage events are stored as log files on the application server that hosts the analytics processing component.These are a list of the analysis and reporting that is done by the analytics processing component:-Search AnalyticsLink and Anchor text analysis analyzes links and anchor texts from the document corpus and improves recall and the precision of search resultsClick Distance calculates the actual distance (in clicks) between a set of authoritative sites and specific items, which helps determine the relevance of a site to a particular itemSearch Clicks uses information about what people actually click (and not click) in search results to improve relevancy.Deep Links uses information about what people actually click in the search results to dynamically calculate which pages under a given site are considered important sub-pages. These pages are shown in the search result as "important" shortcuts for this site, almost like a dynamically generated site map.Social Tags affect ranking and recall when a query expression matches a social tag. Social Distance calculates 1st and second level colleagues. This information is used in people search to boost people that are closer to you in the org chart. Search Reports analyze and aggregates search click data to provide Search Reports like Top Queries, queries with no results, etc.Usage AnalyticsRecommendations creates recommendations between items based on how items are used. This is what supports the scenario of people that viewed item A, also viewed items X, Y and Z.Usage counts calculates events for items. That could be how many times an item is opened, not just from the search result page, but overall. It also calculates the time element in these counts, like what has happened recently and what’s happened over the life of the item. These counts are done for items, but also for elements at the site collection, site and tenant as well, such as # of views, # of unique visitors, etc.Activity Ranking is ranking based on activity in SharePoint content that is outside of search. For example when people locate data/documents directly and click on links to navigate this traffic is captured and used as input to the ranking. Items that have lots of usage activity (clicks, views, buys), typically get a higher activity rank score than less popular items.Index and Query ProcessingIn SharePoint Server 2013, the indexing and query processing architecture is responsible for writing processed items received from the content processing component, handling queries and returning result sets for the query processing component, and moving indexed content based on changes to the search topology.SharePoint Server 2013 search maintains an index of all processed content (including analytical information). The indexing component can scale using the following features:-? Indexing partition — Index partition is a logical portion of the entire search index. Index partitions enable horizontal scaling in that it enables you spread your entire index over multiple servers in your server farm.? Index components — an individual index partition can be supported by one or more index components. These index components host a copy or replica of the index partition. A primary index component is responsible for updating the index partition, whereas passive index components are used for fault tolerance and increased query throughput. This, in-essence, supports vertical scaling of your search topology.ResponsibilityThe spot where crawled data is placedAllows for Partial Data UpdateAll index partitions are in sync? Query processing component — the query processing component is responsible for receiving queries from web front ends, analyzing and processing the query, and submitting the processed query to the index component.ResponsibilityInvoked when a query needs to be executedAnalyzes and processes the query obtained from WFEProcessed query is then sent to the index componentDoes initial linguistic processing (spell check, thesaurus, etc.)Transforms the query if the query rule matchesSearch Administration ComponentIn SharePoint Server 2013, the search administration component is responsible for running the system processes based on the configuration and search topology. The search administration component breaks down into the following components:? Search administration component — the search administration component executes the system processes required for search, performs changes to the search topology, and coordinates activities of the various search components in your search topology.? Search administration database — the search administration database stores search configuration information. This includes topology, crawl rules, query rules, managed property mappings, content sources, and crawl schedules.ResponsibilityResponsible for Topology Changes and provisions searchMakes sure all components are up and runningUses Search Admin DatabaseSends schedule for Crawl Component and its content sourcesPhysical architecture Content and Size of Search Architecture Volume of contentSample search architecture0-10 million itemsSmall search farm10-40 million itemsMedium search farm40-100 million itemsLarge search farmSmall search farmIf you have up to 10 million items, the small search farm will probably be the most suitable farm for you. Microsoft tested this search architecture, and measured that it can crawl 50 documents per second, and serve 10 queries per second. With a crawl rate of 50 documents per second, it takes search 55 hours to crawl 10 million items in the first full crawl.Medium search farmIf you have between 10 and 40 million items, the medium search farm will probably be the most suitable farm for you. Microsoft tested this search architecture, and measured that it can crawl 100 documents per second, and serve 10 queries per second. With a crawl rate of 100 documents per second, it takes search 110 hours to crawl 40 million items in the first full crawl.Large search farmIf you have between 40 and 100 million items, the large search farm will probably be the most suitable farm for you. Microsoft tested this search architecture, and measured that it can crawl 200 documents per second, and serve 10 queries per second. With a crawl rate of 200 documents per second, it takes search 140 hours to crawl 100 million items in the first full crawl.Server RolesWeb ServerApplication Server with Search ComponentsDatabase ServerHardware requirement and scaling ConsiderationsScaling GuidelinesSearch Engine Optimization (SEO)OverviewWhen building public-facing websites you could assume that eventually your site will get indexed but it’s probably a better idea to ensure that your content will appear high in the organic search results. Search Engine Optimization (SEO) is the craft of optimizing public-facing websites for indexing by search engines.From SEO perspective there are a number of things that you can do to improve the ranking of your content in organic search results. Some of them have to do with the content but there are a number of technology-related such as XML sitemaps or Meta tags that you can use as well.One of the many web content management-related improvements in SharePoint Server 2013 are SEO features. This section gives an overview of the search engine optimization features provided with SharePoint Server 2013. It’ll also use examples to show how you can apply these new capabilities.Site Collection SEO Settings can be configured through?Site Settings?>?Site Collection Administration?>?Search engine optimization settings.On the SEO Settings page you can specify custom meta tags that will be added to all pages as well as you can define query string parameters that should be excluded from canonical URLs.Rendering SEO PropertiesAfter SEO Properties have been configured, some of them are rendered in the HTML to be processed by search engines while crawling the content. Rendering of SEO Properties is being done by a number of delegate controls that are activated by the?SearchEngineOptimization?Feature. All those controls are associated with the?AdditionalPageHead?Delegate Control so you want to ensure that you have it present on your Master Page if you want to use the standard SharePoint 2013 SEO capabilities on your website.Following is the overview of Delegate Controls responsible for rendering SEO properties on the page. All of those controls are located in the Microsoft.SharePoint.Publishing?assembly in theMicrosoft.SharePoint.Publishing.WebControls?namespace.Browser Title: SeoBrowserTitleMeta Description: SeoMetaDescriptionKeywords: SeoKeywordsExclude From Internet Search Engines: SeoNoIndexCanonical URL: SeoCanonicalLinkCustom Meta Tags (defined at Site Collection-level): SeoCustomMetaSite URL’s Properties for publishing pages On all publishing pages, you can specify the following:a browser title.a meta description that can be displayed on a search results page.keywords that describe the context of the page.if Internet search engines should exclude the page from search results.How you can use SEO Properties for publishing pagesBrowser TitleThe page title has a big influence on the ranking of web pages in search results. From the search engines perspective, each page has two titles. First there is the title defined by using the title tag in the head section of your page. This title is displayed in the browser’s title bar, and is used as a title in search results. Then there is the page title that is defined by using the h1 tag in the body part of your page. Both titles are very important from a search engine optimization perspective, and you should use them both to optimize the ranking of your web pages.By default, the Title property that you specify for your page is used for both the content area (h1) and the browser’s title bar (title). SharePoint Server 2013 allows you to differentiate between the two. To specify a different browser title, click to edit your page. In the Ribbon, activate the PAGE tab. Next, from the Manage group, click the drop-down option on the Edit Properties button. From the drop-down menu, choose the Edit SEO Properties option.On the SEO Properties page, in the Browser Title field, specify the title that you want to display in the browser’s title bar.A word about hierarchical browser titlesA common practice for building browser titles is to make them hierarchical, for example:AX 100 – Tablets – Contoso ElectronicsIn this example, Contoso Electronics is the name of the website, Tablets is the name of the current product category, and AX 100 is the name of the current product. On complex websites, hierarchical titles can help search engines understand the structure of your website. Hierarchical titles also make it easier for your visitors to understand where in your website hierarchy they are currently browsing.If you enable the Search Engine Optimization Site Collection Feature, the browser title will be rendered by the SeoBrowserTitle control. SeoBrowserTitle is a Delegate Control, which is registered with the activation of the Search Engine Optimization Site Collection Feature. It replaces the contents of the PlaceHolderPageTitle placeholder with the browser title. The browser title is either the same as the page title, or what you specified on the SEO properties page.Important: The Search Engine Optimization Site Collection Feature is hidden and cannot be activated through the SharePoint UI. Instead, you can activate it through PowerShell by using the Enable-SPFeature cmdlet.From SharePoint’s perspective, a typical hierarchical title would look like:Page – Category – SiteWhen using Managed Navigation, there is no distinction between pages and categories – they are both terms in the navigation hierarchy. Additionally, there is no standard control available that would allow you to render the title of the parent navigation node. An exception to this situation is when using cross-site publishing, which is discussed later in this article.If omitting the category name from the hierarchical URL is a sufficient workaround, you can render the hierarchical title as follows:<!--MS:<SharePoint:PageTitle runat="server">--><!--MS:<asp:ContentPlaceHolder id="PlaceHolderPageTitle" runat="server">--><!--MS:<SharePoint:ProjectProperty Property="Title" runat="server">--><!--ME:</SharePoint:ProjectProperty>--><!--ME:</asp:ContentPlaceHolder>--><!--MS:<asp:Literal Text=" - " runat="server">--><!--ME:</asp:Literal>--><!--MS:<SharePoint:ProjectProperty Property="Title" runat="server">--><!--ME:</SharePoint:ProjectProperty>--><!--ME:</SharePoint:PageTitle>-->Meta DescriptionEven though it’s not always used by Internet search engines, it is important that you provide a Meta description for your web pages. Internet search engines decide for themselves what description they want to show for the search result. By providing a Meta description, you increase your chances of having the Internet search engines use “your” Meta description.As with Browser Title, Meta Description is one of the SEO properties that you can manage on the SEO Properties page.After you have set a meta description, SharePoint Server 2013 will render it using the SeoMetaDescription control. This is a Delegate Control that is registered with the activation of the Search Engine Optimization Site Collection Feature. Since it’s a Delegate Control, the only additional step you need to do in order to have the meta description rendered in the HTML, is to ensure that your Master Page contains the AdditionalPageHeader Delegate Control. This Delegate Control is used as a container for all SharePoint Server2013 SEO controls.Exclude from Internet Search EnginesOn your website, there are pages that you don’t want Internet search engines to index. Examples of such pages are archive pages or an an error page.To prevent such pages from being indexed, you can select the Exclude from Internet Search Engines checkbox on the SEO Properties page. When selected, SharePoint Server 2013 will add the following code snippet to the web page’s HTML:This snippet is rendered by the SeoNoIndex Delegate Control, which is activated with the Search Engine Optimization Site Collection Feature.Canonical URLWhen indexing web pages, Internet search engines register them with a specific URL. If a web page can be accessed from different URLs, the page will be indexed with multiple URLs. To improve the ranking of your web pages in search results, you need to ensure that every page is indexed under one URL only. Having the same page indexed under multiple URLs not only divides the search result ranking of that particular page amongst the different URLs, but also introduces a risk of being penalized for content duplication.Using canonical URLs is one way to control the URL under which the web page is indexed.SharePoint Server 2013 can automatically generate a canonical URL for web pages.How you can use the canonical URLUpon the activation of the Search Engine Optimization Site Collection Feature, SharePoint Server 2013 will automatically generate a canonical URL for you. On the Search Engine Optimization Settings page, you can configure which query string parameters should be included in the canonical URL. As a result, you will find the meta tag rel=”canonical” rendered in the HTML of your page.Configuring canonical URL parameters for dynamic web pagesWhen building dynamic web pages, the content of the pages may vary based on different parameters passed in the URL, such as to display articles published in a certain month or by a certain author. If the variations are minor, such as sorting, you wouldn’t want Internet search engines to index the same page twice. On the other hand, if the parameter causes the page to display a different set of products, you would want Internet search engines to index the page twice.By default, SharePoint Server 2013 includes all query string parameters in the canonical URL. However, you can change this by navigating to the Site settings of your website, and opening the Search Engine Optimization Settings page. In the Consolidate link popularity with canonical URLs section, choose the Filter link parameters option, and provide a list of query string parameters that should be included in canonical URLs on your website. Use a semi-colon between each parameter.Site ownership verificationOptimizing your website for Internet search engines is an ongoing process. Because usage patterns and search engine algorithms change, you have to continuously monitor how your web pages are performing in Internet search engines.The most popular Internet search engines offer you tools that can help you analyze how your website is ranked in that particular search engine. However, before you can start to use these tools, you have to submit your website and confirm that you are the owner.Although the process of verifying the website’s ownership may differ per webmaster tool, they often allow you to verify site ownership by including a generated snippet of HTML in your website. After the search engine has scanned your website and discovered the snippet, you are allowed to use the web analytics tool to monitor the performance of your website.By using SharePoint Server 2013’s site ownership verification feature, you can easily include the verification code on your website.How you can use the site ownership verification featureSharePoint Server 2013 allows you to include a snippet without having to modifying any of your Master Pages or Page Layouts. To include the snippet, all you need to do is to navigate to Site settings of your Site Collection, and from the Site Collection Administrator group click the Search engine optimization settings link. By using the Verify ownership of this site with search engines option, you can include the snippet in your pages to complete the verification process.XML sitemapIf Internet search engines can’t find your website, then your visitors can’t find your website by using a search engine. Over the years, Internet search engines have improved their mechanisms to discover web pages. In most cases, all of your web pages will eventually be indexed. However, you can help Internet search engines discover the content of your website by creating an XML sitemap. An XML sitemap is an XML file that contains URLs of all your pages. It can also include additional information such as when the page was last changed, how frequently it changes, and how important it is compared to other pages on your website.Manually creating and maintaining an XML sitemap is very tedious. Luckily, SharePoint Server 2013 is capable of creating an XML sitemap automatically for you.How can you use XML SitemapTo have SharePoint Server 2013 create an XML sitemap for you, all you have to do is to activate the Search Engine Sitemap Site Collection Feature. The Search Engine Sitemap job timer job, which by default runs once every day, ensures that your XML sitemap is kept up to date.Important: Although SharePoint Server 2013 creates an XML sitemap daily, you can adjust the schedule to meet your requirements. Please note that frequently creating a XML sitemap might have impact on the performance of your website/environment.After the XML sitemap has been created, you can submit it to the Internet search engines of your choice for processing. Most Internet search engines offer a tool to submit an XML sitemap in their webmaster toolkit.When generating an XML sitemap for your website, SharePoint Server 2013 uses the URL associated with the Internet Zone (SPSiteUrl for Host-named Site Collections and Alternate Access Mapping for path-based Site Collections). If no URL is associated with the Internet Zone, SharePoint Server 2013 uses the default URL.The XML sitemap generated by SharePoint Server 2013 includes all Publishing Pages and content that is published using catalogs, except for the pages that have been excluded through the Exclude from Internet Search Engines option that was described earlier . The contents of the XML sitemap are retrieved using search technology, so if you don’t see all of the content in the XML sitemap, check to see if the content has been crawled and indexed. The generated XML sitemap includes not only publishing pages, but also catalog content. This makes the XML sitemap a very powerful tool for helping Internet search engines discover your content.Robots.txt fileInternet search engines have limited amount of time to spend on your website. To optimize the crawl process, you can create a file named robots.txt for your website. In this file, you can specify web pages that the search engines crawlers should ignore.How you can use Robots.txt fileSharePoint Server 2013 simplifies creating and managing robots.txt files. From Site settings, navigate to the Search Engines Sitemap Settings page. On this page, you can specify webpages that should not be crawled by Internet search engines.Note: To see the Search Engine Sitemap Settings page, you have to activate the Search Engine Sitemap Feature. Important: The Robots.txt file is created together with the XML sitemap. Depending on the timer job schedule, it may take some time for the robots.txt file to reflect the latest changes.The Robots.txt file does not need to be submitted to Internet search engines. It’s automatically processed each time an Internet search engine crawler scans your website for new content.SharePoint Server 2013 automatically appends the URL of the XML sitemap to the contents of the robots.txt file.Friendly URLsURLs have a big influence on the ranking of your web pages in search results. SharePoint Server 2013 allows you to control your URLs so that you can optimize them for Internet search engines.How you can use friendly URLsBy using the Managed Navigation method, you can use the SharePoint Managed Metadata Service to model the navigation hierarchy of your website through taxonomy terms. For each term you can specify a search engines optimized title and URL.Additionally, SharePoint Server 2013 allows you to associate navigation terms to specific pages. When you use Managed Navigation, you can use extension less URLs, for example contoso/computers/tablets. Extension less URLs are more future-proof, and requires less migration effort from other content management systems. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download