A technical overview of the Citrix App Layering 4.x Elastic Layering Architecture and Configuration details.
What is Elastic Layering
Elastic Layering in Citrix App Layering is a method to dynamically deploy applications to a Virtual Machine at the time the users logs onto the VM. The Citrix Layering Services running in the target VM are configured to use an SMB network share as the Citrix App Layer Repository. The repository contains both the layers and configuration files required for deployment. The Layering Services will read the configuration files then mount the layers assigned to the users. This process is designed to work entirely within the guest operating system using native Windows VHD file mounts over the network.
This creates a scalable environment that is also easy to manage. One where IT Administrators can provide recovery or disaster recovery simply by replicating the file share within the same site, secondary site, or the cloud. App Layering is leveraging software in the native windows OS, eliminates dependencies on the hypervisor and removes the need to replicate databases and management servers in multi-site
Guest Layering Service
The Citrix Layering Service shown below will be added to a layered image if enabled within the layered image template.
It is this service that is responsible for enumerating and mounting elastic layers right before user logon during the “userinit” process.
Configuration files in the Layer Repository
Three files stored in the Elastic Layering Share are used to define the mapping between users, groups machines and layers. These files are the ElasticLayerAssignment.json, the Layers.json and the MachineAssignments.json as seen below.
This file defines the Layers in the repository and metadata about the layers used by the Citrix Elastic Layering Filter Driver.
This file contains the information about user and group mapping to individual application layers. This file will contain and entry for each group or user ID that has assigned applications. The picture you see below shows that the group “ITGuys” have five layers assigned (4489220,448922,448224,7208963, and 8028162)
The Citrix Layering Service can then use the Layers.json file to obtain the path to the VHD to be mounted. We can also see a Script Path, which defines the Run Once Script that will run when the layer is mounted. The Priority number determines the order in which the layers are applied by the filter driver.
This file defines machine associations – You can use a computer name pattern containing wildcards to associate a set of computers with any AD group. Any user that logs onto a machine that matches the pattern will receive the Elastic Layers assigned to the group. Note: * the only wildcard character that can be used is “*”.
Layer VHD files stored in the Elastic Layer Repository
When an Elastic Layer is first assigned, the VHD file that encapsulates that layer is copied to the Elastic Layer Repository in the Unidesk\Layers\App directory. Each VHD maps to a specific application layer and version (revision) of the application layer. It is this directory, and the contents of the root of the share that should be replicated for load balancing and Disaster Recovery. If a Layer is deleted from the Repository, it will be copied back to the repository the next time that Layer is assigned to a user.
It is possible to map the layer disk file back to the layer using the Layer ID from the UMC. This information is also in the layers.json file.
The entire Elastic Layering initialization process takes only a few seconds right before a user logs into a desktop. During this process, the ElasticLayerAssignments.json is read to determine layer assignments, Run Once Scripts and the assigned layers are mounted by the service.
Each layer that is mounted includes its own registry hives that App Layering calls RSD hives which are mounted in the registry under HKLM. that the hive names always start with “RSD_VIRTP” as shown below.
The App Layering filter driver used for Elastic Layering will dynamically enumerate these hives to find the desired keys and return their path. Therefore, when a registry value is accessed it is redirected to the appropriate RSD_VIRTP key instead of where that application would normally live in the registry. The virtual disk is then made available as part of the App Layering Composite File system.
Below you can see that the VHD for each of the assigned layers show up in Disk Management and the end user is only presented with one ULayeredImage C: drive.
Session Based Elastic Layers
Session based elastic layers work just like normal elastic layers with one exception; they are restricted to a particular user session. No other user logged into that session host can see that application unless they have also been authorized by the App Layering Administrator. A local system administrator can login to the system and see all the layers that are assigned to a specific session host.
Since multiple users can be logged into a session host the first thing that layering services will do is check to see if the requested layers are already mounted on the host. If the layer is already mounted, the user is simply “authorized” to see the registry and file system data. Once the user is logged in, they will see their applications, just as other authorized users do. When a layer is not already mounted on a session host, it is added during the logon process the same way it would be during a desktop logon.
When a user logs off from a session host, the applications associated with them are left on that host. The assumption is that there could be other logged on users who are accessing that data. If for some reason a layer must be removed from a session host, the administrator will have to wait until all users are logged off and the host will have to be rebooted.
Machine Based Elastic Layers
Machine based elastic layers work just like normal elastic layers with one exception; they are restricted to a particular set of Machines and all users that log into that machine will be able to use its associated layers. This is accomplished by adding a machine to an Active Directory Group and adding an application assignment to that group.
Machine based assignment can also be limited to a specific machine name pattern by editing the groups Machine Association within the Enterprise Layer Manager. This will cause the elastic Layer to be assigned regardless which users logs onto the VM. Any layers assigned in this way are added to the system the first time that a user logs into that machine post boot. While the user assignments are checked, the machine assignments are checked as well.
As part of the finalization of an Application Layer, the Elastic Fit Analysis is run on the layer contents to attempt to identify the application captured is likely to work when elastically assigned. The analysis looks for specific windows changes or components that are known to not work correctly when elastically deployed.
This layer should work when deployed elastically.
This layer will probably not work when deployed elastically, or may behave differently than when it is deployed in a Layered Image.
The results of the analysis are given as simple Green Check for a good Elastic Fit or a Red X for a Poor Elastic. This is shown on the details view of the layer by clicking the “i” on the layer’s icon in the Layers tab of the ELM.
Anytime an application has a poor Elastic Fit, you will be warned each time you attempt to add an Elastic Assignment to remind you it has been identified as not likely to fully function. The Poor Elastic Fit will not prevent the admin from adding and Elastic Assignment, the admin simply needs to click Assign Elastically button each time to ignore the warnings.
Note: this could mean that only a portion of the application may not function completely, not that it entirely will not work. Testing and validation are always important when determining whether and application layer will behave work correctly.
The Elastic Fit Analysis is a series of rules defined in the LayeringRules.json file on the ELM. A rule is a check for a specific Registry Key Existing, Registry Key Value, or modification of a File and they are run against the finalized layer .vhd very quickly. There are dozens of rules continued in the file and they are constantly being improved and added to with every App Layering version.
"Comment": "Scheduled tasks in your layer might not work",
"Comment": "service start values changed in the layer will not be honored.",
"Comment": "Changes made to how this system boots will not work from an elastic layer",
Each rule has a Name and Comment that is displayed in the Elastic Fit Details of the Layer in the ELM. The Severity is either Low, Medium, or High whether that item is unlikely to cause an issue, may cause an issue, or likely to cause and issue respectively. The RulePath is where the check looks in the file system or the registry, and the RuleValue is what the check looks for, either comparing the value or whether the file/key exists at all.
This is unlikely to cause any change in behavior or functionality for most applications.
This may cause minor changes in behavior or functionality for some applications.
This is likely to cause significant changes in behavior or functionality for many applications.
During the Elastic Fit Analysis, any rule check that comes back with a result is added as a warning in the Elastic Fit Details including the Severity, Name, and Description. The highest severity of rule that is triggered will be used when determining for the Elastic Fit. An application layer may have several Low Severity warnings, and still be given a Good rating for Elastic Fit as the warnings are meant to show potential problems.
What Types of Applications Don’t Elastically Layer
Generally, any application that modifies system wide settings or contains services that need to be running when the user is not logged in are not good candidates for Elastically Layering and should only be deployed as part of a layered image. Typical examples of this would be anti-virus or security software, single-sign-on extensions, remote access or administration agents, and even device or printer drivers. Because driver registration is stored in the windows driver store, which is scanned during computer startup, they are often not recognized or utilized when part of an elastic layer.
Services are supported as part of applications even elastically assigned, if the services don’t insert themselves as a perquisite to already running or system service. A good example of this would be a security software placing its firewall component service as a pre-requisite of the windows TCP/IP service, as that service will have already started on system startup. Services included in an application layer will be registered and automatically started during the login process, so any application it a log service startup time may not be a good candidate for Elastic Assignment as the service starting will hold the entire windows login process.
We measured the overall performance/impact to logon in two ways. The first was to measure the time it takes for layers to attach and be presented to the user (measuring the direct impact to logon time). The second was to measure IO and Bytes sent over the network during the login process (to allow you to estimate IO load of layer attachment).
As described previously in this paper, Elastic Layers are stored and run from an SMB file share. They are not streamed or copied into the virtual machine’s local disks. This means that all read IO is done at the file share. Writes are still handled on the managed machine but read IO is shifted from the MM to the file share.
While no testing scenario can duplicate your environments Windows and application usage, here we have set out to determine how a single Session Host virtual machine may use the Elastic Layers from an IO perspective.
The test was run on a Windows Server 2012R2 Session Host virtual machine hosted on vSphere 6. The server was configured with 4 vCPUs and 8 GB of memory. In addition, no other VMs were run on the host to ensure no contention for resources during testing.
The file share hosting the elastic layers was a physical Windows 2012 R2 file server. A 10GB network connection was used between vSphere and the file share to ensure network bandwidth would not be limited during the testing and skew the results. SSD drives were used as underlying physical storage for the Windows File Share.
Applications used in this testing scenario were chosen to show a mixture of size and layer complexity. The application tested were: Microsoft Office 2016, Firefox 48.0, and WinRAR x64 5.3.1. It should be noted that the Office configuration in the test environment was deployed as Office Standard without OneNote. This was done because OneNote contains a print driver that should not be elastically delivered. To maintain the integrity of the test the team decided to remove OneNote from the installation.
Impact to Login Times
To determine impact to login times we use the log file called ulayersvc located in c:\programdata\Unidesk\Logs. This will file logs and time stamps each step in the layer attachment and mounting process. This allows us to determine the amount of time it takes to complete the entire process.
During the testing, each layer was individually assigned and tested. Timings are determined by viewing the ulayersvc log from the “Received logon event” entry to the “Layering Complete” entry. Between these two entries the system reads the JSON based rules files, mounts the appropriate VHDs, and presents the virtual files system and registry into the session.
In the example above we can see the logon event starts at 10:25:50.097 and the process of mounting all the assigned layers completes at 10:25:54.019, for a total time of 3.022 seconds. This 3 seconds would be the impact (addition) to logon time compared to a session without elastic layers.
As you can see from the snippet of the ulayersvc log below, the total time involved for the layer to become functional was 4 seconds. There is a service that gets created (ose) but because the service is set to manual start, App Layering skips even trying to start the service. Five logins were run during different portions of the day and 4 seconds was the average.
Firefox & Chrome
In this example, you can see that there are two services that are created (gupdate and gupdatem) and started. This had minimal impact on the login times as you can see. The total time involved for the layer to become functional was 2 seconds. Five logins were run during different portions of the day and 2 seconds was the average.
In this example, you can see that there are no services that are created. The total time involved for the layer to become functional was 4 seconds. Five logins were run during different portions of the day and 4 seconds was the average.
Impact of Multiple Layers Attached at Logon
Now these timings all show individual layers and how they individually affect login times. If you have multiple layers, it is not additive. For example, if you have two apps that individually take 4 seconds to mount, it does not take 8 seconds to mount both layers.
The reason for this is that there are several tasks of the logon which are done simultaneously. For example, layers are mounted simultaneously, the JSON files are read once at user logon, services are started simultaneously, and so on.
Disk IO and Network Impact
The next set of data to review was taken from the server hosting the Elastic Layer share. Windows perfmon was used to show both the disk IO and Bytes sent over the network
Per the ulayersvc log , the attach and composition process takes a total of 4 seconds. The layer is read from for an extra 12 seconds beyond that. The extra time can be seen in the perfmon graphs (Figures 1 – 3) below. The interesting point to note is that the extra time taken does not affect login times. For example, a user login does not take an extra 12 seconds due to the data being read in from the layer. Once the layer is attached, the rest is done in parallel with the login process.
The initial spike in the graphs corresponds to the layering service on the host mounting the disk and compositing the file system and registry (4 seconds). The disk IO and information sent over the network is extremely low and has a minimal impact on the disk and network itself.
Tests were also done to see the impact during usage (Figure 4). Specifically, Word was used to measure the impact on IO. The test was a simple typing test but it is interesting to note that there is constant, if minimal, activity during this time. This is due to the spell check, grammar check, etc that is all happening in the background during typing. As an interesting counterpoint, PowerPoint was run (not shown) while displaying a presentation and it generated almost no activity at all. It had an initial spike to read the required files into memory and then didn’t touch the disk afterwards.
Firefox & Chrome
Looking at Firefox, the figures below (Figures 5 – 7) show the same spike in the mount process and data read in after. It is all minimal activity during this time though.
In fact, usage tests were done with Firefox (not shown) as well. A small spike as the Elastic Layer was read for required files and then nothing after that during usage. Once the executable and required files were in memory, there was no need to go back to the disk.
Looking at WinRAR, the figures below (Figures 8 – 10) show the same spike in the mount process and data read in after. It is all minimal activity during this time though.
In fact, usage tests were done with WinRAR (not shown) as well. A small spike as the Elastic Layer was read for required files and then nothing after that during usage. Once the executable and required files were in memory, there was no need to go back to the disk.
In conclusion, Elastic Layering does not change the IO pattern of applications. It simply shifts the read IO from the VM to the file share. As your environment is configured, you will want to take that into effect when deciding what share or shares to use for elastic layering.
IO & Timings
The following charts help to define the load on storage and network for several elastic applications. Note the black line in each chart shows the defined metric.
Elastic Share Architectural Considerations
Probably the most important aspect of the App Layering Architecture is how the elastic share will be provided and the network path from the VDI Desktops and Session Hosts to the share. The design for the technology used to provide the share should be determined based on how elastic layering will be used.
Elastic layers, as discussed earlier, are VHD files attached by desktops and sessions hosts within windows over the VM Network. Application layers are mounted as read-only with many machines mounting the exact same VHD. Therefore, all of the IO for that VHD will be read IO. The file server/share used for Application Layers should therefore be optimized for read IO.
On the other hand, the user writable elastic layer is mounted read/write and only by one desktop or session. The file server/share used for writable elastic layers should be optimized for write IO. Another important architectural consideration when implementing user layers is that all writes for the user will go into the user writable layer. System writes will still go into the caching mechanism for the provisioning system, for example the write cache configured by PVS server, but since all user writes go into the User Layer, the storage used for the share for that layer must be able to handle the user based write IO. This is the most important IO in VDI for the feeling of performance for the user. Therefore, flash based storage will work best for the user writable layer storage.
The App Layering Architecture allows architects to split up the App Layers from User Writable Layers so they can be optimized differently. It also allows for creating copies or replicas of the share to spread the load.
This architecture assumes you will use at least 10GB networking for the VM Network the VDI Desktops/Session Hosts are attached to and that the file servers sit on this same network or a network with the same high bandwidth connectivity to the VM network. This network will in effect be acting like a SAN for the layers delivered elastically.
Elastic share availability is crucial for any companies Application Service. If the elastic shares are unavailable the elastic layers will of course not be available.
We are asked quite often if using DFS shares will work with elastic layering. In our informal testing Microsoft DFS has worked fine for elastic layers. However, once an elastic layer is mounted on a desktop or session host if a DFS link target fails, then all the machines that have mounted layers on that target must be rebooted to mount layers on another DFS target.
On the other hand, a clustered file share will fail over to another node without the Desktops or Session Hosts being rebooted.
Clustered shares are available from many NAS vendors as well as by using Microsoft Scale Out File Server for Application Data on Server 2012R2 or Server 2016. SMB3 would be the preferred protocol for clients that support it (Win10, Server 2012R2, Server 2016).
The architecture for Elastic Layers is highly scalable. The App Layer Elastic share repository path is defined in the VM HKLM registry for both VDI Desktops and Session Hosts. This makes it possible to have an unlimited number of replicas to spread the load.
The location used for user writable Elastic Layers is assigned by Active Directory group and therefore is also highly scalable because as many shares as desired can be used.
App Layers are attached many to one from the elastic app layer share using a Windows mount which makes a block level connection to the VHD disk. The number of desktops/session hosts that can access a particular layer will depend on how much the disk is read from by the application. The more chatty an application is to its “bits” the more utilized the disk will be. As the number of machines using the disk increases at some point disk utilization will become too high and another “replica” will be required.
The way to create new replicas is to make a new share, copy the contents of the existing share, then to change the following registry key (Reg_SZ) on the clients normally through a GPP or GPO:
Value = \\unideskfs1\unidesk
Using this setting it is possible to have an unlimited number of replicas as needed. It would be preferable to use an application or to develop a scripted method to ensure the replicas are always synched from the master share defined in the Management Console.
User Writable Layers
User layers are assigned one to one. One user can have only one User Writable Layer per OS layer per Domain. The user can therefore only log on to one delivery group/pool with a desktop using the same OS layer with the User Layer enabled.
As many user shares as desired can be created. These shares must then be defined in the management console under SYSTEM>Storage Locations. These storage locations have a name and a path as seen below.
Each storage location must have one or more active directory groups associated with it. The best practice would be to create a group specifically for this assignment, then add users into that group to assign the layer.
Storage locations are then prioritized so that if a user is in groups under more than one storage location they will use the one with the highest priority.
If you look on the share you first see a “Users” folder, then Domain_ UserName, then OS layer. In that folder, there is then the actual user layer VHD as well as two diagnostic files.
Diag.txt show the last logon info and logoff.txt shows the last logoff info.
When to create Replicas
We would love to be able to say that every elastic share can handle up to 10,000 users and each user elastic share can handle up to 5000 users. However, it doesn’t really work that way because all users use these infrastructure components differently as do all applications.
Therefore, a standard engineering approach should be taken with deployment of elastic shares for Apps and Users. First create a standard design that assumes you will require multiple shares for both Apps and Users. Then create an initial share for each. If you know you are going to scale fairly quickly create more than one of each. For app layers define the mechanism that will be used to replicate the elastic applications shares and implement it. Create your GPO or GPP settings and define how to separate desktops into different OU’s.
Now you are ready for initial deployment. Start small with a pilot. Analyze the load both read/write on the shares/storage. Add users to the deployment in some definable way meaning by Department, Group, organization, alphabetically and so on. As you add users monitor network and storage IO and as IO becomes too high or performance is actually impacted move some users/machine to a new share and grow there. As you grow you will have a much better idea of your requirements.
User Layer Size
A user layer by default will be created by first checking the user layer share to see if there is a quota defined for the user. If a quota is defined then the user layer will be configured to be a maximum of that size. If no quota is defined and no over-ride is defined then the user layer will be set to a max size of 10GB.
In order to have quota affect User Layer sizing it must be a hard quota. We have tested Microsoft's quota tools; File Server Resource Manager (FSRM) and Quota Manager.
The quota must be set on the User Layer directory meaning the one named “Users”.
Note: Changing the quota (increasing or decreasing) only impacts new User Layers. The maximum size of existing User Layers was previously set and will remain unchanged when the quota is updated.
It is possible to over-ride the default User Layer max size using the registry on Managed Machines. The following keys are optional and do not need to be configured for normal operation. If needed they must be added manually using a layer or a GPO/GPP.
Registry Root: HKLM\Software\Unidesk\Ulayer
|True to enable discovery and use of quotas.
False to disable.
|DefaultUserLayerSizeInGb||DWord||User defined||The size of the user layer in GB (E.g. 5, 10, 23, etc.) When not specified the default is 10.|
|QuotaQuerySleepMS||DWord||User defined||The number of milliseconds to wait after creating the directory for the user layer before checking to see if it has a quota. This is necessary to give some quota systems time to apply the quota to the new directory (FSRM requires this). When not specified the default is 1000|
User Layer Backup
If you plan to back up user writable disks determine the method that will be used for that. These disks will be locked for write much of the time. You could create a script to check over-night and copy the VHD if it’s not mounted. You can also use something like SAN replication at the storage level. These disks can be very large so it may be challenging to copy them using a script. San replication would be much more efficient.