64位中使用AWE分配内存

We have already talked about Windows AWE mechanism on 32 bit and how SQL Server utilizes it. Today I would like to go over AWE & related mechanism on 64 bit platforms.

我们已经谈过windows AWE 在32bit 中sql server 如何利用。下面是AWE在64位环境的相关性
 
To some people it comes as a surprise that AWE mechanism is still present and actually could be useful on 64 bit platforms. As you remember the mechanism consists of two parts allocating physical memory and mapping it to the given process's VAS. The advantage of allocation mechanism is that once physical memory is allocated operating system can't reclaim it until either the process is terminated or the process frees memory back to the OS. This feature allows an application to control and even avoid paging altogether. Advantage of mapping/unmapping mechanism is that the same physical page could be mapped into different VAS's regions. As you imaging unmapping is not necessary on 64 bit platforms since we have enough VAS to accommodate all of existing physical memory.
 
对于有些人会很惊奇AWE机制任然存在并且在64位中是可用的。这个机制分为2部分分配内存和映射到给定的VAS中。这种分配机制的好处是不会被系统回收,直到进程终止或者进程内存被释放会操作系统。这特性允许应用程序控制并且避免分页。映射和非映射机制的好处是相同的物理页可以被映射到不同VAS空间。你可以想象在64bit中非映射是没有必要的,因为64bit 有足够的vas容纳所有物理内存。
 

From Operating System theory, OS implements a page table entry, PTE, to describe a mapping of a page in VAS to physical page. Internally physical page is described by page frame number, PFN. Given PFN one can derive complete information about physical page it represents. For example PFN shows to which NUMA node the particular page belongs. OS has a database, collection of PFNs that it manages.  If page in VAS is committed, it has PTE which might or might not point to given PFN.  Conceptually, page that PTE represents can be either in memory or not, for example swapped out to disk. In the former case it is bound to a given PFN and in latter it is not. In its turn, once a physical page is bound to page in VAS, its PFN points back to PTE.

来自操作系统的理论,操作系统引入一个页表项,PTE,描述了一个在VAS中的页到物理页的映射。物理页被页帧号描述,PFN。给定的PFN可以导出所有它代表的物理页。例如PFN显示了指定的物理也属于那个NUMA节点。OS有一个数据库收集了PFN并且管理它。如果页在VAS中已经被提交了,PTE可能指向了一个给定的PFN。理论上PTE上的页可能在内存上也可能没有,如切换到了磁盘上。先前的例子页绑定了一个PFN,并且之后不绑定了。一旦物理页被绑定到VAS中,PFN就反向指向到PTE。
 

When OS commits, frees, pages out/in a given PTE or needs to derive some information about it, for example NUMA residency, it has to acquire process's working set lock - to guarantee stability of PTE to PFN binding. This lock is a rather expensive and might hurt scalability of the process. Latter versions of Windows made this lock as light as possible but avoiding still will benefit application's scalability..

当OS提交,释放,切换一个给定的PTE或者取一些页的信息。比如NUMA位置,页不得不请求进程工作集锁,来保证PTE到PFN之间的稳定性。这个锁定比较昂贵并且可能会损害进程的扩展性。之后的windows版本会使这个锁越来越轻量但是不能保证会对程序的扩展性有好处。
 

When allocating physical pages utilizing AWE mechanism we are given a set of PFN entries directly from PFN database - remember that you should not manipulate or modify set of entries you get back  nor can you rely on values you get back. OS is required to take a PFN database lock when allocating PFN entries. Using AWE map mechanism you can map allocated PFN entries to the process's VAS. When mapping occurs PTEs are allocated, bound to PFNs and marked as locked. In this case OS needs to acquire process's working set lock only ones. When mapping regular pages, OS does it on demand and hence will have to acquire both working set and PFN database lock for every page. Since pages are locked in memory, OS will ignore these PTEs during  paging process.

当物理内存通过AWE分配的时候,我们获取的PFN项直接来至于PFN数据库——记住你不能删除或者修改你获取的项,也不能依赖你获取的值。当PFN被分配的时候,OS会请求锁定PFN数据库。使用AWE映射机制你可以把PFN映射到进程的VAS中。当映射发生PTE就会被分配,绑定到PFN并且标记被锁定。这个时候OS只需要获取进程的工作集锁。当映射常规也的时候,OS按需求,并且会请求工作集锁和PFN数据库锁。因为页在内存中是锁定的,在分页进程的时候OS会忽略这些PTE。
 

On 64 bit platforms it is better to refer to such pages as locked pages - please don't confuse them with pages locked through VirtualLock API. As described above locked pages have two major properties - they are not considered for paging by OS and during allocation they acquire both working set and PFN database lock only ones. 

在64bit平台上, 这些页被锁定会更好——不要喝通过VirtualLock API混淆。页锁定有2个主要的属性——他们不会被OS分页并且在分配的时候只会请求工作集锁和PFN锁的一个。
 

The first property has implicit implication on high end hardware such as NUMA. It provides explicit memory residency. Remember that OS commits a page on demand. To allocate physical memory, it will use a node on which a thread touching memory is running.  Latter on, the page can be swapped out by OS. Next time it will be brought up into memory, OS will again allocate physical page from the node a thread touching memory is running on. In this case a node could be completely different  from original one. Such behavior makes hard for applications to rely on page's NUMA residency. Locked pages solve this problem by removing themselves from paging altogether.  Moreover Windows 2003 SP1 introduced a new API - QueryWorkingSetEx. It allows to query extended  information about PTE's PFN.  In order to find out real page residency you should use this API. When pages are locked you need to it only ones. Otherwise you will have to do it periodically since residency of the page can actually change.

 第一个属性隐约的涉及到了高端的硬件如NUMA。它提供了显示的内存位置。记住OS按需求提交了页。分配内存的时候会在线程触及到的内存节点中分配内存。
 

The second property - taking both working set and PFN's database lock only ones enables applications to perform faster and better scalable ramp up.

 第二个属性——只要获取工作集锁和PFN数据库锁一个,可以让应用程序运行的更快并且有更好的扩展性。
 

On NUMA SQL Server' Buffer Pool marks each allocated page with its node residency. It leverages QueryWorkingSetEx to accomplish it.  Once page is allocated it calls the API to find  out page residency.   It does it only once. Therefore enabling locked pages for SQL server on 64 bit platform  will improve SQL Server ramp up time and will improve performance & scalability over longer period of time. When running SQL Server with locked pages enabled you shouldn't be worried about overall system performance due to memory starvation - SQL Server participates in OS's paging mechanism by listening on OS's memory notification API's and shrinks its working set accordingly.

 在NUMA SQL Server buffer pool 中标记了每个分配也的节点位置。使用QueryWorkingSetEx来完成。一旦页分配,就会调用api查找出页所在节点,只运行一次。因此SQL Server 64位平台会提高SQL Server 加速并且会提高性能和可扩展性。当运行sql server锁定页你不需要担心因为内存不足引起的性能问题——sql server 会监听OS内存通知api参与分页并且压缩工作集。
 

Let us summarize now - on 64 bit platform, locked pages, awe mechanism, enable better application's scalability and performance both during ramp up time and over long period of time. However, keep in mind that an application is still required to implement a way of responding to memory pressure to avoid starving the whole system for memory.

总结一下:在64位的平台上,锁定页,AWE分配机制,可以让应用程序有更好的扩展性和性能。但是记住一个应用程序也要增强相应内存压力的能力来避免整个系统的内存不足。

你可能感兴趣的