(特別提示:本人所有文章都是憑記憶書寫,不保證準確、正確,請讀者注意鑒別!)
初識KVM API 這(zhe)篇(pian)文(wen)章介紹了(le)(le)KVM API,那有(you)了(le)(le)這(zhe)套API,能干點(dian)什(shen)么?這(zhe)有(you)點(dian)歷史故事。KVM API印象里2009年(nian)或更早就出(chu)現了(le)(le),但(dan)大家都局限于用(yong)它(ta)來做虛擬機。Josh Triplett在2015年(nian)9月有(you)篇(pian)文(wen)章,介紹了(le)(le)KVM API之后專門講了(le)(le)它(ta)的應用(yong)。我認為非常(chang)經典,引(yin)用(yong)如下(//lwn.net/Articles/658511/):
Applications of the KVM API
Other than learning, debugging a virtual machine implementation, or as a party trick, why use /dev/kvm directly?
Virtual machines like qemu-kvm or kvmtool typically emulate the standard hardware of the target architecture; for instance, a standard x86 PC. While they can support other devices and virtio hardware, if you want to emulate a completely different type of system that shares little more than the instruction set architecture, you might want to implement a new VM instead. And even within an existing virtual machine implementation, authors of a new class of virtio hardware device will want a clear understanding of the KVM API.
Efforts like novm and kvmtool use the KVM API to construct a lightweight VM, dedicated to running Linux rather than an arbitrary OS. More recently, the Clear Containers project uses kvmtool to run containers using hardware virtualization.
Alternatively, a VM need not run an OS at all. A KVM-based VM could instead implement a hardware-assisted sandbox with no virtual hardware devices and no OS, providing arbitrary virtual "hardware" devices as the API between the sandbox and the sandboxing VM.
While running a full virtual machine remains the primary use case for hardware virtualization, we've seen many innovative uses of the KVM API recently, and we can certainly expect more in the future.
重(zhong)點在倒數第(di)二段(duan):你要(yao)OS干什么(me)?
在初始KVM API里,我們(men)看到啟動一(yi)個(ge)虛(xu)擬機,運行一(yi)段代碼(ma),非常直觀、簡單(dan)。有很多場景,如函數計(ji)算。用戶(hu)的代碼(ma)編譯成KVM能跑的代碼(ma),拉起(qi)一(yi)個(ge)KVM就跑。這解決(jue)了函數計(ji)算的多個(ge)痛點:1)冷啟動問題;2)成本問題。現在問題來(lai)了,KVM能拉起(qi)來(lai)就跑的代碼(ma)該怎么寫、編譯呢?
Bare metal programming!
沒(mei)錯,我們寫(xie)代碼直接跑在KVM上,就是寫(xie)操(cao)(cao)作(zuo)系統(tong)。說到怎么實(shi)現一(yi)個(ge)操(cao)(cao)作(zuo)系統(tong),就不得不提(ti)一(yi)個(ge)著名(ming)的博客了://os.phil-opp.com/ 。這(zhe)一(yi)系列(lie)文章講明白了怎么從零實(shi)現操(cao)(cao)作(zuo)系統(tong)。認真學習完,手就不慌了。
三個(ge)重要(yao)的概念(nian)(nian):1)freestanding;2)booting;3)paging。我們直接針(zhen)對KVM編程,目(mu)的不是實現一(yi)個(ge)完整(zheng)的操作系統,而(er)只是跑一(yi)段(duan)代(dai)碼的話。上述三個(ge)概念(nian)(nian)都可(ke)以更簡單地(di)實現。
一(yi):freestanding,可以使用rust,c,c++實現,最好選(xuan)用rust。因(yin)為rust把(ba)core庫從std庫分離,因(yin)此rust freestanding編程有大量的高階類,如String, VecQueue,甚至(zhi)async/await。rust-vmm crate又把(ba)KVM API包(bao)裝得很(hen)好,走過路(lu)過不要錯過。
二:Phil的(de)博(bo)客的(de)第一版詳細介紹(shao)了booting的(de)過(guo)程(cheng)(cheng),第二版估(gu)計是覺得booting太繁瑣又很(hen)格式化(hua),所以(yi)booting封裝(zhuang)成(cheng)了工具。直接針對(dui)KVM編程(cheng)(cheng)的(de)話,booting該怎(zen)么(me)做呢?其(qi)實非常(chang)簡單了。正如我(wo)前(qian)面(mian)那篇博(bo)客的(de)代(dai)碼,可(ke)以(yi)直接通過(guo)KVM API完成(cheng)booting的(de)任務(wu),即(ji)設(she)置(zhi)寄存(cun)器,把內(nei)核拷貝到(dao)內(nei)存(cun),啟(qi)動(dong)CPU。虛擬機或物理機的(de)啟(qi)動(dong)通常(chang)還有(you)個(ge)CPU執行模式升級(ji)的(de)過(guo)程(cheng)(cheng),一般是從real到(dao)protected,再到(dao)long。使用KVM API,可(ke)以(yi)一步到(dao)位。
三:paging太復(fu)雜了。操作系統(tong)的(de)復(fu)雜性,除(chu)了進(jin)程(cheng)管(guan)理就是paging,而且進(jin)程(cheng)管(guan)理里最復(fu)雜的(de)部分也(ye)是內存(cun)方(fang)面的(de)代碼。認真學習完phil的(de)博客就會發現,直接(jie)針對KVM編程(cheng),就選(xuan)擇最簡單(dan)的(de)identity mapping好了。更妙的(de)是,這個(ge)mapping可以在啟(qi)動(dong)之前(qian)完成。
//rstforums.com/forum/topic/109893-note-learning-kvm-implement-your-own-linux-kernel/ 這(zhe)篇文章詳細介紹了,如何通過KVM API直接創建KVM虛機,進入Long mode,設(she)置好頁表。
以上資料(liao)都學明(ming)白了,就可以開(kai)始(shi)用KVM干(gan)活了。