ARMv8에서 Virtual address를 통해 cache flush 시키기2

1편에서 적었던 참고 링크 중에 android kernel source로 dcache flush all을 위한 어셈블리 코드가 있었고, 해당 코드를 먼저 해당 코드를 분석해보자고 한다

참조: https://android.googlesource.com/kernel/msm.git/+/android-msm-anthias-3.10-lollipop-wear-release/arch/arm64/mm/cache.S

1.1. flush_dcache_all

ENTRY(__flush_dcache_all)
	dsb	sy				// ensure ordering with previous memory accesses
	mrs	x0, clidr_el1			// read clidr
	and	x3, x0, #0x7000000		// extract loc from clidr
	lsr	x3, x3, #23			// left align loc bit field
	cbz	x3, finished			// if loc is 0, then no need to clean
	mov	x10, #0				// start clean at cache level 0

flush_dcache_all라는 함수를 살펴보자.

명령어: dsb는 data syncronization barrier의 약자이다. 간단히 저 명령어 앞으로 multi fetch나 pipelining등으로 인해서 프로세서 내부에 구동되고 있는 메모리 관련 operation들이 다 처리될 때까지 기다리는 명령어이다.msr는 PSR(program status register)의 값을 상용 레지스터로 불러오는 명령어, lsr(logical shift right), cbz(compare and branch on zero), b(branch)가 있다.

용어:

clidr: cache level ID register이다(aarch64에서는 clidr_el1으로 이름이 바뀌었지만 같은 기능을 하는 register이다)

CLIDR은 현재 프로세서가 가지고있는 cache의 정보를 표시해주는 레지스터로써, 현재 몇 레벨의 캐시 시스템을 가지고 있고(ICB), unification나 consistancy를 위한 cache레벨을 표시하고(LoUU, LoC, LoUIS), 각 캐시 레벨의 구성이 어떻게 되는지, seperate, unified(Ctype3,2,1)을 표시한다.[3]내용 참조

[정리]

정리해보자면 위의 코드는

1. 메모리 배리어 생성

2. CLIDR에서 LoC 값 추출(a53에서는 2이다)

3. loop1(flush 코드)로 이동이다.

1.2. loop1

loop1:
	add	x2, x10, x10, lsr #1		// work out 3x current cache level
	lsr	x1, x0, x2			// extract cache type bits from clidr
	and	x1, x1, #7			// mask of the bits for current cache only
	cmp	x1, #2				// see what cache we have at this level
	b.lt	skip				// skip if no cache, or just i-cache
	save_and_disable_irqs x9		// make CSSELR and CCSIDR access atomic
	msr	csselr_el1, x10			// select current cache level in csselr
	isb					// isb to sych the new cssr&csidr
	mrs	x1, ccsidr_el1			// read the new ccsidr
	restore_irqs x9
	and	x2, x1, #7			// extract the length of the cache lines
	add	x2, x2, #4			// add 4 (line length offset)
	mov	x4, #0x3ff
	and	x4, x4, x1, lsr #3		// find maximum number on the way size
	clz	w5, w4				// find bit position of way size increment
	mov	x7, #0x7fff
	and	x7, x7, x1, lsr #13		// extract max number of the index size

loop1:
	add	x2, x10, x10, lsr #1		// x2 = x10 + (x10 >> 1) "x10: 2n" => x2=3n
	lsr	x1, x0, x2			// x1 = x0 >> x2
	and	x1, x1, #7			// x1 = x1 & 3'b111
	cmp	x1, #2				// if x1 < 2
	b.lt	skip				// 0:no cache, 1: icache only branch skip
	save_and_disable_irqs x9		// 현재의 프로세서에 irq를 비활성화(for atomic)
	msr	csselr_el1, x10			// csselr_el1로 원하는 cache id 입력[4]:[3:1] cache level
	isb					// isb to sych the new cssr&csidr
	mrs	x1, ccsidr_el1			// ccsidr_el1 - load-cache set, way등 정보[5]
	restore_irqs x9				// 다시 irq 활성화
	and	x2, x1, #7			//  line size 추출
	add	x2, x2, #4			// add 4 (line length offset:bytes of word)
	mov	x4, #0x3ff
	and	x4, x4, x1, lsr #3		// x4 = 10bit_mask&(x1>>3): associativity 추출
	clz	w5, w4				// find bit position of way size increment
	mov	x7, #0x7fff			//??
	and	x7, x7, x1, lsr #13		// set 개수 추출 x7=(Num of set -1)

CCSIDR: [27:13] Num of set -1, [12:3] associativity -1, [2:0] log2(num of word in line)

명령어: add, lsr, amd, cmp(compare), clz(count leading zero: 앞에 오는 0개수 출력)

line1~4는

1.3. loop2, loop3

loop2:
	mov	x9, x4				// create working copy of max way size
loop3:
	lsl	x6, x9, x5
	orr	x11, x10, x6			// factor way and cache number into x11
	lsl	x6, x7, x2
	orr	x11, x11, x6			// factor index number into x11
	dc	cisw, x11			// clean & invalidate by set/way
	subs	x9, x9, #1			// decrement the way
	b.ge	loop3
	subs	x7, x7, #1			// decrement the index
	b.ge	loop2

loop2:
	mov	x9, x4				// num of way를 복사
loop3:
	lsl	x6, x9, x5			// way size를 MSB로 이동?
	orr	x11, x10, x6			// x10=[3:1]:cache level, x6=[31:30]:way mask
	lsl	x6, x7, x2			// set offset 만큼 이동, x6=[20:6]: set mask
	orr	x11, x11, x6			// x11=[31:30]max way, [20:6]max set, [3:1]cache level
	dc	cisw, x11			// clean & invalidate by set/way
	subs	x9, x9, #1			// decrement the way
	b.ge	loop3				// x9가 0보다 크거나 같아야함
	subs	x7, x7, #1			// decrement the index
	b.ge	loop2				//x7이 0보다 크거나 같아야함

명령어: lsl(logical shift left), orr(or register), dc(data cache controll 명령어), b.ge(branch big or equal)

DC CISW(cache line clean and invalidate by set/way)에 대해서

[31:32-A]: way, [B-1:L]: set, [3:1]: cache level-1

A : Log2(associativity), L : Log2(Line length), S : Log2(N of sets), B : (L+S)

예, A53-L2cache: 4-way, 64byte line size, 1MB cache size => N of set: 4096

A: 2, L: 6, S:12, B:18

[31:30]way, [17:6]: set [3:1]: 2

[정리]

정리해보자면,

1. ccsidr을 통해서 받은 최대 set, way개수를 받는다.

2. 최대 set, way index서 부터 하나씩 빼가면서 cache flush를 시킨다.

1.4.

skip, finished

skip:
	add	x10, x10, #2			// increment cache number
	cmp	x3, x10
	b.gt	loop1
finished:
	mov	x10, #0				// swith back to cache level 0
	msr	csselr_el1, x10			// select current cache level in csselr
	dsb	sy
	isb
	ret
ENDPROC(__flush_dcache_all)

skip:
	add	x10, x10, #2			// cache index 증가 [3:1]
	cmp	x3, x10				// LoC와 비교
	b.gt	loop1
finished:
	mov	x10, #0				// swith back to cache level 0
	msr	csselr_el1, x10			// select current cache level in csselr
	dsb	sy
	isb
	ret
ENDPROC(__flush_dcache_all)

[정리]

skip:

1. cache level 증가

2. LoC(level of coherency)보다 크면 finish, 작으면 loop1로 가서 flush 시작

finished:

1. x10와 csselr_el1 값들 초기화

2. data 처리 , instruction 처리

3. 종료

참조:

[1]http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500j/DDI0500J_cortex_a53_trm.pdf

[2]http://jake.dothome.co.kr/cache2/

[3]http://jake.dothome.co.kr/registers64/

[4]http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500j/BABGHJEA.html (csselr_el1)

[5]http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0500j/BABBAFJH.html (cssidr_el1)

'Black board > Scratch pad' 카테고리의 다른 글

mtrr 파일 수정해서 uncachable 영역 조정하기 (0)	2019.07.11
AXI signal 설명 (0)	2019.03.20

My Boards

ARMv8에서 Virtual address를 통해 cache flush 시키기2

'Black board > Scratch pad' 카테고리의 다른 글

댓글

티스토리툴바

ARMv8에서 Virtual address를 통해 cache flush 시키기2

'Black board > Scratch pad' 카테고리의 다른 글

관련글

댓글

티스토리툴바