diff --git a/patches/README.txt b/patches/README.txt new file mode 100644 index 0000000..f02d9e6 --- /dev/null +++ b/patches/README.txt @@ -0,0 +1,549 @@ +EMACS NS VOICEOVER ACCESSIBILITY PATCH +======================================== +patch: 0001-ns-implement-AXBoundsForRange-for-macOS-Zoom-cursor-.patch +author: Martin Sukany +files: src/nsterm.h (+108 lines) + src/nsterm.m (+2560 ins, -140 del, +2420 net) + + +OVERVIEW +-------- + + This patch adds comprehensive macOS VoiceOver accessibility support + to the Emacs NS (Cocoa) port. Before this patch, Emacs exposed only + a minimal, largely broken accessibility interface to macOS assistive + technology (AT) clients: EmacsView identified itself as a generic + NSAccessibilityGroup with no text content, no cursor tracking, and + no notifications. VoiceOver users could activate the application + but received no meaningful speech feedback when editing text. + + The patch introduces a layered virtual element tree above EmacsView. + Each visible Emacs window is represented by an EmacsAccessibilityBuffer + element (AXTextArea / AXTextField for minibuffer) with a full text + cache, a visible-run mapping table that bridges buffer character + positions to UTF-16 accessibility string indices, and an interactive + span child array for Tab navigation. A companion + EmacsAccessibilityModeLine element (AXStaticText) represents the mode + line of each window. These virtual elements are wired into the macOS + Accessibility API through EmacsView acting as the AXGroup root. + + Two additional integration points are provided: (1) macOS Zoom is + informed of the cursor position after every physical cursor redraw via + UAZoomChangeFocus(), using the correct CoreGraphics (top-left-origin) + coordinate space; (2) EmacsView implements accessibilityBoundsForRange: + and its legacy parameterized-attribute equivalent so that both Zoom + and third-party AT tools can locate the insertion point. The patch + also covers completion announcements for the *Completions* buffer and + Tab-navigable interactive spans for buttons, links, checkboxes, + Org-mode links, completion candidates, and keymap overlays. + (EmacsAXSpanTypeCheckBox is reserved for future use but not + currently scanned.) + + +ARCHITECTURE +------------ + + Class hierarchy (Cocoa only): + + NSAccessibilityElement + | + +-- EmacsAccessibilityElement (base: owns emacsView + lispWindow) + | + +-- EmacsAccessibilityBuffer (AXTextArea; one per leaf window) + | [category InteractiveSpans] (Tab nav children) + | + +-- EmacsAccessibilityModeLine (AXStaticText; one per non-mini) + | + +-- EmacsAccessibilityInteractiveSpan (AXButton/Link/etc.) + + EmacsView (NSView subclass, existing) + | + +-- owns NSMutableArray *accessibilityElements + contains EmacsAccessibilityBuffer + EmacsAccessibilityModeLine + instances for every visible leaf window and minibuffer. + EmacsAccessibilityInteractiveSpan instances are children of + their parent EmacsAccessibilityBuffer, NOT of this array. + + EmacsAccessibilityElement (base class) + - Stores a weak (unsafe_unretained) pointer to EmacsView and a + Lisp_Object lispWindow (GC-safe window reference). + - Provides -validWindow which verifies WINDOW_LIVE_P before + returning the raw struct window *. All subclasses use this to + avoid dangling pointers after delete-window or kill-buffer. + - Provides -screenRectFromEmacsX:y:width:height: which converts + EmacsView pixel coordinates (flipped AppKit space) to screen + coordinates via the NSWindow coordinate chain. + + EmacsAccessibilityBuffer + - Implements the full NSAccessibility text protocol: value, selected + text range, line/index/range conversions, frame-for-range, + range-for-position, and insertion-point-line-number. + - Maintains a text cache (cachedText / visibleRuns) keyed on + BUF_MODIFF. The cache is the single source of truth for all + index-to-charpos and charpos-to-index mappings. + - Detects buffer edits (modiff change), cursor movement (point + change), and mark changes, and posts the appropriate + NSAccessibility notifications after each redisplay cycle. + - Stores cached values for the previous cycle (cachedModiff, + cachedPoint, cachedMarkActive) to enable change detection. + + EmacsAccessibilityModeLine + - Reads mode line text directly from the window's current glyph + matrix (CHAR_GLYPH rows with mode_line_p set). + - Stateless: no cache; text is read fresh on every AX query. + + EmacsAccessibilityInteractiveSpan + - Lightweight child element representing one contiguous interactive + region (button, link, completion item, etc.). + - Reports isAccessibilityFocused by comparing cachedPoint of the + parent EmacsAccessibilityBuffer against its charpos range. + - On setAccessibilityFocused: dispatches to the main queue via + GCD to move Emacs point, using block_input around SET_PT_BOTH. + + EmacsView (extensions) + - accessibilityElements array: rebuilt by -rebuildAccessibilityTree + when the window tree changes (split, delete, new buffer). + - -postAccessibilityUpdates: called from ns_update_end() after + every redisplay cycle; drives the notification dispatch loop. + - lastAccessibilityCursorRect: updated by ns_draw_phys_cursor + (C function) for Zoom integration. + - Implements accessibilityBoundsForRange: / + accessibilityFrameForRange: and the legacy + accessibilityAttributeValue:forParameter: API. + + +THREADING MODEL +--------------- + + Emacs runs all Lisp evaluation and buffer mutation on the main thread + (the Cocoa/AppKit main thread). The macOS Accessibility server + (axserver / AT daemon) calls AX getters from a private background + thread. + + Rules enforced by this patch: + + Main thread only: + - ns_update_end -> postAccessibilityUpdates + - rebuildAccessibilityTree / invalidateAccessibilityTree + - ensureTextCache / ns_ax_buffer_text (Lisp calls: + Fget_char_property, Fnext_single_char_property_change, + Fbuffer_substring_no_properties) + - postAccessibilityNotificationsForFrame: (full notify logic) + - setAccessibilitySelectedTextRange: (SET_PT_BOTH, marker moves) + - setAccessibilityFocused: on EmacsAccessibilityInteractiveSpan + (dispatches to main queue via dispatch_async) + - ns_draw_phys_cursor partial update (lastAccessibilityCursorRect, + UAZoomChangeFocus) + + Safe from any thread (no Lisp calls, no mutable Emacs state): + - accessibilityIndexForCharpos: reads visibleRuns + cachedText + - charposForAccessibilityIndex: same + - isAccessibilityFocused on EmacsAccessibilityInteractiveSpan + (reads cachedPoint, a plain ptrdiff_t) + + Dispatch-gated (marshalled to main thread when called off-thread): + - accessibilityValue (EmacsAccessibilityBuffer) + - accessibilitySelectedTextRange + - accessibilityInsertionPointLineNumber + - accessibilityFrameForRange: + - accessibilityRangeForPosition: + - accessibilityChildrenInNavigationOrder + + The marshalling pattern used throughout: + + if (![NSThread isMainThread]) { + __block T result; + dispatch_sync(dispatch_get_main_queue(), ^{ result = ...; }); + return result; + } + + Cached data written on main thread and read from any thread: + - cachedText (NSString *): written by ensureTextCache on main. + - visibleRuns (ns_ax_visible_run *): written by ensureTextCache. + - cachedPoint (ptrdiff_t): plain scalar; atomic on 64-bit ARM/x86. + No explicit lock is used; the design relies on the fact that index + mapping methods make no Lisp calls and read only the above scalars + and the immutable NSString object. + + +NOTIFICATION STRATEGY +--------------------- + + Notifications are posted from -postAccessibilityNotificationsForFrame: + which runs on the main thread after every redisplay cycle. The + method detects three mutually exclusive events: + + 1. TEXT CHANGED (modiff != cachedModiff) + Posts NSAccessibilityValueChangedNotification with AXTextEditType + = Typing and, when exactly one character was inserted, provides + AXTextChangeValue for echo feedback. cachedPoint is updated here + to suppress a spurious selection-move event in the same cycle + (WebKit/Chromium convention: edit and selection-move are mutually + exclusive per runloop iteration). + + 2. CURSOR MOVED OR MARK CHANGED (point != cachedPoint OR mark change) + Granularity is computed by comparing oldIdx and newIdx in + cachedText: + - different line range -> LINE granularity + - same line, distance > 1 UTF-16 unit -> WORD granularity + - same line, distance == 1 UTF-16 unit -> CHARACTER granularity + C-n / C-p / Tab / backtab force LINE granularity + (detected by ns_ax_event_is_line_nav_key which inspects + last_command_event) regardless. + + For FOCUSED elements the hybrid strategy applies: + + CHARACTER moves: + SelectedTextChanged is posted WITHOUT AXTextSelectionGranularity + in userInfo. Omitting the key prevents VoiceOver from deriving + its own speech (it would read the character BEFORE point, + which is wrong for evil block-cursor mode where the cursor + sits ON the character). Then AnnouncementRequested is posted + separately with the character AT point as the announcement. + Newline is skipped (VoiceOver handles end-of-line internally). + + WORD and LINE moves: + SelectedTextChanged is posted WITH AXTextSelectionGranularity. + VoiceOver reads the word/line correctly from the element text + using the granularity hint. For LINE moves an additional + AnnouncementRequested is also posted with the line text (or + the completion--string at point if in a completion buffer) to + handle C-n/C-p -- VoiceOver processes these keystrokes + differently from arrow keys internally. + + SELECTION changes (mark becomes active or extends): + SelectedTextChanged with LINE or WORD granularity. VoiceOver + reads the newly selected or deselected text. + + For NON-FOCUSED elements (e.g. *Completions* while minibuffer has + focus): AnnouncementRequested only. See COMPLETION ANNOUNCEMENTS. + + 3. NO CHANGE + Nothing is posted. Completion cache is cleared for focused buffer. + + +TEXT CACHE AND VISIBLE RUNS +---------------------------- + + ns_ax_buffer_text(w, out_start, out_runs, out_nruns) builds the + accessibility string for window W. It operates on the current + buffer with set_buffer_internal_1, scanning from BUF_BEGV to BUF_ZV. + + Invisible text detection uses TEXT_PROP_MEANS_INVISIBLE(invis) where + invis = Fget_char_property(pos, Qinvisible, Qnil). This respects + buffer-invisibility-spec, correctly handling org-mode folding, + outline mode, and hideshow -- not just `invisible t' text properties. + When an invisible region is found, the scanner jumps ahead using + Fnext_single_char_property_change to skip the entire region in O(1) + iterations rather than character by character. + + Text extraction uses Fbuffer_substring_no_properties (not raw + BUF_BYTE_ADDRESS) to handle the buffer gap correctly. Raw byte + access across the gap position yields garbage bytes. + + The ns_ax_visible_run structure: + + typedef struct ns_ax_visible_run { + ptrdiff_t charpos; /* Buffer charpos of run start. */ + ptrdiff_t length; /* Emacs characters in this run. */ + NSUInteger ax_start; /* UTF-16 index in accessibility string. */ + NSUInteger ax_length; /* UTF-16 units for this run. */ + } ns_ax_visible_run; + + Multiple runs are produced when invisible text splits the buffer into + non-contiguous visible segments. The mapping array is stored in the + EmacsAccessibilityBuffer ivar `visibleRuns' (C array, xmalloc'd). + + Index mapping (charpos <-> ax_index) does a linear scan of the run + array. Within a run, UTF-16 unit counting uses + rangeOfComposedCharacterSequenceAtIndex: to handle surrogate pairs + (emoji, rare CJK) correctly -- one Emacs character may occupy 2 + UTF-16 units. + + Cache invalidation is triggered whenever BUF_MODIFF changes + (ensureTextCache compares cachedTextModiff). The cache is also + invalidated when the window tree is rebuilt. NS_AX_TEXT_CAP = 100,000 + UTF-16 units (~200 KB) caps total exposure; buffers larger than + ~50,000 lines are truncated for accessibility purposes. VoiceOver + performance degrades noticeably beyond this threshold. + + +COMPLETION ANNOUNCEMENTS +------------------------ + + When point moves in a non-focused buffer (the common case: + *Completions* window while the minibuffer retains keyboard focus), + VoiceOver does not automatically read the change because it is + tracking the focused element. The patch posts AnnouncementRequested + with a 4-step fallback chain to find the best text to announce: + + Step 1 -- completion--string property at point. + The `completion--string' text property (set by minibuffer.el + since Emacs 29) carries the canonical completion candidate string. + It can be a plain Lisp string or a list (CANDIDATE ANNOTATION) where both + are strings. + ns_ax_completion_string_from_prop handles both: plain string -> + use directly; cons -> use car (the candidate without annotation). + This is the preferred source: precisely the candidate text with + no surrounding whitespace. + + Step 2 -- mouse-face span at point. + completion-list-mode marks the active candidate with mouse-face. + The code walks backward and forward from point to find the span + boundaries, then reads the corresponding slice of cachedText. + Used when completion--string is absent (older Emacs or non- + standard completion modes). + + Step 3 -- completions-highlight overlay at point. + Emacs 29+ highlights the selected completion with the + `completions-highlight' face applied via an overlay. The overlay + text is extracted via ns_ax_completion_text_for_span which itself + tries completion--string first, then the `completion' property, + then falls back to the ax string slice. + + Step 4 -- nearest completions-highlight overlay. + ns_ax_find_completion_overlay_range scans the buffer for the + closest completions-highlight overlay to point. Uses fast probes + at {point, point+1, point-1} before falling back to a full O(n) + scan. + + Final fallback -- current line text. + Read the line containing point from cachedText. + + Deduplication: the announcement is posted only when announceText, + overlay bounds, or point have changed since the last cycle + (cachedCompletionAnnouncement, cachedCompletionOverlayStart/End, + cachedCompletionPoint). + + +INTERACTIVE SPANS +----------------- + + ns_ax_scan_interactive_spans(w, parent_buf) scans the visible range + of window W looking for text properties that indicate interactive + content. Properties are checked in priority order: + + widget -> EmacsAXSpanTypeWidget (AXButton, via default) + button -> EmacsAXSpanTypeButton (AXButton, via default) + follow-link -> EmacsAXSpanTypeLink (AXLink) + org-link -> EmacsAXSpanTypeLink (AXLink) + mouse-face -> EmacsAXSpanTypeCompletionItem + (AXButton; completion-list-mode only) + keymap overlay-> EmacsAXSpanTypeButton (AXButton) + + For completion buffers (major-mode == completion-list-mode), the span + boundary for mouse-face regions uses completion--string as the property + key when present, rather than mouse-face itself. This prevents two + column-adjacent completion candidates from being merged into one span + when their mouse-face regions share padding whitespace. + + All property symbols (Qwidget, Qbutton, Qfollow_link, Qorg_link, + Qcompletion__string, Qcompletion, Qcompletions_highlight, Qbacktab, + Qcompletion_list_mode) are registered with DEFSYM in syms_of_nsterm + and referenced directly -- no repeated intern() calls. + + Each span is allocated, configured, added to the spans array, then + released (the array retains it). Label priority: completion--string + > buffer substring > help-echo. + + Tab navigation: -accessibilityChildrenInNavigationOrder returns the + cached span array, rebuilt lazily when interactiveSpansDirty is set. + Calls from off-thread are marshalled with dispatch_sync. + + Focus movement: -setAccessibilityFocused: on a span dispatches + Fselect_window + SET_PT_BOTH to the main queue via dispatch_async, + wrapped in block_input/unblock_input. + + +ZOOM INTEGRATION +---------------- + + macOS Zoom (accessibility zoom) tracks a "focus element" to keep the + zoomed viewport centered on the relevant screen area. Two mechanisms + are provided: + + 1. ns_draw_phys_cursor (C function, main thread, called during + redisplay). After clipping the cursor rect to the text area, + stores the rect in view->lastAccessibilityCursorRect. If + UAZoomEnabled(), converts the rect to screen coordinates and calls + UAZoomChangeFocus(kUAZoomFocusTypeInsertionPoint). + + Coordinate conversion chain: + EmacsView pixels (AppKit, flipped, origin at top-left of view) + -[convertRect:toView:nil]-> NSWindow coordinates + -[convertRectToScreen:]-> NSScreen coordinates + NSRectToCGRect -> CGRect (same values, no transform) + CG y-flip: cgRect.origin.y = primaryH - y - height + The flip is required because CoreGraphics uses top-left origin + (primary screen) while AppKit screen rects use bottom-left. + primaryH = [[NSScreen screens] firstObject].frame.size.height. + + 2. EmacsView -accessibilityBoundsForRange: / + -accessibilityFrameForRange: + AT tools (including Zoom) call these with the selectedTextRange + to locate the insertion point. The implementation returns the + screen rect stored in lastAccessibilityCursorRect, with a minimum + size of 1x8 pixels. The legacy parameterized-attribute API + (NSAccessibilityBoundsForRangeParameterizedAttribute) is supported + via -accessibilityAttributeValue:forParameter: for older AT + clients. + + +KEY DESIGN DECISIONS +-------------------- + + 1. DEFSYM instead of intern for property symbols. + DEFSYM registers symbols at startup (syms_of_nsterm) and stores + them in C globals (e.g. Qcompletion__string). Using intern() at + every AX scan would perform an obarray lookup on each redisplay + cycle. DEFSYM symbols are also always reachable by the GC via + staticpro, eliminating any risk of premature collection. + + 2. AnnouncementRequested for character moves, not SelectedTextChanged. + VoiceOver derives the speech character from SelectedTextChanged by + looking at the character BEFORE the new cursor position (the char + "passed over"). In evil-mode with a block cursor, the cursor sits + ON the character, not between characters. AnnouncementRequested + with the character AT point produces correct speech in both insert + and normal (block-cursor) modes. SelectedTextChanged is still + posted without granularity to interrupt ongoing VoiceOver reading + and update braille display tracking. + + 3. completion--string, not mouse-face, as span boundary. + mouse-face regions in completion-list-mode sometimes include + leading or trailing whitespace shared between column-adjacent + candidates, which could merge two candidates into one span. + completion--string changes precisely at candidate boundaries. + + 4. Probe order {point, point+1, point-1} for overlay search. + After Tab advances to a new completion candidate, point is at the + START of the new entry. The previous entry's overlay covers the + position before the new start, so point-1 is inside the OLD + overlay. Trying point+1 before point-1 finds the new (correct) + entry first. + + 5. Notifications posted BEFORE rebuilding the tree. + postAccessibilityUpdates uses existing elements which carry cached + state from the previous cycle. Rebuilding first would create + fresh elements with current values, making change detection + impossible. Tree rebuild is deferred to cycles where + accessibilityTreeValid is false; no notifications are posted in + that cycle. + + 6. Re-entrance guard (accessibilityUpdating flag). + VoiceOver callbacks triggered by notification posting can cause + Cocoa to re-enter the run loop, which may trigger redisplay, which + calls ns_update_end -> postAccessibilityUpdates. The BOOL flag + breaks this recursion. + + 7. lispWindow (Lisp_Object) instead of raw struct window *. + struct window pointers can become dangling after delete-window. + Storing the Lisp_Object and using WINDOW_LIVE_P + XWINDOW at the + call site is the standard safe pattern in Emacs C code. + + 8. accessibilityVisibleCharacterRange returns full buffer range. + VoiceOver treats the visible range boundary as end-of-text. If + this returned only the on-screen portion, VoiceOver would announce + "end of text" prematurely when the cursor reaches the visible + bottom, even though more buffer content exists below. + + +KNOWN LIMITATIONS +----------------- + + - BUF_OVERLAY_MODIFF is not tracked. Overlay changes (e.g. moving + the completions-highlight overlay via Tab without changing buffer + text) do not bump BUF_MODIFF, so the text cache is not invalidated. + The notification logic detects point changes (cachedPoint) which + covers the common case, but overlay-only changes with a stationary + point would be missed. A future fix would compare overlay_modiff. + + - Interactive span scan is O(n) in the visible buffer range. Every + character position is visited to find property boundaries. For + large visible buffers this scan runs on every redisplay cycle + whenever interactiveSpansDirty is set. An optimization would use + next_single_property_change to skip non-interactive regions in bulk. + + - Mode line text is extracted from CHAR_GLYPH rows only. Image + glyphs, stretch glyphs, and composed glyphs are silently skipped. + Mode lines with icon fonts (e.g. doom-modeline with nerd-font) + produce incomplete or garbled accessibility text. + + - Buffers larger than NS_AX_TEXT_CAP (100,000 UTF-16 units) are + truncated. The truncation is silent; AT tools navigating past the + truncation boundary may behave unexpectedly. + + - No multi-frame coordination. EmacsView.accessibilityElements is + per-view; there is no cross-frame notification ordering. + + - GNUstep is explicitly excluded (#ifdef NS_IMPL_COCOA). GNUstep + has a different accessibility model and requires separate work. + + - UAZoomChangeFocus always uses kUAZoomFocusTypeInsertionPoint + regardless of cursor style (box, bar, hbar). This is cosmetically + imprecise but functionally correct. + + +TESTING CHECKLIST +----------------- + + Prerequisites: + - macOS with VoiceOver (Cmd-F5 to toggle). + - Emacs built from source with this patch applied. + - Evil-mode recommended for block-cursor tests. + + Basic text reading: + 1. Open Emacs. Press Cmd-F5 to start VoiceOver. + 2. Switch to Emacs (Cmd-Tab). VoiceOver should announce + "Emacs, editor" and read the current line. + 3. Move cursor with arrow keys. VoiceOver should read each + character (left/right) or line (up/down) as you move. + 4. Verify: right/left arrow reads the character AT the cursor + position, not the character left behind. (evil block-cursor) + + Word and line navigation: + 5. Press M-f / M-b (forward/backward word). VoiceOver should + announce the word landed on. + 6. Press C-n / C-p. VoiceOver should read the full new line. + 7. Hold Shift and press arrow keys to extend selection. VoiceOver + should announce the selected text. + + Completion navigation: + 8. Type M-x to open the minibuffer. + 9. Type a partial command name. Press Tab to open *Completions*. + 10. Press Tab / S-Tab to cycle through completions. VoiceOver + should announce each candidate name as you move. + 11. Verify no double-speech (each candidate read exactly once). + + Interactive span Tab navigation: + 12. Open a buffer with buttons (e.g. M-x describe-key). + 13. Use VoiceOver Item Chooser (VO-I) or Tab with VoiceOver + interaction mode to navigate interactive elements. + 14. Verify each button/link is reachable and its label is read. + 15. In an org-mode file with links, verify links appear as + separate navigable AXLink elements. + + Mode line: + 16. Use the VoiceOver cursor to navigate to the mode line below a + buffer. VoiceOver should read the mode line text. + + Zoom integration: + 17. Enable macOS Zoom (System Settings -> Accessibility -> Zoom). + 18. Set Zoom to "Follow keyboard focus". + 19. Move cursor in Emacs. Zoom viewport should track the cursor. + 20. Verify Zoom follows the cursor across split windows. + + Window operations: + 21. Split window with C-x 2. VoiceOver should announce a layout + change. Switch with C-x o; VoiceOver should read the new + window content. + 22. Delete a window with C-x 0. No crash should occur. + 23. Switch buffers with C-x b. VoiceOver should read new buffer. + + Stress test: + 24. Open a large file (>5000 lines). Navigate with C-v / M-v. + Verify no significant lag in VoiceOver speech response. + 25. Open an org-mode file with many folded sections. Verify that + folded (invisible) text is not announced during navigation. + +-- end of README --