EMACS NS VOICEOVER ACCESSIBILITY PATCH ======================================== patch: 0001-ns-implement-AXBoundsForRange-for-macOS-Zoom-cursor-.patch author: Martin Sukany files: src/nsterm.h (+108 lines) src/nsterm.m (+2560 ins, -140 del, +2420 net) OVERVIEW -------- This patch adds comprehensive macOS VoiceOver accessibility support to the Emacs NS (Cocoa) port. Before this patch, Emacs exposed only a minimal, largely broken accessibility interface to macOS assistive technology (AT) clients: EmacsView identified itself as a generic NSAccessibilityGroup with no text content, no cursor tracking, and no notifications. VoiceOver users could activate the application but received no meaningful speech feedback when editing text. The patch introduces a layered virtual element tree above EmacsView. Each visible Emacs window is represented by an EmacsAccessibilityBuffer element (AXTextArea / AXTextField for minibuffer) with a full text cache, a visible-run mapping table that bridges buffer character positions to UTF-16 accessibility string indices, and an interactive span child array for Tab navigation. A companion EmacsAccessibilityModeLine element (AXStaticText) represents the mode line of each window. These virtual elements are wired into the macOS Accessibility API through EmacsView acting as the AXGroup root. Two additional integration points are provided: (1) macOS Zoom is informed of the cursor position after every physical cursor redraw via UAZoomChangeFocus(), using the correct CoreGraphics (top-left-origin) coordinate space; (2) EmacsView implements accessibilityBoundsForRange: and its legacy parameterized-attribute equivalent so that both Zoom and third-party AT tools can locate the insertion point. The patch also covers completion announcements for the *Completions* buffer and Tab-navigable interactive spans for buttons, links, checkboxes, Org-mode links, completion candidates, and keymap overlays. (EmacsAXSpanTypeCheckBox is reserved for future use but not currently scanned.) ARCHITECTURE ------------ Class hierarchy (Cocoa only): NSAccessibilityElement | +-- EmacsAccessibilityElement (base: owns emacsView + lispWindow) | +-- EmacsAccessibilityBuffer (AXTextArea; one per leaf window) | [category InteractiveSpans] (Tab nav children) | +-- EmacsAccessibilityModeLine (AXStaticText; one per non-mini) | +-- EmacsAccessibilityInteractiveSpan (AXButton/Link/etc.) EmacsView (NSView subclass, existing) | +-- owns NSMutableArray *accessibilityElements contains EmacsAccessibilityBuffer + EmacsAccessibilityModeLine instances for every visible leaf window and minibuffer. EmacsAccessibilityInteractiveSpan instances are children of their parent EmacsAccessibilityBuffer, NOT of this array. EmacsAccessibilityElement (base class) - Stores a weak (unsafe_unretained) pointer to EmacsView and a Lisp_Object lispWindow (GC-safe window reference). - Provides -validWindow which verifies WINDOW_LIVE_P before returning the raw struct window *. All subclasses use this to avoid dangling pointers after delete-window or kill-buffer. - Provides -screenRectFromEmacsX:y:width:height: which converts EmacsView pixel coordinates (flipped AppKit space) to screen coordinates via the NSWindow coordinate chain. EmacsAccessibilityBuffer - Implements the full NSAccessibility text protocol: value, selected text range, line/index/range conversions, frame-for-range, range-for-position, and insertion-point-line-number. - Maintains a text cache (cachedText / visibleRuns) keyed on BUF_MODIFF. The cache is the single source of truth for all index-to-charpos and charpos-to-index mappings. - Detects buffer edits (modiff change), cursor movement (point change), and mark changes, and posts the appropriate NSAccessibility notifications after each redisplay cycle. - Stores cached values for the previous cycle (cachedModiff, cachedPoint, cachedMarkActive) to enable change detection. EmacsAccessibilityModeLine - Reads mode line text directly from the window's current glyph matrix (CHAR_GLYPH rows with mode_line_p set). - Stateless: no cache; text is read fresh on every AX query. EmacsAccessibilityInteractiveSpan - Lightweight child element representing one contiguous interactive region (button, link, completion item, etc.). - Reports isAccessibilityFocused by comparing cachedPoint of the parent EmacsAccessibilityBuffer against its charpos range. - On setAccessibilityFocused: dispatches to the main queue via GCD to move Emacs point, using block_input around SET_PT_BOTH. EmacsView (extensions) - accessibilityElements array: rebuilt by -rebuildAccessibilityTree when the window tree changes (split, delete, new buffer). - -postAccessibilityUpdates: called from ns_update_end() after every redisplay cycle; drives the notification dispatch loop. - lastAccessibilityCursorRect: updated by ns_draw_phys_cursor (C function) for Zoom integration. - Implements accessibilityBoundsForRange: / accessibilityFrameForRange: and the legacy accessibilityAttributeValue:forParameter: API. THREADING MODEL --------------- Emacs runs all Lisp evaluation and buffer mutation on the main thread (the Cocoa/AppKit main thread). The macOS Accessibility server (axserver / AT daemon) calls AX getters from a private background thread. Rules enforced by this patch: Main thread only: - ns_update_end -> postAccessibilityUpdates - rebuildAccessibilityTree / invalidateAccessibilityTree - ensureTextCache / ns_ax_buffer_text (Lisp calls: Fget_char_property, Fnext_single_char_property_change, Fbuffer_substring_no_properties) - postAccessibilityNotificationsForFrame: (full notify logic) - setAccessibilitySelectedTextRange: (SET_PT_BOTH, marker moves) - setAccessibilityFocused: on EmacsAccessibilityInteractiveSpan (dispatches to main queue via dispatch_async; uses specpdl unwind protection so block_input is always matched by unblock_input even if Fselect_window signals an error) - ns_draw_phys_cursor partial update (lastAccessibilityCursorRect, UAZoomChangeFocus) Safe from any thread (no Lisp calls, no mutable Emacs state): - accessibilityIndexForCharpos: reads visibleRuns + cachedText - charposForAccessibilityIndex: same - isAccessibilityFocused on EmacsAccessibilityInteractiveSpan (reads cachedPoint, a plain ptrdiff_t) Dispatch-gated (marshalled to main thread when called off-thread): - accessibilityValue (EmacsAccessibilityBuffer) - accessibilitySelectedTextRange - accessibilityInsertionPointLineNumber - accessibilityFrameForRange: - accessibilityRangeForPosition: - accessibilityChildrenInNavigationOrder The marshalling pattern used throughout: if (![NSThread isMainThread]) { __block T result; dispatch_sync(dispatch_get_main_queue(), ^{ result = ...; }); return result; } Cached data written on main thread and read from any thread: - cachedText (NSString *): written by ensureTextCache on main. - visibleRuns (ns_ax_visible_run *): written by ensureTextCache. - cachedPoint (ptrdiff_t): plain scalar; atomic on 64-bit ARM/x86. No explicit lock is used; the design relies on the fact that index mapping methods make no Lisp calls and read only the above scalars and the immutable NSString object. NOTIFICATION STRATEGY --------------------- Notifications are posted from -postAccessibilityNotificationsForFrame: which runs on the main thread after every redisplay cycle. The method detects three mutually exclusive events: 1. TEXT CHANGED (modiff != cachedModiff) Posts NSAccessibilityValueChangedNotification with AXTextEditType = Typing and, when exactly one character was inserted, provides AXTextChangeValue for echo feedback. cachedPoint is updated here to suppress a spurious selection-move event in the same cycle (WebKit/Chromium convention: edit and selection-move are mutually exclusive per runloop iteration). 2. CURSOR MOVED OR MARK CHANGED (point != cachedPoint OR mark change) Granularity is computed by comparing oldIdx and newIdx in cachedText: - different line range -> LINE granularity - same line, distance > 1 UTF-16 unit -> WORD granularity - same line, distance == 1 UTF-16 unit -> CHARACTER granularity C-n / C-p / Tab / backtab force LINE granularity (detected by ns_ax_event_is_line_nav_key which inspects last_command_event) regardless. For FOCUSED elements the hybrid strategy applies: CHARACTER moves: SelectedTextChanged is posted WITHOUT AXTextSelectionGranularity in userInfo. Omitting the key prevents VoiceOver from deriving its own speech (it would read the character BEFORE point, which is wrong for evil block-cursor mode where the cursor sits ON the character). Then AnnouncementRequested is posted separately with the character AT point as the announcement. Newline is skipped (VoiceOver handles end-of-line internally). WORD and LINE moves: SelectedTextChanged is posted WITH AXTextSelectionGranularity. VoiceOver reads the word/line correctly from the element text using the granularity hint. For LINE moves an additional AnnouncementRequested is also posted with the line text (or the completion--string at point if in a completion buffer) to handle C-n/C-p -- VoiceOver processes these keystrokes differently from arrow keys internally. SELECTION changes (mark becomes active or extends): SelectedTextChanged with LINE or WORD granularity. VoiceOver reads the newly selected or deselected text. For NON-FOCUSED elements (e.g. *Completions* while minibuffer has focus): AnnouncementRequested only. See COMPLETION ANNOUNCEMENTS. 3. NO CHANGE Nothing is posted. Completion cache is cleared for focused buffer. TEXT CACHE AND VISIBLE RUNS ---------------------------- ns_ax_buffer_text(w, out_start, out_runs, out_nruns) builds the accessibility string for window W. It operates on the current buffer with set_buffer_internal_1, scanning from BUF_BEGV to BUF_ZV. Invisible text detection uses TEXT_PROP_MEANS_INVISIBLE(invis) where invis = Fget_char_property(pos, Qinvisible, Qnil). This respects buffer-invisibility-spec, correctly handling org-mode folding, outline mode, and hideshow -- not just `invisible t' text properties. When an invisible region is found, the scanner jumps ahead using Fnext_single_char_property_change to skip the entire region in O(1) iterations rather than character by character. Text extraction uses Fbuffer_substring_no_properties (not raw BUF_BYTE_ADDRESS) to handle the buffer gap correctly. Raw byte access across the gap position yields garbage bytes. The ns_ax_visible_run structure: typedef struct ns_ax_visible_run { ptrdiff_t charpos; /* Buffer charpos of run start. */ ptrdiff_t length; /* Emacs characters in this run. */ NSUInteger ax_start; /* UTF-16 index in accessibility string. */ NSUInteger ax_length; /* UTF-16 units for this run. */ } ns_ax_visible_run; Multiple runs are produced when invisible text splits the buffer into non-contiguous visible segments. The mapping array is stored in the EmacsAccessibilityBuffer ivar `visibleRuns' (C array, xmalloc'd). Index mapping (charpos <-> ax_index) does a linear scan of the run array. Within a run, UTF-16 unit counting uses rangeOfComposedCharacterSequenceAtIndex: to handle surrogate pairs (emoji, rare CJK) correctly -- one Emacs character may occupy 2 UTF-16 units. Cache invalidation is triggered whenever BUF_MODIFF changes (ensureTextCache compares cachedTextModiff). The cache is also invalidated when the window tree is rebuilt. NS_AX_TEXT_CAP = 100,000 UTF-16 units (~200 KB) caps total exposure; buffers larger than ~50,000 lines are truncated for accessibility purposes. VoiceOver performance degrades noticeably beyond this threshold. COMPLETION ANNOUNCEMENTS ------------------------ When point moves in a non-focused buffer (the common case: *Completions* window while the minibuffer retains keyboard focus), VoiceOver does not automatically read the change because it is tracking the focused element. The patch posts AnnouncementRequested with a 4-step fallback chain to find the best text to announce: Step 1 -- completion--string property at point. The `completion--string' text property (set by minibuffer.el since Emacs 29) carries the canonical completion candidate string. It can be a plain Lisp string or a list (CANDIDATE ANNOTATION) where both are strings. ns_ax_completion_string_from_prop handles both: plain string -> use directly; cons -> use car (the candidate without annotation). This is the preferred source: precisely the candidate text with no surrounding whitespace. Step 2 -- mouse-face span at point. completion-list-mode marks the active candidate with mouse-face. The code walks backward and forward from point to find the span boundaries, then reads the corresponding slice of cachedText. Used when completion--string is absent (older Emacs or non- standard completion modes). Step 3 -- completions-highlight overlay at point. Emacs 29+ highlights the selected completion with the `completions-highlight' face applied via an overlay. The overlay text is extracted via ns_ax_completion_text_for_span which itself tries completion--string first, then the `completion' property, then falls back to the ax string slice. Step 4 -- nearest completions-highlight overlay. ns_ax_find_completion_overlay_range scans the buffer for the closest completions-highlight overlay to point. Uses fast probes at {point, point+1, point-1} before falling back to a full O(n) scan. Final fallback -- current line text. Read the line containing point from cachedText. Deduplication: the announcement is posted only when announceText, overlay bounds, or point have changed since the last cycle (cachedCompletionAnnouncement, cachedCompletionOverlayStart/End, cachedCompletionPoint). INTERACTIVE SPANS ----------------- ns_ax_scan_interactive_spans(w, parent_buf) scans the visible range of window W looking for text properties that indicate interactive content. Properties are checked in priority order: widget -> EmacsAXSpanTypeWidget (AXButton, via default) button -> EmacsAXSpanTypeButton (AXButton, via default) follow-link -> EmacsAXSpanTypeLink (AXLink) org-link -> EmacsAXSpanTypeLink (AXLink) mouse-face -> EmacsAXSpanTypeCompletionItem (AXButton; completion-list-mode only) keymap overlay-> EmacsAXSpanTypeButton (AXButton) For completion buffers (major-mode == completion-list-mode), the span boundary for mouse-face regions uses completion--string as the property key when present, rather than mouse-face itself. This prevents two column-adjacent completion candidates from being merged into one span when their mouse-face regions share padding whitespace. All property symbols (Qwidget, Qbutton, Qfollow_link, Qorg_link, Qcompletion__string, Qcompletion, Qcompletions_highlight, Qbacktab, Qcompletion_list_mode) are registered with DEFSYM in syms_of_nsterm and referenced directly -- no repeated intern() calls. Each span is allocated, configured, added to the spans array, then released (the array retains it). The function returns an autoreleased immutable copy of the spans array. Label priority: completion--string > buffer substring > help-echo. Tab navigation: -accessibilityChildrenInNavigationOrder returns the cached span array, rebuilt lazily when interactiveSpansDirty is set. Calls from off-thread are marshalled with dispatch_sync. Focus movement: -setAccessibilityFocused: on a span dispatches Fselect_window + SET_PT_BOTH to the main queue via dispatch_async, wrapped in block_input/unblock_input. ZOOM INTEGRATION ---------------- macOS Zoom (accessibility zoom) tracks a "focus element" to keep the zoomed viewport centered on the relevant screen area. Two mechanisms are provided: 1. ns_draw_phys_cursor (C function, main thread, called during redisplay). After clipping the cursor rect to the text area, stores the rect in view->lastAccessibilityCursorRect. If UAZoomEnabled(), converts the rect to screen coordinates and calls UAZoomChangeFocus(kUAZoomFocusTypeInsertionPoint). Coordinate conversion chain: EmacsView pixels (AppKit, flipped, origin at top-left of view) -[convertRect:toView:nil]-> NSWindow coordinates -[convertRectToScreen:]-> NSScreen coordinates NSRectToCGRect -> CGRect (same values, no transform) CG y-flip: cgRect.origin.y = primaryH - y - height The flip is required because CoreGraphics uses top-left origin (primary screen) while AppKit screen rects use bottom-left. primaryH = [[NSScreen screens] firstObject].frame.size.height. 2. EmacsView -accessibilityBoundsForRange: / -accessibilityFrameForRange: AT tools (including Zoom) call these with the selectedTextRange to locate the insertion point. The implementation first delegates to the focused EmacsAccessibilityBuffer element for accurate per-range geometry via its accessibilityFrameForRange: method. If the buffer element returns an empty rect (no valid window or glyph data), the fallback uses the cached cursor rect stored in lastAccessibilityCursorRect (minimum size 1x8 pixels). The legacy parameterized-attribute API (NSAccessibilityBoundsForRangeParameterizedAttribute) is supported via -accessibilityAttributeValue:forParameter: for older AT clients. KEY DESIGN DECISIONS -------------------- 1. DEFSYM instead of intern for property symbols. DEFSYM registers symbols at startup (syms_of_nsterm) and stores them in C globals (e.g. Qcompletion__string). Using intern() at every AX scan would perform an obarray lookup on each redisplay cycle. DEFSYM symbols are also always reachable by the GC via staticpro, eliminating any risk of premature collection. 2. AnnouncementRequested for character moves, not SelectedTextChanged. VoiceOver derives the speech character from SelectedTextChanged by looking at the character BEFORE the new cursor position (the char "passed over"). In evil-mode with a block cursor, the cursor sits ON the character, not between characters. AnnouncementRequested with the character AT point produces correct speech in both insert and normal (block-cursor) modes. SelectedTextChanged is still posted without granularity to interrupt ongoing VoiceOver reading and update braille display tracking. 3. completion--string, not mouse-face, as span boundary. mouse-face regions in completion-list-mode sometimes include leading or trailing whitespace shared between column-adjacent candidates, which could merge two candidates into one span. completion--string changes precisely at candidate boundaries. 4. Probe order {point, point+1, point-1} for overlay search. After Tab advances to a new completion candidate, point is at the START of the new entry. The previous entry's overlay covers the position before the new start, so point-1 is inside the OLD overlay. Trying point+1 before point-1 finds the new (correct) entry first. 5. Notifications posted BEFORE rebuilding the tree. postAccessibilityUpdates uses existing elements which carry cached state from the previous cycle. Rebuilding first would create fresh elements with current values, making change detection impossible. Tree rebuild is deferred to cycles where accessibilityTreeValid is false; no notifications are posted in that cycle. 6. Re-entrance guard (accessibilityUpdating flag). VoiceOver callbacks triggered by notification posting can cause Cocoa to re-enter the run loop, which may trigger redisplay, which calls ns_update_end -> postAccessibilityUpdates. The BOOL flag breaks this recursion. 7. lispWindow (Lisp_Object) instead of raw struct window *. struct window pointers can become dangling after delete-window. Storing the Lisp_Object and using WINDOW_LIVE_P + XWINDOW at the call site is the standard safe pattern in Emacs C code. 8. accessibilityVisibleCharacterRange returns full buffer range. VoiceOver treats the visible range boundary as end-of-text. If this returned only the on-screen portion, VoiceOver would announce "end of text" prematurely when the cursor reaches the visible bottom, even though more buffer content exists below. KNOWN LIMITATIONS ----------------- - BUF_OVERLAY_MODIFF is not tracked. Overlay changes (e.g. moving the completions-highlight overlay via Tab without changing buffer text) do not bump BUF_MODIFF, so the text cache is not invalidated. The notification logic detects point changes (cachedPoint) which covers the common case, but overlay-only changes with a stationary point would be missed. A future fix would compare overlay_modiff. - Interactive span scan is O(n) in the visible buffer range. Every character position is visited to find property boundaries. For large visible buffers this scan runs on every redisplay cycle whenever interactiveSpansDirty is set. An optimization would use next_single_property_change to skip non-interactive regions in bulk. - Mode line text is extracted from CHAR_GLYPH rows only. Image glyphs, stretch glyphs, and composed glyphs are silently skipped. Mode lines with icon fonts (e.g. doom-modeline with nerd-font) produce incomplete or garbled accessibility text. - Buffers larger than NS_AX_TEXT_CAP (100,000 UTF-16 units) are truncated. The truncation is silent; AT tools navigating past the truncation boundary may behave unexpectedly. - No multi-frame coordination. EmacsView.accessibilityElements is per-view; there is no cross-frame notification ordering. - GNUstep is explicitly excluded (#ifdef NS_IMPL_COCOA). GNUstep has a different accessibility model and requires separate work. - Line navigation detection (ns_ax_event_is_line_nav_key) checks raw key codes (C-n = 14, C-p = 16, Tab = 9, backtab symbol). Users who remap keys to navigation commands (e.g. C-j -> next-line) will not get forced line-granularity announcements for those bindings. A future improvement would inspect Vthis_command against known navigation command symbols instead. - UAZoomChangeFocus always uses kUAZoomFocusTypeInsertionPoint regardless of cursor style (box, bar, hbar). This is cosmetically imprecise but functionally correct. TESTING CHECKLIST ----------------- Prerequisites: - macOS with VoiceOver (Cmd-F5 to toggle). - Emacs built from source with this patch applied. - Evil-mode recommended for block-cursor tests. Basic text reading: 1. Open Emacs. Press Cmd-F5 to start VoiceOver. 2. Switch to Emacs (Cmd-Tab). VoiceOver should announce "Emacs, editor" and read the current line. 3. Move cursor with arrow keys. VoiceOver should read each character (left/right) or line (up/down) as you move. 4. Verify: right/left arrow reads the character AT the cursor position, not the character left behind. (evil block-cursor) Word and line navigation: 5. Press M-f / M-b (forward/backward word). VoiceOver should announce the word landed on. 6. Press C-n / C-p. VoiceOver should read the full new line. 7. Hold Shift and press arrow keys to extend selection. VoiceOver should announce the selected text. Completion navigation: 8. Type M-x to open the minibuffer. 9. Type a partial command name. Press Tab to open *Completions*. 10. Press Tab / S-Tab to cycle through completions. VoiceOver should announce each candidate name as you move. 11. Verify no double-speech (each candidate read exactly once). Interactive span Tab navigation: 12. Open a buffer with buttons (e.g. M-x describe-key). 13. Use VoiceOver Item Chooser (VO-I) or Tab with VoiceOver interaction mode to navigate interactive elements. 14. Verify each button/link is reachable and its label is read. 15. In an org-mode file with links, verify links appear as separate navigable AXLink elements. Mode line: 16. Use the VoiceOver cursor to navigate to the mode line below a buffer. VoiceOver should read the mode line text. Zoom integration: 17. Enable macOS Zoom (System Settings -> Accessibility -> Zoom). 18. Set Zoom to "Follow keyboard focus". 19. Move cursor in Emacs. Zoom viewport should track the cursor. 20. Verify Zoom follows the cursor across split windows. Window operations: 21. Split window with C-x 2. VoiceOver should announce a layout change. Switch with C-x o; VoiceOver should read the new window content. 22. Delete a window with C-x 0. No crash should occur. 23. Switch buffers with C-x b. VoiceOver should read new buffer. Stress test: 24. Open a large file (>5000 lines). Navigate with C-v / M-v. Verify no significant lag in VoiceOver speech response. 25. Open an org-mode file with many folded sections. Verify that folded (invisible) text is not announced during navigation. -- end of README --