[GH-ISSUE #7644] [FR] Local inference for mobile app using llama.cpp #3386

Open
opened 2026-03-23 21:29:44 +00:00 by mirror · 2 comments
Owner

Originally created by @rampa3 on GitHub (Mar 29, 2025).
Original GitHub issue: https://github.com/AppFlowy-IO/AppFlowy/issues/7644

Description

I would like to suggest implementing an option to use local LLM inference on mobile devices using llama.cpp library and either user provided or by the app downloaded quantized GGUF variant of LLM model. I believe such function would be feasible, since most middle tier mobile phones nowadays are capable of running usually a Q4_K_M quantization (the medium balanced quality/speed option) of 7B variants of many models at slower than PC, but bearable speed.

Impact

Implementing this would benefit users who are not always able to access internet on their mobile devices, plus those who would wish a privacy of local LLM on the go.

Additional Context

Inspired by addition of local inference option into desktop version of AppFlowy.

Originally created by @rampa3 on GitHub (Mar 29, 2025). Original GitHub issue: https://github.com/AppFlowy-IO/AppFlowy/issues/7644 ### Description I would like to suggest implementing an option to use local LLM inference on mobile devices using llama.cpp library and either user provided or by the app downloaded quantized GGUF variant of LLM model. I believe such function would be feasible, since most middle tier mobile phones nowadays are capable of running usually a Q4_K_M quantization (the medium balanced quality/speed option) of 7B variants of many models at slower than PC, but bearable speed. ### Impact Implementing this would benefit users who are not always able to access internet on their mobile devices, plus those who would wish a privacy of local LLM on the go. ### Additional Context Inspired by addition of local inference option into desktop version of AppFlowy.
Author
Owner

@m13v commented on GitHub (Mar 18, 2026):

running llama.cpp on mobile is totally viable now, especially with Q4_K_M quantization on newer phones. we went through a similar evaluation for our macOS agent and the main consideration was memory pressure - on devices with 6GB RAM you need to be careful about model loading/unloading or the OS will kill your app. also worth looking at CoreML conversion for Apple devices since the Neural Engine is significantly faster than CPU inference for supported architectures.

<!-- gh-comment-id:4079436845 --> @m13v commented on GitHub (Mar 18, 2026): running llama.cpp on mobile is totally viable now, especially with Q4_K_M quantization on newer phones. we went through a similar evaluation for our macOS agent and the main consideration was memory pressure - on devices with 6GB RAM you need to be careful about model loading/unloading or the OS will kill your app. also worth looking at CoreML conversion for Apple devices since the Neural Engine is significantly faster than CPU inference for supported architectures.
Author
Owner

@m13v commented on GitHub (Mar 18, 2026):

for reference, here's how we handle the provider layer that can switch between local and cloud inference: https://github.com/m13v/fazm/blob/main/Desktop/Sources/Providers/ChatProvider.swift

and the transcription service that uses on-device WhisperKit for speech-to-text: https://github.com/m13v/fazm/blob/main/Desktop/Sources/TranscriptionService.swift

<!-- gh-comment-id:4079438099 --> @m13v commented on GitHub (Mar 18, 2026): for reference, here's how we handle the provider layer that can switch between local and cloud inference: https://github.com/m13v/fazm/blob/main/Desktop/Sources/Providers/ChatProvider.swift and the transcription service that uses on-device WhisperKit for speech-to-text: https://github.com/m13v/fazm/blob/main/Desktop/Sources/TranscriptionService.swift
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
AppFlowy-IO/AppFlowy#3386
No description provided.