mirror of
https://github.com/AppFlowy-IO/AppFlowy.git
synced 2026-03-24 04:46:56 +00:00
[GH-ISSUE #7644] [FR] Local inference for mobile app using llama.cpp #3386
Labels
No labels
2024
2025
2026
acct mgmt
AI
automation
bug
calendar
ci
CJK
cloud
code-block
collaboration
copy-paste
database
data migration
data sync
deploy
desktop
develop
develop
documentation
duplicate
editor
editor-plugin
emoji
export
files
flutter-only
follow-up
formula
good first issue for devs
good first issue for experienced devs
grid
hacktoberfest
HACKTOBERFEST-ACCEPTED
help wanted
i18n
icons
images
importer
improvements
infra
install
integrations
IR
kanban board
login
look and joy
mentorship
mobile
mobile
needs design
new feature
new feature
non-coding
notes
notifications
onboarding
organization
P0+
permission
platform-linux
platform-mac
platform-windows
plugins
program
pull-request
Q1 25
Q1 26
Q2 24
Q2 25
Q3 24
Q3 25
Q4 24
Q4 25
react
regression
rust
rust
Rust-only
Rust-only
Rust-starter
Rust-starter
self-hosted
shortcuts
side panel
slash-menu
sync v2
table
tablet
task
tauri
templates
tests
themes
translation
v0.5.6
v0.5.8
v0.5.9
v0.6.0
v0.6.1
v0.6.4
v0.6.7
v0.6.8
v0.7.1
v0.7.4
v0.7.4
v0.7.5
v0.7.6
v0.7.7
v0.7.8
v0.8.0
v0.8.4
v0.8.5
v0.8.9
web
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
AppFlowy-IO/AppFlowy#3386
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @rampa3 on GitHub (Mar 29, 2025).
Original GitHub issue: https://github.com/AppFlowy-IO/AppFlowy/issues/7644
Description
I would like to suggest implementing an option to use local LLM inference on mobile devices using llama.cpp library and either user provided or by the app downloaded quantized GGUF variant of LLM model. I believe such function would be feasible, since most middle tier mobile phones nowadays are capable of running usually a Q4_K_M quantization (the medium balanced quality/speed option) of 7B variants of many models at slower than PC, but bearable speed.
Impact
Implementing this would benefit users who are not always able to access internet on their mobile devices, plus those who would wish a privacy of local LLM on the go.
Additional Context
Inspired by addition of local inference option into desktop version of AppFlowy.
@m13v commented on GitHub (Mar 18, 2026):
running llama.cpp on mobile is totally viable now, especially with Q4_K_M quantization on newer phones. we went through a similar evaluation for our macOS agent and the main consideration was memory pressure - on devices with 6GB RAM you need to be careful about model loading/unloading or the OS will kill your app. also worth looking at CoreML conversion for Apple devices since the Neural Engine is significantly faster than CPU inference for supported architectures.
@m13v commented on GitHub (Mar 18, 2026):
for reference, here's how we handle the provider layer that can switch between local and cloud inference: https://github.com/m13v/fazm/blob/main/Desktop/Sources/Providers/ChatProvider.swift
and the transcription service that uses on-device WhisperKit for speech-to-text: https://github.com/m13v/fazm/blob/main/Desktop/Sources/TranscriptionService.swift