You should sell this to Lawyers and other professionals who bill per hour to reconstruct their billables for the day without missing anything. They would pay big money for something that recovered forgotten(unbilled) work throughout the day.
I would imagine this could be one of the inputs along with a STT system as context to an LLM. Because in general we can speak faster than we can write/type and for me, specifically, after a point in the day typing creates a higher cognitive load than speaking.
1. "Create a reminder for reading this email at 5:00 pm" and this could infer what to do from the screen shot's description(plus a local MCP tool for calendar)
2. "Can you fetch that file form that project in that workspace and implement the pattern in the code on my vscode terminal?" It can lower cognitive fatigue of typing and clicking a bunch of place.
3. Take notes as I describe something on the screen. It could be for prompt composition e.g. get the link from my browser and the file on vscode and write code that does XYZ.
On one hand I'm super enthusiastic about your project.
This could help battle procrastination, organize your time in a long run, bill your clients more efficiently, etc. 20 years younger, hyper productive me would kill for such product.
But then I recall when I accidently suggested TimeRescue to my boss at one time, and suddenly he was skimming though everyones daily logs to see if they're spending 100% of their times in business facing apps.
When I first heard about "covid mouse mover devices" that faked activity for remote workers I thought it was a joke. Seriously.
But I'm afraid this is the dystopian future. Employers constantly looking at your screen and getting spreadsheets with your daily effort.
Yea, honestly I would hate if people used this to track _other_ people, especially bosses. I wanted to build something that gave people more agency to do more with their precious time, but there definitely is a fine line here.
Really nice! I currently use ActivityWatch for tracking tasks on PC.
Some things I would like to be able to do with software like this:
- Identify the 'spark' of a distraction. For example, opening my email inbox to read a specific email also shows me many unrelated emails. These can easily be the cause of a 5-15 minute distraction. This information is often actionable. I installed browser plugins to hide my youtube suggested videos and my distractions went down. I made sure to close all unused windows to avoid catching a glimpse of unrelated work.
- Identify repeated tasks, and the cadence of those tasks. Do I manually make an invoice once a week for a particular edge case? Is the process basically identical every time. Could this be automated?
- How was I feeling before, during and after a task. (This is a very broad and intentionally not well-defined question, but I think it has the most promise for improving procrastination and task initiation).
Yep, helping people understand their distraction patterns would be an amazing feature. I find myself doing the same thing, funnily enough I also have that same Youtube extension.
Couldn't we get a low-res version of this info by tracking the active window using a cli tool? For linux, there are several options. Not sure about Mac.
Another approach is to run OCR on 1FPS screenshots. Everything runs locally without draining the battery like an LLM would.
Thanks! Between my friends and I, it's about a 50/50 split between local and cloud. I think it's great to be able to pick the tradeoff between quality/privacy based on your own privacy preferences.
Cute serendipity, rule of three. Neat project too; conceptually it sounds like an amazing ability to be able to better watch ourselves. Doing it via screenshots & AI feels like a fun sense-making adventure that actually makes a lot of sense, that can maybe try to pick through & discern what the screen is doing in a lot of different scenarios.
Recall (and Rewind) are similar in the sense that they both use screen data, but it's designed for retrieving specific things you saw, not semantically summarizing your time. My opinion is that they're completely different feature sets.
Which are wildly different when comparing a third-party hosted product (i.e., Microsoft), and a self-hosted OSS application that can use a self-hosted Ollama model.
Yep! Have tested it out on Qwen 2.5VL 3B and it works reasonably well on my 16GB Macbook Air. The only thing I will say is that I don't think it's a great idea to run local models on laptop battery, since it's quite compute intensive and drains kinda quickly. Have tested with Ollama and LMStudio, but you should be able to use any OpenAI compatible local server.
Would it be possible to check for the power adapter and run processing then? These are the types of things I've been thinking about for my own app: https://stardateapp.com
That would be really cool, but for the foreseeable future there's still a lot of room to improve how screen data is used so I'll mostly be focused on that.
You should sell this to Lawyers and other professionals who bill per hour to reconstruct their billables for the day without missing anything. They would pay big money for something that recovered forgotten(unbilled) work throughout the day.
I would imagine this could be one of the inputs along with a STT system as context to an LLM. Because in general we can speak faster than we can write/type and for me, specifically, after a point in the day typing creates a higher cognitive load than speaking.
1. "Create a reminder for reading this email at 5:00 pm" and this could infer what to do from the screen shot's description(plus a local MCP tool for calendar)
2. "Can you fetch that file form that project in that workspace and implement the pattern in the code on my vscode terminal?" It can lower cognitive fatigue of typing and clicking a bunch of place.
3. Take notes as I describe something on the screen. It could be for prompt composition e.g. get the link from my browser and the file on vscode and write code that does XYZ.
On one hand I'm super enthusiastic about your project.
This could help battle procrastination, organize your time in a long run, bill your clients more efficiently, etc. 20 years younger, hyper productive me would kill for such product.
But then I recall when I accidently suggested TimeRescue to my boss at one time, and suddenly he was skimming though everyones daily logs to see if they're spending 100% of their times in business facing apps.
When I first heard about "covid mouse mover devices" that faked activity for remote workers I thought it was a joke. Seriously.
But I'm afraid this is the dystopian future. Employers constantly looking at your screen and getting spreadsheets with your daily effort.
Overall, very disturbing product.
Yea, honestly I would hate if people used this to track _other_ people, especially bosses. I wanted to build something that gave people more agency to do more with their precious time, but there definitely is a fine line here.
This is super rad. Love it being Open Source, and with the option to choose local models. You’re awesome, thanks!
Really nice! I currently use ActivityWatch for tracking tasks on PC.
Some things I would like to be able to do with software like this:
- Identify the 'spark' of a distraction. For example, opening my email inbox to read a specific email also shows me many unrelated emails. These can easily be the cause of a 5-15 minute distraction. This information is often actionable. I installed browser plugins to hide my youtube suggested videos and my distractions went down. I made sure to close all unused windows to avoid catching a glimpse of unrelated work.
- Identify repeated tasks, and the cadence of those tasks. Do I manually make an invoice once a week for a particular edge case? Is the process basically identical every time. Could this be automated?
- How was I feeling before, during and after a task. (This is a very broad and intentionally not well-defined question, but I think it has the most promise for improving procrastination and task initiation).
Yep, helping people understand their distraction patterns would be an amazing feature. I find myself doing the same thing, funnily enough I also have that same Youtube extension.
Couldn't we get a low-res version of this info by tracking the active window using a cli tool? For linux, there are several options. Not sure about Mac.
Another approach is to run OCR on 1FPS screenshots. Everything runs locally without draining the battery like an LLM would.
You definitely could! I think it would just be harder to get good semantic understanding of what you did during a segment of time without LLMs.
For those expecting something more along the lines of a `git log` sort of thing, like a command line tool, there's `doing`.
https://brettterpstra.com/projects/doing/
I'd only ever consider doing it with a local model, but this looks really cool!
Thanks! Between my friends and I, it's about a 50/50 split between local and cloud. I think it's great to be able to pick the tradeoff between quality/privacy based on your own privacy preferences.
It's somewhat related two other recent submissions,
Replace PostgreSQL with Git for your next project for git data storing. https://news.ycombinator.com/item?id=4535144 https://devcenter.upsun.com/posts/why-you-should-replace-pos...
Consumer.today day-logging single user microsite. https://consumed.today/ https://news.ycombinator.com/item?id=45351446
Cute serendipity, rule of three. Neat project too; conceptually it sounds like an amazing ability to be able to better watch ourselves. Doing it via screenshots & AI feels like a fun sense-making adventure that actually makes a lot of sense, that can maybe try to pick through & discern what the screen is doing in a lot of different scenarios.
wait... isnt this pretty much what Microsoft was doing with Recall?
Recall (and Rewind) are similar in the sense that they both use screen data, but it's designed for retrieving specific things you saw, not semantically summarizing your time. My opinion is that they're completely different feature sets.
The backlash for Recall was not based on the feature set, it was because of the massive privacy and security concerns.
Which are wildly different when comparing a third-party hosted product (i.e., Microsoft), and a self-hosted OSS application that can use a self-hosted Ollama model.
The feature isn't the problem.
Nice work, does this work with local (100% offline) models assuming you have decent hardware and are serving them up with llama.cpp or similar?
Yep! Have tested it out on Qwen 2.5VL 3B and it works reasonably well on my 16GB Macbook Air. The only thing I will say is that I don't think it's a great idea to run local models on laptop battery, since it's quite compute intensive and drains kinda quickly. Have tested with Ollama and LMStudio, but you should be able to use any OpenAI compatible local server.
Would it be possible to check for the power adapter and run processing then? These are the types of things I've been thinking about for my own app: https://stardateapp.com
Wow, yeah that's clever I hadn't thought of that. Will add as an advanced setting.
Nice that's great. I have a 96GB M2 Max that's plugged in 99.9% of the time so that's not an issue. Cheers for the response!
Congrats. This is very well executed.
Is it possible to include wearables as a data sources?
i.e. apple watch for sleep, running, activity levels? it could really give a 360 view of your life
That would be really cool, but for the foreseeable future there's still a lot of room to improve how screen data is used so I'll mostly be focused on that.