Voice Assistant for Wizardy, Dubious Efficiency, and Arguably Reduced Screen Time

This is a simple voice assistant. There are many like it, but this one is mine. I could use Mycroft or any number of other existing open-source voice assistants, but I've decided to roll my own.

I am developing this project for a few reasons:

  • I don't personally like the privacy/convenience ratio of Alexa and other home assistants
  • I want skills that other voice assistants don't have, and I don't feel like developing skills for those devices. Perhaps someday I will do that if the skills seem useful enough for enough people, but for now I just want to build things exactly for my own needs
  • To keep in Python practice. Right now I work exclusively in Javascript, with a heavy emphasis on frontend, and I want to keep my hand in other technologies and paradigms
  • To keep in writing practice. Publish or perish, as they say, and it's been a long time since I've written much of anything
  • I enjoy modular projects like this, where it is really easy to add incremental improvements over time
  • To get some raspberry pi experience. It's been years since I've touched any hardware, and I miss it. Not that this project needs much hardware tinkering for now, but who knows what future improvements may bring.
  • Reduce screen time? In theory if I get all the planned skills up and running I'll be looking at my phone and navigating through different tabs less. The time saving there is probably nominal, though; this is a programming exercise with a side effect of some small practical benefits.
  • Eventual language model integration experiments

I loosely used this GeeksforGeeks tutorial as a really basic POC to start off with. If you've ever built a project with Python... or any language... then you'll be familiar with the headaches and how fast things move. Even following a simple tutorial runs into some roadbloacks, usually around different libraries. For example, at the time of this writing you have to brew install portaudio before you can pip install pyaudio and pytxt is now py3-tts. There's also a bunch of extraneous ALSA errors to silence. But the roadblocks were easily overcome and I got a simple script up and running that recognizes commands and reacts to them in one evening.

I removed almost all of the commands, as I don't really need to be asking it what day it is and the like, and just kept the base logic. I also refactored the TakeCommand function to use a match case statement instead of a bunch of elifs. I just prefer reading it that way.

At first, I kept development on my macbook. I was able to get the speech recognition up, with a very simple command to turn my Wyze lightbulb on and off, and then moved to a raspberry pi. I'm keeping the pi headless, so I plan to mostly develop on my macbook and then pull it down to the pi. We'll see how much of a headache that is.

The goal of this project is to build something that I can use to control the lights in my house, proctor the daily survey that I use to track my habits, and let me add to my various Notion pages and google spreadsheets and workout trackers and nutrition trackers and such by talking, without involving Zapier or too many third-party services.

As of this blog post, I can turn my bedside lamp smarbulb on or off. Not bad for one day, though - Python really is great for rapid iteration.

© 2024 A Minor Studio