There has never been a greater time to begin your data science homelab for analyzing data useful to you, storing essential information, or developing your personal tech skills.
There’s an expression I’ve read on Reddit just a few times now in various tech-focused subreddits that’s along the lines of “Paying for cloud services is just renting another person’s computer.” While I do think cloud computing and storage will be extremely useful, this text will give attention to among the explanation why I’ve moved my analyses, data stores, and tools away from the net providers, and into my home office. A link to the tools and hardware I used to do that is accessible as well.
The perfect technique to start explaining the strategy to my madness is by sharing a business problem I bumped into. While I’m a reasonably traditional investor with a low-risk tolerance, there may be a small hope within me that perhaps, just perhaps, I will be considered one of the <1% to beat the S&P 500. Note I used the word “hope”, and us such, don't put an excessive amount of on the road on this hope. Just a few times a yr I’ll give my Robinhood account $100 and treat it with as much regard as I treat a lottery ticket — hoping to interrupt it big. I'll put the adults within the room relaxed though by sharing that this account is separate from my larger accounts which are mostly based on index funds with regular modest returns with just a few value stocks I sell covered calls on a rolling basis with. My Robinhood account nevertheless is borderline degenerate gambling, and anything goes. I actually have just a few rules for myself though:
- I never take out any margin.
- I never sell uncovered, only buy to open.
- I don’t throw money at chasing losing trades.
It’s possible you’ll wonder where I’m going with this, and I’ll pull back from my tangent by sharing that my “lottery tickets” which have, alas, not earned me a Jeff-Bezos-worthy yacht yet, but have taught me an excellent bit about risk and loss. These lessons have also inspired the info enthusiast within me to try to enhance the way in which I quantify risk and try to anticipate market trends and events. Even models directionally correct within the short term can provide tremendous value to investors — retail and hedge alike.
Step one I saw toward improving my decision-making was to have data available to make data-driven decisions. Removing emotion from investing is a widely known success tip. While historical data is widely available for stocks and ETFs and is open-sourced through resources comparable to yfinance (an example of mine is below), derivative historical datasets are far more expensive and difficult to return by. Some initial glances on the APIs available provided hints that regular, routine access to data to backtest strategies for my portfolio could cost me tons of of dollars annually, and possibly even monthly depending on the granularity I used to be in search of.
I made a decision I’d reasonably spend money on myself on this process, and spend $100’s of dollars by myself terms as an alternative. *audience groans*
My first thoughts on data scraping and warehousing led me to the identical tools I take advantage of each day in my work. I created a private AWS account, and wrote Python scripts to deploy on Lambda to scrape free, live option datasets at predetermined intervals and write the info on my behalf. This was a completely automated system, and near-infinitely scalable because a unique scraper could be dynamically spun up for each ticker in my portfolio. Writing the info was more difficult, and I used to be nestled between two routes. I could either write the info to S3, crawl it with Glue, and analyze it with serverless querying in Athena, or I could use a relational database service and directly write my data from Lambda to the RDS.
A fast breakdown of AWS tools mentioned:
Lambda is serverless computing allowing users to execute scripts without much overhead and with a really generous free tier.
S3, aka easy storage service, is an object storage system with a large free tier and very cost-effective storage at $0.02 per GB monthly.
Glue is an AWS data prep, integration, and ETL tool with web crawlers available for reading and interpreting tabular data.
Athena is a serverless query architecture.
I ended up leaning toward RDS simply to have the info easily queryable and monitorable, if for no other reason. In addition they had a free tier available of 750 hours free in addition to 20 GB of storage, giving me a pleasant sandbox to get my hands dirty in.
Little did I realize, nevertheless, how large stock options data is. I started to write down about 100 MB of knowledge per ticker monthly at 15-minute intervals, which can not sound like much, but considering I actually have a portfolio of 20 tickers, before the tip of the yr I might have used all of the whole thing of the free tier. On top of that, the small compute capability inside the free tier was quickly eaten up, and my server ate through all 750 hours before I knew it (considering I desired to track options trades for roughly 8 hours a day, 5 days every week). I also incessantly would read and analyze data after work at my day job, which led to greater usage as well. After about two months I finished the free tier allotment and received my first AWS bill: about $60 a month. Take note, once the free tier ends, you’re paying for each server hour of processing, an amount per GB out of the AWS ecosystem to my local dev machine, and a storage cost in GB/month. I anticipated inside a month or two my costs of ownership could increase by at the least 50% if no more, and proceed so on.
At this point, I spotted how I’d reasonably be taking that $60 a month I’m spending renting equipment from Amazon, and spend it on electric bills and throwing what’s left over into my Robinhood account, back where we began. As much as I really like using AWS tools, when my employer isn’t footing the bill (and to my coworkers reading this, I promise I’m frugal at work too), I actually don’t have much interest in investing in them. AWS just isn’t priced at the purpose for hobbyists. They provide loads of great free resources to learn to noobies, and great bang in your buck professionally, but not at this current in-between level.
I had an old Lenovo Y50–70 laptop from prior to varsity with a broken screen that I assumed I’d repurpose as a house web scraping bot and SQL server. While they still can fetch a good price latest or certified refurbished (likely as a result of the i7 processor and dedicated graphics card), my broken screen just about totaled the worth of the pc, and so hooking it up as a server breathed fresh life into it, and about three years of dust out of it. I set it up within the corner of my lounge on top of a speaker (next to a gnome) and across from my PlayStation and set it to “at all times on” to meet its latest purpose. My girlfriend even said the obnoxious red backlight of the pc keys even “pulled the room together” for what it’s value.
Conveniently my 65″ Call-of-Duty-playable-certified TV was inside HDMI cable distance to the laptop to really see the code I used to be writing too.
I migrated my server from the cloud to my janky laptop and was off to the races! I could now perform all the evaluation I wanted at just the associated fee of electricity, or around $0.14/kWh, or around $0.20–0.30 a day. For one more month or two, I tinkered and tooled around locally. Typically this is able to appear to be just a few hours every week after work of opening up my MacBook, fooling around with ML models with data from my gnome-speaker-server, visualizing data on local Plotly dashboards, after which directing my Robinhood investments.
I experienced some limited success. I’ll save the small print for an additional Medium post once I actually have more data and performance metrics to share, but I made a decision I desired to expand from a broken laptop to my very own micro cloud. This time, not rented, but owned.
“Home Lab” is a reputation that sounds really complicated and funky *pushes up glasses*, but is definitely relatively straightforward when deconstructed. Principally, there have been just a few challenges I used to be looking to handle with my broken laptop setup that provided motivation, in addition to latest goals and nice-to-haves that provided inspiration.
Broken laptop problems:
The hard disk drive was old, at the least 5 or 6 years old, which posed a risk to potential future data loss. It also slowed down significantly under duress with larger queries, a noted problem with the model.
Having to make use of my TV and Bluetooth keyboard to make use of my laptop with Windows 10 Home installed was very inconvenient, and never ergonomically friendly.
The laptop was not upgradeable within the event I desired to add more RAM beyond what I had already installed.
The technology was limited in parallelizing tasks.
The laptop alone was not strong enough to host my SQL server in addition to dashboards and crunching numbers for my ML models. Nor would I feel comfortable sharing the resources on the identical computer, shooting the opposite services within the feet.
A system I might put into place had to resolve each of those problems, but there have been also latest features I’d like to realize too.
Planned Latest Features:
A brand new home office setup to make working from home on occasion more comfortable.
Ethernet wiring throughout my entire apartment (if I’m paying for the entire gigabit, I’m going to make use of the entire gigabit AT&T).
Distributed computing* with microservers where appropriate.
Servers could be able to being upgraded and swapped out.
Various programs and software deployable to realize different subgoals independently and without impeding current or parallel programs.
*Distributed computing with the computers I selected is a debated topic that shall be explained later within the article.
I spent an excellent period of time conducting research on appropriate hardware configurations. One in every of my favorite resources I read was “Project TinyMiniMicro”, which compared the Lenovo ThinkCentre Tiny platform, the HP ProDesk/EliteDesk Mini Platform, and the Dell OptiPlex Micro platform. I too have used single-board computers before just like the authors of Project TMM, and have two Raspberry Pis and an Odroid XU4.
What I liked about my Pis:
They were small, ate little power, and the brand new models have 8GB of RAM.
What I liked about my Odroid XU4:
It’s small, has 8 cores, and is an amazing emulation platform.
While I’m sure my SBCs will still discover a home in my homelab, remember, I want equipment that handles the services I would like to host. I also ended up purchasing probably the most costly Amazon order of my entire life and completely redid my entire office. My shopping cart included:
- Multiple Cat6 Ethernet Cables
- RJ45 Crimp Tool
- Zip ties
- 2 EliteDesk 800 G1 i5 Minis (but was sent G2 #Win)
- 1 EliteDesk 800 G4 i7 Mini (and sent a good higher i7 processor #Win)
- 2 ProDesk 600 G3 i5 Minis (and send sent a rather worse i5 #Karma)
- Extra RAM
- Multiple SSDs
- A brand new office desk to switch my credenza/runner
- Latest office lighting
- Harddrive cloning equipment
- Two 8-Port Network Switches
- An Uninterruptible Power Supply
- A Printer
- A Mechanical Keyboard (Related, I even have five keyboard and mice combos from the computers if anyone wants one)
- Two latest monitors
If you happen to’d wish to see my entire parts list with links to every item to envision it out or two make a purchase order for yourself, be at liberty to move over to my website for a whole list.
Once my Christmas-in-the-Summer arrived with a complete slew of boxes on my doorstep, the actual fun could begin. Step one was ending wiring my ethernet throughout my home. The installers had not connected any ethernet cables to the cable box by default, so I needed to cut the ends and install the jacks myself. Fortunately, the AWESOME toolkit I purchased (link on my site) included the crimp tool, the RJ45 ends, and testing equipment to make sure I wired the ends right and to discover which port around my apartment correlated to which wire. In fact, with my luck, the very last of 8 wires ended up being the one I needed for my office, but the long run tenants of my place will profit from my good deed for the day I assume. Your complete process took around 2–3 hours of wiring the gigabit connections but fortunately, my girlfriend enjoyed helping and a glass of wine made it go by faster.
Following wired networking, I started to establish my office by constructing the furniture, installing the lighting, and unpacking the hardware. My desk setup turned out pretty clean, and I’m comfortable with how my office now looks.