Making the Leap from Hardware to Machine Learning, Part 1

Matt Chang
7 min readJan 17, 2024

--

After earning a PhD in optical chip design and working 7 years in the hardware industry, I made a career pivot to machine learning software. Today, I’m a software engineer at Codeium, an AI coding assistant company.

I’m writing this blog series to share my experiences with anyone else who is considering making a similar career transition. With the benefit of hindsight, I’ll layout a roadmap of how I prepared for the job search and interview process as an outsider. I’ll also reflect on my experiences actually working in software and how they compared with my expectations. My hope is that you can learn from my journey to either make your own transition smoother or have a high quality data point to decide whether or not this is the right decision for you.

This is Post #1: why I made the decision to leave hardware and switch to machine learning.

Other posts in this series:

Part #2: Learning about the machine learning industry before diving in

Why make the change?

In May 2023, my previous employer, Luminous Computing, laid off half the company, including anything involving photonics, my team. I was employee #5 at Luminous, including the 3 founders, and had spent the last 4 years building everything (the infrastructure, the lab, the team, the vendor relationships) from scratch. Despite the recent financial troubles, technical progress was going well. We had just received our latest chip design from the foundry, and the bring-up and test was going better than we could have hoped, to all of our astonishment. I also really enjoyed working with my team. They were great people, personally and intellectually, whom I learned a lot from. The layoffs “pushed me into the pool”, in that it forced me out of a job that I would not have left willingly because there was so much inertia. This freed me to contemplate my options, one of which was a switch from hardware to software.

After the layoffs, I took 3 months off to unclench my mind from the day-to-day grind of startup life. In August 2023, I resolved to go all-in on a career change into ML Software. Admittedly, it wasn’t easy. Sunk cost (12 years, including the PhD), potential new employers reaching out, a mortgage, and the feeling that I was at the peak of my abilities in silicon photonics were all reasons I could have stayed.

But I felt that I owed it to myself to try. In most major personal decisions, I’ve found that a switch from the status quo requires a pull and a push, where I am being pulled towards the new thing and pushed away from the old thing. I found that to be the case here as well.

The pull towards ML Software

To anyone paying attention to tech, this might not need much explanation. I believe that machine learning has reached the point of maturity and accessibility where it will penetrate the everyday lives of everyday people, whether they know it or not. Historically, there have been few technological advances that have changed the game: the transistor, the laser, the internet, the smartphone. I believe that the productization of machine learning is the next revolution, and there’s so much low hanging fruit to pick that I can’t resist the urge to be a part of it myself.

There are more specific, reasons why I’m attracted to machine learning. I find machine learning much more expressive and capable than the programming I was taught in school. The very spirit of machine learning is to have an algorithm learn what to do, a tiny bit at a time, through an enormous number of examples. Not because a human coded exact instructions line-by-line. The job of the engineer is no longer to identify every possible corner case and build a case for it, but rather to guide the algorithm by curating these examples, build an architecture that allows the algorithm to best construct an internal representation of the underlying examples, optimize the execution of the algorithm on datacenter infrastructure, and make sure that the algorithm generalizes well to unseen data. Babies don’t learn to walk because we told them the exact angle to bend the knee and raise a foot. They learned because they see countless examples of us doing it. Then they make many mistakes trying to replicate us until finally, probably by luck, maybe by some form of optimization, they take their first, clumsy steps. This change in software design was elaborated on in great detail in a blogpost by Andrej Karpathy (who I think is just an incredible teacher and communicator).

Another reason why I find machine learning so compelling is that it is relatively open and democratized compared to hardware. You don’t need expensive equipment or access to start-of-the-art manufacturing facilities to build and test machine learning models. Datasets are publicly available, the internet is teeming with tutorials and open-source code, and the only equipment you need to build an experiment is one that many of us spend hours on everyday already. Heck, if you want to spin up a more powerful GPU, it’s only a few clicks away. Of course, training large models and tapping into massive datasets are still only accessible to corporations and governments, but the barrier to entry to the base technology is orders of magnitude lower than most hardware fields. Perhaps because of where machine learning is in its development — it hasn’t reached the maturity or productization levels of hardware— or because of the spirit of the field itself, machine learning feels like a much more open world than the safe-guarded and secretive world of hardware. In hardware, when was the last time a company the size of Meta open-sourced something as technologically relevant as Llama?

The final reason why I find software attractive is how quickly one can experiment. Trying different model architectures, changing input feature representations, and optimizing hyperparameters are bounded only by my time or how many GPUs I’m able to spin up. As a scientist and engineer (and perhaps a millennial), being able to try different approaches, view the results, and iterate quickly is incredibly satisfying. The field in general is advancing rapidly. The importance of data is obvious now, but just 10 years ago, Fei-Fei Li had to go against the grain to justify the development of ImageNet. It is both a testament to how young the field is, yet how swiftly it has developed.

Now, of course, I know the picture I’ve painted is far rosier than what happens in the day to day trenches of most machine learning engineers. I’ve worked in big tech before. But at the end of the day, I believe that software and machine learning will be here for the long run and will define the next tech revolution, so making the switch now is, to me, a long-term investment in skills and knowledge that will be important for whatever comes next.

The push away from hardware

I can summarize the factors pushing me away from hardware, specifically silicon photonics, into two main categories:

Long cycle times. For silicon photonics in 2023, each prototype chip cycle took about 1 year (3 months tapeout, 4-6 months fabout, 3 months test). I emphasize prototype, because this is not the cycle time for product, which would require many more months of comprehensive chip-level testing, integration with an OSAT (outsourced semiconductor assembly and test) and contract manufacturer, integration with digital chips and firmware, system-level testing, regulatory signoff, and any respins that would have to be done. For just an experiment, it could take a year before you see the results. In addition, silicon photonics in 2023 was not at the level of electronic IC design, in which material stackups are standardized, proven packaging flows and wafer fab exist under one roof, IP blocks can be purchased from multiple vendors (and guaranteed to work!), and standard verification tools are readily available. The expectation in silicon photonics is that you will need multiple tapeouts until you yield a working design. What all this means is that designers are naturally extremely risk-averse and conservative in their approach. A paranoia pervades silicon photonic chip design, and for good reason — no one wants to cost their team a year of progress and millions of dollars of funding. Progress takes place on the time scale of years.

Of course, the lack of an established flow also means there’s enormous space for opportunity and invention. The foundries that we worked with and the leaders in silicon photonics were moving heaven and earth to close the gap between silicon photonics and electronic ICs. I have enormous respect for the people who work in this industry, because I know the incredible attention to detail, persistence, and willingness to grind that it takes to make progress. It simply wasn’t going to happen on a time scale that I was willing to accept.

The limited market of silicon photonics. Silicon photonics has been around for decades, but despite all the hype and many startup companies, there’s only one industry silicon photonics has managed to penetrate: optical interconnects (moving data optically between servers and now, hopefully, chips). The lack of market, in my opinion, is mostly because of the high combined cost of silicon photonic wafers (the volumes just aren’t large enough), optical packaging (a high-precision, low-throughput process), and laser power (lasers are inefficient and loss of light in optical packaging and chips exacerbate this). It’s a chicken-and-egg problem. Cost will only drop when volumes increase, and volumes only increase when the costs are low enough to create a market. At the end of the day, all optical companies have to ask the question: what does my product do for my customer that existing (photonic and non-photonic) products cannot? And they have to be honest and incorporate the true cost of lasers and packaging. On top of that, if you’re going to ask your customer to redesign their datacenter or retrain their models, you’re fooling yourself. I tried to think of any other market that silicon photonics can enter, and I cannot come up with any that don’t require some order of magnitude yield and cost improvement.

For what it’s worth, I hope I’m horribly wrong here and just not bright enough to conjure the killer application. Silicon photonics, and integrated photonics in general, embody such cool physics and unique properties that I pray that it’s just a matter of time before the next big market is discovered. But right now, tech is going through a revolution akin to the invention of the internet, and burrowing my head for the sake of comfort, familiarity, and sunk cost just doesn’t make sense.

In the next post, I’m going to switch gears and talk about some of the non-technical things that I found helpful, as well as some things I wished I had done first, before diving into the technical content.

--

--