Ask HN: Resources on how Google/Facebook etc. approach software design?

notacoward · on May 30, 2019

The design process at Facebook is, to put it charitably, a bit minimal. "Move fast" is taken to mean start writing code immediately, then iterate on that to approach the desired outcome. Developers are rewarded for landing code in production each review period, even if that code provides little benefit, will need to be rewritten, and might even be buggy. In the rush, careful design and testing (which might delay landing in production and result in a bad review) get pretty short shrift. Some might say that there's risk or waste either way, and that velocity rules all. I'm not going to say they're wrong, but it makes "software design at Facebook" a bit of an oxymoron.

a13n · on May 30, 2019

In my experience working at Facebook, not a single thing you said was true.

People plan and think before they code.

People are rewarded for impact, not for landing useless code.

Code is carefully designed and tested, or it doesn't land.

Impact rules all, not velocity.

notacoward · on May 30, 2019

> In my experience working at Facebook, not a single thing you said was true.

Well good for you. Either one of us has had an anomalous experience, or...

> People plan and think before they code.

"Planning and thinking" != design process. Of course people plan and think. I plan and think before I go to the grocery or hardware store. Of course it's only for a moment, and the typical process I've seen at FB is a lot closer to that than to the design processes I've seen elsewhere in nearly thirty years producing software.

> People are rewarded for impact, not for landing useless code.

Again, you're playing extremes against the middle to paint a picture more favorable than accurate. Very little code is totally useless. However, I've seen plenty of "impact" awarded for code that made a very tiny increment in functionality, far outweighed by the missing/misleading metrics or spurious alarms or outright bugs in something even the author knows will be replaced next half. Every day I encounter stuff that's broken as a result. This morning it was a distcc script that nuked a build, because someone's desperate to get their impact in before the end of the half. I can't count the number of times I've followed the trail back from an incident to a diff with "test in prod LOL" or some such for its test plan. So "carefully designed and tested" just isn't true in general. The common case is very far from that.

a13n · on May 30, 2019

Yeah. I worked in infra, so it could also be the difference between working on infra (where stability is valued) and working on product (where shipping fast is valued).

When it comes to software design though, the design of your infra is what matters most, since everything else is built on top of it.

notacoward · on May 30, 2019

I work in infra too (storage) so that's the majority of what I see. If all that PHP/JS stuff on the front end is worse I kind of don't even want to know about it. ;)

omarchowdhury · on May 30, 2019

What are some critical parts of Facebook written in PHP, if you know?

dymk · on May 30, 2019

The entire web fronted, and most site “business logic” is implemented in Hack (sort of PHP, statically typed)

julienreszka · on May 30, 2019

"Hack" correct name

talonx · on May 30, 2019

If these principles are followed throughout the engineering org, can you explain how the recent reports of password and other information being stored in plaintext happened? That is Security 101.

Genuinely curious.

jeremyjh · on May 30, 2019

I don't work at Facebook but my understanding is passwords were not stored in a database in plaintext, but there was verbose logging of web requests that were not sanitized. Sanitizing logs is definitely common sense and a basic security practice, but it can be difficult to be sure you are doing it adequately without good tooling to monitor it.

a13n · on May 30, 2019

No idea haven't worked at Facebook for over four years

Communitivity · on May 30, 2019

There are a number of blog posts by people such as Tim Bray, Steve Yegge, and others - as well as internal engineering blogs (at least there is one at Facebook and several at Google). I read these on a regular basis (Yegge's hasn't been updated in a while), but only to take away ideas and learn about new technologies.

Their approach is almost certainly not the right approach for you, because you are not Google ( https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb ). That said, if you are looking for a position at these places, then the blogs might help you, in a number of ways. If you want to work there though, focus on the sound engineering techniques applicable anywhere (maps, hashing, trees, graphs, algorithms) and build a reputation through Open Source contributions to the projects the companies are involved with.

* Tim Bray / Amazon / https://www.tbray.org/ongoing/

* Steve Yegge / Ex-Googler / https://steve-yegge.blogspot.com/ and https://medium.com/@steve.yegge

* Facebook Engineering Blog / https://code.fb.com/

* Google Developers Blog / https://developers.googleblog.com/

SwSwinger · on May 30, 2019

A group of early ex-Facebook engineers/directors have recently collaborated with Software Engineering Daily to produce a podcast series about the engineering philosophy behind the core product. I think these talks are a little more raw and direct than you would find from official corporate talking points. https://softwareengineeringdaily.com/category/all-episodes/

aashu_dwivedi · on May 30, 2019

I've recently moved all my podcast subcriptions to google podcast and now I am sad that I can't find this one there.

kkcorps · on May 30, 2019

yes, I definitely gained a lot of insights from these 5 podcasts into the whole development as well as management process inside facebook.

ex_amazon_fc · on May 30, 2019

Ex Amazon here. Amazon has some solid principles that often go against popular belief.

- No waterfall-ish process where design is handed down from architects to senior engineers to juniors. The same people do design, implementation, ops and so on.

- Measure everything and always. People are encouraged to define metrics and goals and create dashboards before writing code

- Simplify: decreasing complexity is taken more seriously than in other companies. Do not use a database when you can use a file, or a message passing library when you can use a socket, or 200 lines of code when you can for out to "grep | sort". This can be surprising to new hires.

srndh · on May 30, 2019

> message passing library when you can use a socket

kindly explain that logic

Too · on May 30, 2019

I'm also curious about this. Reinventing your own serialization and retransmission every time, instead of using proven technology does not sound like "decreasing" complexity. Same goes for using files and grep instead of databases in many cases.

gsempe · on May 30, 2019

It probably means that if you only need to ping a process from another one (no payload needed or only a handful of case expeceted) you don't have to deploy your last gRPC knowledge. cf. how daemons are managed on Linux

brian_spiering · on May 30, 2019

John Ousterhout's "A Philosophy of Software Design" is a good book on the subject. He has worked with Google.

lfx · on May 31, 2019

It's a very well written book! Short and sweet. Not that novel as expected, but a still very good read for a weekend.

munchbunny · on May 30, 2019

Are you looking for wisdom to apply to your day to day work? In that case, keep in mind that Facebook/Google practices are tailored for Facebook/Google problems, so their practices might not fit your situation.

LawnboyMax · on May 30, 2019

Really good overview: https://arxiv.org/abs/1702.01715

repolfx · on May 30, 2019

I used to be at Google, years ago. I don't know if any of this is still relevant.

The design process at Google started with a design document. There was a template that I think was available online a long time ago but (ironically) I can no longer find it. The template was relatively lightweight and had some headings like so: Introduction, Goals, Non-goals, Overview, Detailed design, Security, Privacy, Testing. Compared to other design doc templates I've seen it wasn't heavy on software engineering theory. Of course the bulk of the writing would be in the "detailed design" section and subsections.

Good design docs were very detailed. One I wrote ended up being, I think, about 40 pages by the end when printed out, and that was not an especially large or unusual document. Design docs for critical systems could be larger still, or more frequently, split into many other docs. The quality of writing was generally high and they were maintained in version control. There was a mailing list where design docs were posted for company-wide review, though by the time I was there, this process had degraded quite a bit and a lot of stuff was done in team-specific design docs in Google Docs, with relatively minimal or no peer review. I felt that it was common for the less "serious" teams to do this, e.g. teams working on the latest chat product or on Google Apps itself. Those docs tended to be shorter, only partly filled out, or non-existent. The closer to the metal, older-school stuff was all hand-written HTML.

Good design docs would be kept up to date as the design evolved, although I'd say that was the minority. Most designs were the work of a small number of people or just one person. There were not many design review meetings that I recall, though probably that varied a lot by team.

Diagrams were minimal, possibly because there weren't any good diagramming tools available internally (well, there was graphviz and TeX).

Data structures would often be designed up front as long as they were either protocol buffers (i.e. inter-server comms or long term data storage), or fundamental to what the system did as with BigTable, indexing, index serving, ad serving etc. Systems where the data structures weren't fundamental to the design didn't necessarily plan out every structure in advance of course, by no means. For many products the user interface or network protocols were more important, so that's where the design docs would dwell.

The most senior engineers were very familiar with the performance costs of things, e.g. the cost of an L2 cache miss vs a disk seek, and that deeply informed the design of many systems.

That's about it.

pm · on May 30, 2019

Data structures are the basis of what you start off with for any program. The key is to understand the domain you want to model. It's hardly unique to FAANG.

cpeterso · on May 30, 2019

Fred "Mythical Man-month" Brooks wrote:

"Show me your flowcharts and conceal your tables [i.e. data structures], and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious."

maxxxxx · on May 30, 2019

My theory is that they have more people who understand and respect software design in higher management levels compared to other companies. So it’s easier to do good work without constant pressure from the “business” people to take shortcuts.

talonx · on May 30, 2019

This is probably true for Google, but not Facebook, given how many times they have broken things, especially security related.

repolfx · on May 30, 2019

Facebook's culture is basically a younger brother of Google's, or at least used to be. There's a reason their tech stacks are so similar - FB went through a phase where it poached a lot of Googlers.

A tech firm is most easily defined as a company founded and run by a software or hardware engineer. The only exception I can think of is Apple, but Jobs was immersed in engineering culture from the time he was a teenager - he was sort of a sphinx in that regard. Even so, it's fair to say Apple under Jobs is maybe more of a high-end design house than a traditional tech firm: their in-house talent struggled with online services and anything involving complex computer science R&D, e.g. their AI / mapping efforts were weak sauce compared to Google.

julienreszka · on May 30, 2019

"their tech stacks are so similar" in what ways? not similar at all

repolfx · on May 30, 2019

Thrift RPC is basically a clone of Stubby. Lots of sharded MySQL (not so much at Google anymore but the whole time I was there, the ads db was a giant sharded MySQL DB). Small number of large monorepos. Etc.

fma · on May 30, 2019

I was thinking about this the other day, because I'm kicking off a new project. How do big companies start their code base. Do they have 1 person write a platform and everyone builds on top..and then iterate. Or several people contribute to the platform. The example in java being, someone needs to layout some skeleton package structure (a hello world controller, DAO)...?

My previous projects were me coming onto an existing project, or were small enough that I had 50% of the code done before the next person helped out :)

talonx · on May 30, 2019

All big companies start out as small companies, which means 1 or 2 or a few people writing code.

fma · on May 30, 2019

As a follow up I decided to implement a "Sprint 0".

dunkelheit · on May 30, 2019

Take any information (blog posts, articles) that comes from the company itself with a grain of salt because it is essentially marketing and so will downplay the downsides of their approach and paint a rosier picture of the process than what really happens there.

jision · on May 30, 2019

I am not sure how facebook and google or any of the big companies measure the design practices to be better what we do at pickmysolar is we try out few design patterns for poc and check it flexibilty then decide on the framework or the design

purpleidea · on May 30, 2019

I didn't know that they spent much time on good design :/

amelius · on May 30, 2019

And certainly not user-centric design ...

D3m0lish · on May 30, 2019

wow there are places like that for real?

i'd say it comes down the maturity of individual engineer or group how they'd approach a problem. some problems are worth spending time most aren't and every problem is not at global scale trust me.