Career Talk: Data Analytics at Facebook
- Articles, Blog

Career Talk: Data Analytics at Facebook

hi everybody
welcome thank you for coming we have a packed audience and this is fantastic
we really happy to host data analytics from Facebook today we have
Jake here Peterson who’s a data science and analytics engineering manager at
Facebook has been working in data science for more than 10 years longer
than data science has been a term he’ll have tons of good knowledge for us at
Facebook Jake has led data science for four different Facebook product teams
most recently for the graph search product and prior to Facebook Jake led
analytic functions at several tech startups in spent six years in the
direct marketing industry as an analyst consultant he holds a BS in computer
science and a BA in philosophy from Santa Clara University we’ll try not to hold
that against you here at Berkeley and before we’d start such to know the format today so Jake
will have his talk for around 30 minutes or so and then we’ll have some time for
Q&A we will be passing on the mic for Q&A it’s important that you use it it’s
not for amplification but so we can capture it on our recording and then
after that we’ll have a student panel talking about their internship
experience at Facebook and we’ll kind of just have an open conversation I think
at that point so without further ado I’m going to turn it over to Jake and we’ll get
started thanks cool thank you very much can everyone hear me okay yep coming
through loud and clear with full power and velocity all right
good so I I think Rebecca just gave me a fantastic introduction so I won’t like
dig into it too much a big part of what Facebook data science team does is
understanding how products work quantitatively how to drive those
products towards success and just like growing them there’s been this huge
growth focus at Facebook in general which is very quantitatively driven like
what are users doing what’s like the first order derivative on user growth
and how do we like play in that space to make sure that the site is growing and
useful for people and in general what we really expect out of data scientists is
impact all right so you can do any analysis that you want
right you can present any number you can throw it up on a dashboard you can make
the most amazing visualization or the most comprehensive report but if nothing
happens because of that work like if we don’t change a product or we don’t
change someone’s mind we don’t change a process it’s useless we might have we
actually would have saved money by you not doing all this wonderful insightful
learning right so that’s what we expect at Facebook that’s like it’s how we
manage our data scientists how we like goal everyone what we like aim for and
coach for and teach to every like everything that we do has to be aimed at
moving the product forward so that’s like kind of like what your
responsibility is in like the day of a life of a data scientist right and
that’s pretty much how we move product forward now this came about because a
few years ago we started noticing that teams with like data scientists we’re
having a lot more success right like our growth team was being very successful at
driving user growth our newsfeed team was starting to rank newsfeed better the
product was improving dramatically and then and we were having lots of product
launches I won’t name those products that were not being quantitatively
driven and didn’t have access to really powerful and rigorous statistical
insight of how users were interacting with a product and so a few years ago
Javier Olivan who’s the vice president of the growth function of
Facebook said hey like we need to rethink how we even make products right
like like if you think of Facebook as this giant army right at the ground leaf
level like at the army there’s like two squad there’s actually the lowest piece
of the Army’s a squads anyone here like in the military you want to agree or
disagree with me that’s okay so like you know Army squad is like eight people one
like grenade launcher one rocket launcher one flamethrower five rifleman
one squad leader right at Facebook it’s one engineering manager five to ten
engineers one product manager one product designer
and a data scientist so like we’ve we’ve over the past three years we’ve tried to
like completely from the ground up rebuild like everything that every
product goes through has to have like a product hypothesis okay what are we
trying to prove in the world what’s the product hypothesis that we’re trying to
prove how do we prove it how do we do it quantitatively right and what are the
skills we need in order to do that and to build up that functional muscle so
that everything like has a reason for existence and is rigorous so we can do
that over the past couple years it’s been going pretty well we’ve got a long
way to go so that’s kind of why I’m here right to talk to all of you guys and those
problems in those products have this insane union of space in which you can
work right Facebook is so big we pretty much have like if you take a Venn
diagram of sociology and then computer science just take all of that we have
like all of those problems right like how do people communicate with each
other how do networks grow how do you build distributed hash tables so that
you can scan indexes efficiently what’s you know give me an efficient online
algorithm for computing distinct counts of users across a billion people in real
time right so it’s like like all of all of those problems that need quantitative
help are ones that we have and at least in terms of like where we kind of align
all those separate things we we do them across what Facebook calls very
descriptively the formula right now so the formula for Facebook’s
impact out on the world is the sum of these four product groups right so it’s
engagement how useful are our products in your daily life by how much time are
you spending with them how much are you interacting with them growth how many
people on Facebook right you times time spent by a billion you know 20 minutes
times a billion people you’re talking about this incredible like interaction
and power that the company has and just plain-old what you’d consider impact
across the world right utility how am i extracting knowledge
from one billion voices all talking all at the same
time right that’s actually like one of the big challenges for search right like
I want to know about Malaysian flight airline 211 right the head of the pilots
on Malaysian airline 211 of Malaysian airline is on Facebook right and he’s
posting things about this why can’t I be informed in that conversation why can’t
I go see it right and see as it happens like how
the world is changing and unfolding that’s what this part of what
utility is that’s kind of this rough approximation for what kind of power can
I extract from all of this conversation engagement and discussion and core
business it costs money to serve a billion news feeds a day right it’s like
there’s a lot of machines there’s a lot of like infrastructure so we have to pay
for that and we have to make more money than we lose on keeping things alive so
that’s like the core business like ads as optimization right of which there’s
like a ton of very interesting price optimization auction studying all those
kinds of those kinds of problems in that in that fundamental area so that’s kind
of like so where we’ve gotten right now so far it’s okay
data scientist impact we expect you to use data to change these four product
areas right because effectively you know the market this is true of any
market right the markets priced in staying where you’re at no matter what
like if you don’t do anything that’s where the market is right so you have to
get better no matter what and that’s what we expect of our data scientists
with our products so that’s like that’s basic impact right across these four
areas which are you know growing users growing the engagement level of our
products growing power of them or going to business so like in general how do
you do that like what’s what’s like it how does the data scientist fit in this
team of five engineers and one product manager one product designer and it’s pretty
close to what you learn in college right it’s kind of like a mix of all these
different things right it’s using statistics to understand causality right
like there’s nothing more influential to a product manager or someone who’s
trying to launch something else is saying hey when you
broke the news feed these features right you lost 20% of all likes across the
world right and we know that because we ran an experiment and it’s causal and it
proved it right it’s hard to refute that or ignore it or just like throw it off
into the wind so it’s like super powerful right so statistics to find
causality or statistics to describe what is going on in a particular product
right like these are pretty common like tasks like design an experiment analyze
the experiment be rigorous about it describe trends of like an aggregate set
of measurements of what’s going on in your product right like you can imagine
for news feed it’d be very important for all the executives in charge of Facebook
to see ok how many people are seeing stories on this week how many stories
what’s going on are they getting more they getting less how are we growing a
system how are we shrinking it and what’s causing that looks like the
underlying human behavior that’s kind of behind the veil that we think is is
resulting in these changes and you get and you have to be scrappy about how you
find those things that’s like part of Facebook is we have this really hacker
self-serve culture right so one thing you may do in a day is hey I found out
that we’re not hypothetically tracking when people change their profile photo
or we’re not paying attention to it right so that means I go to the code
base I instrument that interaction I write some code to measure it and then I
get that data back and present it in a presentation right now you’re talking so
it’s database programming in the middle level right it’s PHP to write the
measurement it’s some Excel some R some PowerPoint and then some sentences
and paragraphs to communicate and get people to change their mind right so
those are like kind of like the typical technologies that we use sentences and
paragraphs are often very overlooked as the as an important tool the data
scientist and a big chunk of it as well is just like
being proactive of finding like what’s the most important thing and posing
questions about projects to tackle we expect like pretty much everyone at
Facebook to be self driven right we don’t we’re not babysitters and we don’t
want babies so we want people to like these the problems I want to work on
because I think they’re important and they’re impactful right I think it’s
going to is the biggest thing that will change take the product forward right
and so here’s kind of like a gist of what are pretty typical for a data
science to take on right what the heck is going to happen to our web
advertising business when everyone is only using smart phones right here is
our the the big question for Facebook in 2010 right I gotta remember our IPO we
just got like totally busted when they’re like you’re not making any money
on mobile it’s like so this was this was like a question that a lot of data
scientists were focused on like what’s going to happen to a product when people
like switch to mobile and how fast is it going to happen right like is this
something where we should start finding advertisers right now to go create
mobile ads and create mobile ad products or can we wait two years or three years
or four years or five years right that in market timing is pretty important in
predicting that transition is also super important right difficult as well
what should we build next is there any behaviors going on right now that we can
turn a product into one of the ones that are probably eponymous an example here
is I think I misused that whatever the photos product on Facebook didn’t exist
until someone took a look and said hey everyone is posting links to photos on
other sites on their Facebook profile like a lot like really a lot
we should go make it so you can upload photos to Facebook right pretty classic
one right and then other forecasting feature usage and attempts to
disambiguate trends or find causality and products to fix them right so like
deep dive on launching feature often like this is something that Facebook
data scientists are looking at all the time what is the change in control and
test for a metric that we care about on a specific product right and then being
really rigorous about the results right if this product this feature change was
you know a whole new UI to create posts and it’s showing no effect this is
showing no effect right should we build it should we keep working on it like how
are we going to make that decision in general like this is how you work on
your team you’ll sit with the product designer be like hey look this new thing
you did doesn’t do anything that you think is so awesome it’s worth
squat right should we keep doing it what do you think right is is the new design
that much better is the new direction you’re going going to be that much more
fruitful where we think we can actually change the world right because part of
like the responsibility of keeping the service alive for a billion people is
that we’re going to change the world right and and we have to make sure that
it works well and that we’re making the right decisions similarly another one we
often tackle is can we infer behavior from some proxy measurements one we did
a long time ago is we weren’t able to measure it’s not pictured here but we
weren’t able to measure if game developers we’re making games in a new
medium called Unity are any of you guys familiar with the Unity game development
environment okay so they have a web player and you can’t actually measure
it’s very difficult to measure if games are in the Unity Web Player and but is
it you are able to measure if people’s browsers have Unity installed right
so backing into like how many of your people who are like definitely playing
this game right if you play in a game once you might have got there by mistake
if you played it twice you know shame on you if you’re playing if you played it
three times yeah you’re playing that game right there’s like there’s no
question about it right so people who are playing a game three times and have
Unity installed probably playing a Unity game
and similarly feature creation a newsfeed kind of happens in a similar
way about a year ago we were seeing just tons of meme photos and meme content and
social video players that we didn’t think was necessarily a great addition
to newsfeed so we are asking ourselves hey is there a way we can identify
photos from content producers that are of low quality right and the first
hypothesis we had in this was ok let’s take a look at dislike rate right so
if I like something and I immediately unlike it right what is the kind of
content when that ratio is very very high like my dislike ratio that the
people who of the people who are liking it like the vast majority of them are
quickly unliking it within like a few seconds what does that look like and you
can actually see like even with just like a simple measurement like that and
like a small distribution of the small numbers of people with the number of
pages and people producing content we see like some very like cruddy content
you know like this like black hole photo you know like it and then you’ll see
something amazing which is pure click baiting right it’s getting plus me on
the order of 50 percent unliking right or pure marketing photos from from pages
that are not like actually promoting something useful you know realistically
if you want to go look at these kind of meme photos there should be just another
place for you to go to them but they shouldn’t make it into your newsfeed and
so this is like kind of the things that we try and tackle right we see a problem
we have a hypothesis about it ok can we try and find a proxy measurement for it
and then how effective is that proxy measurement and then and then find
something to do with it so in this one we eventually started adding features
about unliking on the photos and to pages that are producing photos and page
stories and then started ranking them in the model in the newsfeed model and
they’ve been decreasing in distribution over time which is increasing newsfeed
quality so that’s like the gist that’s a typical day that’s a typical project typical
set of of expectations why do an internship at
Facebook one it’s rad it’s like freaking rad we’ve talked to I think when we have
the other in terms of here they’ll agree otherwise I don’t think we would have
brought them and the problems you get to work on are pretty incredible so I think
the thing that motivates the people who work at Facebook right now isn’t
necessarily future riches right even though it’s you know you do okay it’s
definitely the data right so the data that we have in the interaction that
people have is incredible right just once you sit back and like start to
frame problems right you’ll realize that there’s just this incredible low-hanging
fruit and opportunity to go dig in and learn new and amazing things it’s
the scale right like doing projects for a billion people can be pretty amazing
when you launch them when I started at Facebook we I was working on platform
and in particular we were looking at groups and bookmark usage and we started
to see these bookmarks that were people like bookmarks live on the left side of
the home page next to newsfeed and take you to groups or events or apps that you
play and we started to see these groups like get this insane traction like
millions of people and that most groups don’t have millions of people in them
and and they’re all in Arabic they’re like this feels like some kind of spam
attack like someone is taking down the site and so we started digging into it
and translating the the groups and what was going on in them now names like
green something or other and all that kind of stuff we started like digging in
and looking at the news and stuff and the groups were Arab Spring and so there
was two of us sitting back like looking at these group bookmarks exploding in
behavior and activity and we were actually we was like watching the world
change we were thinking about like shutting these things down because they were
spam and it was it was literally the across the globe changes in government
incredible it was amazing you work on projects like that as an intern maybe
not every day but they’re very similar products the the people are definitely
the best and then the tools are cutting-edge seven or eight slides back
or so you saw there’s like a list of databases that we use and three of them
were all developed in-house like the full rd the full DBMS system was built
by facebook engineers and a large part of that is this scale of our data is so
big that most commercial solutions just puke they just die like we buy them and
they die and then we get rid of them because our engineers have already
worked on the piping and the guts on other systems to make sure that they
live so Presto and Scuba and Hive have all are all data warehousing type
solutions Scuba and Presto really aren’t but that were developed in-house to meet
our analytical needs so if you like want to go sit on the cutting edge and know
what that’s like then it happens here we invest a lot in your learning and skills
and self interns in particular don’t go through data camp and boot camp but
full-time hires do so boot camp is between two and six weeks
of full-on intense software engineering training we part of the move fast hacker
culture is you show up to Facebook you fix Facebook your first week you get a
bug you go fix it you ship it that week live to a billion people right so
everyone there who’s working there has done that and we give you the training
to do which is pretty awesome right like you roll in your first day and you’re
like alright there’s Facebook go fix it go work on it right go change it data
scientists do the first two weeks of boot camp and then they do two weeks of data
camp which is about Presto Hive Scuba some of our R integrations and just
like the the code base that we use to manage data
and how to add things to it or subtract from it it’s been pretty intensive we’ve
been we’ve come a long way I think in the past two years with this where it’s
gotten really pretty darn good and so you’re most people are really able to
come in and hit the ground running and and learn a lot and then we’re starting
to get global so there’s there’s openings right now in London yeah we
just added openings in London in the past like what when did we open London yeah
like two weeks ago something like that yeah so if you want to go overseas you
can be one of the first data scientists over in London
Seattle Menlo Park New York internships pretty rad this is Scott he worked on
news feed in groups here’s his like you know propaganda
about why it ruled and we’ve done both undergrad and grad opportunities for
internships here’s another so there’s more propaganda at the postdoc level and
more locations where we can go so that’s it let’s get some questions – and how do we want to do questions do
we want like and actually are we gonna do intern roundtable first or questions
first or okay okay sweet yeah okay that’s fine we’re taking an applied
behavioral economics class and one of the questions we were asking is what you
do to roll out products to minimize backlash okay for us like how do we do
that specifically so I think it’s an art we haven’t mastered right products
roll out as going out to a small set of people that we think are either
representative of who the product is going is going to be exposed to later on
or most likely to use it early on or a random sample like one of those three we
monitor those small launch percentages for changes in user sentiment so we have
actually started to build that muscle around like tracking how people feel
about the Facebook experience and products specifically and then as soon
as we’re comfortable with that like subjectively objectively like
subjectively we think this is awesome and objectively it drives some goal
measurement or you know processes we need to happen then we start ramping it
up and pushing it out and it’s been working
okay I mean it’s not like we nail 100% of them all the time but most of them
have most of the product rollouts either are quietly used or lauded how’s that
was that fair okay any others? Thanks so my colleague
and I actually came over here from the Haas Business School and so I was
wondering if you talked a little bit about the role of the PM how do you have
stories of how like how does a great PM integrate with your data team and how
could a PM be close minded or what’s something PMs do wrong uh what do PMs
do wrong PM is a hard job right so a PM is trying to balance is this
feasible to do like technically feasible like can we build the infrastructure to
launch graph search right is the experience for the user good is it a
problem that we think human beings have and needs to be solved and how important
a problem is it to solve does our is our solution good looking right and feel
right and then is it going to help solve our business objectives for whatever the
product is right so tying all those five things together is insanely difficult I
think that for the most part PMs can struggle when they get too in the
details of everything for everyone’s job right that can be a little bit tough and
specifically when working with data scientists I think the best interactions
work when they’re like go find the right questions for me to ask about my product
right and then and also where is the biggest space to play right is if if I’m
like the photos PM for example where are most photos happening that we’re
not building products for something like that right those are usually the
interactions that work really best I think asking respectful and good
questions is like a pretty core skill for a PM
and just helping be clear about accountability sound good cool
yeah hi my name is Xavier and I’m at the I School and I took last semester
like many of us a data visualization class here and I’ve also spent a lot of
time doing you know everything from the extraction transformation loading and I
guess that my question is you know for data scientists at Facebook how much time
are you spending you know visualizing the data to come up with the good
questions to ask or do you are the data scientist you know passing off it to be
visualized somewhere else you know no it’s data scientists you need to do the
full thing right you need to be comfortable writing code onto the site
and then you need to be comfortable presenting your information and
reshaping it to present it right if you were to ask like split on time I
definitely think the data visualization part is probably in the low 20% or less
right of how much time you spend working on on that kind of stuff but you know it
can depend it really depends on the project right like someone who’s working
on helping to design the insights product like for platform or for ads
they’re going to spend a lot of time working on data visualization right that’s like the
core of the project right what’s the right way to look at your set of ad
campaigns but someone working on what is the right scaling coefficient for our
optimized CPM model for install tracking right they’re like that’s they’re not
spending a lot of time working on visualization
okay thanks hi I’m wondering how much kind of piggybacking on that question how
much machine learning our data scientists actually doing and is there a
distinction between sort of like a data analyst and level of work and kind of
like a data scientist level work and also like a machine learning researcher
level of work that is a good question and there is the difference between a
data analyst and a data scientist I’m not sure means much anymore a machine
learning I don’t know if we have titles for machine learning researchers or we
have people doing that job other than maybe in McCune’s team in New York
City machine learning oriented engineers are
fully spending their time writing code to support ranking systems right so
whether that’s news feed ads or any of the other ranked systems at Facebook of
which there are a lot they’re generally focused on a single dependent variable
right those like there’s no changing it it’s this number and your job of a team
of 10 engineers is to work on the ranking systems that drive that thing
which is a pretty intense like environment right I mean very
programming having a very like feature extraction work and working on systems
versus like doing analysis data scientists and data analysts do pretty
much much more analysis focused on answering strategic questions right
should we even rank this dependent variable right once we’ve decided on
dependent variable and it’s important and we need to work on it then we
usually staff a team of engineers like ok go now go build a single system to go
work on that thing so I have a question um that’s more like on the project
assignment because in the summer I worked for a consulting company as an intern
is more like there are sometimes is a your assignment is based on the
availability for those projects sometimes that there are highs that
you’re really busy sometimes there are lows that you don’t have projects to
work out yeah so how is that working in Facebook like oh there are times that
you will sit on the bench then no no there’s no bench time like all the the
products that we’re working on all of our like companies we consider peers
that are working in the same space of us are all
10 times as big it was so pretty much everyone has got a long list of projects
they want to do and probably should do but are never going to get to it’s
partially why the company is still growing 40 percent a year right
in terms of projects and how you find them
we’re actually we do our best for full time I’m not sure how we’re doing on
interns I don’t think we’ve switched that process yet but for full time data
scientists we actually once you you you don’t get a team before you join
facebook so you go to boot camp learn about engineering you learn about all
the different product teams and who runs them and what their goals are and then
you go to data camp and you learn some more and you take on some work from a
few different product teams and then you pick the team that you want to join like
one that you’ve worked with a little bit you go yeah I like these guys I’d
actually like to work with them and then you go join that team pretty much all
the teams have spots yeah yeah no sweat hi as far as the projects that you work
on I’m assuming there’s no standards on the time that you spend on a given
project kind of depends on the size of the project and the problems
you find along the way yep but on average just to give me a sense of range
do you spend a whole year in a single project you kind of bounced through a
couple two or three or what’s the average um it’s kind of a few weeks
usually weeks okay um it kind of depends right so Facebook a review cycles and
goaling happens every six months so if you have like a multi-year project and
heat kind of needs to be a really big big project right like kind of at the
Zuck level decision of this is the thing we need to do right but I mean other
than that usually your products always have low-hanging fruit and always have
things to improve on and new questions to ask and even you know like on search
like we’re still finding new things to figure out and new like as our projects
are evolving as they like we complete one but you know the output of one
project is often a more complex but more interesting question right that we
haven’t solved yet I had a follow-up question if anybody has another one but
as far as the team that you were assigned to kind of showed us the
structure is that representative for a single person are you the sole
representative of data science within a team which means you’re kind of a
one-man shop for yeah yeah usually yeah but it kind of depends because it
depends on how you split them up right so search for example my team has there
are nine of us now well it depends right but it and then
each member of the team has a different like subject area and expertise that
they’re developing right and you can imagine the same is true for ads as well
and newsfeed and all the other big products like one of the really big
products it’s not really possible for one person to own that whole thing so
they definitely own some up part of it right and then there are other products
where there’s only one person on the whole frickin thing right so just
depends it depends on what you want to do and where you fit and like when you
get in like okay do I want to be you know the man on the island or do I want
to be part of a bigger team on it but with them with much more laser focus
that’s actually well we’ll take more questions that’s a good point to switch
to the intern roundtable as well and then we’ll continue the discussion so
I’ll ask the two former interns from Facebook to come up and ISchool
students and in addition to that I want to also introduce Allie Kramer who’s here
as a recruiter from Facebook and she has two iPads I think floating around does somebody have one of those can you like raise your hand or the iPad or something
to know where it is one there one there okay great so if you
came in late and you want to sign in to get plugged into the recruiting efforts
at Facebook that would be great and we’ll also end the recording now so
thank you people are watching on video

About Ralph Robinson

Read All Posts By Ralph Robinson

2 thoughts on “Career Talk: Data Analytics at Facebook

Leave a Reply

Your email address will not be published. Required fields are marked *