Dataclysm: Book Notes

Dataclysm is a book by Christian Ruder, the founder of OkCupid, a dating site. The book is about the site’s data can tell us about relationships and human behaviour. If you are interested in learning more about dating habits in society and more broadly, what people say they like versus what they actually like, I would recommend you read this book.

You can read all my book notes on my blog.

  • This book was also particularly relevant for me because my startup, Atila is a very data driven site and I was interested to see some of the ways that we can apply his data crunching + story telling technique to our company as well
  • We are currently running a massive series of a/b tests on our site, partially inspired by this book. Subscribe to my blog to get notified when the post is ready.
  • I believe there are two types of non-fictions: data driven and narrative driven (see Shiller’s Narrative Economics for more on this).
  • Some books the authors starts with a narrative and then ocasionally sprinkle datas in to emphasize their story
  • Other books, the author lets the data drive the story
  • This is a spectrum as both categories use a bit of each other, but narrative is primarily based on apriori logic and the author’s obersvations while data driven is primarily driven by empirical data
  • Neither is necessarily better or wrse, and almost all of my favorite books are narrative driven (Antifragile, How to Fail at Everything and Still Win Big etc.). If you want raw data, read a census report
  • But recently I’ve grown an increasing fondness for books that just takes the data and says: “I won’t try to convince you of anything, here is the data, you form your own opinions”

Don’t Listen to What they Say, Watch What They Do

  • One of my favorite meta takeaways from this book is that directly observing people’s behaviours is much more effective than asking them due to the Hawthorne effect
  • Essentially, if you ask someone on the phone, “Would you date someone from a different race than you?” some people may lie and say yes because it’s the socially desirable thing to say, even though they would only date people of the same race.
  • However, if you just observed their behaviour on a dating site and tracked how often they messaged people of opposing races, that would be a lot more instructive
  • This can be applied in a wide range of things: e.g. it’s currently election season in the US, the polls were notoriously wrong for Trump’s election and Brexit. One reason for this is that voting for Trump and Brexit are against social norms so some people were afraid to say they supported Trump or Brexit when asked. But privately they knew they were going to vote for Trump or Brexit. If people had access to Social media analytics (e.g. what % of people in parts of US or UK are Liking anti-immigration posts on FB), I wonder if that would have been predicitive of Trump or Brexit
  • I predict that for the 2020 election, search and social media data analytics will be more predictive of who wins primaries and general election than traditional polls
  • I think that Mining the Social Web and Mining Social Media: Finding Stories in Internet Data seem like they are very timely in an election year. Mining Social Media actually came out just over a month ago. And just a year before the 2020 election, great timing IMO.
  • Update: There are various names for this the Bradley Effect, Shy Tories, Spiral of Silence
  • I was skimming another book at the library, Everybody Lies by Seth Stephens-Davidowitz and one of the Author’s observations was that the number of searches for “Obama Romney” vs “Romney Obama” in a given state (or “Clinton Trump” in the 2016 election), was predictive of who would win that state
  • (tk tell Christian Rudder that his book reminds me of Stephens-Davidowitz’s books)
  • Btw, I THINK that’s what the author said, I can’t remember
  • If you like this book, I think that book would also be really good, also because it’s not limited to relationships there may be an ability to get broader conclusions, though the author may have less access to primary data
  • A good book related to this phenomenon is Private Truths, Public Lies by Timur Kuran
  • Interesting idea about graph theory and how best predictor of succesful relationships are how interconnected the relationships are [88]
  • Imagine if you graphed your created 2 social graphs, one with your siblings and one with your partner
  • You and your siblings probably have a very embedded network of family members whereas

The Data of Love

  • very fascinating graph plotting a man’s age vs the age of the woman he finds most attractive

and vice versa for women

  • There’s things that I sort of always intuitively suspected but seeing it in raw data is always a bit surprising [42][43]
  • The importance of displaying the right data, page 41 and 42 are showing the same information, but page 42 displays it in a very intuitive way
  • 80/20 rule in NBA, similar to the idea I talked about in Soul of Basketball [22]
  • How many men message women on OkCupid, segmented by different age brackets [49]
  • “men and women experience beauty unequally” [128]
  • attractive people get more job interviews [129]
  • It’s also interesting to observe the shift of the OK Cupid blog. When they posted the first, blog post about dating and race, it was extremely candid and thought provoking, but potentially politically incorrect. I checked their new blog recently and it seems that have shifted to a very safe, politically correct tone. Which is understandable as they are trying to manage their PR, but it means a lot of the uncomfortable, but useful insights are no longer being shared.
  • He has a remarkable ability to display the data in a creative and intuitive way, for example the way he graphed the age of men vs. the age of women they find attractive was extremely thought provoking. Many people could have access to that data, but how many people would think to display it in such a clever, intuitive way.
  • Another very important thing I learned from this book is that it’s not enough to have data, the ability to show the data in a useful way is important and Rudder did an excellent job of that
  • At first I found the book really interesting because I am super fascinated by relationships but after a while I wanted to see what this could tell us about other parts of society
  • People stop using dating sites like OkCupid after they have found a partner so, there was not a lot of insights into once people are in a relationship, which is arguably just as interesting if not more
  • This is not a criticism of the book because:
  1. I didn’t finish it so maybe he got to that part after the point where I stopped reading
  2. He did mention some data on other parts of society, but he doesn’t have primary data access to those sources so the analysis wasn’t as deep as it was with the OKCupid data

I didn’t finish reading the book, as I got distracted by other books (I was reading What If We’re wrong). But I did enjoy the parts I read. I think Ruder’s access to primary data is an extremely strong competitive advantage and his ability to visually tell a story with data is impressive. If you are interested in learning more about dating habits in society and more broadly, what people say they like versus what they actually like, I would recommend checking out Christian Ruder’s Dataclysm.