So, I Met the Chief Data Architect at MySpace…

Originally published on MySpace on June 17, 2009

I’m currently in California (Bay Area) attending what I like to refer to as SQL Server bootcamp, because it’s actually two week-long classes crammed into a single week. The thought of learning that much in a week was enough to kick my geek gene into high gear.

Then I got the email from my instructor…
He asked if I’d be interested in attending a SQL Users Group meeting while I was in town, and oh by the way, the guest speaker would be Christa Stelzmuller, Chief Data Architect at MySpace.
Now the average woman swoons at the thought of meeting Johnny Depp or Brad Pitt. Me? Meeting Christa Stelzmuller would be like my daughter meeting Hannah Montana and the Jonas Brothers at the same time. Yes, I was flying my geek flag high.
So, after class was over yesterday, myself, one other student from the class, and the instructor headed over to Microsoft in Mountain View to hear Christa speak.
We arrived at Building 1, signed in and headed to the Neptune conference room, where pizza and soft drinks awaited us. We picked seats right in the middle, and watched the room fill up…with men. Out of about 50 people there were probably four women, including the speaker. If I had been looking for a date, the odds were good, but the goods were odd.
I must say I had a million questions in my head before I got there, but as soon as Christa started her presentation I was just in awe of what I was seeing.
Many of us have been on MySpace for years. I have been blogging on MySpace for almost four years. We all felt the growth pains when the user base went from 12 million, to 20 million, to 80 million and beyond.
MySpace is a behemoth problem child when it comes to the instant gratification of it’s 100+ million users. It is a mind boggling challenge to manage the constantly changing data.
Christa joined MySpace two and a half years ago when the site was growing so fast that the two (yes TWO) DBA’s couldn’t keep up and weren’t getting any sleep, and the rate of user errors was two million a day.
What it takes to keep MySpace online:
669 Servers
1512 Databases
15120 Disks
6 DBA’s
Each database holds one million users, grouped by user id. So, if you and your friends all signed up around the same time you’re probably on the same database.
MySpace features are split up all over the place. Video, Photos, photo comments, profile comments, blogs, etc. are all kept on separate drives in separate databases. Your profile is managed this way so that if one server is offline your profile stays up, minus whatever feature is sitting on that server.
After the presentation was over, and the Q&A was done, there was a raffle for five prizes: books, software and t-shirts. After all of the software was gone, and the books had been scooped up, Christa drew one more ticket, and it was mine. So, this Oracle geek is now the proud owner of a SQL Server t-shirt.
The meeting host thanked Christa, people applauded, and then most started heading for the door. But I didn’t leave. DId you think I would walk away without asking the question that’s on all of your minds? Hell no!
I went up and joined the throng of database geeks waiting to ask Christa just one more question. I had to meet this woman who took this disaster of an architecture and somehow made sense of it. And I had a question for her, “Christa, what happened to blog indexing!? I have 800 blogs and I can’t find anything.”
Her answer was this: the blogging feature has been ignored for a very long time. But, it’s an issue she has been championing, and she promised some changes were in the works, and things would be fixed soon. I said, “Define ‘soon,’ because every time Tom says ‘sit tight’ it means nothing gets fixed for a year.” She said there was a meeting happening in two weeks and things were going to change very soon. Sigh. Let’s hope so.
If you are at all interested in the database architecture behind MySpace, I highly recommend you take a look at the presentation at the link below. If you have any questions feel free to post a comment and I will try to answer to the best of my recollection.
The link to the presentation –> MySpace Data Architecture
And for the real database geeks, I present the following:
* MySpace runs on Microsoft SQL Server for the most part, and SQL Server Standard Edition at that. There are a few databases that are the Enterprise version. Why? The cost of licensing for the Enterprise version was prohibitive.
* They don’t use foreign keys at all. Why? Imagine that you want to delete a picture from your profile, and that picture has 100 comments, and those comments are from 100 different users scattered across 1000 databases. If the database had to stop and check for dependencies on delete, it would come to a crawl, and you’d be waiting a very long time to get your cursor back. So, you delete a picture, and some crawlers go through and cleanup the orphan data later.
* They don’t use SQL Server replication, because of data integrity issues. It can’t keep up with the traffic. They use their own homegrown solution of replication.
* They use a mix of SQL Server, open source products and homegrown solutions.

Caveat: I did not take notes during this presentation, so I am
writing from memory. The numbers listed, however, are straight
off the presentation.

Two Database goddesses who seem to have the same taste in clothing. That’s Christa on the right.

Comments

comments

Powered by Facebook Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top