Recap: Southern Data Science Conference 2019

Last weekend, I attended the 2019 Southern Data Science Conference (SDSC) in Atlanta. I learned a lot and met a lot of interesting people. Here are some of my takeaways.

sdsc2019

I have to start by saying that Khalifeh Al Jadda and the other conference organizers did an outstanding job with the conference. This was the first single-track conference I’ve ever attended, and I appreciated the lack of anxiety involved with not having to choose between competing interests and bouncing between sessions. There were around 500 attendees, and a great mix of practitioners, researchers, and students. The whole thing was well executed.
The speaker lineup was impressive. Maya Gupta from Google AI, Edo Liberty, and other heads of data science/AI from Microsoft, Netflix, Pandora, Uber, LinkedIn, and many others.
Diversity: I would estimate over half of the keynote/panel sessions were led by women, and there was a diverse racial representation among speakers and attendees. Awesome! I also appreciated that the conference program highlighted that there would be no alcohol at the social events, which is an issue I have seen repeatedly brought up on Twitter over the years.
Streaming data is the future, but there are still a lot of open questions. It’s easy to think about ETL in a streaming context: recieve n new records, process them, and append them to the production cache. But there is a lot of work to be done when it comes to updating machine learning models based on a few updated observations. For now, the nightly batch job still seems in order. I am eager to work in the world of the distributed log as it seems A/B testing of feature and model changes will be much more straightforward than the way this is often done now.
Based on the conversations I had, most companies are still solving machine learning problems in production using one of two basic architectures: (a) Some pairing of Spark and in-memory use of scikit-learn (on either AWS or GCP) for basic regression/classification/clustering-based models, or (b) a deep learning approach using Tensorflow on GCP for NLP or computer vision. Also lots of XGBoost.
Serverless: Most of the people I talked to are deploying all new work on Google Cloud Platform and are migrating their legacy pipelines to GCP as quickly as they are able.
The poster sessions were very good. Lots of interesting ideas coming out of academia that are relevant to production machine learning work. I learned a lot from these conversations and walked away with several practical ideas, way more than I expected.
One of the people sitting near me had a picture of my data science friend Tim Hopper open on their computer for a few minutes (I didn’t get a close enough look to see what it was). Fun to know famous people.
The swag was strong. I happen to be wearing some new socks today:

mailchimp-socks

All in all, it was a great several days. Will be there next year for sure.

You can read more about me, follow me on Twitter, subscribe to this blog by RSS or email, and find many more posts in the archives.