Last weekend, I attended the 2019 Southern Data Science Conference (SDSC) in Atlanta. I learned a lot and met a lot of interesting people. Here are some of my takeaways.
-
I have to start by saying that Khalifeh Al Jadda and the other conference organizers did an outstanding job with the conference. This was the first single-track conference I’ve ever attended, and I appreciated the lack of anxiety involved with not having to choose between competing interests and bouncing between sessions. There were around 500 attendees, and a great mix of practitioners, researchers, and students. The whole thing was well executed.
-
The speaker lineup was impressive. Maya Gupta from Google AI, Edo Liberty, and other heads of data science/AI from Microsoft, Netflix, Pandora, Uber, LinkedIn, and many others.
-
Diversity: I would estimate over half of the keynote/panel sessions were led by women, and there was a diverse racial representation among speakers and attendees. Awesome! I also appreciated that the conference program highlighted that there would be no alcohol at the social events, which is an issue I have seen repeatedly brought up on Twitter over the years.
-
Streaming data is the future, but there are still a lot of open questions. It’s easy to think about ETL in a streaming context: recieve
n
new records, process them, and append them to the production cache. But there is a lot of work to be done when it comes to updating machine learning models based on a few updated observations. For now, the nightly batch job still seems in order. I am eager to work in the world of the distributed log as it seems A/B testing of feature and model changes will be much more straightforward than the way this is often done now. -
Based on the conversations I had, most companies are still solving machine learning problems in production using one of two basic architectures: (a) Some pairing of Spark and in-memory use of scikit-learn (on either AWS or GCP) for basic regression/classification/clustering-based models, or (b) a deep learning approach using Tensorflow on GCP for NLP or computer vision. Also lots of XGBoost.
-
Serverless: Most of the people I talked to are deploying all new work on Google Cloud Platform and are migrating their legacy pipelines to GCP as quickly as they are able.
-
The poster sessions were very good. Lots of interesting ideas coming out of academia that are relevant to production machine learning work. I learned a lot from these conversations and walked away with several practical ideas, way more than I expected.
-
One of the people sitting near me had a picture of my data science friend Tim Hopper open on their computer for a few minutes (I didn’t get a close enough look to see what it was). Fun to know famous people.
-
The swag was strong. I happen to be wearing some new socks today:
All in all, it was a great several days. Will be there next year for sure.