Lead OCRUG Book Club Session on Modeling in Spark R

1 minute read


This evening, I led the session on Spark R modeling in the book club (advanced-track) of Orange County R Users Group (OCRUG). The session covered contents of chapter 4 (modeling) and chapter 5 (pipeline) of the book Mastering Spark with R. All of my teaching slides and exercise materials can be found here.

I first gave a 20 min presentation going through the sparklyr Machine Learning Library (or MLlib), the main commands in exploratory data analyses (EDA), feature engineering, supervised learning, unsupervised learning, pipeline model, and model deployment through API. Then I switched to a 1h long exercise session guiding people to build a supervised learning pipeline model step by step.

I am really happy to be a member of the South California R Users family and to read tech books together with other R lovers. Especially during this pandemic, reading tech books together with other people makes me mentally feel supported and motivated to keep improving my data science skills. I have had online programming teaching experience through Coursera and Neuromatch Academy, but this is the first time I teach in a book club. I feel really content that other people learned from my presentations and the exercises I created.

Education has a magic. It does not only enrich the mind of who is receiving, but also enrich the mind of who is giving. There is an old Chinese saying, “Gain new knowledge by reviewing old” (wēn gù ér zhī xīn) which perfectly describes a teacher’s role. This is also supported by psychologist Prof. Henry Roediger’s finding that learning can be improved by reviewing through testing even without feedback. For me, personally, teaching is an adjustable weight which helps me balance stress coming from other aspects of life.

image of Orange County R Users Group