Go back to the Homepage.
Dataset
and Dataloader
abstractions and experimented3.4.2: The Lasso
from The Elements of Statistical Learning and read about proximal gradient methods, this technique sits in the domain of convex optimization, which I have yet to take down systematically. I added some notes to an optional module that I may well persue on this subject.4.5: Seperating Hyperplanes
and 12.2: Support Vector Classifier
from The Elements of Statistical Learning with the aspiration of writing an SVM notebook. I was thwarted by some unfamiliar mathematics - which was frustrating but I didn't feel too bad about it as the section is associated with an Edvard Munch - The Scream icon in the book. I'll need to grok Lagrangian/Wolfe Duality before I return to it.Chapters 4 and 5 of an Introduction to Statistical Learning
Back to the books:
Back from holiday! I spent the day playing around with some applied project ideas that I hope to share once I've articulated them better. I intend to spent one day a week on the applied track from now on.
A half day, I finished the applied exercises from chapter 3 of an Introduction to Statistical Learning. Now I'm off to Bruges for for the rest of the week for my birthday, then I'm away next week in Greece for a holiday.
I continued the applied exercises from chapter 3 of an Introduction to Statistical Learning. Which lead me to read this page from the statsmodels docs about diagnostic regression plots. I also played around with patsy which I discovered via the statsmodels formula API and skimmed the scipy.stats documentation.
I started the applied exercises from chapter 3 of an Introduction to Statistical Learning which I am porting from R to Python. I reviewed some of the material from the chapter and reached out to wikipedia and The Elements of Statistical Learning for more detail. This article was useful in reproducing the diagnostic regression plots that come for free with R.
I completed reading chapter 3 from An Introduction to Statistical Learning and did the conceptual excercises. I reviewed some of the material from the chapter and reached out to wikipedia and The Elements of Statistical Learning for more detail.
I completed the Ch3 lectures and questions from Hastie and Tibshirani's Statistical Learning course. I started reading the associated chapter from An Introduction to Statistical Learning.
I completed the Ch1 and Ch2 lectures and questions from Hastie and Tibshirani's Statistical Learning course. I read the associated chapters from An Introduction to Statistical Learning and did the exercies.
I met with my study buddy to discuss our experience of the House Prices: Advanced Regression Techniques competition, work thorough some sticking points and discuss our next steps. We resolved to complete Hastie and Tibshirani's Statistical Learning course.
I completed modules 5.1 and 5.2 (cross-validation) from Hastie and Tibshirani's Statistical Learning course. I cross referenced with the corresponding sections of The Elements of Statistical Learning.
I completed modules 6.6, 6.7 and 6.8 (shrinkage methods, ridge and lasso regression, finding lambda) from Hastie and Tibshirani's Statistical Learning course. I cross referenced with the corresponding sections of The Elements of Statistical Learning.
I checked out the the scikit-learn cross validation, model evaluation and pipeline docs and used them in my House Prices: Advanced Regression Techniques notebook. Using lasso regression yielded a competition score in the middle of the table.
I worked on on the kaggle House Prices: Advanced Regression Techniques competition.
I read Chapter 10: Predicting Continuous Target Variables with Regression Analysis of Python Machine Learning
I met up with my study buddy, as we'd agreed last week.
We reviewed some data sets, we decided to spent one week working on Kaggle's House Prices: Advanced Regression Techniques competition.
We spent more time comparing notes from Mathematics for Machine Learning and helped eachother through a few sticking points.
We pontificated on various topics including: death, software and statistics.
I read Andrej Karpathy's deep reinforcement learning blog post. I made a mental note to spend a couple of weekends sometime getting an agent running in an OpenAI Gym environment.
I had a look online for a dataset that would be suitable for the week. I looked at:
I took a closer look at Jupyter notebooks, which I've been using a fair amount:
I signed up for and poked around on kaggle:
I played with some python libraries:
I met up with my study buddy.
I finished my work on the Statistical Learning Theory section of the Bloomberg Concept Check 1. I faired well on the subjects I've seen recently: like fitting linear and quadratic functions to data using the normal equation. I faired less well, though not catastrophically, on the probabilty material, which I've yet to review methodically as part of this sabattical - I suspect I may need to do this - one to discuss with my study buddy next time we meet.
I did a unit of work in the practice track - proving this, I watched a couple of short Pavel Grinfeld lectures on the null space which enhanced my intution.
I started working on the Statistical Learning Theory section of the Bloomberg Concept Check 1.
I did some monumental admin which included booking: dentist, hygenist, haircut and car MOT. I went through the links, miscellaneous notes and TODOs I had left over from the Mathematics for Machine Learning I finished yesterday and pruned/consolidated them. I modified the curriculum to include a "Practice track: A "little and often" track; small programming exercises and mathematical problems. Intended to keep maintain and enhance practical skills.". I created the track to try to retain and enhance what I've learned so far, whilst still making headway in the foundational track.
I ordered another book - Hands-On Machine Learning with Scikit-Learn and TensorFlow. Which is a referece text for the Bloomberg course.
I attended Lecture 2 from Bloomberg's Foundations of Machine Learning, which was a Case Study: We were asked to frame the problem of customer churn for a mobile network operator as a machine learning problem: predict when a user will churn. Again, the students made this an entertaining and informative session, some suggestions included: a probability distribution over the days in the future that a user may churn, a binary classification of churn/no-churn in some specified window in the future, and 'number of days' until churn prediction. The objective of this activity was to demonstrate that mapping the choice of outcome measure when approaching business problems is often non-trivial.
I attended Lecture 3 from Bloomberg's Foundations of Machine Learning, which was an introduction to Statistical Learning Theory, topics included:
A fair amount of this language was new to me, I took the opportunity to read the introduction to The Elements of Statistical Learning and the Statistical Learning Theory Wikipedia page before writing up my notes from today's lectures.
I'll take down the concept check and homework problems tomorrow.
I worked through the Principal Components Analysis proof on pages 392-3 of Murphy's MLPP. Feeling confident following this proof was a satisfying capstone to Mathematics for Machine Learning as it requied the application of much of the knowledge I've acquired over the past few weeks. The proof begins by constructing an expression for the projection error and shows that that it is minimized when the projection onto the subspace is orthonormal, before demonstrating that minimising the projection error is equivalent to maximising the variance of the projected data. This allows one to write an expression for the variance of the projected data in terms of the covariance matrix of the high dimensional data which we can then maximize in a constrained optimization - making use of a Lagrange multiplier. This maximization yields an expression for the variance of the projected data that can be recognised as an eigen problem - we arrive at an expression identifying the vector in the direction of maximal variance as an eigenvector of the covariance matrix with the largest eigenvalue.
I wrote up my notes from the last module of Mathematics for Machine Learning: PCA which I finished on saturday morning and incorporated the proof from Murphy's MLPP.
Though I still have some loose ends to tie up with Mathematics for Machine Learning, I permitted myself to watch the first lecture from Bloomberg's Foundations of Machine Learning which is the next item in the curriculum. The lecture was a gentle introduction to machine learning, though we are assured the learning curve is due to increase steeply. The content was a survey of the basics and the material was familiar from Andrew Ng's Machine Learning Course: classification and regression, bias and overfitting, training/validation/test sets etc. The figures on polynomial curve fitting with a power series were familiar from Mathematics for Machine Learning and the introduction to Bishop's PRML. Most noteworthy were the good questions from the students in the lecture, I suspect some were software engineers as they shared my discomfort with the nature of deploying the non-deterministic artefacts of ML to production - the Q and A on this subject was interesting. The instructor mentioned the upcoming homeworks, I looked ahead and they look like they are pitched at the right level - I'm looking forward to getting to them and writing some code.
I completed week 4 of Mathematics for Machine Learning: PCA, topics included:
This unit was the most detailed so far. I didn't have time to write up my notes and I ended up finishing the programming exercise on Saturday morning. I'll write up my notes on monday morning and that'll conclude the Mathematics for Machine Learning series.
I completed week 3 of Mathematics for Machine Learning: PCA, topics included:
I completed week 2 of Mathematics for Machine Learning: PCA, topics included:
During the week's programming exercise I took a detour to review broadcasting with numpy and read the relevant chapter from the python data science handbook. I had an understanding of broadcasting that served me well in performing binary operations between scalars and arrays, and between pairs of arrays. My intuition was however not robust enough to generalise well to broadcasting pairs of matrices - which requires an intuition fit for three dimensions, after some playing around I grokked it.
I completed week 1 of Mathematics for Machine Learning: PCA which covered some elementary statistics material, topics included:
I started week 2 of Mathematics for Machine Learning: PCA, which began with a refresher of the dot product before moving on to the more general definition of an inner product.
I completed week 6 of Mathematics for Machine Learning: Multivariate Calculus, topics included:
This concluded Mathematics for Machine Learning: Multivariate Calculus. I did a quick review of the course.
I completed week five of Mathematics for Machine Learning: Multivariate Calculus. The week's focus was numerical optimisation, topics covered:
I completed the second half of week four of Mathematics for Machine Learning: Multivariate Calculus and wrote up my notes for the week, new topics in the second half of the week were:
I registered for a meetup next month that my study buddy discovered; from the description: "There is NO speaker at Journal Club. We split into small groups of 6 people and discuss the papers. For the first hour the groups are random to make sure everyone is on the same page. Afterwards we split into blog/paper/code groups to go deeper". Some swatting up required to avoid blushes here.
I met up with my study buddy. We discussed how we were getting on with the curriculum, we are both having too much of a good time in the foundational track and have been neglecting the applied and interview training tracks. Fair enough, the interview training track is pretty dull and we agreed the applied track can wait until the Mathematics for Machine learning unit is wrapped up - which it should be within the next couple of weeks.
I completed the first half of week four of Mathematics for Machine Learning: Multivariate Calculus, topics covered:
I worked through weeks two and three of Mathematics for Machine Learning: Multivariate Calculus, topics covered:
I reviewed and consolidated my notes from Mathematics for Machine Learning: Linear Algebra before moving onto the next course in the specialisation. I completed week one of Mathematics for Machine Learning: Multivariate Calculus, which was a univariate differential calculus review covering:
I completed the assignments for week 5 of the Mathematics for Machine Learning: Linear Algebra, which included a quiz on diagonalization and an implementation of power iteration. This concluded the course, I had a flick back through the course. I will do a review of the material on Monday before moving onto the next course in the series.
I completed the assignments for week 4 and worked through week 5 of the Mathematics for Machine Learning: Linear Algebra. Week 5's topic is eigenvectors/values.
I completed the timetabling excercise on InterviewCake.
I worked through week 4 of Mathematics for Machine Learning: Linear Algebra, which continued yesterday’s linear algebra review. Topics included:
I started my systematic transit through InterviewCake, I did the readings in the first two sections: “Algorithmic Thinking” and “Array and string manipulation“ which was a back to basics CS101 style intro to the rest of the material on the site.
I added some more thoughts to the applied track doc
I completed the first three weeks of Mathematics for machine learning: Linear Algebra. This was a nice back to basics linear algebra review, which I didn't mind as it felt good to stake out some ground - topics included:
I signed up for and had a poke around on interviewcake to get a feel for it. I'll start a more systematic transit through the material tomorrow.
I created a document to track project ideas for the applied track.
I met up with my study buddy, we compared notes and constructed our curriculum.
I reviewed:
Which concluded my review of my notes from Andrew Ng's ML course.
I reviewed the curricula from some masters courses and made notes here
Review of Andrew Ng's Machine Learning topics:
In the afternoon I continued working through chapter 3 of "Python Machine Learning".
Review of Andrew Ng's Machine Learning topics:
In the afternoon I started working through chapter 3 of "Python Machine Learning".
I took a look at the UCL machine learning masters syllabus and made notes here
Revisited my notes from Andrew Ng's course and cross referenced some topics with some more advanced resources. I found the book I was a little afraid of - Bishop's PRML is challenging but well written and accessible.
I reviewed:
Morning working on my curriculum. Activities have included:
I hope to get a first draft out today and solicit some feedback.
I met up with a colleague in the afternoon. He's interested in pursuing a machine learning sabbatical of his own which is fantastic news. After talking for a few hours I'm convinced that I should take a slightly more measured approach to planning my curriculum. I'm going to take a step back and explore the space of possible curricula a little further before seeking wider feedback. I plan to systematicaly review curricula offered from masters programmes and look at the requirements on ML job listings. I'll also continue to thumb through more resources and feel out what looks promising. I'm going to meet up with said colleague early next week, we intend compare notes on what to include in what we plan on learning, we've agreed it makes sense to share a set of 'core modules' but to have the freedom to also pursue 'optional modules' so that we're not shackled to eachother and can still pursue interest.
I'm going to spend the rest of this week engaging in the following activities:
A day off at Hampstead Heath. Felt out the Talking Machines Podcast, listening to the first 3 episodes.
In the first episode we were privy to a chat with: Yan LeCun, Yoshua Benugo and Geoff Hinton. I'm aware of LeCun having downloaded the MNIST dataset from his website a while ago to do clj_mnist. I was also aware of Benugo as a co-author of the deep learning bible and I've seen Hinton before in an interview with Andrew Ng in a coursera course - these guys are the Deep Learning Mafia. We also met Kevin Murphy who's a head honcho at google and the author of Machine Learning a Probabalistic Perspective which is fighting it out with Pattern Recognition and Machine Learning to be the canonical advanced level machine learining reference - I'll hope to graduate onto these books in the not too distant future (I bought PRML last week for the odd flick through, to gauge how deep the pool is).
In the second episode we met Ilya Sutskever. He's an deep learning fan boy working at google. Amongst other things, he said he felt it was not well understood why it should be that gradient decent empirically appears to be an appropriate algorithm for finding good paramaters for deep NNs given the high-dimensional non-convex surface of the function they are tasked with optimizing. He linked this to the AI winter, saying that in the 80s/90s people had failed to train deep neural networks for other reasons (badly initialized weights in paticular) and incorrectly concluded that the optimisation problem posed by deep NNs was intractable. In the third episode the host dug out a relevant paper from Yoshua Bengio’s Lab entitled: "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization", the paper contains empirical and theoretical evidence that saddle points are more frequently the cause of slow training in large NNs than local minima, the paper also proposes some approaches for tackling this problem.
Summary
I spun up this website and began feeling out some resources that I may decide to include in the curriculum.