I wanted to finally really learn OCaml and couldn’t think of a project to start. So, I’ve begun implementing the code in Programming Collective Intelligence, originally in Python, in OCaml. I’ll save my initial complaints about OCaml till later, since they may go away once I become more accustomed to it, but there are definitely things I miss from Erlang and Haskell.
Below is the first code for making recommendations, Chapter 2. Its very simple but will be built off of in my next posts.
let sim_distance critics critic_1 critic_2 =
let movies_1 = Hashtbl.find critics critic_1 in
let movies_2 = Hashtbl.find critics critic_2 in
let add_squares movie rank_1 sum =
if Hashtbl.mem movies_2 movie then
let rank_2 = Hashtbl.find movies_2 movie in
sum +. ((rank_1 -. rank_2) ** 2.)
else
sum
in (1. /. (1. +. Hashtbl.fold add_squares movies_1 0.)) ;;
let create_hash list =
let hash = Hashtbl.create 1 in
let rec create_hash_rec list = match list with
[] -> hash
| (key, value)::xs -> Hashtbl.add hash key value ; create_hash_rec xs
in create_hash_rec list;;
let test =
let critics =
create_hash [("Lisa Rose", (create_hash [("Lady in the Water", 2.5) ;
("Snakes on a Plane", 3.5)])) ;
("Gene Seymour", (create_hash [("Lady in the Water", 3.0)]))]
in sim_distance critics "Lisa Rose" "Gene Seymour" ;;
This code creates a hash with keys being the name of a movie critic and the value is another Hash containing names of movies and the score the critic gave the movie. The sim_distance function finds the similarity between two reviewers, when the test is run it return 0.8 for the similarity between Lisa and Gene. I’ll go into how similarities work later.
My main problem with the code is its not purely functional. I originally implemented it with lists but wanted the running time to be the same as the Python code, requiring a hash for O(1) lookups and, sadly, destructive data structures. Luckily, this gives me a reason to play around more with Purely Functional Data Structures. Meaning, I’ll probably get caught up implementing data structures for this very basic first part of Collective Intelligence…
As I said, I don’t know OCaml, so this is probably not the best code but it works. Suggestions would be great.
Well my OCaml is a little rusty, but the first thing I would change is all the Hashtable usage. This makes sense in Python/Perl/etc. but not in OCaml. Instead I would use algebraic types, like so:
type movie_ratings = Movie of string * float
type critics = Critic of string * movie_ratings list
Then constructing it would look something like this:
let critics = [
Critic("Lisa Rose", [ Movie("Lady in the Water", 2.5); Movie("Snakes on a Plane", 3.5) ]);
Critic(”Gene Seymour”, [ Movie("Lady in the Water", 3.0) ])
] in …
After that you just need to adjust the code to not search hashtables, but to decounstruct the types instead. Its a little more Ocaml-ish this way, and most of the Hashtbl functions can also be found in the List module as well (although they may require a little more work to implement).
Yeah, using algebraic types helps readability and use, but its really the same way I originally implemented it. I changed it to Hashtbls so the code would have the same growth at the Python code. Of course, this doesn’t matter for so many reasons, its only two critics, its never actually going to be used… But this way Python coders can’t use it as an attack :).
I plan to reimplement it with a purely functional data structure, but not lists. Still won’t be able to get O(1), like hashes, but closer than O(n).
This is just for learning and having fun, and maybe others will enjoy it too. So, no (I know someone is thinking it), its not a waste of time to over complicate everything.