A simple way to create a machine learning model which can generate text are markov chains.
I had to adjust my workflows at this point, because just loading the whole dataset into memory started causing performance issues… So filtering submissions while loading them from disk was the way to go.
Having all HN submission titles, it was pretty quick to filter out successful submissions (ones which received at least a few comments or upvotes), and provide them to a ready-made markov chain library - markovify.
Here’s what the juicy parts of the code look like:
import markovify # all_titles is just a list of strings text_model = markovify.NewlineText(all_titles) for i in range(5): print(text_model.make_sentence())
Awesome, right? It’s so easy!
For the sake of amusement, here is an output sample:
Google, Facebook legislation to legalize marijuana Explicit Trusted Proxy in Go Can we sell a tiny subset of Python code, won't make upgrade deadline Explaining to my well-being Twitter Is About to Change Someone's Mind
Some of those sound HNy, some like satire, others pretty far-fetched. That’s because a markov chain does not care if any of this makes sense! It only cares about probabilities.
Another way to generate output is to use
make_short_sentence. Here’s the output limited to 280 characters:
If you’re interested in one particular word, there’s a way to brute-force-filter some examples like this:
must_contain = "Bitcoin" desired_number = 5 results =  while len(results) < desired_number: s = text_model.make_sentence() if must_contain in s: results.append(s) for i in results: print(i)
It terminates eventually. The results in this case were a bit underwhelming:
Show HN: Days Away From Bitcoin. It's A Mirage How Microsoft made it an app that makes Bitcoin stronger Investors Bet Big on Bitcoin Bitcoin Core switching from the browser Bitcoin at an earlier decision by someone who wants it?
How about looking for sentences generated with “Ask HN:”?
Ask HN: What is the skillset required to sign executive order banning transactions with the iPod? Ask HN: Took a Pay Cut to Infrastructure-as-Declarative-Code? Ask HN: How to Survive Downturn Ask HN: Sys Admin Career Advice – WhoIsHiring – According to a mass casualty incident Ask HN: Do you get Android wrong in the Ruby gem
Utter nonsense. But great fun! You can read more about my tinkering with HN data over here.