We trained a language model on LinkedIn posts...

Why, you might ask? Well, because it was fun. For a while I noticed many LinkedIn posts that seemed almost like a cliche. People trying to convey "deep" messages, give life advice (they are often not qualified to give) and glorify themselves in ways you rarely see on other social networks.

Being experts on large language models here at Neuralfinity, we thought it would be a fun challenge to train a small one on LinkedIn posts. The aim was to create a generative language model that was somewhat similar to GPT3, feed it some prompts and then have it generate a few posts ourselves, with some funny results.

A collection of generated LinkedIn posts:

These posts have been generated by our "LinkedIn Posts" model, created by feeding it with a short text snippet of a few words, often referred to as a "prompt".

Prompt: Today I quit my job

Today I quit my job to understand the life of a “famous person” I threw out all of the things that have been disrupted by the world we live in. I was impressed by that. Common people, read the link in the comments below and be sure to share more in the comments section.

Prompt: Today I decided

Today I decided to build a unicorn company in the midst of a positive, losing conversation. Surprisingly, substance abuse is important. Who could have predicted? I demonstrated the evolution of their lives better - in many ways, all employees are complaining that they're not included. I hope there are some people who have not commented today, but I have a few linkedin newsletters on how to use your natural work style to work simply AND live fully.

Prompt: I want to share this insight with you

I want to share this insight with you:
“We are all in the business of being able schedule time with each other. More than that, get it done with your intended behavior. Being clear on what you want. Part of the questions is to ask yourself about the four-letter word leaders love to use. #planning #jobsearchtips #protests2020

Prompt: Success

"Success to me was just to be independent, and I don’t mean financially independent! I actually didn’t think I would have met my mentors. Why? Because my #research is the new norm, I have made new friends throughout the days, I learnt all of the skills that made leaders. They provide technologies that can come to their definition of influence.

Methodology

We trained a GPT-like transformer model on almost 100.000 lines of LinkedIn posts. Sadly we only had a limited dataset, that didn't contain many of the very long posts we were hoping to create, so we focused instead on shorter posts, that turned out to be just as funny and cliche. After training the model for 6.5h on an Nvidia RTX 3090 (on a desktop system, this is for fun, after all), we then fed the model with different prompts from our LinkedIn timelines to see what happens.

What we learned

LinkedIn data is hard to find

There are only a few public datasets and scraping it is against LinkedIn's terms and conditions. We utilised public datasets from LinkedIn influencers, cleaned the dataset and trained with that, as it was the only option without letting this "weekend" project become a month long one.

Bias can be fun...

...But only if you intentionally use it. A lot of our development work for the commercial models went into ensuring that unwanted bias is avoided and our models don't have any "dark" side effects like we have seen from some AI models trained for research in the past.

In this case, we wanted to achieve the opposite and have as much "LinkedIn self-endulging-motivational-post" bias as possible.

AI becomes more and more accessible

While large models, like the one underpinning our Magic-Summary API still take significant amounts of data and compute to create, smaller models can now viably be trained on affordable desktop computers, thanks to Nvidia's current generation of high-end desktop cards. As we noticed, they also excellently heat a room on a cold and rainy day in northern Germany.

And while the large models are still the domain of specialised companies, at least you can use ours with just a few lines of code 😉.