Week Eight: The Strippers Guide to Bad Book Data

Might have to rename this blog, as quite a few of these thoughts aren’t coming out on a Friday. Monday lunchtime thoughts?

Was out quite a lot last week, doing some family things, so didn’t manage to launch FutureTales on ProductHunt, but I am relaxed about this 😬.

One of the days out was at BrightonRuby - the UK’s only rails/ruby conference, which happens to be a 10 minute bus ride from my front door. Was my first ever tech conference, despite having been in the industry nearly 4 years now (something to do with some virus called Covid-19?) Anyway, it was a lovely day, and I plan to keep coming back as long as they keep doing it.

Want your bad data

One of the things blocking a launch last week was the realisation that I had quite a lot of bad data in the FutureTales database. One book that I kept seeing suggested for kids was “The Strippers Guide to looking great naked” (link for the curious…), which probably isn’t what most parents are looking for.

This is because ChatGPT fairly frequently makes up ISBN numbers (very naughty), and I wasn’t handling these cases at all well. In order to clear the bad data out of the DB I had to create an admin backend. This is actually super easy in rails using the active_admin gem, but took a bit of time. Then, once I’d built it, I had to manually check every book in the DB (about 1000 titles) to see that the title and the cover matched up. This was not a fun time.

Thankfully, I’ve now got a much better system which stops this bad data ever getting into the DB.

Behind the scenes Behind the scenes

User researching with beers

Had a really nice chat with an old friend (who lives far away) on Monday evening (thanks Mike!). He shared his screen and he narrated his experience of using FutureTales for the first time. Mike is both a parent and also works in web design, so he had some very useful thoughts.

I am going to make these a weekly habit, as they are both a great way of connecting with people who I haven’t seen in a while, and provide a lot of user research gold.

Of course, they only really work if the person you are talking is a potential user of the product. As my current user base is parents, so there are potential users everywhere. My next product is going to be a super-niche B2B thing, so let’s enjoy my widely relevant consumer products while they last.

As a result of the chat on Monday, I rejigged the onboarding process a fair bit. It’s now looking fairly slick I think.

Ready to launch on ProductHunt tomorrow? Yes, I think so!

Slick AF Slick AF

We’re going to need a bigger bookshelf

A big problem that I continue to have is that I need more books in the FutureTales database. Currently there are about 1,000 books. If you think that each book is probably only suitable for an age range of around 2 years (eg 2-4 or 10-12), that means that for each child there are maybe only 100 relevant books in the database. As I have over 100 “interests” in the database, for each combination of age and interest there may only be one relevant book. This is pretty lame. I probably need to have at least 10,000 books for it to feel like a cool way of discovering new titles.

My current approach is just to ask GPT-4 for book recommendations that match each combination of age and “interest” (eg “Egyptians” or “Climate change”). This is expensive (because GPT-4 is expensive) and will leave out a lot of potentially relevant books. My plan for fixing this is to make more use of GPT-3 (which is faster and cheaper) and the GoogleBooks API. The GoogleBooks API allows you to search for books by “Category”. So my plan over the next week is to setup tasks to pull out a whole bunch of kids books out of the GoogleBooks API (using this category search approach) and then get GPT-3 to “tag” each of these books with relevant interests and age range based on the description of the book. This task of tagging text is much simpler than recommending suitable books, so I think GPT-3 will be up to the job.

See you next week!

Please don't sign up to my newsletter