The challenges of implementing AI in data systems

Listen to this episode:

About this episode:

In this episode of Behind the Data, host Matthew Stibbe speaks with Andy Coulson, a cloud architect at Epicor, about the emerging trends in data engineering, particularly the integration of AI and machine learning. They discuss the challenges of implementing AI in data systems, the transformation of Epicor's auto catalog, and the architectural decisions behind it. Andy shares insights on data management, the importance of understanding data volumes, and lessons learned from data integration projects, including the need for foresight in planning.

AI-generated transcript

Matthew Stibbe (00:02)
Hello and welcome to Behind the Data with CloverDX. I'm Matthew Stibbe, your host, and I'm here today with Andy Coulson, who is a cloud architect in the Auto Catalog team at Epicor. Great to have you on the show, Andy.

Andy Coulson (00:15)
Great to be here.

Matthew Stibbe (00:18)
Before we dive into your world in detail, I'd like to start with a really sort of practical or broad question. Are there emerging trends or technologies in data engineering that you're particularly excited about at the moment?

Andy Coulson (00:32)
Yeah, I think everyone is aware of the trend towards AI and machine learning as a kind of subset of that. AI is really machine learning and inference on steroids. And, you know, there's some very exciting and somewhat scary things about it even. But our company is actually very interested in it and actively trying to invest in incorporating it in different ways and different products, my team included. So I'm having to learn something new pretty quickly. But it's fun and challenging, which I like. That makes it fun.

Matthew Stibbe (01:25)
Can you give me an example of something that you've had to sort of dive into recently or learn and perhaps how you went about it? What's the best way of learning about this new, well, relatively new AI stuff?

Andy Coulson (01:25)
Yeah, well the, you know, the kind of most straightforward example is having a chatbot and chatbot interface to your product or data, right? And so implementing a chatbot is relatively easy, but getting the chatbot to sort of respond and interact the way you want to has proven to be actually more challenging and requires a kind of a different way of thinking about things.

So instead of writing procedural code that goes X then Y then Z, you're just giving it instructions in a prompt ahead of time, telling it what you want it to do. And sometimes it sticks to those instructions and sometimes it doesn't. it becomes, you have to become a bit of an expert in prompt engineering to get it to behave the way you want to. And that's writing things in natural language instead of writing code.

Matthew Stibbe (02:36)
That's quite a leap in a way if you're used to operating in that procedural language way and being very specific about the instructions, isn't it?

Andy Coulson (02:42)
Yeah, I think it might even be, you could try to get it to behave the way you want by being more procedural and in some cases it's better to just explain what you want it to do. Well, in every case I think.

Matthew Stibbe (02:59)
Sometimes it feels like that with chat GPT.

Andy Coulson (03:03)
We are going to use, so we're actually building some actual demos and prototypes doing this stuff. And one of the things we have to do in order for it to understand our data is use RAG or, what does the acronym stand for, Retrieval Augmented Generation. But it basically says here's a tool you can call to look stuff up in our catalog and you literally tell it that and you give it a procedure it can call and it does it. So that limits, know, that gives it some very specific data but the way you interact with it is still in a natural language way.

Matthew Stibbe (03:51)
So that, and forgive my ignorance here, and I'm sure everyone listening will be appalled about it, but this RAG thing is like giving the language model an API or API -like access into a pool of data.

Andy Coulson (04:05)
Exactly.

Matthew Stibbe (04:08)
Okay. And what are the biggest challenges you see with this integration of AI into data? For me as a marketer, to go on a tangent, but I was at Big Data London last week and the biggest challenge, I think, is around the language because every every stand at Big Data London had, know, we're bringing AI to data and AI is the new thing. there's sort of a marketing piece to it. But from a technology perspective, what are the biggest challenges do you think?

Andy Coulson (04:39)
Well, I mean, there's the different paradigm of how to prompt it to do what you want. But there are also practical considerations like cost, which I think is falling fast, but still rather expensive compared to running other kinds of services. And cost is based on typically how many tokens you're passing back and forth which can be thought of as words and whitespace and whatnot. So to be cost effective you need to limit your token count and to limit your token count you need to think about okay well if I use RAG and let it call an API do I want to return you know, a megabyte of data or do I want to try to distill that down to the stuff it really needs reduce my token count make it more cost-effective and then the other big consideration is it's still not lightning fast it's been some time analyzing what you're giving it and it spends time you know assembling its response so it's not as fast as typing a Google search it takes a little time and you have to think about that I think as you're implementing, you know, from a UX standpoint, what is it like for an end user to have to wait a moment to get a response when they're used to, you know, sub-two-second kind of response time.

Matthew Stibbe (06:17)
Hmm. That's interesting. So there's, there's all kinds of dimensions to this around user experience and around data quality and about cost control. It sounds an awful lot like every other thing in IT. Yeah.

Andy Coulson (06:29)
It's just a new one, don't worry about it.

Matthew Stibbe (06:35)
So let's talk a little bit about you and Epicor. So tell me first of all about the auto catalog bit of Epicor and how it fits into the overall company.

Andy Coulson (06:43)
Okay, so Epicor is really an umbrella over many acquisitions across many different product lines and verticals. And that includes ERP and you had Airline on with who uses our product, Profit21, but we have other ERP products as well. We have point of sale, we have logistics, all these things. Within the company there are different market verticals. So I'm in the automotive vertical and specifically within that I'm helping redevelop our automotive parts catalog.

And I say parts but it encompasses a little more. If you're a shop floor mechanic or a guy at the desk at a auto shop, you want to know, I come in and I say I need my brakes repaired you want to know what parts you need but you also want to know how much labor is involved in performing that, you might want to look up other things that are due around the mileage your car is at and so, you know upsell the customer on that. You might want to look up specifications in terms of like how much of the rotor in your brakes should be left before it's really time to replace rotors. So there are specifications that are also part of the catalog.

So yeah, it's a kind of a greenfield product project to redevelop our auto parts catalog.

Matthew Stibbe (08:31)
And the current product is....?

Andy Coulson (08:32)
So the primary current product is a 20 plus year old Windows application that runs standalone on Windows. You get your data. Now we actually do have internet download of data refreshes, but we still send out, I don't know, probably hundreds or maybe, probably hundreds of DVDs to some customers that they just load on either on their local machine or on a local server. So that technology is quite antiquated. We do have some new web-based APIs on top of that data, but it's still relying on sort of the same underlying data proprietary kind of database.

So the new product will be entirely re-engineered, cloud native, a lot of new technology and lot of new opportunity, of course, like AI to do new things.

Matthew Stibbe (09:44)
Can you talk me through the architecture of the new product and what tool sets are you using? How are you going to manage the data on it?

Andy Coulson (09:56)
Yeah, so early on, actually slightly before I joined, the decision was made to go with a graph database, which is kind of unusual. The thinking being that we have all these different kinds of things that are all related, parts, labor, specifications, service intervals, and then a lot of other things in the automotive space that aren't even part of our current catalog that we may want to sort of tie together.

So it's a graph database on the back end, kind of traditional in terms of container-based services running on top of that. We do do a lot of search, so we have a search engine that's, Elasticsearch Foundation search engine that's an integral part of it that kind of accelerates looking things up and then we go to the graph database to get the detail but for performance the search engine is much better. Turns out graph databases aren't great at you know sort of wild card queries. They're even worse than... Yeah and especially if you're going out and sort of spidering out through the graph to get a lot of different bits of information, it can become pretty non-performant if you go overboard with that.

Matthew Stibbe (11:31)
So you're directing the searches to the search engine rather than going straight to the database.

Andy Coulson (11:35)
Yeah, we narrow it down to sort of primary keys first, then go after the data with primary keys. For keyword searches, you really can't beat a search engine. In fact, that's an area where ML will probably come in. We're currently doing some things, because there's different terminology in the industry that isn't necessarily what parts are called in the database.

So how do you create relevant search results when a customer is using some kind of a slang term for a part or a colloquial term, which gets even more interesting when you get in other languages, Spanish and French. So we're already using some synonym support, but we want to get to where we actually start using ML in there to kind of make inferences about what are better results based on past behavior that we analyze and start training a model to give better results. We're very early in that process, but it's something we want to work on.

Matthew Stibbe (12:53)
When you say cloud native, how are you going to architect the system? Is going to be in containers and things like that?

Andy Coulson (13:00)
Yeah, so it's container, I mean, we're basically mostly AWS, so it's containers running in AWS's container service. The graph database, we have a relational database for some of the more relational data. And some other, you know, a lot of bits and pieces in there. We have lambdas doing things and... search engine is... it's all you know using SaaS services in the cloud we did we try not to host too much on our own instances CloverDX being an exception but yeah

Matthew Stibbe (13:41)
And yeah, how does CloverDX fit into all of this?

Andy Coulson (13:44)
Yeah, so we have all this data in the catalog, billions of entries that essentially come from parts suppliers of which there are thousands or OEM manufacturers, which there are dozens or hundreds. So it's a lot of data that comes in from a lot of different data sources and we need to process that and normalize it and try to extract attributes that are common, fit a common taxonomy sort of on the fly and get that all into our, mainly our graph database and a couple supporting relational tables.

So that's where Clover's been really helpful. You can imagine with all these different data sources, which can be a nice JSON data feed, but might be a CSV file or an Excel file or something else, Microsoft Access Database. We need a lot of flexibility. We also need something that we can sort of rapidly ramp up on a new dataset. Ideally, it's not too brittle, or if it does break, it can be fixed fast. And this could be like a really kind of tremendous level of effort if you did it all in procedural code or Java or whatever.

We found that CloverDX really accelerates our ability to like onboard data and it kind of reduces the skill set. I think that really the better skill set with Clover is a sort of experienced data analyst who knows some scripting versus somebody who's used to coding in Java. They have a tendency to kind of not understand the slight difference in how data flows through a pipeline, we call them pipelines. Whereas we've got a guy who's a business analyst background did scripting on our legacy system to import data and he just cranks through stuff really quick.

Matthew Stibbe (16:14)
Building the pipelines for the different suppliers, manufacturers to bring their data into your database.

Andy Coulson
Yeah. There are occasions where we have to dip down into code and a nice thing about Clover is it lets you do that with custom components. So a great example is graph database. They've got a lot of connectors, but they don't have a connector to connect to a Gremlin based graph database. So we wrote our own readers and writers. Well, primarily I did that and then had some help from other Java guys. And occasionally something's a little too tricky or it just isn't a good use case for doing it in a low code environment. It's easier to just knock it out in Java.

We have a couple custom transform components that do that kind of thing, but mostly we don't do any coding in Clover anymore. It's all low code, no code.

Matthew Stibbe (17:19)
And you mentioned that it's the exception, or one exception, to the SaaS model. Are you hosting it in the cloud as well, but in its own sort of environment, or are you running it on-prem?

Andy Coulson (17:20)
We host it in the cloud. Everybody's got their own local environment for development, but our productionized instance is in the cloud. Multi-node, you know, because we do have a high volume of data, which again is another thing that's, you know, is challenging and would be no matter what technology you were using, but tracking the ingestion of a file and did it work and how do you replay it if you need to? How do you handle it? We process data nearly 24 -7. We have so much coming in. So we have multiple nodes in Clover and it gives you the capability to partition and run processes in parallel on the data, which is probably the only way we would keep up with it at this point.

So there's some nice functionality around the job management and partitioning of data in there that we use. We still ended up building a fair amount of, I guess you would call it framework around running our jobs. They provide tracking of a job, but we really want to track the source, we want to know what file was processed, we want to trap the root cause if there's a failure and things like that. So we built sort of a generic framework within Clover around each specific ingestion job we run that tracks.

Matthew Stibbe (19:19)
So when you've built that error trapping framework and that pipeline as a sort of, this is probably not the official Clover language, like as a template, you can then replicate it so that each new ingestion pipeline has the same...

Andy Coulson (19:32)
Yeah, well, and reducing the development effort was a big part of that idea there. So we have this scaffolding that is common to almost all our runs. And then we have quite a bit of metadata that says what the data coming in looks like and where it needs to go. And we don't have to recreate that sort of scaffolding part of it every time. We just recreate the core, which might have a different transform for different data or sometimes we have to, sometimes something doesn't fit real well within our metadata driven mechanism. And so we build a kind of a specific one-off transform for that, but it's still plugged into the same scaffolding. So it's pretty nice.

Matthew Stibbe (20:19)
As you're moving from a, some 90s style desktop app to a containerized cloud based system. What impact is that transformation, that change having on data flows and on how you sort of plan out what you can offer and what you need?

Andy Coulson (20:26)
Yeah, well, that's a challenge, right? Because we're moving from something we've done for 20 years where we just burn DVDs or originally CDs. And we have a process that runs once at the end of the month to produce that and ship them out to an environment that's continually processing data, you know, what do you do if you encounter a problem during processing to fix that? If it's a really heavy load and you're having trouble keeping up, how do you deal with that? You spend a lot of time optimizing and parallelizing what you're doing and looking for ways to kind of not repeat some of the processing if possible, know, only process changes, deltas rather than everything, for example.

So there's a lot of thought that goes into any new data set. It's kind of the first thing is, okay, how are we going to deal with this? Getting a sense of the volume and what's it going to be like to have to process that?

I think it makes it more challenging than an environment where you can make mistakes and fix them and backtrack and rerun before the end of the month and you're good, right?

Matthew Stibbe (22:23)
Yeah. And it's starting with a green field site of how you engineer something and plan it out. It is intellectually challenging, I think, but actually concept, you know, fraught with risk because you might take a wrong turn or you might do, I'm not saying you, but you I, see a lot of software projects fail when they go to sort of version one to version two, rather than when they go from version one to version 1 .1, because everybody wants to change everything.

Andy Coulson (22:25)
Yeah, yeah, you know, we're all still learning too. Because there is quite a bit of new technology in here. So I wouldn't say we're even completely there yet, but we're learning and we're getting there.

Matthew Stibbe (23:06)
What's been the biggest, the number one lesson so far?

Andy Coulson (23:12)
Well, I would say that the that process of when you have a new data set that you're starting with understanding as much as you can about it and as much as you can about how that maps into your your target schema say what will your target schema look like. I think putting a lot of thought into that upfront is important. Otherwise you're in a situation where you're refactoring things to account for a higher volume than you thought you would have or the schema doesn't really suit the data well and doesn't perform well. And we've certainly done some of that and continue to refine what we've done.

But the more you can do upfront, I think the less you have to do down the road in a situation where you're trying to do it while running the application instead of doing it before something goes live.

Matthew Stibbe (24:18)
There's a definite sort of trend, I think, with software development in the last 10 years, perhaps longer, of sort of iteratively trying things and, you know, sort of sprinting and failing faster stuff. My instinct is there's a really big place for that. But when you're doing something big and complex with a lot of dependencies, there's also some value in rigorous thought and analysis and caring for it first, right? .

Andy Coulson (24:36)
Yeah, so the whole DevOps thought of how to proceed. And we really, don't have the latitude to take a full DevOps approach because you have to have a lot of duplication in terms of environments and somebody to run that and people that know what's going on and where your state is in each environment. I think that's a worthy goal but it's also pretty, another very challenging thing to do and do well.

Especially if your tolerance for something being down or a piece of something being down is low. You know, Facebook, if I don't see my post immediately I don't really care that much but if a customer complains about some data that's not there it needs to get fixed fast.

Matthew Stibbe (25:42)
So let's just shift time perspectives a bit. Take a really long view. OK, so I was really intrigued when I read on your LinkedIn profile that you support an archaeological project in Italy where they're recording archaeological information. So I'm a historian from long ago. I'm fascinated by this. So tell me a little bit about data for archaeology. Tell me about that project.

Andy Coulson (25:55)
Yeah, my moonlighting job, case my superiors see this. That one was pretty challenging too. And so in that case, I'm the only developer. I'm just building this little app for this archaeological team that runs out of the University of Texas and is excavating a couple of different sites near Pompeii, but not part of Pompeii. So it turns out that our you know academics and architecture are not real computer literate. There were a lot of challenges that evolved over time but initially what they wanted was to be entering data on their laptop in the dig where there was no internet and have it all go into one database, so the data they're recording is kind of like triangulation in terms of where they found it within the dig they've done. But then there's some taxonomy to it where they're organizing things by objects or by paintings or floor tiling, that type of thing. I can't, you know, and asking them what their schema should be doesn't work. You have to come up with a schema and hope it kind of works right and work with them to figure it out.

So that was really challenging and having something that worked just out in the wild wasn't very easy because we're used to, you can connect to the internet and hit a centralized database. So this thing started out as, I'm somewhat embarrassed to admit that it started out as a Microsoft Access application, but they do have built-in replication in Access, so that helped there, in terms of merging data coming from multiple users.

But it's evolved now into a typical kind of web application with a single page app and a relational database on the back. They got internet eventually and it turns out they really prefer to just put stuff, write stuff in notebooks or put it in a spreadsheet and then enter it later anyway. So the being online part of it kind of went away.

It continues to be a little bit challenging, understanding what they want their use case to really be, how they really want to use this data right now has been the sort of biggest challenge because the data is in there but how they want to present it they don't even know. So we're kind of trying to work with them to kind of rethink okay how do you really want to look things up it's not by a catalog ID you know that means nothing they want to be - in a way it's similar to our auto parts catalog challenge - in that they want to be able to say, well, in this trench, what did we find when we dug this trench? And it might've been objects or it might've been some part of a water system or whatever, but it's interesting to relate all those things together. So we may move towards like a trench as the sort of primary key to all this stuff.

Matthew Stibbe (29:44)
It sounds like a really fascinating case of unusual requirements gathering, right? I mean, they're just...

Andy Coulson (29:45)
Yeah, it's really, I mean, it's kind of fun because it's completely different from the business environment, you know, where requirements are kind of well known and your business users are somewhat savvy about using technology and they know where they want to end up. This is a little bit more like, you know, exploring in a different way from the archaeological exploring.

Matthew Stibbe (30:18)
Reminds me of when I was at university, I was very geeky and I had a computer and not many people did in those days. And the Dean of the college, one reason or another, I was sat next to him at a dinner. And he went, he was a real old duffer, right? He was a classicist. Stibbe, you're that computer chappy, aren't you? That basically that was as much as he had to do with technology, I think in his entire life. Well, that's probably not fair. He had been a code breaker actually in the war. Anyway, anyway.

Andy Coulson (30:48)
The director of our archaeological project probably isn't too different.

Matthew Stibbe (30:56)
And bless them with that we need people in ivory towers to be thinking deep things. And on that bombshell, think that brings us to the almost out of time. But before we wrap up this this this conversation, this fascinating conversation, I want to ask you one question. So if you could go back in time, four or five years, not 2000 years, four or five years to your start of your career at Epicor, what advice would you give yourself about data integration? What did you wish you had known back then that you know now?

Andy Coulson (31:33)
I think the part about understanding where your data is going to wind up and understanding more about the volumes and just quantity of data, how many of a particular, let's call it object you're gonna wind up with, because that's where we run into the scaling issues and performance issues.

And so, have, you know, it's hard to do that because it's, you're starting from scratch. You're, you have a team that doesn't know the data, doesn't know the, in this case, most of our team didn't know the domain, automotive domain. We were fortunate to have one very experienced Epicor developer who though new to Java and graph database and all these things, understood the data, even if he didn't have numbers around all of this. He was a huge asset. So I think I would have wanted to put more effort into knowing where we're going first before we start down the path. Would have saved us some time, but there are practical considerations there in terms of showing the feasibility to the business and you know, actually getting something to market quick, which is important in almost every case in business.

Matthew Stibbe (33:08)
Well, like the residents of Pompeii, a bit of foresight is a very valuable thing. If you can see the volcano before it explodes, good for you. And on that, think that brings the episode to a close. And Andy, thank you very much for being with us. It was really interesting conversation. If listeners at home, if you'd like to get more practical data insights or learn more about CloverDx, please visit cloverdx.com/behind-the-data.

Matthew Stibbe (33:36) Thank you very much for joining us all today and good bye.

Andy Coulson (33:41) Thanks, bye.

Ask us anything!

How Zywave freed up engineer time by a third with automated data onboarding

More efficient, streamlined data feeds

Effectively Migrating Legacy Data Into Workday

The challenges of implementing AI in data systems

Download and listen on other platforms

Get notified about upcoming episodes