How rapid prototyping can accelerate complex data projects

Listen to this episode:

About this episode:

In this episode of Behind the Data, Matthew Stibbe interviews Michael Lintonen, director of technology at VasoHealthcare. They discuss the exciting growth of open source in data technology, the unique data landscape at VasoHealthcare, and the challenges of communicating complex data insights to non-technical stakeholders. Michael shares insights on a recent project involving GE ultrasound sales, the importance of rapid prototyping with tools like Excel, and the role of CloverDX in managing data workflows. The conversation also touches on the evolving challenges in data integration and offers valuable advice for future data professionals.

AI-generated transcript

Matthew Stibbe (00:01)
Hello, welcome to Behind the Data with CloverDX. I'm Matthew Stibbe, your host, and today I'm here with Michael Lintonen, who is director of technology and a 12-year veteran at VasoHealthcare. So great to have you on the show, Michael. Welcome.

Michael Lintonen (00:17)
Thanks for having me.

Matthew Stibbe (00:19)
Before we dive into your world, let's start with a really practical question. What in the world of data and data technology, data engineering, are you particularly excited about at the moment? What should people be looking at?

Michael Lintonen (00:36)
I think it's really the growth of open source. I think we're seeing this transition away from these de facto solution providers we've had for so long. Like, yeah, if you want a medium size data warehouse, you're looking at Microsoft SQL. If you want massive scale, you might be looking at Oracle. That's kind of transition. We'll start transitioning a little bit when we got to Cosmos DB and kind of started to diversify, but now we're finally starting to see Postgres, or Postgres, however you want to pronounce it, really starting to come to the forefront. And I remember dealing with that like 15 years ago. And it was always kind of like this thing you just use for little side caches and other things. It was never actually the core of your data structure. Now it's becoming that. It's fascinating, like all the open source products now that are really supporting organizations. Microsoft is no longer the biggest player for some of this.

Matthew Stibbe (01:41)
That's fascinating. Are there any particular pieces of open source software that you find yourself using on a regular basis?

Michael Lintonen (01:52)
I have so many pieces of open source software it is comical. but yeah, we, we have some Postgres we are messing around with. Geez. I, I don't know. We, literally like it is, it is just, it is ridiculous how much we have. I can't even list it all. but yeah, we, we don't use it necessarily to support the business as much as we use it to support our internal operations and to allow us to operate at speed.

That is really the key to a lot of the open source solutions is trying to operate at speed. Even if it's something as simple as, okay, we need to do a lot more with the terminal now that we ever used to have to. Microsoft is very determined not to have GUI interfaces. They want everything to be run through the CLIs. And that's fine. But okay, CLIs, like you need something sometimes a little bit more robust and things like Oh My Posh or Oh My Zsh that can allow you to tweak and tune your terminals to make them a little more robust, provide more features to them to allow you to operate at speed are really useful and they're free. You just go to GitHub, start downloading, start building these things.

Matthew Stibbe (03:09)
Do you think Microsoft is sort of slightly regretting letting go of MS -DOS and the command line interface there? And it's sort of like, well, that Windows stuff was a bit of a mistake. Let's get back to command lines. I don't know. It's been a long time since I typed anything into a command line, to be honest. So let's explore your world a little bit. Tell me, first of all, what VasoHealthcare does.

Michael Lintonen (03:35)
Yeah, so we're referred to as an alternate sales channel. So essentially we are a contract sales team. So 90, 95 % of our workforce is salespeople or something in the sales realm, whether that be a regional manager for the sales team or the actual sales person on the ground. So we are contracted by other companies to sell on their behalf. So our contract right now, we have two of them with GE Healthcare - one contract is to sell what they refer to as the diagnostic imaging portfolio and that covers everything from your x -rays up to your big iron, CTMRs, things like that. And we have another contract with them to sell their ultrasound. And the reason why they contract us is simply because we are specialists in the rural markets and the smaller mom and pop shops, the harder sells where they might be able to walk into Mayo or University, and sell them on something, three to four month, you know, lifespan of that deal to get that thing going. Some of our deals, we've had deals take up to eight years to push through. It's incredible how long it can take to get some of the stuff in process. 

Matthew Stibbe (04:44)
Wow!

Michael Lintonen (04:56)
So that's why we're around. And we make up around a quarter of their total revenue per year. So we definitely aren't a small piece of their income.

Matthew Stibbe (04:59)
You encourage me in a way, selling websites and marketing services has a long sales cycle, but nothing like eight years. Good grief. And Michael, what's your role within that organization? What do you do?

Michael Lintonen (05:16)
A little bit of everything, but yeah, we are rather fond once we get outside of the sales realm of giving everybody as generic of titles as possible because we have to all be jacks of all trades. So my role really is all things technology end to end. So whether that be the data infrastructure or Azure cloud or on -prem infrastructure, the iPads we deploy out, the app MDMs, data administration, website coding, whatever is in need of work and that my team doesn't have the bandwidth to do is what I am working on at the end of the day.

Matthew Stibbe (06:02)
And in your team, how big's the team? What's the composition of that?

Michael Lintonen (06:06)
We are a very small team, relatively. There's still quite a few companies in our size that are about the same as us. So kind of a preface to that, the size of our company is about 125 employees. And my team is me plus three employees currently. We're probably going to have to add at least one headcount to keep this running. But that's where we're sitting right now. It's just me plus three.

Matthew Stibbe (06:31)
Still quite a lot of brain power there. And you mentioned to me before the interview that part of your job is to derive a complete picture of the business to assist with informed decision making, which is great. So apart from the data that is coming in from or going out to GE, what's the data landscape that you're dealing with? What kinds of data is in your matrix?

Michael Lintonen (07:00)
We are a little unique where we generate very little of our own data. Only single digit percentage of the data we store is actually generated by us. Everything else is either coming in from external lists, whether that be something as simplistic as postal codes to just feed the territory mapping or marketing lists or things like that.

But a lot of it is coming from whoever's contracting us. So we are pulling in whatever we can pull in from wherever we can pull it in. Maybe we might be lucky enough to negotiate a connection to an API where we can pull in something with a little more consistency and predictability. But a lot of times when we are dealing with this stuff, we don't have any access to whoever is running whatever they are storing their data on. All we have access to is a data operator of varying skill sets that is doing some sort of manual operations and shipping us a flat file in some random format of, again, interesting variations of quality. 

Matthew Stibbe (08:10)
What kinds of data is this? Customer data or prospect data or product information?

Michael Lintonen (08:17)
Yeah, we are lucky as the data coming in right now is all cleansed. We don't have to worry about any sort of compliance there, PII or anything else. It makes our lives a lot easier. But this data could be anything. It really is all around the sales life cycle and the customers. So we are tracking as much as we can track about the customers, what is going on out there, what changes that are affecting purchasing. Whereas, you know, a good example would be, go back five years and PET cameras, the nuclear medicine cameras, nobody's buying those things. They buy a new one about every quarter of a century. And that was because there was no real particular need for them. There's not a lot of healthcare or anything that requires the scans provided by that piece of equipment. Well, now we have an Alzheimer's drug, that Alzheimer's drug comes to the stipulation of you have to get images done with a PET camera at certain intervals to measure its effectiveness. So suddenly the PET market went from nothing to, my God, it's the maximum of our business. So there's things we're bringing in to try to measure, try to find that stuff, but a lot around the customers, what is affecting their purchasing.

Otherwise, once we have that account data, that's kind of the center of everything. We are now, I'm just looking at the sales lifecycle, looking at essentially the cradle to grave, opportunity all the way through to revenue. in our world, a deal is not final until that piece of equipment is bolted to the floor. So we'll get some commissions when we close a deal, but that commission could get pulled back if they cancel the deal later.

And that can happen because if you're talking about MR, you're not going to knock a hospital wall out and create something in a week. That's going to take a year and half plus. So in that timeframe, it could be canceled and that could be clawed back. So we are always very closely monitoring the backlog of installs, things like that.

Matthew Stibbe (10:30)
Right. How easy is it these days with all this complexity and all these systems and the open source stuff and the data coming in and out to explain what you do to non -techy decision makers, to boards and executives?

Michael Lintonen (10:50)
Yeah, that is our challenge. It is becoming increasingly difficult to explain this into layman's terms in a way that's both impactful, like where you can actually communicate to the true impact of a change that's being made and why you are making it. And also, it is very difficult to translate something in a way that you don't lose the amount of effort that was required to actually execute on it.

There's things we do that at end of the day, like it looks simplistic. It's like, it's a web app. How hard was that? It was very hard. Like you don't want to know. And trying to actually translate that is difficult. You're also always trying to avoid using any jargon, which is increasingly difficult.  I mean, if you, if you like have a CFO or something walking into your office and your app is down and they're like yelling at you, you can't just say, well, you know, the warp client was having some issues trying to get through the Cloudflare tunnel into the droplet to the guacamole to hop over to a Dockerized container to run the war file or something. Yeah, you'd better be writing a resume if you tried to do that.

Matthew Stibbe (11:56)
Yeah, they don't want to hear that. So let's dive into a specific data project and explore that a bit. You mentioned that you had to spin up some new processes and data systems to support the GE ultrasound sales. Could you talk a little bit more about that project and perhaps start with the goal of the project?

Michael Lintonen (12:37)
Yeah, that was a high stress one. That one, we literally had maybe 90 days total to ramp up the entire business, like everything, hiring, structures, the entire thing, all the way from leadership down. We had about 90 days. So we had to, first of all, identify what actually is the core components needed to even support this business? What can we recycle from existing solutions?

And what are their metrics? Their KPIs are obviously different. Different life cycle or different selling cycles, all different things. So there's a lot of just rapid development issues to build, prototype out things fast. And I know a lot of people like to pan Excel and you shouldn't be using Excel. Excel, still at end of the day, is one of the best prototyping solutions. A lot of people forget about Power Query and the M scripting language under the hood of that thing. That thing is great for building out the initial prototypes of how are you actually going to connect your data together? Where is the data going to come from? How can you connect together? What can you build for cubes or other things out of that? What kind of tabular models and things can you build?

From that, we were able to that, kind of build some initial prototypes relatively quick before the actual business had to be online. And then because it's in Excel and it is in that M language, we're able to quickly transition that over to Power BI. For the most part, the engines are relatively close. Strangely, there's some deltas in the parity between the engines because Power BI is the golden child of Microsoft, Excel. They just shove features back down to it when they feel like it. But we are able to move most of that over and then get initial Power BI dashboards over and those are the initial proof of concept now.

So, okay, we have these dashboards, do these metrics work? Are these metrics, telling us what we need to see. And also, what are the core metrics? You don't want to have two dozen KPIs you're trying to monitor. You're gonna fail. You need to have variable KPIs that are the core drivers of your actual success. You need to keep that down under a half dozen. They're like the big hit items. And we use the Power BI's to help us identify those and identify what those were.

And now we are working through, okay, we've built prototypes, we built these proof of concepts. Now we need to build the final solutions. We need to build the systems that we have in place for the DI. And that is where you know, these more robust ETL pipelines come in, that is where the final database architectures come in. That's where the final web applications come in. That's where the big lift operations start occurring. But at the end the day it's always like, don't try to get it right to begin with. Don't let perfect be the enemy of good. Just get it functional. Get the proof of concepts out there. Start playing with it see what works and learn and grow and continue on. And then worry about trying to build something better. But don't try to build perfect first.

Matthew Stibbe (15:59)
I'm interested in this idea of Excel as a prototype. So this is what I'm imagining is happening. You've got tables and tabs in Excel that are effectively your initial data tables and the relationships between them. So that it's almost like a data structure inside Excel with some data in it. And you're using that to go, if I graph this and I report that and I sum that up, that's going to give me the basic reporting. Am I imagining that or am I just like, so stuck in the 90s for it?

Michael Lintonen (16:30)
There's a whole different side of Excel that most people don't.... It kind of had its moment several years ago, and then it's kind of faded from the consciousness of all the data engineers, but it still is there. It still is excellent. And it is this whole different side of Excel that is Power Query and Power Pivot. And Power Query is an entire separate component. And it actually now, unfortunately, is only accessible if you have the enterprise licensing for Excel, but it is a full scripting engine where you can write, this isn't the old school, know, VBAs or anything else. This is like a whole separate programming language, the MLanguage where you are able to write calls to Salesforce APIs, where you are able to blend and wrangle data through the back end, through this scripting system, and essentially create your Lego blocks. And then with those Lego blocks, you can use the Power Pivot to build the data model.

And comically enough Power Pivot's actually been there for eons and nobody even really noticed it until the last eight years or so, eight, 10 years. But it is essentially, it is if you kind of have ever modeled out a database and you've stuck your little tables up on a screen and connect your primary keys and foreign keys.

That is Power Pivot. It has that interface behind the scenes to allow you to build these models. And then once you've built the models, you can do things you can't normally do with Excel. You can run Pivots on multiple tables instead of a single table. You can build the bane of everybody's existence, DAX measurements. Those things will drive you batty, but they are powerful. But DAX measurements allow you to, instead of just putting a value into a cell and pivot table that's coming from the source, you can write some sort of formula that is merging the two data sets, doing some math on them, and then sticking that into that cell.

Matthew Stibbe (18:36)
How do you then translate that actually sounding very sophisticated and robust sort of prototype into a larger scale production sort of environment? I mean, what's the tools and transition you need to go through to scale that up?

Michael Lintonen (18:55)
Unfortunately, a lot is not like, it's not one for one compatible. It's, M is its own unique creature, but it does have a lot of similarities where you can kind of translate things over if you have knowledge of the two systems. But the main thing is it gives you the main, I guess the main transformations. The main transformations you need and the main queries you need. So once you've built and prototype this, then you know what your filter criteria have to be on that source. You know what your cleaning routines have to be on that source. You know what your merge routines have to be on that source. You know what is missing that you have to fill in from that source. It allows you to figure that out and figure out the logic to do that.

Once you figure out the logic and scripting engine, even though it's not one for one compatible, you're like, okay, maybe they're using if logic here or here case one statements here. It can be translated fairly easily. It's not a complete rewrite. And that's what we are doing. We are just take those if statements, maybe move them into case ones. If we are in SQL or removing, if we're going to the Java side or some of that, the if else statements. Yeah, that's kind of the transition there.

Matthew Stibbe (20:27)
Where does CloverDX fit into this picture?

Michael Lintonen (20:32)
Yeah, so CloverDX is our... We use it instead of SSIS because the problem in our world and the problem for actually a lot of people who have to deal with externally provided data is you don't want these black box solutions like SSIS where, okay, it's not a black box when you're building it, but once you compile it into a package and are using it, it's a black box. 

Matthew Stibbe (20:57)
That's the Microsoft ETL thing, right?

Michael Lintonen (21:01) Yeah, so SSIS, sorry, that is... SQL Server Integration Services, that's their ETL solution. Yeah, I really do not like that thing. I just, not a fan of thing. Because yeah, once again, it's great when it works. When it dies, it's a black box and you don't know why. And that's where Clover is so critical. Clover is not, it's not like a simple as, one of the Microsoft engineers back in the day kind of coined the term when comes to these Excel power query systems of what is it like, clicky, clicky, draggy, droppy type of interfaces where you just click, drag, drop, drop, things. Clover's not quite that, but it's close enough. And when it fails, you can watch the entire flow of the data and you can build it in a very modular fashion.

So if a input changes, you can build its solutions in a way that, okay, that input, its ingestion is a separate component. If it fails, it's isolated to that component. I can fix it there. It's not, I don't have to rewrite code downstream. I can fix it in one place. Similarly, like I can recycle code and that because that's a lot of the issue with some of this stuff. Like number one rule in coding, don't rewrite the same code. Don't write the code twice. Write it once. Reuse it. And that's where Clover is key. It allows us to write libraries. So we can write libraries of cleansing routines for postal codes, phone numbers, really annoying white spaces and non -ASCII stuff that nobody knows is there, but my god, it will just completely destroy your database. We can write cleansing routines for all that.

And if we find some new little gem in there that the cleansing routine misses,  we just change it one time in that one piece of code. And now all of the other components using that cleansing code have that new function. We don't have to repeat work. So it's been absolutely critical. And it's also unbelievably powerful. The ability of it to process vast quantities of data rapidly is quite astounding.

And even one of the things I have not been able to explain is strangely SQL Server is more fond of it making queries than it is of you directly querying it. And this boggles my mind a little bit. But when you, we have to, for our web apps, we have to use dynamic SQL. If we have a web app and we are doing a call against a dynamic SQL store procedure, the plan that creates for that direct call will be worse than the plan that's created for Clover if we are going through the Clover API and doing the same thing. So if we put the extra step in of using Clover to talk to it, we will get a faster response than if we just go directly. I don't know how that works. 

Matthew Stibbe (24:07)
There's probably some genius coding somewhere in Czech. Yeah, the clever guys there. Moving on, one of the things that we discussed before was how very difficult it is to, increasingly difficult it is to maintain and keep these complex spider webs of applications and systems and data flows, maintain them and keep them secure. And you've been at this now for more than a decade. I'm interested in how the landscape has changed over time and what the challenges are today for that.

Michael Lintonen (24:44)
Yeah, it's... The problem is the landscape is... It's becoming quite... I'm not sure how else to refer to it other than essentially it's rewilding. It's becoming more feral as time goes on. It's... They are just changing for the sake of change at times. It just doesn't accomplish anything other than create more work for us. And it's like, why'd you do that? Because we could.

But didn't actually improve anything.

Matthew Stibbe (25:14)
You're talking about vendors changing?

Michael Lintonen
It's a lot of stuff like, know, Microsoft is pretty infamous for this. They just decided, yeah, that way you license people in bulk in your environment, we're going to move it over here. And by the way, we're not going to give you any tools to do the migration. You're going to do it manually. And by the way, some licenses still don't work. So those people, yeah, you're just stuck. They gave us a 30 day notice on that. And it's like, well, that's nice of you.

Michael Lintonen (25:43)
It didn't improve anything. In no way was it a benefit to us. It just created more headaches for us. So we are constantly trying to keep up with what are the vendors changing on us and trying to adapt to those changes. We are constantly trying to figure out what is the correct tooling for what we are doing. Are we using the right stuff? Are we using the stuff in the best practices, which is also quite challenging, fair off, we are using stuff as it should be used, because when you're dealing with some of these companies like Microsoft, they're shedding a head count. This started 15 some years ago, maybe 10, 15 years ago, where they decided, don't need quality assurance anymore. The engineers are good enough. They can QA check their own code. Eh, not really. But now it's like, we don't need tech writers either. The engineers can write their own technical documents.

Matthew Stibbe (26:40)
Hmm.

Michael Lintonen
Yeah. Like a lot of large companies, Microsoft is kind of siloed. So now you have documents over here that don't agree with documents over here. So it's like, well, which one of you is more right and which one should we follow? So it is quite, there's a lot of just like detective work to figure out like even like where are we supposed to go with this? how are we supposed to build this out? And also to try to figure out like what is the, what can we even like maintain as a small team?

Matthew Stibbe (27:02) 
How do you triage that? How do you keep abreast of it?

Michael Lintonen (27:14)
It's ADHD, honestly, I don't think you can work on this field if you don't have that. You just are constantly consuming large quantities of information. Are watching Reddit. You are watching, you know, bleeping computers, you know, they're posting out there. You're monitoring Twitter feeds. You're trying to look at the message center in Microsoft. Good luck with that at times, but yeah, you're just constantly just listening to things, watching things. 

Matthew Stibbe (27:47)
Just drinking from the fire hose constantly.

Michael Lintonen
Yes. Live on YouTube. Just watch every video come through.

Matthew Stibbe
Well, Michael, we're almost out of time, but as we bring this episode to a close, I'd like to ask you one final question, reflecting back on everything we've discussed in your experience. If you were going to go back in time and give yourself a piece of advice about data integration, what would it be?

Michael Lintonen (28:17)
There's actually a couple of things. One would be build the sandcastle first. Don't worry about the mountain. It can be really overwhelming in this world. Be like, my God, like how this massive, huge Azure thing. Don't worry about it. Worry about where can you affect change first and build that thing. Get good at that thing. Then worry about some of those. Don't worry about the rest of it. And when it comes to the tools you're using to do this, keep it simple.

For me as a manager especially, when it comes to my team, they could really be referred to as makers. They're tech workers, but really they're artists. They are creating some wonderful things. So they are makers and you have to give those makers tools that they're going to use. So those tools have to be simple. They have to be low friction. If you're trying to bend a tool to your will, they're not going to use it. You have to make sure you're keeping it simple, keeping it low friction and that will allow your makers to succeed and allow them to grow.

And it's like, it's kind of like one of things I'm always telling people, don't be... like fail fast, grow fast. Like don't be afraid to fail. Like the faster you can fail, the faster you can grow. Like try different things, learn and grow. But if you know more than me, then I hired the right people and I have a lot, most of my people know some stuff that was like, it astounds me every day.

Matthew Stibbe (29:28)
That's a great place to be. And may everyone have people under management that know more than you or more than one. And with that revelation, I think that brings this episode to a close. And if you're listening and you'd like to learn more about CloverDX or you want to get more practical data insights, please visit cloverdx.com/behind-the-data. Thank you for joining us. Thank you for listening. And Michael, thank you very much for being a very interesting guest today.

Michael Lintonen (30:16) All right, thank you very much.

Share

Download and listen on other platforms

Subscribe on your favorite podcast platform and follow us on social to keep up with the latest episodes.

Upcoming episodes

Get notified about upcoming episodes

Our podcast takes you inside the world of data management through engaging, commute-length interviews with some of the field’s most inspiring figures. Each episode explores the stories and challenges behind innovative data solutions, featuring insights and lessons from industry pioneers and thought leaders.