HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Data Science pipelines and monolithic model blobs

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
sciencepipelinesmonolithicblobsanddatamodel

Problem

Normally, one important topic in DevOps is how we take care of automated creation and delivery of software artefacts.

With the rise of data science there is a new type of artefact - monolithic binary blobs representing a trained neural net for example or other machine learning models. Such a blob can be many GB in size and its creation is not yet standardized AFAIK which brings organizations back to the pre-CI age. Nevertheless, such blobs have their version and associated collections of training data (corpora) which tend to grow rapidly as well.

What are best practices to address this new challenge using DevOps methods - if even possible?

Solution

Personally I don't see any reason for which an Artefact Repository - the recommeneded DevOps tool of managing artefacts - wouldn't be applicable to trained neural nets or other artefacts.

The artefact size might have some upper limit for a particular artefact repository, but in such case it would be a technical or policy limitation one, not a fundamental/principial one.

As for applying DevOps methodologies for the process producing these artefacts, I think most if not all of them can be applied equally well, as long as the artefacts:

  • are produced from some sort of specification which supports change versioning (equivalent to software source code)



  • are built via a repeatable and automatable process



  • are validated using some sort of repeatable and automatable verification (similar to QA), eventually using some supporting data (training data in this case, equivalent to DB snapshots, for example)



Side note: monolithic software code delivery is still a big deal and is perfectly maintainable with DevOps methodologies (with a bit of care), not everything can be split in microservices. Size doesn't matter enough to make DevOps not applicable.

Context

StackExchange DevOps Q#1999, answer score: 9

Revisions (0)

No revisions yet.