HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Would elasticsearch or RavenDB be better for fueling a statistics engine/random forest?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
randomengineforestravendbbetterfuelingwouldstatisticsforelasticsearch

Problem

(Note: this question exists on StackOverflow as well but I thought it might have a better reception here. If it proves this is the better place, I'll close/ask to migrate/link to this. Also, if it doesn't really belong here, I'd be happy to delete it.)

I've been looking at the following NoSQL databases for the next phase of my project:

  • elasticsearch



  • RavenDB



elasticsearch positions itself as primarily serving advanced search scenarios while RavenDB positions itself as a document-oriented-database.

Primarily, the document will be around videos. Each has a natural id. That will be the key of the document.

Around that, I add other content in fields which will not necessarily be scalar or flat, as the information will come from a number of different sources with different structures.

For example, there will be content from the video provider's Atom feeds, blog posts that have the video embedded in it, and other pieces of data from a data warehouse project.

There is no set structure across all of the items (each of them will be very domain-specific, actually), the only thing that will relate them is the natural key of the video mentioned above.

That said, once I have this information in one of the above solutions, I'll want to do a number of things with it:

  • Cull it to help populate variables in a random forest in order to make classifications about the videos



  • Provide general search on the videos (general free-text, not based on the results of the random forest) through a web-based front end (ASP.NET MVC if you must know)



There are some requirements:

-
I will more than likely be in a ASP.NET shared web hosting environment. This means I'll have one machine, and won't have access to set up a service. Something embeddable will be very helpful.

-
The ASP.NET environment will be hosted in IIS, so the embeddable aspect will have to survive app-domain recycling.

-
I'll want to create new indexes based on the results of the statistical analysis whic

Solution

Ravendb embedds quite nicely into a .net application and also allows you to create full text (embedded) lucene.net indexes. Given your constraints on the hosting elasticsearch won't be a viable option since you'll need it to run as a service alongside of your MVC application.

Lucene.net does not support facets out of the box but ravendb comes to the rescue here too:

http://ravendb.net/documentation/faceted-search

Ravendb also allows you to control your lucene.net analyzers quite nicely:
http://ravendb.net/documentation/how-indexes-work

Disclosure: I'm the author of the elasticsearch .net client NEST so if anyone would try to sell you Elasticsearch it be me :)

Context

StackExchange Database Administrators Q#8101, answer score: 5

Revisions (0)

No revisions yet.