HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Teradata to drive responsive web application

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
applicationresponsivewebteradatadrive

Problem

Does anyone have experience using Teradata to drive a typical, responsive client-side web application? What is your evaluation? Specifically, how well does it perform in single-read and single-write?

Single-read: The total time-to-first-byte between a request being sent and a response being received to read a single document or row that contains meaningful information for a webapp. For example, consider a table called profiles with 100,000 records in which various attributes are stored on each row. How much time does it take to get a single profile up to a client-side webapp by querying on a primary key? How much time does it take to do the same operation, querying on a non-indexed field?

Single-write: How much time does it take to insert a new row or update an existing row. Assume that all indexes that are needed to perform the single-read quickly need to be written during a single-write operation.

If you have built a client-side web application with a Teradata backend, what sort of challenges did you face? I have seen Teradata as primarily an analytical and data warehousing tool rather than an operational database, but I am being asked to investigate Teradata as the backend for an operational web application.

I would like to find out if anyone has tried this, and whether it could keep up with more commonly used databases for webapps. I'm not looking for opinions, but for honest evaluations of how the technology works in this unusual use-case by people who have tested it, and for people's experiences using Teradata to drive a responsive webapp.

I understand there are many variables here, but I'm looking for real-world examples of Teradata being used as a backend for a client-side webapp, and how it performs in practice. Thus I'm more interested in TTFB than I am in DBS time, though a thorough breakdown of how long each piece of a transaction takes would be appreciated.

My previous question, Teradata REST API performance metrics, was specifically about Tera

Solution

as I said in my answer to this question, Teradata wouldn't be my first choice for a web backend:
it really shines on moving/elaborating large quantities of data, but the typical workload of an application (frequent small queries) is not the best scenario for it.

Obviously there are other voices (see here) but there is quite an agreement about some critical points:


Teradata locking behavior is the primary challenge for an OLTP style
application.

and:


use as much cache as you can: the Teradata connection is a precious resource
you'd better not use for information that doesn't need to be realtime

It is clearly not impossible to implement such a solution on Teradata, but it requires your application to adapt fully to Teradata peculiarities (e.g. you could have to create a non-trivial application data model, to strive for single-AMP queries)

An example of the challenges you're going to face:

in an application I had to update a flag field on a set of rows in a table, chosen with a condition that was not the Teradata "primary index". However, there was an index on the columns used in the condition.

Usually, I would have used a simple "update": however, in Teradata, this would have created a "table-level lock".
At the end it turned out it was "better" to select all the rows in the application and do all the updates (in batch) one by one by primary index, to use the "hash-level" locks.

=========================================================

Edited: after the additional info from OP

First the good news:

Mostly read-only
since it is a mostly "read only" application, you can setup an efficient caching system. You could also "pre-cache" some common content so that the hit of first querying it won't be apparent to the user.

Why that's important in this case:
connections in Teradata can't be created over a specific number for appliance. That's a pool wìthat will be shared between users in the application, the ETL and the normal OLAP users.
In addition, queries that are not single-AMP won't have a very good latency, and they'll be more subject to the instan load of the system

Fine-grained queries

It might be a good thing if in this way you're able to do mostly single-AMP queries.

Why that's important in this case:
single-AMP queries (the ones querying/insert/deleting using the primary key as criteria) have lower lock impact and can be managed without paying the cost of coordination between the AMPs. On the other side, having such a "chatty" protocol wil occupy more the connections.

Materialized tables

It's good that you are already thinking about materializing some data. You might even find out that it will be useful have the same data materialized in two or more tables with different primary indexes, to satisfy your user interface (e.g. if you have different selection criteria on the same entity). However this will have a cost in terms of ETL time and disk space.

Now, the bad news:

More than 1000 users
As the application is not "completely" read-only, this number of users will probably create access conflict in the cases when they have to write.

Why that's a problem:
Forget multi-versioning: here readers block writers and viceversa. Your select might stop the update of another user from completing. If you are not able to use primary index for every update you do, you'll be locking the full table, and even if you use primary index, you are not going to have a row-level lock (you have an "hash-level".. normally it works ok, but it can have some issues with batch inserts). There's a workaround this conflict betwen readers and writers (LOCK ROW FOR ACCESS) but it opens the door to dirty reads, and you should carefully think if you can accept this in your application

Web application and OLAP on the same machine

Independently from the underlying technology, it's always a challenge to accomodate these two different workload on the same platform.

Why that's a problem:
Teradata has some workload management rules that can help in this field, however they need a careful tuning from DBAs and, however, extraordinary load (e.g. end of month batches) will likely have an impact on the user experience of the application

Hourly ETL

The hourly ETL will be a stress point for the application, since you'll be loading the same tables that you are querying. Since you cannot afford to lock the readers, you'll probably resort to "dirty reads", but you'll have to find some policy to manage the fact that the application could have temporarily inconsistent data.

Why that's a problem:
Already said: table locks, dirty reads and the presence of two different workloads

Number of queries

I know I said that "finer grained queries" can be a good thing, but here we're talking of 100 queries per user for a page, and a minimum of 20 queries per user action. Now, if you execute them serially, you're increasing latency (and you're looking for sub-second response time); but if you parallelize them, you're occupyi

Context

StackExchange Database Administrators Q#153026, answer score: 4

Revisions (0)

No revisions yet.