HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMajor

Why don't databases create their own indexes automatically?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whydatabasescreateautomaticallyindexesowndontheir

Problem

I would have thought that databases would know enough about what they encounter often and be able to respond to the demands they're placed under that they could decide to add indexes to highly requested data.

Solution

Update

This is now implemented in SQL Server Azure. It generates recommendations

and index management can be configured to be automatic.


Enable automatic index management


You can set the SQL Database Advisor to implement recommendations
automatically. As recommendations become available they will
automatically be applied. As with all index operations managed by the
service if the performance impact is negative the recommendation will
be reverted.

Original Answer

Some databases do already (kind of) create indexes automatically.

In SQL Server the execution plan can sometimes include an Index Spool operator where the RDBMS dynamically creates an indexed copy of the data. However this spool is not a persistent part of the database kept in synch with the source data and it cannot be shared between query executions, meaning execution of such plans may end up repeatedly creating and dropping temporary indexes on the same data.

Perhaps in the future RDBMSs will have the capacity to dynamically drop and create persistent indexes according to workload.

The process of index optimisation is in the end just a cost benefit analysis. Whilst it is true that humans may have more information about relative importance of queries in a workload in principle there is no reason why this information could not be made available to the optimiser. SQL Server already has a resource governor that allows sessions to be classified into different workload groups with different resource allocations according to priority.

The missing index DMVs mentioned by Kenneth are not intended to be implemented blindly as they only consider the benefits to a specific query and make no attempt to take account of the cost of the potential index to other queries. Nor does it consolidate similar missing indexes. e.g. the output of this DMV may report missing indexes on A,B,C and A,B INCLUDE(C)

Some current issues with the idea are

  • The quality of any automated analysis that does not actually create the index will be highly dependant upon the accuracy of the costing model.



  • Even within the field of automated analysis an offline solution will be able to be more thorough than an online solution as it is imperative that an online solution should not add large book keeping overhead to the live server and interfere with its primary purpose of executing queries.



  • The indexes created automatically in response to workload will necessarily be created in response to queries that would have found them useful so will lag behind solutions that create the indexes in advance.



It is probably reasonable to expect the accuracy of costing models to improve over time but point 2 looks trickier to solve and point 3 is inherently insoluble.

Nevertheless probably the vast majority of installs are not in this idealised situation with skilled staff who continuously monitor, diagnose, and anticipate (or at least react to) changes in workloads.

The AutoAdmin project at Microsoft Research has been running since 1996


The goal of this project is to make databases self-tuning and
self-administering by exploiting knowledge of the workload

The project home page lists several intriguing projects. One is particularly relevant to the question here


Another interesting problem arises when there is no DBA available
(e.g. an embedded database or a small business). In such scenarios, a
low touch continuous index tuning approach may become important. We
have explored solutions ...[in] “An Online Approach to Physical Design Tuning” in ICDE
2007.

The authors state


With increasingly common DBMS features like online indexes , it is
appealing to explore more automatic solutions to the physical design
problem that advance the state of the art.

The paper introduces an algorithm


Its main characteristics are:



  • As queries are optimized, we identify a relevant set of candidate indexes that would improve performance. This feature allows query


processing to continue in parallel with indexes that are built in the
background.

  • At execution time, we track the potential benefits that we


lose by not having such candidate indexes and also the utility of
existing indexes in the presence of queries, updates, and space
constraints.

  • After we gather enough “evidence” that a physical design change is beneficial, we automatically trigger index creations or deletions.



  • The online nature of our problem implies that we will generally lag behind optimal solutions that know the future. However, by carefully


measuring evidence, we ensure that we do not suffer from “late”
decisions significantly, thus bounding the amount of incurred loss


The implementation of the algorithm allows for throttling in response to changes in server load and also can abort index creation if during creation the workload changes and expected benefit falls below the point that it is deemed worthwhile.

The conclusion of the authors on the

Context

StackExchange Database Administrators Q#43772, answer score: 27

Revisions (0)

No revisions yet.