Row-Level Security: A Cautionary Tale

7 minute read

Row-level security is a common requirement for people trying to control access to data. Some systems provide this natively, but when it's not provided, people often roll their own using the tools they have—with mixed results

In this post we'll explore a common way to implement row-level security on top of a relational database and see why it may not be as secure as it looks.

A Pop Quiz

But before we get to the crux of the issue, here's a quick quiz. I promise it's relevant.

What will each of the following languages do when a is equal to 0​?

  1. C, C++, C#, Java, and most other C-family languages:
    if (a != 0 && 1/a > 0) { /* Do something */ }
  2. Pascal:
    IF a <> 0 AND 1/a > 0 THEN (* Do something *)
  3. SQL:
    SELECT *
    FROM T
    WHERE a <> 0 AND 1/a > 0

Obviously, I'm asking about short circuiting behavior. I'll let you ponder and reveal the answers in a moment. But first, back to row-level security.

The Setup

Imagine that we have a table of sensitive customer information:

customers:

id  ssn        balance
--- ---------- --------
1   123456789  150.00
2   234567890  250.00
3   345678901  350.00
4   456789012  450.00
5   567890123  550.00
6   678901234  650.00
7   789012345  750.00
8   890123456  850.00
9   901234567  950.00

(Apologies if I've exposed your SSN...)

We want to provide our employees access, but only to their customers, not the entire set.

A common way to do this on a system that doesn't have built-in row-level security is to (a) add a security table that expresses which rows each user is allowed to see, (b) build a view that uses this security table to restrict the rows that each user sees, and (c) force everyone to access the data through the view.

So, I first create a security table that maps users to the customers they can see. E.g.,:

access:

uid    cid
------ -----
alice  1
alice  2
alice  3
isaac  4
isaac  5
isaac  6
bob    7
bob    8
bob    9

This means, e.g., that isaac should only be able to see customers numbered 4, 5, and 6. To enforce this we create a view:

CREATE VIEW sec_customers
AS
SELECT *
FROM CUSTOMERS
WHERE id IN 
   (SELECT cid FROM access WHERE uid = USER_NAME())

I'm showing this with SQL Server, so I'm using the built-in function USER_NAME() to dynamically modify the query based on the user who accesses the view. The specifics here will vary system-to-system, but you should be able to accomplish something similar.

We'll restrict access to the base table, and let users only come in through the view.  Now when isaac selects everything from sec_customers, all he sees is:

id  ssn        balance
--- ---------- --------
4   456789012  450.00
5   567890123  550.00
6   678901234  650.00

The Punchline

Pretty good! But before we celebrate, let's look at the answers to our quiz:

  1. In most C-family languages, compound predicates like this short circuit: the system will check whether a != 0 and only execute the 1/a > 0 bit if a isn't zero. The body of our conditional won't be executed, but life will go on as usual.
  2. In Pascal there is no short circuiting: the system will always execute all of the parts of the compound predicate. So if a is zero, the system will throw a divide-by-zero error.
  3. In a particularly awesome twist of semantics, SQL short circuits but doesn't guarantee order of operations. So this code may execute fine if it tests a <> 0 first, or it may throw an exception if it tries the division first—you're at the whim of the optimizer.

What does this have to do with row-level security? As I mentioned when discussing extract types in Tableau, when you write a query against a (virtual) view in SQL, your query is composed with the view query, and this whole thing is then optimized. But SQL doesn't generally respect the order of the operations you've written down, and this disregard runs deep. There is no "query boundary" when you compose queries: your operations can get shuffled around anywhere in the plan.

And that's a problem when it comes to security.

To illustrate, let's try another query against my "secured" customer table. I'm going to guess the Social Security number of a customer that I shouldn't have access to, and see what I can find with a little SQL.

If I guess incorrectly, everything works as we'd expect:

SELECT *
FROM sec_customers
WHERE 1/(ssn - 789012346) = 0

id  ssn        balance
--- ---------- --------
4   456789012   450.00
5   567890123   550.00
6   678901234   650.00

But if I guess correctly:

SELECT *
FROM sec_customers
WHERE id = 7 AND 1/(ssn - 789012345) = 0

id  ssn        balance
--- ---------- --------
4   456789012  450.00
5   567890123  550.00
6   678901234  650.00
Msg 8134, Level 16, State 1, Line 16
Divide by zero error encountered.

And now I know that a customer has an SSN of 789012345: I've leaked information that I shouldn't have leaked. And with a little work, I may be able to narrow this down to a particular customer.

What happened? Looking at the query plan for this query it becomes more clear:

pushdown

The syntax implies that customers I don't have access to will be filtered out before they hit my query. But the optimizer has reordered the operations: the security filter is enforced by the join, and my predicate has been "pushed down" and folded into the table scan. The result is that the predicate sees the entire customer table, which results in an information-leaking exception.

Coda

I've shown this with SQL Server, but this isn't a criticism of that system. Aside from the specifics around identifying the user, the mechanism I've shown is likely to apply to any database with an optimizer that can rearrange operations—which is to say any database worth it's salt.

That said, while SQL Server did add first-class row-level security in SQL Server 2016, it's likely that it's using a similar mechanism under the hood. As Microsoft's notes:

Carefully crafted queries: It is possible to cause information leakage through the use of carefully crafted queries. For example, SELECT 1/(SALARY-100000) FROM PAYROLL WHERE NAME='John Doe' would let a malicious user know that John Doe's salary is $100,000. Even though there is a security predicate in place to prevent a malicious user from directly querying other people's salary, the user can determine when the query returns a divide-by-zero exception.

Is this a worry? Well, I suppose it depends. But I'd be reluctant to rely on this mechanism if I had a real concern about exposing data.