Protecting Location Data at The Ezrat Nashim Database

Since the early stages of my work on The Ezrat Nashim Database (https://www.ezratnashim.com/), I had a nagging thought in the back of my head that I wasn’t quite sure how to address. I was building what I hoped would become a large, public database of synagogues, with exact geographical coordinates for each one - a potential gold mine for someone with more nefarious “side-projects” in mind. Sure, shul addresses are already easily available online. They’re not state secrets. But if the project succeeded, I would be making it easier to pull that data from my site in bulk. It was an issue I could ignore for the moment, but I knew I’d have to deal with it later.

A few months ago, I started pushing the project out to a wider group of test users and “later” finally arrived. One friend told me that she had “a concern, not about the software but about how it could be used in the wrong hands. Let's say an antisemitic person comes across it, it's basically an open map of Orthodox/observant shuls that could be targeted.“ Though it’s worth noting that the project is not technically limited to Orthodox or observant shuls, she was right. It was time for me to figure out what to do about this.

I did not have experience with this sort of problem from my day-to-day work, so I started brainstorming, with some help from ChatGPT. My goal was to find a way to appropriately address the security gap without sacrificing the site’s usability too much. I did not need to completely obscure the shuls’ coordinates. That would be overkill. I just had to come up with a way to rearrange things so that it wouldn’t be any easier to pull those coordinates from my site than it would be to get them from somewhere else.

Out of that brainstorming session came two core ideas, which I’ll explain in more detail below, along with my misadventures in attempting to implement them: (1) coordinate jittering and (2) rate limiting. Essentially, I would redesign the site so that the map on the homepage would only show approximate (or “jittered”) locations. Exact shul coordinates would only be accessible one at a time, from specific URLs. Once I had that set up, I could build a system to track requests made to those URLs and block suspicious-looking activity. Location data would still be available, but it would be better protected from someone trying to retrieve it in bulk.

Coordinate Jittering

Conceptually, the idea of blurring shul coordinates when they are shown in bulk and only exposing precise locations one at a time made perfect sense. On the main map I would still display all of the shuls in the database, but each pin would only show up within a few blocks of the shul’s actual location. I could even add a link to open the exact location in Google Maps so that users can see where the building is and how to get there. Whatever I did, I just had to make sure I stuck to my new rule of one shul per request. Practically, however, this “solution” led to a number of different problems, some of which I anticipated, and others I did not.

The first issue that I foresaw was that randomly jittering coordinates would inevitably create confusion in more synagogue-dense areas. One shul’s pin might be placed on the map closer to the actual location of a different shul, or multiple pins could all be jittered to be very close to each other, making it difficult to tell which pin is for which shul. To help reduce the confusion, I decided to update my jittering algorithm. If there were multiple shuls located near each other, I would group all of them into a single pin. Clicking on that pin would display information for the entire cluster of shuls. Of course, the cluster pin would still end up closer to one of the shuls than the others, but it would have a number on it to indicate that it represents multiple nearby shuls in order to make things a bit clearer.

A Shul Lost at Sea: Adventures in User Testing

With my new jittering algorithm in place, I shared the site with a few more friends and asked them to try it out. The response was clear: this feature still needed more work. One person tried saving a shul in Netanya, a city on Israel’s Mediterranean coast. After setting the correct location in the form, she was redirected to the homepage with the map focused on the new shul’s pin. However, my jittering “feature” had sent that pin about half a kilometer into the Mediterranean:

Technically, this was not a bug: the algorithm was working as I had intended. But the interface felt broken, so it had to be fixed. After some more brainstorming, I landed on a solution that would address the pain point without violating the new one-synagogue-per-request rule. I would update the add/edit shul form so that when a user saves the form and gets redirected back to the map, the saved shul’s pin would show up at the exact location on a one-time basis. On page reload, the pin would return to its jittered location. Problem solved. Or alleviated, at least.

Rate Limiting

While I was sure there was still more I could do to reduce the usability cost of jittering shul coordinates, I decided I could leave that for another day. It was now time to move on to part 2 of my data security plan: rate limiting. The one-synagogue-per-request rule meant that going forward, if someone wanted to pull precise location data for a large number of shuls from my site, they would have to make a large number of requests: one per shul. I could track those requests, set up rules to automatically block them when they started looking suspicious, and escalate my response for repeat offenders.

From my initial research, it seemed that the django-ratelimit package was the right tool for the job (https://github.com/jsocol/django-ratelimit). In order to set that up, I first had to decide how I was going to track requests: by IP address, by user account, or through some other custom configuration.

At the time, I was giving users the option to either sign in with Google or to set up an account with just an email and password. In order to reduce onboarding friction, I did not require non-Google users to verify their email. That meant it was easy to create new user accounts, so if I tracked activity by account it would be easy to bypass my rules. I also wasn’t looking to overcomplicate things with a custom setup. So I decided I would track by IP address instead.

I finished configuring django-ratelimit, and then built a response escalation system on top of it. The first instance of suspicious activity from a given IP address would require users to solve CAPTCHAs in order to keep using pages that exposed shul coordinates. Continued abuse would land them with increasing timeout periods until eventually that IP address would be handed a permanent ban.

Was all of this going a bit overboard given the actual threat level? I wasn’t sure. But I figured it was better to err on the side of caution.

Man Plans, the Machine Laughs

This wasn’t the sort of feature that I could easily ask someone else to try out, so I put on my test user hat and started playing around to see if I could break it.

I did… sort of.

I went to the edit page of one of the shuls, where I had set up rate limiting because the form exposes the shul’s exact location, and started repeatedly refreshing the page until I hit the rate limit. As expected, the system detected a violation and responded accordingly. But it didn’t show me a CAPTCHA form, as I had intended for first-time offenders. Instead, it told me that I was in a cooldown period and that I would have to wait before accessing the edit page again. According to my plan, that was supposed to happen after two instances of suspicious activity, not one. I checked the database. Somehow two instances of abuse were in fact recorded. From a security standpoint, the end result was better: more abuse detected meant more restrictions for sensitive URLs. But it didn’t make sense, and I didn’t want to start creating more friction for legitimate users than I had intended for reasons that I didn’t understand.

After some investigating, I discovered that I had created a multi-layered problem. The bug was partly due to faulty logic that I had set up around CAPTCHA checking, and partly the result of a more fundamental misunderstanding of how rate limiting works, or at least of how the django-ratelimit package implements it. The rest of this post will focus on a more in-depth explanation of these two issues, how I fixed them, and one additional security improvement that I made along the way.

Faulty CAPTCHA checking

As I’ve mentioned, my intent had been that one-time rate limit offenders should have to solve CAPTCHAs in order to gain continued access to sensitive pages. That meant I now had to run two different checks every time a request for one of those pages hit my server:

Rate limit check: I had to check if the request constituted a rate limit violation for its IP address, and if it did, increase the violation count for that IP.
CAPTCHA check: I had to check if there was already a recorded violation for that request’s IP address. If there was and a CAPTCHA was not already solved, then I should block the request and redirect the user to the CAPTCHA form.

I had both of these checks set up, but they were in the wrong order. The rate-limit check was happening before the CAPTCHA check, when really the CAPTCHA check needed to be first so it could intercept requests that already had a violation recorded and prevent an extra one from getting logged.

While I was busy repeatedly refreshing the edit page, one of my requests eventually exceeded the rate limit, and with it my first violation was recorded. Then that request hit the CAPTCHA check and failed, since I had not completed a CAPTCHA yet, so my server tried to redirect me to the CAPTCHA form. However, before that redirect completed, I had already pressed the refresh button again, with my browser still on the edit page. I never saw the CAPTCHA form because I was refreshing the edit page so quickly. When the new request hit my server, the rate limit check ran a second time. This new request was also in excess of my rate limit, just as the one before it had been, so a second violation was recorded and I was immediately bumped to the second response level of my escalation ladder: a cooldown period. Once I finally gave the refresh button a rest I landed on the timeout page as if I had committed two violations because, as far as my database was concerned, that’s exactly what had happened.

If I had ordered the two checks correctly, my second request would have hit the CAPTCHA check first and failed, just as the first one had, and then my server would have redirected me to the CAPTCHA form. The request would have never reached the rate limit check, so a second violation would not have been recorded.

Had that been the only problem, the fix would have been simple: just reverse the ordering of the two checks. But there was another, more fundamental issue at play that also needed to be addressed.

The limits of rate-limiting

The second part of my problem stemmed from a mismatch between the tool I was using, django-ratelimit, and the goal I was trying to achieve with it. By default, the django-ratelimit package works by asking a simple question every time a request hits the server: were at least X requests to this URL already made in the last Y minutes? If yes, the request gets blocked. If no, the request is allowed to pass through. My problem was that I was piggybacking on that same logic to build a related but different system, my response escalation ladder, and assumed it would just work. Well, it didn’t.

The core of the issue was that I hadn’t properly defined what a “violation” should mean in the context of my escalation ladder. In lieu of that definition, I had conflated my idea of a “violation” with any request that fails django-ratelimit's default check. So a first “violation” would be recorded the first time a request exceeded the rate limit. But if a second request came in right after from the same IP address, the new request would also be in excess of the rate limit. A second violation would be immediately recorded, sending that IP straight to level two on my escalation ladder.

Even if I had fixed the ordering of the CAPTCHA and rate limit checks, I still would have been left with this problem. My second request in excess of the rate limit would have been properly redirected to the CAPTCHA form without having an extra violation recorded. But once I solved the CAPTCHA, I would be redirected back to the edit page and the two checks would then run again on the redirected request. Since I would have just solved a CAPTCHA, that check would pass. But the redirected request would still be in excess of my rate limit, so a second violation would still be recorded.

Though I hadn't fully sketched out the details, I had intended for “violations” to somehow signify separate waves of requests to sensitive URLs, which I could then use as markers for repeated abuse. But django-ratelimit's check wasn’t going to give me that on its own. I needed a new plan.

A better solution

Now that I had a better understanding of my bug, I returned for another heart-to-motherboard with ChatGPT. Two conclusions came out of that conversation: I decided to switch from IP-based to user-based tracking, and I landed on a definition of “violation” that would better fit my idea of a wave of suspicious activity.

User-based tracking

My decision to switch from IP-based to user-based tracking was not, strictly speaking, a fix for my bug. I had just mentioned that I had been tracking by IP as background information, and ChatGPT recommended that I consider tracking by user instead. It alerted me to the fact that someone trying to pull data from my site could easily rotate IP addresses to get around my current rate limiting rules. That made enough sense to draw me in, but there were a few caveats.

First, as I explained above, I was not requiring non-Google users to verify their emails. That meant creating new accounts would be even easier than the IP-rotating that ChatGPT was warning me about, so switching to user-based tracking would in fact be less, not more, secure. Maybe it was time to start requiring email verification in order to close that gap, but I was hesitant. The project depended on volunteer contributions, and email verification would add an extra onboarding step for legitimate users. The more friction I created around setting up new accounts, the fewer people would bother doing it, and the fewer contributions I would get.

Aside from my hesitation around the usability cost of email verification, it also wasn’t clear to me that it would adequately address the security problem. Why couldn’t an attacker just find an email provider that would allow them to create enough accounts to bypass my user-based tracking? That concern was met with an assurance that “most large email providers” have security measures in place to prevent people from creating accounts in bulk. But something didn’t sit right: an attacker wouldn’t need “most large providers” to allow them to create lots of accounts. One small provider that I had never heard of would be enough. I was left with a potentially gaping hole in my rate-limiting plan. Perhaps it could be addressed with more sophisticated, ongoing updates, but I knew I would not have the bandwidth for that.

Concerned that I may have hit a wall, I stepped back for a moment. I already had a small number of users that had created accounts on my site since I started requiring log in. All of them had been given the option to either sign in with Google or set up an account with an email and password. All but one had signed in with Google. The one outlier had created their account using a gmail address. What if I just restricted logins to Google-only, and shifted the responsibility of preventing bulk account creation to Google? It wouldn’t be perfect, but no system is. It would tighten the security gap better than anything I would have set up on my own, and given the data I already had there would be little to no usability cost. It seemed like a fair trade-off. Going forward, I would switch to user-based tracking and users would only be able to sign up using a legitimate Google account.

Defining a “Violation”

With the decision to switch to user-based tracking squared away, I moved on to my remaining issue: figuring out a more appropriate definition for “violation,” and a better response pattern for when one occurs. I mapped out a new system that would still rely on django-ratelimit, but the star of the show would now be a separate per-user record that I’d call an “abuse state,” which would store an “abuse score.” The first time a user exceeded a rate limit, their “abuse score” would be set to 1, and their “abuse state” would become “active.” It would stay active until a certain amount of time of inactivity had passed. While an abuse state was active, I could apply additional restrictions to prevent the abuse from getting out of hand, but I would not increase the score. I would only bump the score to 2 once the abuse state became inactive and then a new batch of requests came in that exceeded the rate limit a second time. This way, the abuse score would better align with my original idea of a “wave” of abuse. I would also set up rules to automatically decrease the abuse score over time so that legitimate users who accidentally exceeded the rate limit would not get penalized indefinitely.

And they all lived securely ever after…

Satisfied with this new plan, I got to work on implementation. I switched the project over to Google-only sign-in, configured the new abuse prevention rules, tested it out and… everything seemed to work.

Were there still security holes that I wasn’t aware of? Maybe. But again, my goal was never to make it impossible to pull data from my site, only to make it more difficult. The new setup accomplished that. There may have been better solutions out there, and I’m sure I’ll make more changes in the future, but this would do the trick for now.

And with that, it was time for a new blog post.

Have other ideas to better balance security with usability? Email me at ari.abramowitz1@gmail.com

Want to see how I set all of this up? Check out the code here: https://github.com/EzNashDB/eznashdb

Protecting Location Data at The Ezrat Nashim Database

Coordinate Jittering

A Shul Lost at Sea: Adventures in User Testing

Rate Limiting

Man Plans, the Machine Laughs

Faulty CAPTCHA checking

The limits of rate-limiting

A better solution

User-based tracking

Defining a “Violation”

And they all lived securely ever after…

Comments (1)

More from this blog

We need to talk (more) about AI

In Response to a Spammer at rally4israel.com

The Philosopher

The Tale of the Flickering Monitors, part 2: You're (not) Grounded!

Command Palette

Coordinate Jittering

A Shul Lost at Sea: Adventures in User Testing

Rate Limiting

Man Plans, the Machine Laughs

Faulty CAPTCHA checking

The limits of rate-limiting

A better solution

User-based tracking

Google-only sign-in

Defining a “Violation”

And they all lived securely ever after…

Comments (1)

More from this blog