As always, frankly. Although now that we're one of their biggest customers, I think we get the best they have to offer.

read more
On Aug 26, 2019, at 2:43 PM, Michael Guerin  wrote:

Hey Joe,

Angela forwarded your email over to me - and I've been taking a look to see if I can clearly identify the issue. I've noticed that the physical machine your Droplet on has been a bit noisier than normal for the last several hours, which I think might be causing at least some of the impact you're seeing.

I'd like to perform a live migration of your Droplet to a new physical machine to see if that helps clear things up - judging by the amount of disk you're using, this should not take too long, and during the course of it, your Droplet will stay online but might see some additional performance decrease. 

I have a feeling the explains and index issues aren't directly related to the performance problems you've been seeing, since nothing on the DigitalOcean side would've caused an impact or any changes there.

Another possible improvement, though perhaps not one for right this moment, would be to consider moving this Droplet to NYC1 or NYC3, and utilizing one of our performance plans. They have dedicated CPUs and are generally a bit better for sensitive workloads like databases - unfortunately we don't currently support them in NYC2. This would require you to transfer a snapshot, and create a new Droplet, with a new IP address - but it might be worthwhile down the line, since NYC3 and NYC1 receive support for our newest products unlike NYC2.

Best,
	
Michael Guerin
Manager, Customer Success
michael@digitalocean.com



On Aug 26, 2019, at 2:51 PM, Joe Puccio  wrote:

Hi Michael, 

What sort of performance degradation would we see exactly during the move to a new physical machine? 

Thanks so much for your help. We actually upgraded our server to the current tier last week specifically because we were expecting increased load. 

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com




On Mon, Aug 26, 2019 at 5:56 PM Joe Puccio  wrote:
I'm mainly wondering if there's someone at DigitalOcean who is an expert with databases (like Cassio, who I had talked to before) who would be able to advise us on this matter. 

My guess is that this is more on our side than DigitalOcean's. We're barely taxing the plan we're on RAM wise, and it seems like this has more to do with our MySQL setup and queries than anything else. 

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com




On Mon, Aug 26, 2019 at 6:07 PM Michael Guerin  wrote:
Hey Joe,

I expect that while there may be some improvements to be had on the MySQL front, your Droplet was directly running into steal: 



You can see it in the blue line that spiked much higher than previously, and would've prevented your Droplet from getting all the resources it was trying to. I've taken some steps to minimize the steal on this physical machine, but I think it would also help if you powered off and back on the Droplet, as that will help our systems balance the resources and should resolve an issue that occasionally happens after a resize.

Regarding the index issues - I'm happy to help you troubleshoot those (or pull in someone who can, if it gets beyond me), but solving the CPU contention is a critical first step here.

Best,
Michael



On Aug 26, 2019, at 3:17 PM, Michael Guerin  wrote:

Hey Joe,

It actually looks like we've managed to remove the contention that your Droplet was seeing, and I'm curious as to whether you're seeing any performance improvements? I understand that it looks like the usage is pretty spiky, is there a better time to know if we're seeing things stabilize?

Best,


Hi Michael, 

After your email at 6:07pm I restarted all of our scripts to see how the server is managing them. It appears as though things magically may be better now. I'm going to continue to monitor over the next couple of minutes and keep you updated. 

Screenshot of "top" is attached, now that all scripts are running. Previously, this would shoot up to load average 70, now it's hovering in the teens. 



Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com


Hey Joe,

I expect the steal was causing load to spike as it struggled to get the resources it needed. 

Contention of this sort is relatively rare in NYC2, but it looks like a few neighboring Droplets were heavily using resources.

I've managed to move those to other physical machines and thus free up resources for you. In the longer term, it might be worth experimenting with performance Droplets as they have dedicated cores and aren't impacted by neighboring Droplets, but feel free to reach out to me if you do see any recurring issues and we can usually mitigate the problem.

Best,




Hi Michael, 

That makes sense, and explains why this happened all of a sudden even though our database hadn't grown significantly nor had we made any code changes. 

It appears as though everything is fully back to normal now, and is running smoothly. Thank you so much for your prompt assistance. This is precisely why we love DigitalOcean. 

I'd really like to do a transfer to a performance Droplet; I didn't realize our neighbors could impact us so heavily, because we've never had this happen before in the 6 years we've used DigitalOcean. The main reason we didn't transfer to a performance Droplet one week ago when we resized from 8 GB, 4 vCPUs to 32 GB, 8 vCPUs was because NYC2 doesn't allow CPU optimized Droplets at this time: "Due to high demand and capacity restrictions we have temporarily disabled this size in this region." is what it says for all of the "CPU optimized Droplets" and "General Purpose" Droplets. 

What plan and datacenter would you suggest we move to? We have a lot of other Droplets in NYC1 under two other accounts (joseph.puccio@gmail.com and joe@coursicle.com), which has been fine. I think Cassio mentioned something about the hardware in NYC2 being a bit old? The biggest thing is, because this is our web server, we really don't want to change our IP, so that's something we really want to avoid. Wondering if you have any solution here, like if we could make our current IP a floating IP and then redirect traffic to the new "General Purpose" Droplet we setup in NYC1 or something.

Also, could you explain exactly what the blue, yellow, and orange lines were in the graph that you sent? Is one our CPU usage and the other the CPU usage of other droplets on the same physical machine? 

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com




Hey Joe,

Unfortunately, if maintaining the IP is vital, then there's not an easy solution. Sometime down the line, we may be able to separate IPs from datacenters, but due to some architectural decisions we've made, it's not a trivial thing and it's likely going to be some time before we can really support moving an IP. Floating IPs are similarly restricted. 

NYC2's hardware is a bit old, and the region is disabled for new users - which is why generally speaking you won't see too much contention. Most issues of this nature are driven by abusive usage, or fraud, when users spin up cloud resources to do something like mine bitcoin. That doesn't happen in a region like NYC2 because it's locked down to long-time users, so you might honestly be okay barring one-off edge cases like occurred here.

On the graph I showed, that was your Droplet's CPU usage broken down into "cpu", "sys", and "steal". The blue line was steal, which was the main source of the issue.

Best,
	
Michael Guerin
Manager, Customer Success
michael@digitalocean.com


Hey Michael, 

Got it. It's not vital, but it's preferred because otherwise we'll have to deal with DNS propagation which there's no great time to do. 

That makes sense. Did you track down whether this specific issue was due to abuse? Ultimately, we'll want to end up with a Droplet that has dedicated CPU that we won't have to contest for, but since there's no real way to do that in NYC2, I think we'll have to stick it out in NYC2 with our current plan until we redesign our web server/database architecture to be more flexible. 

Is there a way you can assure that this won't, or is very unlikely to, happen again with our current physical hardware? Today was the first day of class for hundreds of colleges and we experienced 4 hours of downtime, so today was actually one of the worst days this could have happened. The next couple weeks will be similarly busy.

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com




Hey Joe,

In this case it looks like it was legitimate usage that was recently moved onto this physical machine. I would consider it unlikely that you'll see a recurrence, as in general NYC2 tends to not see a lot of shuffling of workloads or deployments. If you do see any issues, feel free to reach out and we should be able to identify and resolve it pretty quickly now that we've seen the root cause. Again though, I don't think it's very likely that you'll see the issue return in the near-term.

Best,
	
Michael Guerin


On Tue, Aug 27, 2019 at 12:37 PM Joe Puccio  wrote:
Hi Michael, 

Can you please check for contention again right now? We just saw a large spike in load average. 

Best, 

Joe Puccio
Co-founder, Coursicle



On Tue, Aug 27, 2019 at 9:40 AM -0700, "Michael Guerin"  wrote:

Hey Joe,

Taking a look.

Best,



On Tue, Aug 27, 2019 at 9:51 AM -0700, "Michael Guerin"  wrote:

Hi Joe,

Things should be alleviating - I think it would be beneficial for you to perform a full shutdown of the Droplet and power it back on, as soon as possible.

This will help rebalance how your Droplet is allocated on the physical host, and should help with things moving forward. I'm not seeing that there should've been any contention, but occasionally after a resize or migration, the balancing isn't being done in the most effective way until the Droplet is powered off and on one more time.

If we continue to see issues after a reboot, I will escalate this internally to see if we can find any other causes for the contention, since usage-wise the overall physical host is not anywhere close to hitting full utilization.

Best,
	
Michael Guerin



On Tue, Aug 27, 2019 at 12:56 PM Joe Puccio  wrote:
Hey Michael,

Things may be alleviating because I just killed the majority of our scripts which tax the database/CPU of the web server. Did you see a brief point where things were maxed out?

I’ll have to perform a shutdown tonight, since we can’t go down on the second day of school.

It would be helpful if you could let me know what the fastest way (minimizing DNS propagation, etc) would be to migrate to a data center that supports the dedicated CPUs so that we don’t have to worry about this anymore. We never had these issues until we resized to the current tier, which was an attempt to prepare for exactly something like this.

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com




On Aug 27, 2019, at 10:20 AM, Michael Guerin  wrote:

Hey Joe,

The following would be the easiest way to move to a new Datacenter:

1) Update the TTL on all of your relevant DNS records to the minimum that DigitalOcean supports, which is 30 seconds. This will decrease the amount of time that records remain cached for users, though they will only get the new TTL when their cache expires, so you should likely wait at least 30 minutes before proceeding. I can actually do this from the admin side for you, and it might be simpler that way.

2) Take a snapshot of your Droplet. If possible, it would be ideal to take your site down for maintenance starting at this time, and perform this snapshot offline. You should not destroy the original Droplet until you are sure everything is working properly on the new Droplet.

3) Transfer that snapshot to NYC1.  (https://www.digitalocean.com/docs/images/snapshots/how-to/change-regions/)

4) Create a new Droplet, with the desired plan, in NYC1 from that snapshot. Since your Droplet has 80GB of disk, you can use either the 8 vCPU, 16GB RAM CPU optimized plan @ $160 per month, or the 32GB 8 vCPU plan @ $240.

5) Update your DNS records to point to the new Droplet's IP. This should propagate quickly, and with the reduced TTL, users should not end up on the cached old DNS so long as it has propagated and the cache has expired.

6) You can leverage something like this to check propagation: https://dnschecker.org/

7) Validate that the new site is reachable, and that DNS has been updated appropriately. Keep the original Droplet offline so that you can be clear that you're reaching the new Droplet. 

8) I'd suggest keeping the old Droplet for some period of time, then taking a final snapshot and destroying. Additionally, return the DNS ttl to the original setting, as having a short expiration time is not necessary in most situations.

Best,
	
Michael Guerin
Manager, Customer Success




On Thu, Aug 29, 2019 at 7:25 PM Joe Puccio  wrote:
Hey Michael, 

Thanks so much for sending over those detailed instructions and my apologies for the late reply; it's been hectic this week as our users head back to school. 

Per your suggestion, on Tuesday night I restarted our Droplet to allow for better balancing on the physical host. It's hard to know for sure since traffic generally declines later in the week, but it appears anecdotally that that may have made a difference in the load averages. 

Last night I updated the TTL of our "coursicle.com" A record to 30 seconds, so that we'll be ready to make the move (possibly this weekend, but more likely in a couple weeks when our usage declines).  

Thank you again for your help and I'll let you know if I hit any other issues. 

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com



On Aug 29, 2019, at 4:38 PM, Michael Guerin  wrote:

Hey Joe,

Great to hear that things have stabilized!

If you run into any issues during the migration, or have any other questions, I'll be happy to help.

Best,




On Fri, Sep 13, 2019 at 1:47 PM Joe Puccio  wrote:
Hey Michael, 

Hope all is well! Late last night we followed your directions for moving our web server to NYC1 and everything went very smoothly! For added flexibility, we pointed our DNS to a floating IP (really handy).

We're now onto the task of revitalizing our infrastructure. Some of our plans over the next couple of weeks:
Currently our database is on our web server. We're going to create a Droplet dedicated to our database (i.e. sever our web server from our database).
Set up a Private Network so that all Droplets stop using the public interface for inter-server communication (we couldn't do this until now, because our Droplets were spread across 3 different DigitalOcean accounts). 
Investigate using DigitalOcean's Load Balancer service, and figure out what application layer code would need to be changed in order to do so.  

Let me know if you have any suggestions on this kind of stuff!

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com


On Sep 13, 2019, at 11:50 AM, Michael Guerin  wrote:

Hey Joe,

Depending on your traffic load, you may be best off not using DigitalOcean's load balancer service - we've been currently working on a v2 of that product, but the current iteration has run into some challenges across high scale loads. It's probably worth testing, but in general, if your web server is just on one Droplet right now there's not a big advantage to putting an LB in front.

For your database, you could test out our new managed databases:  https://www.digitalocean.com/docs/databases/mysql/  

If you run into any issues, happy to answer questions! Glad to hear the migration went smoothly!

Best,
	




Hey Michael, 

Got it, that's very good to know. In that case, we may set up our own or wait for v2 of yours. Not sure how to convey the size of our traffic load, but indeed we are running on just one Droplet right now. That said, for uptime reasons, it would be convenient to have a load balancer distributing traffic to multiple web servers so that if we need to do upgrades, our site/app API doesn't have to go down.

We looked at the managed database (we use MySQL, which I know you just added support for), but ultimately we decided we'd rather do the database management in-house since it's so critical to get right (plus we have a rather unconventional setup: mostly MyISAM tables which are read-mostly, and about 4000 of them and growing). We've done a decent number of configuration tweaks in the past couple weeks that appear to have helped a lot, but if we hit a wall, it would certainly be nice to have a database person at DigitalOcean to bounce things off of. I think that would probably be the biggest way DigitalOcean could help us (even more than you already do). 

Thanks again so much for your help and kindness! 

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com