<?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
     <title>BigBinary Blog</title>
     <link href="https://www.bigbinary.com/feed.xml" rel="self"/>
     <link href="https://www.bigbinary.com/"/>
     <updated>2026-05-19T04:31:17+00:00</updated>
     <id>https://www.bigbinary.com/</id>
     <entry>
       <title><![CDATA[Configuring the Kubernetes Horizontal Pod Autoscaler to scale based on custom metrics from Prometheus]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/prometheus-adapter"/>
      <updated>2024-07-23T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/prometheus-adapter</id>
      <content type="html"><![CDATA[<p>Some of the major upsides of using Kubernetes to manage deployments are theself-healing and autoscaling capabilities of Kubernetes. If a deployment has asudden spike of traffic, Kubernetes will automatically spin up new containersand handle that load gracefully. It will also scale down deployments when thetraffic reduces.</p><p>Kubernetes has<a href="https://www.bigbinary.com/blog/solving-scalability-in-neeto-deploy#understanding-kubernetes-autoscalers">a couple of different ways</a>to scale deployments automatically based on the load the application receives.The<a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/">Horizontal Pod Autoscaler (HPA)</a>can be used out of the box in a Kubernetes cluster to increase or decrease thenumber of Pods of your deployment. By default, HPA supports scaling based on CPUand memory usage, served by the<a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server">metrics server</a>.</p><p>While building <a href="https://neeto.com/neetodeploy">NeetoDeploy</a> initially, we'd setup to scale deployments based on CPU and memory usage, since these were thedefault metrics supported by the HPA. However, later we wanted to scaledeployments based on the average response time of our application.</p><p>This is an example of a case where the metric we want to scale is not directlyrelated to the CPU or the memory usage. Other examples of this could be networkmetrics from the load balancer, like the number of requests received in theapplication. In this blog, we will discuss how we achieved autoscaling ofdeployments in Kubernetes based on the average response time using<a href="https://github.com/kubernetes-sigs/prometheus-adapter">prometheus-adapter</a>.</p><p>When an application receives a lot of requests suddenly, this creates a spike inthe average response time. The CPU and memory metrics also spike, but they takelonger to catch up. In such cases, being able to scale deployments based on theresponse time will ensure that the spike in traffic is handled gracefully.</p><p><a href="https://prometheus.io/">Prometheus</a> is one of the most popular cloud nativemonitoring tools and the Kubernetes HPA can be extended to scale deploymentsbased on metrics exposed by Prometheus. We used the <code>prometheus-adapter</code> tobuild autoscaling based on the average response time in<a href="https://neeto.com/neetodeploy">NeetoDeploy</a>.</p><h2>Setting up the custom metrics</h2><p>We took the following steps to make our HPAs work with Prometheus metrics.</p><ol><li>Installed <code>prometheus-adapter</code> in our cluster.</li><li>Configured the metric we wanted for our HPAs as a custom metric in the<code>prometheus-adapter</code>.</li><li>Confirmed that the metric is added to the <code>custom.metrics.k8s.io</code> APIendpoint.</li><li>Configured an HPA with the custom metric.</li></ol><h2>Install prometheus-adapter in the cluster</h2><p><a href="https://github.com/kubernetes-sigs/prometheus-adapter">prometheus-adapter</a> isan implementation of the <code>custom.metrics.k8s.io</code> API using Prometheus. We usedthe prometheus-adapter to set up Kubernetes metrics APIs for our Promtheusmetrics, which then can be used with our HPAs.</p><p>We installed <code>prometheus-adapter</code> in our cluster using <a href="https://helm.sh/">Helm</a>.We got a template for the values file for the Helm installation<a href="https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus-adapter/values.yaml">here</a>.</p><p>We made a few changes to the file before we applied it to our cluster anddeployed <code>prometheus-adapter</code>:</p><ol><li>We made sure that the Prometheus deployment is configured properly by givingthe correct service url and port.</li></ol><pre><code class="language-yaml"># values.yamlprometheus:  # Value is templated  url: http://prometheus.monitoring.svc.cluster.local  port: 9090  path: &quot;&quot;# ... rest of the file</code></pre><ol start="2"><li>We made sure that the custom metrics that we needed for our HPA areconfigured under <code>rules.custom</code> in the <code>values.yaml</code> file. In the followingexample, we are using the custom metric <code>traefik_service_avg_response_time</code>since we'll be using that to calculate the average response time for eachdeployment.</li></ol><pre><code class="language-yaml"># values.yamlrules:  default: false  custom:    - seriesQuery:'{__name__=~&quot;traefik_service_avg_response_time&quot;, service!=&quot;&quot;}'      resources:        overrides:          app_name:            resource: service          namespace:            resource: namespace      metricsQuery: traefik_service_avg_response_time{&lt;&lt;.LabelMatchers&gt;&gt;}</code></pre><p>Once we configured our <code>values.yaml</code> file properly, we installed<code>prometheus-adapter</code> in our cluster with Helm.</p><pre><code class="language-bash">helm repo add prometheus https://prometheus-community.github.io/helm-chartshelm repo updatehelm install prom-adapter prometheus-community/prometheus-adapter --values values.yaml</code></pre><h2>Query for custom metric</h2><p>Once we got <code>prometheus-adapter</code> running, we queried our cluster to check if thecustom metric is coming up in the <code>custom.metrics.k8s.io</code> API endpoint.</p><pre><code class="language-bash">kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq</code></pre><p>The response looked like this:</p><pre><code class="language-json">{  &quot;kind&quot;: &quot;APIResourceList&quot;,  &quot;apiVersion&quot;: &quot;v1&quot;,  &quot;groupVersion&quot;: &quot;custom.metrics.k8s.io/v1beta1&quot;,  &quot;resources&quot;: [    {      &quot;name&quot;: &quot;services/traefik_service_avg_response_time&quot;,      &quot;singularName&quot;: &quot;&quot;,      &quot;namespaced&quot;: true,      &quot;kind&quot;: &quot;MetricValueList&quot;,      &quot;verbs&quot;: [&quot;get&quot;]    },    {      &quot;name&quot;: &quot;namespaces/traefik_service_avg_response_time&quot;,      &quot;singularName&quot;: &quot;&quot;,      &quot;namespaced&quot;: false,      &quot;kind&quot;: &quot;MetricValueList&quot;,      &quot;verbs&quot;: [&quot;get&quot;]    }  ]}</code></pre><p>We also queried the metric API for a particular service we've configured themetric for. Here, we're querying the <code>traefik_service_avg_response_time</code> metricfor the <code>neeto-chat-web-staging</code> app in the default namespace.</p><pre><code>kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/services/neeto-chat-web-staging/traefik_service_avg_response_time | jq</code></pre><p>The API response gave the following.</p><pre><code class="language-json">{  &quot;kind&quot;: &quot;MetricValueList&quot;,  &quot;apiVersion&quot;: &quot;custom.metrics.k8s.io/v1beta1&quot;,  &quot;metadata&quot;: {},  &quot;items&quot;: [    {      &quot;describedObject&quot;: {        &quot;kind&quot;: &quot;Service&quot;,        &quot;namespace&quot;: &quot;default&quot;,        &quot;name&quot;: &quot;neeto-chat-web-staging&quot;,        &quot;apiVersion&quot;: &quot;/v1&quot;      },      &quot;metricName&quot;: &quot;traefik_service_avg_response_time&quot;,      &quot;timestamp&quot;: &quot;2024-02-26T19:31:33Z&quot;,      &quot;value&quot;: &quot;19m&quot;,      &quot;selector&quot;: null    }  ]}</code></pre><p>From the response, we can see that the average response time at the instant isreported as <code>19ms</code>.</p><h2>Create the HPA</h2><p>Now that we're sure that <code>prometheus-adapter</code> is able to serve custom metricsunder the <code>custom.metrics.k8s.io</code> API, we wired this up with a Horizontal PodAutoscaler to scale our deployments based on our custom metric.</p><pre><code class="language-yaml">apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata:  name: my-app-name-hpaspec:  scaleTargetRef:    apiVersion: apps/v1    kind: Deployment    name: my-app-name-deployment  minReplicas: 1  maxReplicas: 10  metrics:    - type: Object      object:        metric:          name: traefik_service_avg_response_time          selector: { matchLabels: { app_name: my-app-name } }        describedObject:          apiVersion: v1          kind: Service          name: my-app-name        target:          type: Value          value: 0.03</code></pre><p>With everything set up, the HPA was able to fetch the custom metric scraped byPrometheus and scale our Pods up and down based on the value of the metric. Wealso created a recording rule in Prometheus for storing our custom metricqueries and dropped the unwanted labels as a best practice. We can use thecustom metric stored with the recording rule directly with <code>prometheus-adapter</code>to expose the metrics as an API endpoint in Kubernetes. This is helpful whenyour custom metric queries are complex.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email atinvite@neeto.com.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">X</a>. You can also join our<a href="https://launchpass.com/neetohq">community Slack</a> to chat with us about anyNeeto product.</p>]]></content>
    </entry><entry>
       <title><![CDATA[How we fixed app downtime issue in NeetoDeploy]]></title>
       <author><name>Abhishek T</name></author>
      <link href="https://www.bigbinary.com/blog/how-we-fixed-app-down-time-in-neeto-deploy"/>
      <updated>2024-07-09T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/how-we-fixed-app-down-time-in-neeto-deploy</id>
      <content type="html"><![CDATA[<p><em>We are building <a href="https://neeto.com/neetoDeploy">NeetoDeploy</a>, a compellingalternative to Heroku. Stay updated by following NeetoDeploy on<a href="https://twitter.com/neetodeploy">Twitter</a> and reading our<a href="https://www.bigbinary.com/blog/categories/neetodeploy">blog</a>.</em></p><p>At <a href="https://www.neeto.com/">neeto</a> we are building 20+ applications, and most ofour applications are running in NeetoDeploy. Once we migrated from Heroku toNeetoDeploy, we started getting 520 response code for our applications. Thisissue was occurring randomly and rarely.</p><h3>What is 520 response code?</h3><p>A 520 response code happens when the connection is started on the origin webserver, but the request is not completed. This could be due to server crashes orthe inability to handle the incoming requests because of insufficient resources.</p><p>When we looked at our logs closely, we found that all the 520 response codesituations occurred when we restarted or deployed the app. From this, weconcluded that the new pods are failing to handle requests from the clientinitially and working fine after some time.</p><h3>What is wrong with new pods?</h3><p>Once our investigation narrowed down to the new pods, we quickly realized thatrequests are arriving at the server even when the server is not fully ready yetto take new requests.</p><p>When we create a new pod in Kubernetes, it is marked as &quot;Ready&quot;, and requestsare sent to it as soon as its containers start. However, the servers initiatedwithin these containers may require additional time to boot up and to becomeready to accept the requests fully.</p><h4>Let's try restarting the application</h4><pre><code class="language-bash">$ kubectl rollout restart deployment bling-staging-web</code></pre><p>As we can see, a new container is getting created for the new pod. The READYstatus for the new pod is 0. It means it's not yet READY.</p><pre><code class="language-bash">NAME                               READY  STATUS             RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  1/1    Running            0         2m8sbling-staging-web-79fc6f978-cdjf5  0/1    ContainerCreating  0         5s</code></pre><p>Now we can see that the new pod is marked as READY (1 out of 1), and the old oneis terminating.</p><pre><code class="language-bash">NAME                               READY  STATUS             RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  0/1    Terminating        0         2m9sbling-staging-web-79fc6f978-cdjf5  1/1    Running            0         6s</code></pre><p>The new pod is shown as <code>READY</code> as soon as the container was created. But onchecking the logs, we could see that the server was still starting up and notready yet.</p><pre><code>[1] Puma starting in cluster mode...[1] Installing dependencies...</code></pre><p>From the above observation, we understood that the pod is marked as &quot;READY&quot;right after the container is created. Consequently, requests are received evenbefore the server is fully prepared to serve them, and they get a 520 responsecode.</p><h2>Solution</h2><p>To fix this issue, we must ensure that pods are marked as &quot;Ready&quot; only after theserver is up and ready to accept the requests. We can do this by usingKubernetes<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/">health probes</a>.More than six years ago we wrote<a href="https://www.bigbinary.com/blog/deploying-rails-applications-using-kubernetes-with-zero-downtime">a blog</a>on how we can leverage the readiness and liveness probes of Kubernetes.</p><h3>Adding Startup probe</h3><p>Initially, we only added<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-startup-probes">Startup probe</a>since we had a problem with the boot-up phase. You can read more about theconfiguration settings<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes">here</a>.</p><p>The following configuration will add the Startup probe for the deployments:</p><pre><code class="language-yaml">startupProbe:  failureThreshold: 10  httpGet:    path: /health_check    port: 3000    scheme: HTTP  periodSeconds: 5  successThreshold: 1  timeoutSeconds: 60  initialDelaySeconds: 10</code></pre><p><code>/health_check</code> is a route in the application that is expected to return a 200response code if all is going well. Now, let's restart the application againafter adding the Startup probe.</p><p>Container is created for the new pod, but the pod is still not &quot;Ready&quot;.</p><pre><code class="language-bash">NAME                               READY  STATUS            RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  1/1    Running           0         2m8sbling-staging-web-79fc6f978-cdjf5  0/1    Running           0         5s</code></pre><p>The new pod is marked as &quot;Ready&quot;, and the old one is &quot;Terminating&quot;.</p><pre><code class="language-bash">NAME                               READY  STATUS            RESTARTS  AGEbling-staging-web-656f74d9d-6kpzz  0/1    Terminating       0         2m38sbling-staging-web-79fc6f978-cdjf5  1/1    Running           0         35s</code></pre><p>If we check the logs, we can see the health check request:</p><pre><code>  [1] Puma starting in cluster mode...  [1] Installing dependencies...  [1] * Puma version: 6.3.1 (ruby 3.2.2-p53) (&quot;Mugi No Toki Itaru&quot;)  [1] *  Min threads: 5  [1] *  Max threads: 5  [1] *  Environment: heroku  [1] *   Master PID: 1  [1] *      Workers: 1  [1] *     Restarts: () hot () phased [1] * Listening on http://0.0.0.0:3000 [1] Use Ctrl-C to stop [2024-02-10T02:40:48.944785 #23]  INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Started GET &quot;/health_check&quot; for 192.168.120.195 at 2024-02-10 02:40:48 +0000 [2024-02-10T02:40:48.946148 #23]  INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Processing by HealthCheckController#healthy as */* [2024-02-10T02:40:48.949292 #23]  INFO -- : [bb9e756a-51cc-4d6b-9a4a-96b0464f6740] Completed 200 OK in 3ms (Allocations: 691)</code></pre><p>Now, the pod is marked as &quot;Ready&quot; only after the health check succeeds, in otherwords, only when the server is prepared to accept the requests.</p><h3>Fixing the Startup probe for production applications</h3><p>Once we released the health check for our deployments, we found that healthchecks were failing for all production applications but working for staging andreview applications.</p><p>We were getting the following error in our production applications.</p><pre><code class="language-bash">Startup probe failed: Get &quot;https://192.168.43.231:3000/health_check&quot;: http: server gave HTTP response to HTTPS client2024-02-12 06:40:04 +0000 HTTP parse error, malformed request: #&lt;Puma::HttpParserError: Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?&gt;</code></pre><p>From the above logs, it was clear that the issue was related to SSLconfiguration. On comparing the production environment configuration with theothers, we figured out that we had enabled<a href="https://guides.rubyonrails.org/configuring.html#config-force-ssl">force_ssl</a>for production applications. The <code>force_ssl=true</code> setting ensures that allincoming requests are SSL encrypted and will automatically redirect to their SSLcounterparts.</p><p>The following diagram broadly shows the path of an incoming request.</p><p><img src="/blog_images/2024/how-we-fixed-app-down-time-in-neeto-deploy/image4.png" alt="HTTPS request path"></p><p>From the above diagram, we can infer the following things:</p><ul><li>SSL verification is happening in the ingress controller and not in the server.</li><li>Client requests are going through the ingress controller before reaching theserver.</li><li>Request from ingress controller to the pod is an HTTP request.</li><li>The HTTP health check requests are directly sent from Kubelet to the pod anddo not go through the ingress controller.</li></ul><p>Here is how our health check request works.</p><ol><li><a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/">Kubelet</a>sends an HTTP request to the server directly.</li><li>Since <code>force_ssl</code> is enabled,<a href="https://api.rubyonrails.org/v7.1.2/classes/ActionDispatch/SSL.html">ActionDispatch::SSL</a>middleware redirects the request to HTTPS.</li><li>When the HTTPS request reaches the server, <a href="https://puma.io/">Puma</a> throws<code>Are you trying to open an SSL connection to a non-SSL Puma?</code> error since noSSL certificates are configured with the server.</li></ol><p>The solution to our problem lies in understanding why only the health checkrequest is rejected, whereas the request from the ingress controller is not,even though both are HTTP requests. This is because ingress controller sets someheaders before forwarding to the pod, and the header we are concerned about is<a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-Proto">X-FORWARDED-PROTO</a>.The <code>X-Forwarded-Proto</code> header contains the HTTP/HTTPS scheme the client used toaccess the application. When a client makes an HTTPS request, the ingresscontroller terminates the SSL/TLS connection and forwards the request to thebackend service using plain HTTP after adding the<code>X-Forwarded-Proto</code> along withthe other headers.</p><p>Everything started working after adding the <code>X-Forwarded-Proto</code> header to ourstartup probe request.</p><pre><code class="language-yaml">startupProbe:  failureThreshold: 10  httpGet:    httpHeaders:      - name: X-FORWARDED-PROTO        value: https    path: &lt;%= health_check_url %&gt;    port: &lt;%= port %&gt;    scheme: HTTP  periodSeconds: 5  successThreshold: 1  timeoutSeconds: 60  initialDelaySeconds: 10</code></pre><p>We also added<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes">Readiness</a>and<a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request">Liveness</a>probes for our deployments.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email at<a href="mailto:invite@neeto.com">invite@neeto.com</a>.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">Twitter</a>. You can also join ourSlack community to chat with us about any Neeto product.</p>]]></content>
    </entry><entry>
       <title><![CDATA[How we added sleep when idle feature to NeetoDeploy and reduced cost]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps"/>
      <updated>2024-01-19T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps</id>
      <content type="html"><![CDATA[<p><em>We are building <a href="https://neeto.com/neetoDeploy">NeetoDeploy</a>, a compellingHeroku alternative. Stay updated by following NeetoDeploy on<a href="https://twitter.com/neetodeploy">Twitter</a> and reading our<a href="https://www.bigbinary.com/blog/categories/neetodeploy">blog</a>.</em></p><h2>What is sleep when idle feature</h2><p>&quot;Sleep when idle&quot; is a feature of NeetoDeploy, which puts the deployedapplication to sleep when there is no hit to the server for 5 minutes. Thishelps reduce the cost of the server.</p><p>&quot;Sleep when idle&quot; feature can be enabled not only for the pull request reviewapplications, but for staging and production applications too. Many folks buildapplications to learn and for hobby. In such cases, there is no point in runningthe server when the server is not likely to get any traffic. Since NeetoDeploybilling is based on the usage &quot;Sleep when idle&quot; feature helps keep the bill lowfor the users.</p><p>Let's say you build something and you deployed to production. You shared it withyour friends. For a day or two you got a bit of traffic, and after that youmoved on to other things. If &quot;sleep when idle&quot; is enabled then you don't need toworry about anything. If the server is not getting any traffic then you will notbe billed.</p><h2>How is Neeto using sleep when idle feature</h2><p>At <a href="https://neeto.com">neeto</a>, we are building 20+ applications at the sametime. It means lots of pull requests for all these products and thus lots of PRreview apps are created.</p><p>For a long time, we were using Heroku to build the review apps. However whenNeetoDeploy started to become stable, we movedto generating PR review apps fromHeroku to NeetoDeploy. This helped reduce cost.</p><h2>How to make deployments sleep when idle?</h2><p>This video describes how &quot;sleep when idle&quot; feature is implemented.</p><p>&lt;iframewidth=&quot;560&quot;height=&quot;315&quot;src=&quot;https://youtube.com/embed/trn2DJyTjnw&quot;frameborder=&quot;0&quot;title=&quot;How we designed NeetoDeploy's 'sleep when idle' feature&quot;allow=&quot;accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture&quot;allowfullscreen</p><blockquote><p>&lt;/iframe&gt;</p></blockquote><p>Keeping the apps running only when they're being used involves two steps:</p><ol><li>Scaling the deployments down and bringing them back up again</li><li>Figuring out when to do the scaling</li></ol><p>The deployments can be scaled easily using the <code>kubectl scale</code> command. Forexample, if we want to turn our deployment off, we can run the following toupdate our deployment to zero replicas, essentially destroying all the pods.</p><pre><code class="language-bash">kubectl scale deployment/nginx --replicas=0</code></pre><p>We can also delete our service, ingress or any other resource we might havecreated for our deployment. The configuration of the deployment itself wouldstill be present in the cluster even when we make it sleep, since the KubernetesDeployment is not deleted.</p><p>When we want to bring our app back up again, we can use the same command to spinup new pods:</p><pre><code class="language-bash">kubectl scale deployment/nginx --replicas=1</code></pre><p>The challenge was to figure out <em>when</em> to do this. We decided that we'd have athreshold based on the time the app is last accessed by users. If theapplication is not accessed for more than five minutes, we consider theapplication to be idle and we will scale it down. It'll be brought back up whena user tries to access it again.</p><h2>Exploring existing solutions</h2><p>There are existing CNCF projects like <a href="https://knative.dev/">Knative</a> and<a href="https://keda.sh/">Keda</a>, which can potentially be used to achieve what we wanthere. We spent some time exploring these but realized that these solutionsweren't exactly suitable for our requirements. Kubernetes also natively has a<code>HPAScaleToZero</code><a href="https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates">feature gate</a>which enables the<a href="https://www.bigbinary.com/blog/solving-scalability-in-neeto-deploy#understanding-kubernetes-autoscalers">Horizontal Pod Autoscaler</a>to scale down deployments to zero pods, but this is still in alpha and, hence isnot available in EKS yet.</p><p>Ultimately, we decided to write our own service for achieving this. The entirebackend of NeetoDeploy was designed as<a href="https://www.bigbinary.com/blog/neeto-deploy-zero-to-one">a collection of microservices</a>from day one. So it made sense to build our <em>pod idling service</em> as anothermicroservice that runs in our cluster.</p><h2>Figuring out when to make applications sleep</h2><p>To know when applications can be idled, we need to know when people areaccessing the applications from their browsers. Since all the requests toapplications deployed on NeetoDeploy would go through our load balancer, itwould contain the information of when every app was last accessed.</p><p>We use <a href="https://traefik.io/traefik/">Traefik</a> as our load balancer and we usedTraefik's <a href="https://doc.traefik.io/traefik/middlewares/overview/">middlewares</a> toretrieve and process the information of when apps are being accessed. We wrote acustom middleware to send all the request information to the pod idling service,whenever an app is being accessed. The pod idling service would store all theURLs, along with the timestamp at which they were accessed, in a Redis cache.The following graphic shows how the request information would be collected andstored by the pod idling service into its Redis cache, both of which are runningwithin the cluster.</p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/pod-idling-new-architecture.png" alt="The architecture of the pod idling service"></p><p>The pod idling service would then filter the apps that were last accessed morethan five minutes ago. It then sends a request to the cluster to scale all theseapps down. We'd also delete any related resources like the Services and theIngressRoutes used to configure networking for the deployments.</p><p>We first tested this by running the service manually, and sure enough, all theinactive deployments are filtered and scaled properly. We then added this as acron job in the pod idling service, which would run every five minutes. Thismeans that no app would run for more than five minutes if they're not beingused.</p><p>But wait! How would we bring the app back up after scaling it down?</p><h2>Building the downtime service</h2><p>As we discussed above, we use Traefik's IngressRoutes to route traffic to theapplication being accessed. We made use of the<a href="https://doc.traefik.io/traefik/v2.10/routing/routers/#priority_1">priority parameter</a>of IngressRoutes to boot up apps that are sleeping. Essentially, we created awildcard Traefik IngressRoute that points to a &quot;downtime service&quot; deployment,which is a React app that serves a message of <code>There's nothing here, yet</code> to letusers know that the app they're trying to access doesn't exist. You can see thisin action if you visit a random URL in NeetoDeploy, say something like<a href="https://nonexistent-appname.neetodeployapp.com">nonexistent-appname.neetodeployapp.com</a></p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/downtime-service-page.png" alt="The downtime service page"></p><p>Wildcard IngressRoutes have the least priority by default. So if we create a&quot;catch-all&quot; wildcard IngressRoute, any invalid url without an IngressRoute ofits own, can be redirected to a single Service in Kubernetes. This is how we'reredirecting non-existent apps to the page shown above. In the following graphic,we can see how a request to a random URL is routed to the downtime service withthe wildcard IngressRoute.</p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/downtime-service-architecture.png" alt="Architecture of how the downtime service works in NeetoDeploy"></p><p>This also means that if an app is scaled down by the pod idling service and getsits IngressRoute deleted, the next time a user tries to access the app, therequest would instead be routed to the downtime service. We need to handle thescale up logic from the downtime service.</p><p>Whenever a user requests a URL that doesn't have an IngressRoute, there are twopossibilities.</p><ol><li>The app doesn't exist.</li><li>The app exists, but is currently scaled down.</li></ol><p>The downtime service would first check the cluster if the requested app ispresent in the cluster in a sleeping state. If not then the user will be servedthe &quot;There's nothing here, yet&quot; page. If there is a sleeping deployment,however, we boot it back up. The downtime service sends the scale up request tothe cluster. We keep redirecting the user back to the url till the app is up andrunning. This redirection would keep happening until the app is scaled up sincewe create the Service and IngressRoute only after the pods of the app arerunning. At this point, the request will be routed to the correct pod by theapp's IngressRoute, since it has a higher priority than the wildcardIngressRoute of the downtime service. All of these steps are illustrated in theGIF below:</p><p><img src="/blog_images/2024/cost-reduction-in-neeto-deploy-by-turning-off-inactive-apps/downtime-service.gif" alt="Illustration of how the downtime service works"></p><p>This design worked flawlessly and we were able to bring back scaled downapplications with as low as 20-30 seconds of wait time.</p><h2>Conclusion</h2><p>We've been running this setup for almost a year now, and it has been workingsmoothly so far. Pod idling service and the downtime service started as simplermicroservices and continue to evolve, adapting to the increasing demand as wegrow.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email atinvite@neeto.com.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">X</a>. You can also join our<a href="https://launchpass.com/neetohq">community Slack</a> to chat with us about anyNeeto product.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Building the metrics dashboard in NeetoDeploy with Prometheus]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/using-prometheus-in-neeto-deploy"/>
      <updated>2024-01-09T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/using-prometheus-in-neeto-deploy</id>
      <content type="html"><![CDATA[<p><em>We are building <a href="https://neeto.com/neetoDeploy">NeetoDeploy</a>, a compellingalternative to Heroku. Stay updated by following NeetoDeploy on<a href="https://twitter.com/neetodeploy">Twitter</a> and reading our<a href="https://www.bigbinary.com/blog/categories/neetodeploy">blog</a>.</em></p><p>One of the features that we wanted in our cloud deployment platform,<strong>NeetoDeploy</strong> was an application metrics. We decided to use<a href="https://prometheus.io/">Prometheus</a> for building this feature. Prometheus is anopen source monitoring and alerting toolkit and is a CNCF graduated project.Venturing into the Cloud Native ecosystem of projects apart from Kubernetes wassomething we had never done before. We ended up learning a lot about Prometheusand how to use it during the course of building this feature.</p><h2>Initial setup</h2><p>We installed Prometheus in our Kubernetes cluster by writing a deploymentconfiguration YAML and applying it to our cluster. We also provisioned an AWSElastic Block Store volume using a PersistentVolumeClaim to store the metricsdata collected by Prometheus. Prometheus needed a<a href="https://github.com/prometheus/prometheus/blob/main/documentation/examples/prometheus-kubernetes.yml">configuration file</a>where we defined what all targets it will be scraping metrics from. This is aYAML file which we stored in a ConfigMap in our cluster.</p><p>Targets in Prometheus can be anything that exposes metrics data in thePrometheus format at a <code>/metrics</code> endpoint. This can be your applicationservers, Kubernetes API servers or even Prometheus itself. Prometheus wouldscrape the data at the defined <code>scrape_interval</code> and store it in the volume astime series data. This can be queried and visualized in the Prometheus dashboardthat comes bundled in the Prometheus deployment.</p><p>We used <code>kubectl port-forward</code> command to test that Prometheus is workinglocally. Once everything was tested and we confirmed, we exposed Prometheus withan ingress so that we can hit its APIs with that url.</p><p>Initially we had configured the following targets:</p><ol><li><a href="https://github.com/prometheus/node_exporter">node_exporter</a> from Prometheus,which would scrape the metrics of the machine the deployment is running on.</li><li><a href="https://github.com/kubernetes/kube-state-metrics">kube-state-metrics</a> whichwould listen to the Kubernetes API and store metrics of all the objects.</li><li><a href="https://traefik.io/traefik/">Traefik</a> for all the network-related metrics(like the number of requests etc.) since we are using Traefik as our ingresscontroller.</li><li>kubernetes-nodes</li><li>kubernetes-pods</li><li>kubernetes-cadvisor</li><li>kubernetes-service-endpoints</li></ol><p>The last 4 scrape jobs would be collecting metrics from the Kubernetes REST APIrelated to nodes, pods, containers and services respectively.</p><p>For scraping metrics from all of these targets, we had set a resource request of500 MB of RAM and 0.5 vCPU to our Prometheus deployment.</p><p>After setting up all of this, the Prometheus deployment was running fine, and wewere able to see the data from the Prometheus dashboard. Seeing this, we weresatisfied and happily started hacking with PromQL, Prometheus's query language.</p><h2>The CrashLoopBackOff crime scene</h2><p><code>CrashLoopBackOff</code> is when a Kubernetes pod is going into a loop of crashing,restarting itself and then crashing again - and this was what was happening tothe Prometheus deployment we had created. From what we could see, the pod hadcrashed, and when it gets recreated, Prometheus would initialize itself and do areload of the<a href="https://prometheus.io/docs/prometheus/latest/storage/">Write Ahead Log (WAL)</a>.</p><p>The WAL is there for adding additional durability to the database. Prometheusstores the metrics it scrapes in-memory before persisting them to the databaseas chunks, and the WAL makes sure that the in-memory data will not be lost inthe case of a crash. In our case, the Prometheus deployment was crashing and itwould get recreated. It would try to load the data from WAL into memory, andthen crash again before this was completed, leading to the CrashLoopBackOffstate.</p><p>We tried deleting the WAL blocks manually from the volume, even though thiswould incur some data loss. This was able to bring the deployment back up againsince WAL replay needn't be done. The deployment went into CrashLoopBackOffagain after a while.</p><h2>Investigating the error</h2><p>The first line of approach we took was to monitor the CPU, memory, and diskusage of the deployment. The disk usage seemed to be normal. We had provisioneda 100GB volume and it wasn't anywhere near getting used up. The CPU usage alsoseemed normal. The memory usage, however, was suspicious.</p><p>After the pods had crashed initially, we recreated the deployment and monitoredit using kubectl's <code>--watch</code> flag for following all the pod updates. While doingthis we were able to see that the pods were going into CrashLoopBackOff becausethey were getting OOMKilled first. The <code>OOMKilled</code> error in Kubernetes is when apod is terminated because it tries to use more memory than it is allotted in itsresource limits. We were consistently seeing the <code>OOMKilled</code> error so memorymust be the culprit here.</p><p>We added Prometheus itself as a target in Prometheus so that we could monitorthe memory usage of the Prometheus deployment. The following was the generaltrend of how Prometheus's memory was increasing over time. This would go onuntil the memory would cross the specified limit, and then the pod would go intoCrashLoopBackOff.</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/memory_usage.png" alt="Memory usage of the Prometheus deployment"></p><p>Now that we knew that memory was the issue, we started looking into what wascausing the memory leak. After talking with some folks from the Kubernetes Slackworkspace, we were asked to look at the TSDB status of the Prometheusdeployment. We monitored the stats in real time and saw that the number of timeseries data stored in the database was growing in tens of thousands by eachsecond! This lined up with the increase in the memory usage graph from earlier.</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/tsdb.png" alt="Prometheus TSDB stats"></p><h2>How we fixed it</h2><p>We can calculate the memory requirement for Prometheus based on the number oftargets we are scraping metrics from and the frequency at which we are scrapingthe data. The memory requirement of the deployment is a function of both ofthese parameters. In our case ,this was definitely higher than what we couldafford to allocate (based on the nodegroup's machine type) since we werescraping a lot of data at a scrape interval of 15 seconds, which was set in thedefault configuration for Prometheus.</p><p>We increased the scrape interval to 60 seconds and removed all the targets fromthe Prometheus configuration whose metrics we didn't need for building thedashboard. Within the targets that we were scraping from, we used the<code>metric_relabel_configs</code> option to persist in the database only those metricswhich we needed and to drop everything else. We only needed the<code>container_cpu_usage_seconds_total</code>, <code>container_memory_usage_bytes</code> and the<code>traefik_service_requests_total</code> metrics - so we configured Prometheus so thatonly these three would be stored in our database, and by extension the WAL.</p><p>We redeployed Prometheus after making these changes and the memory showed greatstability afterwards. The following is the memory usage of Prometheus over thelast few days. It has not exceeded 1GB.</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/memory_usage_after_fix.png" alt="Memory usage of the Prometheus deployment after the fix"></p><h2>The aftermath</h2><p>Once Prometheus was stable we were able to build the metrics dashboard with thePrometheus API in a straightforward manner. The metrics dashboard came to usewithin a couple of days, when the staging deployment of<a href="https://neetocode.com/">NeetoCode</a> had faced a downtime. You can see thechanges in the metrics from the time when the outage had occurred</p><p><img src="/blog_images/2024/using-prometheus-in-neeto-deploy/neetocode_metrics.png" alt="NeetoCode metrics showing the downtime"></p><p>The quintessential learning that we got from this experience is to always bewary of the resources that are being used up when it comes to tasks likescraping metrics over an extended period of time. We were scraping all themetrics initially in order to explore everything, even though all the metricswere not being used. But because of this, we were able to read a lot about howPrometheus works internally, and also learn some Prometheus best practices thehard way.</p><p>If your application runs on Heroku, you can deploy it on NeetoDeploy without anychange. If you want to give NeetoDeploy a try, then please send us an email at<a href="mailto:invite@neeto.com">invite@neeto.com</a>.</p><p>If you have questions about NeetoDeploy or want to see the journey, followNeetoDeploy on <a href="https://twitter.com/neetodeploy">Twitter</a>. You can also join our<a href="https://neetohq.slack.com/">community Slack</a> to chat with us about any Neetoproduct.</p>]]></content>
    </entry><entry>
       <title><![CDATA[How my server got infected with a crypto mining malware and how I fixed it]]></title>
       <author><name>Sreeram Venkitesh</name></author>
      <link href="https://www.bigbinary.com/blog/how-my-server-got-infected-with-a-crypto-mining-malware-and-how-I-fixed-it"/>
      <updated>2022-09-06T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/how-my-server-got-infected-with-a-crypto-mining-malware-and-how-I-fixed-it</id>
      <content type="html"><![CDATA[<p>I was working on a side project recently, where I faced an issue when running aPostgreSQL database. The database server was getting shut down randomly for noapparent reason. I had deployed my Rails application along with itsdependencies, like Redis and PostgreSQL, in one of my EC2 instances in AWS.</p><p>PostgreSQL was running on the machine at the default port of <code>5432</code>. Ports <code>443</code>and <code>80</code> were open to everyone, for handling HTTP/S traffic. Port <code>22</code> was alsoopen to everyone, so that anyone with their public SSH keys added in the<code>authorized_keys</code> file in the remote server, or having access to the private keyfile of the server could log into the machine remotely.</p><p>For development, I needed to access this remote database locally, so I editedthe <code>pg_hba.conf</code> file and opened PostgreSQL to the network. I added a new ruleto open port <code>5432</code> so that anyone could connect to the PostgreSQL instanceremotely if they had all the credentials. If you notice in the screenshot, youwill see all the ports that are open to the public network. This was all workinggreat for me, until one fine day it wasnt.</p><p><img src="/blog_images/2022/how-my-server-got-infected-with-a-crypto-mining-malware-and-how-I-fixed-it/aws-before.png" alt="The networking screen in AWS where you can add inbound port rules."></p><p>I realized that something was wrong when I couldnt connect to the PostgreSQLinstance remotely one day. The response I was getting was the standard<code>is PostgreSQL running?</code> error.</p><pre><code class="language-bash">psql: could not connect to server: No such file or directoryConnection refused Is the server running on host ${hostname}and accepting TCP/IP connections on port 5432?</code></pre><p>I was still able to SSH into the VM so I tried to restart PostgreSQL. After someinvestigation I figured out that PostgreSQL was back up momentarily when I do<code>systemctl restart postgresql</code>, but it goes down again.</p><p>Inspecting the processes with <code>htop</code> I was able to see that all the CPU coreswere at 100% usage. Something didnt feel right. Sorting the processes based onthe percentage of CPU and memory used, I came across two peculiar processes -<code>kdevtmpfsi</code> and <code>kinsing</code>. A quick Google search showed that this was a cryptomining malware that spreads by exploiting flaws in resources that are exposed tothe public. Killing the process was of no use since the malware also adds a cronjob to replicate itself so that it cant be stopped.</p><h3>Removing the malware</h3><p>I found all files in the system with <code>kdevtmpfsi</code> and <code>kinsing</code> in their namesusing the unix <code>find</code> command and deleted them. The malwares files was insidethe <code>/tmp</code> directory.</p><pre><code class="language-bash">find / -name kdevtmpfsi*find / -name kinsing*</code></pre><p>Then I checked if there were any cron jobs running on the machine with the<code>crontab</code> command. There were some jobs running that were there to reload themalware script, even if you delete it. I deleted the jobs related to<code>kdevtmpfsi</code> and <code>kinsing</code>. Another information I learnt was that in Unix, eachuser will have their own crontab which can run jobs as that particular user.</p><pre><code class="language-bash">crontab -l  #To list all running cron jobscrontab -e #To delete running jobs</code></pre><h3>Things to pay attention to</h3><p>I made all the passwords stronger, especially for the resources that were beingexposed to the public. One of the lessons I learnt was that you can always bemore secure, and that you should never compromise on your passwords. Thepasswords that I had set for my users were weak, with just a dictionary word, adigit and a special character - something like the format of <code>himalaya7!</code></p><p>Instead of opening the required ports to the public network, I exposed them toonly the IP addresses from which I needed to access it.</p><p>Notice how the ports for SSH and PostgreSQL are only exposed to the required IPaddresses now.</p><p><img src="/blog_images/2022/how-my-server-got-infected-with-a-crypto-mining-malware-and-how-I-fixed-it/aws-after.png" alt="how ports 22 and 5432 are only open to certain IP addresses now"></p><p>I moved the application database to a managed PostgreSQL service rather thanrunning it in a VM by myself. This also means that I need not worry about theperformance or uptime, as all of this will be taken care of by AWS itself.</p><p>For extra security, I also set up a reverse proxy so that no one can ping mydeployed URL and get the IP address of the VM where the application is running.</p><p>Securing your deployments is as important as any other step when deploying yourapplication and it needs to be a priority right from when you are designing thearchitecture of your application. Taking care of such small details duringdevelopment will facilitate you in writing good code and following the rightpatterns from the start.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Cache all files with Cloudflare worker and HMAC auth]]></title>
       <author><name>Ershad Kunnakkadan</name></author>
      <link href="https://www.bigbinary.com/blog/how-to-cache-all-files-using-cloudflare-worker-along-with-hmac-authentication"/>
      <updated>2019-01-29T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/how-to-cache-all-files-using-cloudflare-worker-along-with-hmac-authentication</id>
      <content type="html"><![CDATA[<p><a href="https://www.cloudflare.com/">Cloudflare</a> is a Content Delivery Network (CDN)company that provides various network and security services. In March 2018, they<a href="https://blog.cloudflare.com/introducing-cloudflare-workers/">released</a>&quot;Cloudflare Workers&quot; feature for public. Cloudflare Workers allow us to writeJavaScript code and run them in Cloudflare edges. This is helpful when we wantto pre-process requests before forwarding them to the origin. In this post, wewill explain how we implemented<a href="https://en.wikipedia.org/wiki/HMAC">HMAC authentication</a> while caching allfiles in Cloudflare edges.</p><p>We have a bunch of files hosted in S3 which are served through CloudFront. Toreduce the CloudFront bandwidth cost and to make use of a global CDN (we use<code>Price Class 100</code> in CloudFront), we decided to use Cloudflare for filedownloads. This would help us cache files in Cloudflare edges and willeventually reduce the bandwidth costs at origin (CloudFront). But to do this, wehad to solve a few problems.</p><p>We had been signing CloudFront download URLs to restrict their usage after aperiod of time. This means the file download URLs are always unique. SinceCloudflare caches files based on URLs, caching will not work when the URLs aresigned. We had to remove the URL signing to get it working with Cloudflare, butwe can't allow people to continuously use the same download URL. CloudflareWorkers helped us with this.</p><p>We negotiated a deal with Cloudflare and upgraded the subscription to Enterpriseplan. Enterprise plan helps us define a<a href="https://developers.cloudflare.com/workers/reference/cloudflare-features/">Custom Cache Key</a>using which we can configure Cloudflare to cache based on user defined key.Enterprise plan also increased cache file size limits. We wrote following Workercode which configures a custom cache key and authenticates URLs using HMAC.</p><p>Cloudflare worker starts with attaching a method to <code>&quot;fetch&quot;</code> event.</p><pre><code class="language-javascript">addEventListener(&quot;fetch&quot;, event =&gt; {  event.respondWith(verifyAndCache(event.request));});</code></pre><p><code>verifyAndCache</code> function can be defined as follows.</p><pre><code class="language-javascript">async function verifyAndCache(request) {  /**  source:  https://jameshfisher.com/2017/10/31/web-cryptography-api-hmac.html  https://github.com/diafygi/webcrypto-amples#hmac-verify  https://stackoverflow.com/questions/17191945/conversion-between-utf-8-arraybuffer-and-string  **/  // Convert the string to array of its ASCII values  function str2ab(str) {    let uintArray = new Uint8Array(      str.split(&quot;&quot;).map(function (char) {        return char.charCodeAt(0);      })    );    return uintArray;  }  // Retrieve to token from query string which is in the format &quot;&lt;time&gt;-&lt;auth_code&gt;&quot;  function getFullToken(url, query_string_key) {    let full_token = url.split(query_string_key)[1];    return full_token;  }  // Fetch the authentication code from token  function getAuthCode(full_token) {    let token = full_token.split(&quot;-&quot;);    return token[1].split(&quot;/&quot;)[0];  }  // Fetch timestamp from token  function getExpiryTimestamp(full_token) {    let timestamp = full_token.split(&quot;-&quot;);    return timestamp[0];  }  // Fetch file path from URL  function getFilePath(url) {    let url_obj = new URL(url);    return decodeURI(url_obj.pathname);  }  const full_token = getFullToken(request.url, &quot;&amp;verify=&quot;);  const token = getAuthCode(full_token);  const str =    getFilePath(encodeURI(request.url)) + &quot;/&quot; + getExpiryTimestamp(full_token);  const secret = &quot;&lt; HMAC KEY &gt;&quot;;  // Generate the SHA-256 hash from the secret string  let key = await crypto.subtle.importKey(    &quot;raw&quot;,    str2ab(secret),    { name: &quot;HMAC&quot;, hash: { name: &quot;SHA-256&quot; } },    false,    [&quot;sign&quot;, &quot;verify&quot;]  );  // Sign the &quot;str&quot; with the key generated previously  let sig = await crypto.subtle.sign({ name: &quot;HMAC&quot; }, key, str2ab(str));  // convert the Arraybuffer &quot;sig&quot; in string and then, in Base64 digest, and then URLencode it  let verif = encodeURIComponent(    btoa(String.fromCharCode.apply(null, new Uint8Array(sig)))  );  // Get time in Unix epoch  let time = Math.floor(Date.now() / 1000);  if (time &gt; getExpiryTimestamp(full_token) || verif != token) {    // Render error response    const init = {      status: 403,    };    const modifiedResponse = new Response(`Invalid token`, init);    return modifiedResponse;  } else {    let url = new URL(request.url);    // Generate a cache key from URL excluding the unique query string    let cache_key = url.host + url.pathname;    let headers = new Headers(request.headers);    /**    Set an optional header/auth token for additional security in origin.    For example, using AWS Web Application Firewall (WAF), it is possible to create a filter    that allows requests only with a custom header to pass through CloudFront distribution.    **/    headers.set(&quot;X-Auth-token&quot;, &quot;&lt; Optional Auth Token &gt;&quot;);    /**    Fetch the file using cache_key. File will be served from cache if it's already there,    or it will send the request to origin. Please note 'cacheKey' is available only in    Enterprise plan.    **/    const response = await fetch(request, {      cf: { cacheKey: cache_key },      headers: headers,    });    return response;  }}</code></pre><p>Once the worker is added, configure an associated route in<code>&quot;Workers -&gt; Routes -&gt; Add Route&quot;</code> in Cloudflare.</p><p><img src="/blog_images/2019/how-to-cache-all-files-using-cloudflare-worker-along-with-hmac-authentication/cloudflare-add-worker-route.png" alt="Add Cloudflare Worker route"></p><p>Now, all requests will go through the configured Cloudflare worker. Each requestwill be verified using HMAC authentication and all files will be cached inCloudflare edges. This would reduce bandwidth costs at the origin.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Target Tracking Policy for Auto Scaling]]></title>
       <author><name>Ershad Kunnakkadan</name></author>
      <link href="https://www.bigbinary.com/blog/target-tracking-policy-for-auto-scaling"/>
      <updated>2019-01-15T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/target-tracking-policy-for-auto-scaling</id>
      <content type="html"><![CDATA[<p>In July 2017, AWS<a href="https://aws.amazon.com/about-aws/whats-new/2017/07/introducing-target-tracking-scaling-policies-for-auto-scaling/">introduced</a>Target Tracking Policy for Auto Scaling in EC2. It helps to autoscale based onthe metrics like Average CPU Utilization, Load balancer request per target, andso on. Simply stated it scales up and down the resources to keep the metric at afixed value. For example, if the configured metric is Average CPU Utilizationand the value is 60%, the Target Tracking Policy will launch more instances ifthe Average CPU Utilization goes beyond 60%. It will automatically scale downwhen the usage decreases. Target Tracking Policy works using a set of CloudWatchalarms which are automatically set when the policy is configured.</p><p>It can be configured in <code>EC2 -&gt; Auto Scaling Groups -&gt; Scaling Policies</code>.</p><p><img src="/blog_images/2019/target-tracking-policy-for-auto-scaling/ec2_target_tracking_policy.png" alt="EC2 Target Tracking Policy"></p><p>We can also configure a warm-up period so that it would wait before it launchesmore instances to keep the metric at the configured value.</p><p>Internally, we use terraform to manage AWS resources. We can configure TargetTracking Policy using terraform as follows.</p><pre><code class="language-hcl">resource &quot;aws_launch_configuration&quot; &quot;web_cluster&quot; {name_prefix = &quot;staging-web-cluster&quot;image_id = &quot;&lt;image ID&gt;&quot;instance_type = &quot;&lt;instance type&gt;&quot;key_name = &quot;&lt;ssh key name&gt;&quot;security_groups = [&quot;&lt;security group&gt;&quot;]user_data = &quot;&lt;user_data script&gt;&quot;root_block_device {volume_size = &quot;&lt;volume size&gt;&quot;}lifecycle {create_before_destroy = true}}resource &quot;aws_autoscaling_group&quot; &quot;web_cluster&quot; {name = &quot;staging-web-cluster-asg&quot;min_size = &quot;&lt;min ASG size&gt;&quot;max_size = &quot;&lt;max ASG size&gt;&quot;default_cooldown = &quot;300&quot;launch_configuration = &quot;\${ aws_launch_configuration.web_cluster.name }&quot;vpc_zone_identifier = [&quot;&lt;subnet ID&gt;&quot;]health_check_type = &quot;EC2&quot;health_check_grace_period = 300target_group_arns = [&quot;&lt;target group arn&gt;&quot;]}resource &quot;aws_autoscaling_policy&quot; &quot;web_cluster_target_tracking_policy&quot; {name = &quot;staging-web-cluster-target-tracking-policy&quot;policy_type = &quot;TargetTrackingScaling&quot;autoscaling_group_name = &quot;\${aws_autoscaling_group.web_cluster.name}&quot;estimated_instance_warmup = 200target_tracking_configuration {predefined_metric_specification {predefined_metric_type = &quot;ASGAverageCPUUtilization&quot;}    target_value = &quot;60&quot;}}</code></pre><p>Target Tracking Policy allows us to easily configure and manage autoscaling inEC2. It's particularly helpful while running services like web servers.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Deploying feature branches to have a review app]]></title>
       <author><name>Ershad Kunnakkadan</name></author>
      <link href="https://www.bigbinary.com/blog/deploying-feature-branches-to-have-a-review-app"/>
      <updated>2018-11-27T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/deploying-feature-branches-to-have-a-review-app</id>
      <content type="html"><![CDATA[<p><em>BigBinary has been working with <a href="https://gumroad.com">Gumroad</a> for a while.Following blog post has been posted with permission from Gumroad and we are verygrateful to <a href="https://twitter.com/shl">Sahil</a> for allowing us to discuss the workin such an open environment.</em></p><p>Staging environment helps us in testing the code before pushing the code toproduction. However it becomes hard to manage the staging environment when morepeople work on different parts of the application. This can be solved byimplementing a system where feature branch can have its own individual stagingenvironment.</p><p>Heroku has<a href="https://devcenter.heroku.com/articles/github-integration-review-apps">Review Apps feature</a>which can deploy different branches separately. <a href="https://gumroad.com">Gumroad</a>,doesn't use Heroku so we built a custom in-house solution.</p><p>The first step was to build the infrastructure. We created a new Auto ScalingGroup, Application Load Balancer and route in AWS for the review apps. Loadbalancer and route are common for all review apps, but a new EC2 instance iscreated in the ASG when a new review app is commissioned.</p><p>![review app architecture](/blog_images/image review_app_architecture.jpg)</p><p>The main challenge was to forward the incoming requests to the correct serverrunning the review app. This was made possible using<a href="https://www.nginx.com/resources/wiki/modules/lua/">Lua in nginx</a> and<a href="https://www.consul.io/">consul</a>. When a review app is deployed, it writes itsIP and port to consul along with the hostname. Each review app server runs aninstance of <a href="https://openresty.org/en/">OpenResty</a> (Nginx + Lua modules) withthe following configuration.</p><pre><code class="language-bash">server {  listen                   80;  server_name              _;  server_name_in_redirect  off;  port_in_redirect         off;  try_files $uri/index.html $uri $uri.html @app;  location @app {    set $upstream &quot;&quot;;    rewrite_by_lua '      http   = require &quot;socket.http&quot;      json   = require &quot;json&quot;      base64 = require &quot;base64&quot;      -- read upstream from consul      host          = ngx.var.http_host      body, c, l, h = http.request(&quot;http://172.17.0.1:8500/v1/kv/&quot; .. host)      data          = json.decode(body)      upstream      = base64.decode(data[1].Value)      ngx.var.upstream = upstream    ';    proxy_buffering   off;    proxy_set_header  Host $host;    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;    proxy_redirect    off;    proxy_pass        http://$upstream;  }}</code></pre><p>It forwards all incoming requests to the correct IP:PORT after looking up inconsul with the hostname.</p><p>The next task was to build a system to deploy the review apps to thisinfrastructure. We were already using docker in both production and stagingenvironments. We decided to extend it to deploy branches by building dockerimage for every branch with <code>deploy-</code> prefix in the branch name. When such abranch is pushed to GitHub, a CircleCI job is run to build a docker image withthe code and all the necessary packages. This can be configured using aconfiguration template like this.</p><pre><code class="language-yaml">jobs:  build_image:    &lt;&lt;: *defaults    parallelism: 2    steps:      - checkout      - setup_remote_docker:          version: 17.09.0-ce      - run:          command: |            ci_scripts/2.0/build_docker_image.sh          no_output_timeout: 20mworkflows:  version: 2  web_app:    jobs:      - build_image:          filters:            branches:              only:                - /deploy-.*/</code></pre><p>It also pushes static assets like JavaScript, CSS and images to an S3 bucketfrom where they are served directly through CDN. After building the dockerimage, another CircleCI job is run to do the following tasks.</p><ul><li>Create a new database in RDS and configure the required credentials.</li><li>Scale up Review App's Auto Scaling Group by increasing the number of desiredinstances by 1.</li><li>Run redis, database migration, seed-data population, unicorn and resqueinstances using <a href="https://nomadproject.io">nomad</a>.</li></ul><p>The ease of deploying a review app helped increase our productivity.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Reducing infrastructure cost by 10% for ECommerce app]]></title>
       <author><name>Ershad Kunnakkadan</name></author>
      <link href="https://www.bigbinary.com/blog/how-we-reduced-infrastructure-cost-e-commerce-project"/>
      <updated>2018-08-29T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/how-we-reduced-infrastructure-cost-e-commerce-project</id>
      <content type="html"><![CDATA[<p>Recently, we got an opportunity to reduce the infrastructure cost of amedium-sized e-commerce project. In this blog we discuss how we reduced thetotal infrastructure cost by 10%.</p><h3>Changes to MongoDB instances</h3><p>Depending on the requirements, modern web applications use different third-partyservices. For example, it's easy and cost effective to subscribe to a GeoIPlookup service than building and maintaining one. Some third-party services getvery expensive as the usage increases but people don't look for alternatives dueto legacy reasons.</p><p>In our case, our client had been paying more than $5,000/month for athird-party MongoDB service. This service charges based on the storage used andwe had years of data in it. This data is consumed by a machine learning systemto fight fraudulent purchases and users. We had a look at both the ML system andthe data in MongoDB and found we actually didn't need all the data in thedatabase. The system never read data older than 30-60 days in some of thebiggest mongo collections.</p><p>Since we were already using <a href="https://www.nomadproject.io/">nomad</a> as ourscheduler, we wrote a periodic nomad job that runs every week to deleteunnecessary data. The nomad job syncs both primary and secondary MongoDBinstances to release the free space back to OS. This helped reduce monthly billto $630/month.</p><h2>Changes to MongoDB service provider</h2><p>Then we looked at the MongoDB service provider. It was configured years backwhen the application was built. There are other vendors who provided the sameservice for a much cheaper price. We switched our MongoDB to mLab and now thedatabase runs in a $180/month dedicated cluster. With WiredTiger's<a href="https://docs.mongodb.com/manual/core/wiredtiger/#compression">compression</a>enabled, we don't use as much storage we used to use before.</p><h2>Making use of Auto Scaling</h2><p>Auto Scaling can be a powerful tool when it comes to reducing costs. We had beenrunning around 15 large EC2 instances. This was inefficient due to following tworeasons.</p><ol><li>It cannot cope up when the traffic increases beyond its limit.</li><li>Resources are underused when traffic is less.</li></ol><p>Auto Scaling solves both the issues. For web servers, we switched to smallerinstances and used<a href="https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html">Target Tracking Scaling Policy</a>to keep the average aggregate CPU utilization at 70%.</p><p>Background job workers made use of a nomad job we built. It periodicallycalculated the number of required instances based on the count of pending jobsand the job's queue priority. This number was pushed to CloudWatch as a metricand the Auto Scaling group scaled based on that. This approach was effective inboosting performance and reducing cost.</p><h2>Buying reserved instances</h2><p>AWS has a feature to reserve instances for services like EC2, RDS, etc.. It'soften preferable to buy reserved instances than running the application usingon-demand instances. We evaluated reserved instance utilization using the<a href="https://console.aws.amazon.com/cost-reports/home?#/ri/utilization">reporting tool</a>and bought the required reserved instances.</p><h2>Looking for cost-effective solutions</h2><p>Sometimes, different solutions to the same problem can have different costs. Forexample, we had been facing small DDoS attack regularly and we had to rate-limitrequests based on IP and other parameters. Since we had been using Cloudflare,we could have used their rate-limiting feature. Performance wise, it was thebest solution but they charge based on the number of good requests. It would beexpensive for us since it's a high-traffic application. We looked for othersolutions and solved the problem using Rack::Attack. We<a href="https://blog.bigbinary.com/2018/05/15/how-to-mitigate-ddos-using-rack-attack.html">wrote a blog</a>about it sometime back. The solution presented in the blog was effective inmitigating the DDoS attack we faced and didn't cost us anything significant.</p><h2>Requesting custom pricing</h2><p>If you are a comparatively larger customer of a third-party service, it's morelikely that you don't have to pay the published price. Instead, we could requestfor custom pricing. Many companies will be happy to give 20% to 50% pricediscounts if we can commit to a minimum spending in the year. We triednegotiating a new contract for an expensive third-party service and got the dealwith 40% discount compared to their published minimum price.</p><p>Running an infrastructure can be both technically and economically challenging.But if we can look between the lines and if we are willing to update existingsystems, we would be amazed in terms of how much money we can save every month.</p>]]></content>
    </entry>
     </feed>