Tarun Jangra

Serving Authenticated Static Content was pretty expensive before today

2016-12-19T00:00:00+00:00

It has always been the pain when we need to server authenticated static content. Because we are bound with programming framework to handle the authentication job. And once authenticated,

you have to read the file from the Disk through your programming to stream to the end user with correct mime type. This was the only solution for me before today.

While working on some project, i’ve found XSendfile and X-Accel. X-accel allows for internal redirection to a location determined by a header returned from a backend.

This allows you to handle authentication, logging or whatever else you please in your backend and then have NGINX handle serving the contents from redirected location to the end user, thus freeing up the backend to handle other requests. This feature is commonly known as X-Sendfile.

NGINX also has this feature, but implemented a little bit differently. In NGINX this feature is called X-Accel-Redirect.

There are two main differences:

The header must contain a URI.
The location should be defined as internal; to prevent the client from going directly to the URI.

We have been missing this feature in our elgg development. Now we will definitely use this module to get the better performance while service static content to the end user. No more stream reading from disk and serve further.

My First post with Jekyll

2016-07-09T00:00:00+00:00

In this blog i do not have any thing particular to talk about. So it is just an introduction of my new blog built on Jekyll. Since it is Jekyll based, so i've used Travis-CI for building and github for hosting this blog.

As described in About Me, I'm passionate about programming, cloud computing, Entrepreneurship. So that's whyat i'll be writing about.

So just want to say "Hello" and Thank you to take time to read this blog.

Amazon Elastic Cloud Computing (EC2)

2016-03-02T00:00:00+00:00

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides re-sizable compute capacity in the cloud. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change.

Amazon EC2

We are working from last 10 years in IT I remember time where if we needed a new Active Directory Server or a new SQL Server we have to go to HP or go to DELL order new servers. we then had to get deliver to our data centers. we had to get racked. we had to do the networking setup them the internet accessible etc and you know your provisioning time should be anywhere from 5 to 10 business days.

Then i started public cloud and was really exciting to see the capabilities of cloud in step having of 5 to 10 days lead time you would reduce to literally just couple of minutes you can have that server up and running so that's really how cloud computing change the IT industry in the last 5 to 10 years so Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios. So we just look at the first section the advantage of the cloud computing is utility based model you can pay only by the hour. If you want to spin up the development environment and just test on it and then terminate you only pay for 1 or 2 hours the environment is live the old model way you would buy the server hardware you would be stuck with it.

Elastic Compute Cloud Pricing Options

Free Tier you get 735 hours free on certain micro instances.
On Demand Which allow you to pay a fixed rate by the hour with no commitment.
Reserved Which provide you with a capacity reservation, and offer a significant discount on the hourly charge for an instance. Then you have 1 Year or 3 Year Terms so reserved just saying i need 10 servers of this size and i am willing to pay either up-front contractual willing to commit for 1 to 3 years and if you do use reserved instances then you get massive discounts compared with on demand.
Spot This is enable you to bid whatever price you want to pay for instance capacity, providing for even greater savings if your applications have flexible start and end times.

Elastic Compute Cloud On Demand vs Reserved vs Spot

On Demand Instances
- Users that want the low cost and flexibility of Amazon EC2 without any up-front payment or long-term commitment.
- Applications with short term, spike, or unpredictable workloads that cannot be interrupted.
- Applications being developed or tested on Amazon EC2 for the first time.
Reserved Instances
- Applications with steady state or predictable usage so reserved might be your 3 or 4 web servers that you always want to turned on and then your on demand instances might be is the part of an auto scaling event.
- Applications that require reserved capacity.
- Users able to make upfront payment to reduce their total computing costs even further.
Spot Instances
- Applications that have flexible start and end times.
- Applications that are only feasible at very low compute prices.
- Users with urgent computing needs for large amounts of additional capacity.

Elastic Compute Cloud On Demand Instances**

General Purpose Instances
Compute Optimized Instances
- Compute Intensive Applications
Memory Optimized Instances
- Database & Memory Caching Applications
GPU Instances Instances
- High Performance Parallel Computing (eg Hadoop)
Storage Optimized Instances
- Data warehousing and Parallel Computing

Local Instance Storage vs Elastic Block Storage

Local Instance Storage: Data stored on a local instance store will persist only as long as that instance is alive. So you terminate that Instances you loose all the data on that virtual hardware.
Elastic Block Storage Backed Storage: Data that is stored on an Amazon Elastic Block Storage volume will persist independently of the life of the instance.

Storage backed by Elastic Block Storage

Provisioned IOPS Solid State Drive
- Designed for I/O intensive applications such as large relational or No-SQL databases.
General purpose Solid State Drive
- Designed for 99.999% availability.
- Ratio of 3 IOPS per GB, offer single digit millisecond latency, and also have the ability to burst up to 3000 IOPS for short periods.
Magnetic
- Lowest cost per gigabyte of all Elastic Block Storage volume types. Magnetic volumes are ideal for workloads where data is accessed infrequently, and applications where the lowest storage cost is important.

ElasticSearch restore failed when s3-gateway is activated

2014-07-11T00:00:00+00:00

Hufffff, Unfortunately i met this edge case. I have recovered from this situation. Here’s my scenario.

I am on ElasticSearch Version 1.1.0
I have two data nodes. One is primary and other is replica.
I am taking regular snapshots of my indexes.
I am no more taking snapshots, So I have installed s3-gateway plugin to keep updating s3 buckets for persistent indexes.

Because of bulk import, i have stopped my replica to make import little faster. Once import get completed. I felt high CPU and Memory usage. And since i was aware that my indexes are safe because i am supporting s3-gateway. So i decided to restart remaining data node. Fuck…. It was a big mistake. When i tried to restart, it was not recovering all indexes. And we were about to launch our site in next two hours. And i am left with no index.

Struggling here and there, i came to know that i am suffered with Bug in ElasticSearch. I tried to follow instruction at the end of this thread where i was suppose to update/edit metadata file from s3-bucket. I did that but no luck.

Problem i found, All indexes and shards suppose to have _source folders. And i had so many indexes and their shards where _source folder was missing. And those indexes were unrecoverable. I have no solutions at that place and was literately sweating in Air Conditioned Room.

Then one of my colleague, Narinder Kaur has joined me. And she gave me necessary support and we tried some more messes to fix it. Since i already made a mistake, So i took one backup of existing elasticsearch so that i would be able to back at same place in case of any other mess. And solutions we were planning to try was totally crap.

So, Solution we tried. and which actually works….. Wow!.

I updated my elasticsearch.yml, and remove s3-gateway settings related to my s3 bucket.
I stopped elasticsearch.
I rename my old cluster (elasticsearch) to elasticsearch.original.
Restarted Elasticsearch. And it created new blank cluster. where i have no indexes.
I created all required indexes with the same number of shards and replicas i previously had. In my case i had 5 indexes and 5 shards per index.
Now i stop elasticsearch again.
Start deleting (elasticsearch/nodes/0/indices/{index_name}>/{0,1,2,3,4}/{index,translog}. And move (elasticsearch.original/nodes/0/indices/{index_name}/{0,1,2,3,4}/{index,translog}) to (elasticsearch/nodes/0/indices/{index_name}/<0,1,2,3,4>/{index,translog}) Note: Here, i did not touch _state folder of blank indexes. And now my all indexes has _status folder in each shard and each index.
I copied all indexes as in 5th step.
Restart ElasticSearch. and i found all indexes were recovered.

Observation: Well you should run your all custom mappings in blank indexes. I found some errors because i did not execute my mapping.

Thank god, Now all indexes were recovered. And Thanks to Narinder Kaur, she got me required support at that time.

How to install go-daddy ssl certificate on amazon load balancer

2012-12-29T00:00:00+00:00

I was struggling around to install SSL Certificate on ELB. And finally i’ve made that. Following are the steps you need to follow.

Requirements & Prerequisites:

Linux having openssl and apache installed.
Open shell terminal on your Linux Box.

openssl genrsa -des3 -out private.key 1024
openssl req -new -key private.key -out www.your-web-site.com.csr

You will be prompt to provide some basic information. Make sure you have added “Common Name”; a fully qualified domain name. like “www.xyz.com”

Open to GoDaddy and go to ssl management control panel
Select your Certificate. And click on Re-Key button.
Copy content of “www.your-web-site.com.csr” and paste the content in “CSR” field. And press Re-Key.
It will prompt you to download the keys. Available options to download are Apache, Nginx and Other. By the way, i used “Other” to download my keys to be used on ELB.
Now unzip the downloaded file. It should have two *.crt files.

Now back to your terminal.

openssl rsa -in private.key -out private.pem

Now you will have following files in your current location.

private.key
private.pem
”www.web-site.com.csr”
sf_bundle.crt
your-domain.com.crt

Now open your load balancer console and add https support. it will prompt you to add following values.

Certificate Name:* -> Put any friendly name
Private Key:* -> Paste content of private.pem
Public Key Certificate:* -> Paste content of your-domain.com.crt.
Certificate Chain: -> Paste content of sf_bundle.crt

Once done, Save all these values and here you go.

Logical Volume Manager (LVM) can help if you are out of space

2012-11-12T00:00:00+00:00

Today i was wondering when i found, my ubuntu server’s home partition is about to finish. It was having lots of projects we are working on. Replacing the old hardisk with the new of bigger size is one solution but it is so much time consuming. Ohhhh it is so scary. Copy every thing from old to new hard drive. Install every single application and library my scripts needed.

Obviously that is time consuming process. But thanks to Logical Volume Manager(LVM). Fortunately i have used LVM to configure my old hard drive and that really helped me to extend my “home” drive in minutes without copying and all boring stuff as i explained above. My old hard disk scheme was:

100MB /boot
73GB PV (Physical Volume )
3 GB /myDB ( my database directory)
45GB /home (All my projects are located in home)

So i was going out of space. What i did, I purchased new 1TB WD sata hard drive. And configure that on secondary sata port. My ubuntu box detect the new hard drive. I make it sure by following command.

fdisk -l       
# i got my both partitions as /dev/sda and dev/sdb (new one).
vgdisplay    
# I got the name of the volume group to be used.
vgextend  /dev/sdb   
# this command put my new hard drive in the existing volume group.
vgdisplay   
# To make it sure if this new hard drive is actually added to the new group.
lvextend -L 500G /dev/volume-group-name/drive-name  
# Drive name was assigned to my "home" dir.
resize2fs /dev/volume-group-name/drive-name  
# This took about 10 mins to extend my home with more 500G.

So this is how i extend all space. I noticed while extending i was able to access all projects from that extending drive. There was no crash or no restart (usually forced by windows for such tasks :) ). It means process is so efficient, you can use your disk even while making this arrangement to increase more space. Anyway, That how i get to work everything within 10 mins. It was really amazing experience.

Our development workflow with gitflow

2012-01-19T00:00:00+00:00

We are using git since 2009. Recently we have been forced by a platform to implement better development workflow. Where we handle better branching, code releases etc. And we found gitflow, A collection of git extensions provide high level of git based operations.

I found it pretty much worthy to share our experience. Earlier than gitflow, we were using git with Master branch only where all developers suppose to push and code is suppose to move to development server and after testing, it is suppose to deploy on production server. Which is bit cumbersom process. and as we are getting in the requirement of better tracked development with less efforts we start feeling to have some serious process to get in. We have followed Vincent Driessen's branching model.

Master branch will be now our production ready branch. And Development branch will be our dev server branch. These two branches are suppose to be in the system for infinite time. We have learnt to keep some temporary branches like “Feature branches” and “Release branches” which will really play a great role in the architecture we are workingin. We are using “Pivotal Tracker” for our Agile methodology, So when we have new milestone with multiple stories for a particular feature. It means, developer need to create new branche with the name “Feature/“. This branch is suppose to be cloned from master branch and suppose to be in the system till the completion of the feature. And then merge back to master branch. So in the whole release we are suppose to complete all pivotal stories by story ids.

I am looking for some automatic process where all stories get started when developer creates the Feature branch. And when he deliver the whole feature and merge the branch back to the development. It should automatically change the status of the story to be “Delivered”. QA team will test and either accept or reject the corresponding story. I know webhooks provided by github.com which can be implemented to achieve this with pivotal tracker.

Overall, Gitflow methodology make the development flow quite better then what we were doing ealrier.

Round-robin at application level to Balance MySQL Database Load

2011-06-10T00:00:00+00:00

Round robin technique facilitates you to distribute your read queries on number of available resources even if all servers are located at different locations. Huge traffic sites like Facebook has to has such techniques working at the background to serve as fast as possible.

I would like to discuss one of my personal implementation experience for such a large potential social networking site. Cloud computing is really help full but it also needs logical approach at programming level.

Approach 1: Six servers architecture on amazon cloud.

WOW! I had implemented 1 load balancer, 1 mysql master db, 1 mysql slave db and 3 application server. Such an architecture
can handle huge traffic. Since there is a separate application server layer where we can add more application servers anytime we need. So user requests get balanced on 3 application servers and they get response. But in my application i had one more problem. When user click on single link it executes 100+ SQLs because there is a framework overhead and some intentional queries. Hmmmm, So MySql load is never balanced with this technique and it has to be. Because 1 request is triggering 100+ SQLs. So i drill down to find out the solution and decided to separate sql reads and writes. Ok so with this, i get an opportunity to divide separate Writes of MySQL db and initiated one mysql slave server.

Does this really get me at the end of performance level?

No. Because we use read queires more frequently then write. Son in 100+ SQLs i have lesser database writes. So My write server is still have idle resources.

Here is where Round Robin comes in.

If i could be able to develop a logic which distributes my 100+ SQLs to any number of replicated instances available. That could really work for me. Say i have 5 read servers for 100+ SQL. Than i can distribute around 20 SQL per server per request. And as we increase number of read server. System can adjust it self to distribute (SQL queries) / (Number of servers) (Qn / Sn). In this way, all of my server will work for every SQL requested from the system. And I could get maximum performance from servers. Because there is no use if we have 1000 Servers and 1 server is responding for 1 complete request. Because in this case 999 servers are free and which is wastage of Money. So i implemented that in My PHP application and that really makes sense to be available on Cloud to use maximum resources.

How to create custom amazon AMI throught CLI Commands

2011-05-11T00:00:00+00:00

Today, i am going to explain how you can create custom amazon ami to launch instance anytime later. This will have you clone of your server anytime you need. I am considering you are able to login your current running instance and you also have your private key and certificate downloaded on some location.

Upload your private key and certificate on the running instance.

scp -i path/of/yourkeypair.pem path/of/cert.pem /mnt
scp -i path/of/yourkeypair.pem path/of/pk.pem /mnt

ec2-bundle-vol 
-d /mnt -k /mnt/pk.pem 
-c /mnt/cert.pem 
-u 673491274719 
-p name-of-ami

This will take some time and create the desired ami to be uploaded in the bucket. So you can use that later anytime you need. Now upload your bundle to amazon s3 storage.

ec2-upload-bundle 
-b <S3-bucket-name> 
-m /mnt/name-of-ami.manifest.xml 
-a <AWS-access-key-id> 
-s  <AWS-secret-access-key> 
--location US-EAST-1C

Note: Remember to upload to an S3 bucket in correct region. Also: if the bucket does not exit, it will be created for you. (I’ve used a European bucket as an example.) Now we need to register AMI. Do following< br />

ec2-register <bucket-name>/sampleimage.manifest.xml --region US-EAST-1C

It will return the new AMI ID (like ami-). That’s it you are done with your custom ami.

Solr setup debian (lenny) + tomcat6 + solr

2010-03-10T00:00:00+00:00

I am working on a task to set solr enterprise search for elgg. I am as digging as getting surprised with this amazing search utility. First i am going to explain how to install solr with tomcat6.x.

Requirements:

JDK, JRE (OpenJDK, SunJDK)
Tomcat6.x
Latest solr

Installation JDK,JRE:

Well i used to setup openjdk and openjre on my lenny server. It is quite easy to use debian package manager. You can install is using

apt-get install openjdk-6-jre, openjdk-6-jdk

And i was installed with all jdk and jre environment. You may need to setup JAVA_HOME environment variable if you do not wish to install JDK at default location. You can do this in “.profile” located under your home or “/etc/profile” to make it enable for all available users.

Download tomcate6.x

I downloaded tomcat binary “apache tomcat 6.0.24” and untar it at “/usr/local/”. You can choose any of your selected location. So my location of all tomcat binaries was “/usr/local/tomcat”. That’s it, you have done with tomcat installation. You can start tomcat as:

cd /usr/local/tomcat/bin/
./startup.sh

Now put localhost:8080 in your browser. You will see the response of tomcat server. Now next step is to install solr as a tomcat application. It needs some configurations.

Installation & configuration of Solr

Download apache solr and unzip it at any accessible location. Now create some directories under tomcat as

mkdir /usr/local/tomcat/data/solr/elgg/conf -p
mkdir /usr/local/tomcat/data/solr/elgg/data -p

Now we need to copy “apache-solr-1.4.0.war” file for tomcat deployment. Go to the directory where you unzip the solr file. i found that file as “/apache-solr-1.4.0/dist/apache-solr-1.4.0.war”.

cp apache-solr-1.4.0/dist/apache-solr-1.4.0.war /usr/local/tomcat/data/solr

Now, in /usr/local/tomcat/conf/Catalina/localhost we need to create and save a files which will be read the next time you start Tomcat, and properly deploy Solr. Use a text editor of your choice and create a files name “solrelgg.xml” in the /usr/local/tomcat/conf//Catalina/localhost subdirectory. Put the contents as follow

<Context docBase=”/usr/local/tomcat/data/solr/apache-solr-1.4.0.war” 
debug=”0″ crossContext=”true”>
<Environment name=”solr/home” type=”java.lang.String” 
value=”/usr/local/tomcat/data/solr/elgg” override=”true” />
</Context>

Now go to “apache-solr-1.4.0/example/solr/conf” and copy all default configuration files in to our configured configuration directory under tomcat.

cd apache-solr-1.4.0/example/solr/conf
cp * -R /usr/local/tomcat/data/solr/elgg/conf
cd /usr/local/tomcat/data/solr/elgg/conf

Now edit “solrconfig.xml” and find “solr.data.dir” parameter. Change it’s value to new data directory. I gave relative path like “ ../data” So now it was pointing to new data directory “/usr/local/tomcat/data/solr/elgg/data”. Well this edit is an optional step. you can skip this. In that case, data directory will be created at default location according to the default value of “solr.data.dir”. Now start tomcat server using “/usr/local/tomcat/bin/startup.sh” and browse localhost:8080/solrelgg

It should show you “Welcome to Solr!” message with “Solr Admin” link. I hope, it would work for you. Now elgg integration is just the matter of pushing new entities at create entity hooks and all other crud operations.