Massoud Mazar

Sharing The Knowledge


VPN DNS resolution problem with CNAME

This problem took me a day to fix and it worth documenting here for others.

On my Mac laptop, I connect to a VPN, which has its own DNS server defined to resolve internal host names. For example, I could use 


and get a good response. So far so good. But when I tried 


I did not get a response. After talking to IT team, it turned out this address was a CNAME record and not a A record. Still, I expected the DNS resolution to correctly return an IP address for this host, but it didn't. More...

Raspberry Pi SMB server to use with Time Machine

Last week I had a bit of free time (which is very rare these days), and decided it is finally time to build a file server to be used for backing up my laptops (both Mac and PC), and also as a general purpose shared drive. After doing some research I learned Apple supports SMB protocol for Time Machine, and SMB is obviously compatible with Windows as well.

My criteria to select the hardware was simple:

  • Gigabit Ethernet
  • USB 3
  • Support for Ubuntu


Mobile Sensors: Easy data collection, labeling and model deployment

Disclaimer: My targets for this article are data scientists which may not be necessarily coming from a software engineering background. Pardon me if you find this over simplified.

Mobile devices provide a rich set of sensors to allow us get a feel of where the device is being used and how. Sensors can tell us about environment, motion and orientation of the device, among other things. A list of sensors supported by Android can be found here.

There are a lot of cool applications for data coming from these sensors and a lot of those applications could benefit from machine learning models to infer more meaning from the sensor data. A famous example is the classic Human Activity Detection using mobile phones which can be found easily on the internet. But what if you need to collect data and label it for a different purpose? Here I show an easy way to build a mobile app which can run on both Android and iOS for both data collection and testing of the trained model. More...

Kafka stream processing: lookup against hive data

Here is a scenario which in my opinion should be very common:

Suppose you need to build an ETL kafka stream which read data from one stream and checks it against a blacklist before writing to destination stream. This blacklist gets updated daily, andhas the same key as your source stream. One way to implement this is to use a Kafka Table (ktable) and join your stream with the table to find the matches.More...

Ingesting 250 million daily IoT messages with Hadoop and Hive 3.0 in Azure: Lessons Learned

250 Million records per day may not be a lot of data for large environments with billions of users, but those companies have huge budgets and countless servers to do it. It's a different story in startup world and you have to squeeze the resources to get the job done with less budget. I will highlight what I learned during optimization of an analytics backend which was designed based on Azure HDInsight. Is Hadoop+Hive most suitable for this purpose is a question for another time and I'm not advocating these technologies, but if you are dealing with them, specially on Azure cloud, I hope this post save you some time.More...

Custom Hadoop RecordReader to read JSON with no line breaks

This past week I had to deal with loading few terra bytes of data into our Spark cluster. This data is stored in a JSON array, and there is no line break to separate individual JSON objects. Spark can easily deal with JSON, but your JSON must be one object per line. I had to write a custom Hadoop RecordReader to work around this issue.More...

Azure HDInsight performance benchmarking

I did a brief performance benchmark of spark execution time in Azure HDInsight spark couple of months ago and the result was very disappointing. Recently I did a much deeper investigation and benchmarking and cost analysis of the Azure HDInsight to see does it make ANY sense to use it, and results do not surprise me at all. More...

GPU assisted Machine Learning: Benchmark

A recent project at work, involving binary classification using a Keras LSTM layer with 1000 nodes which took almost an hour to run initiated my effort to speedup this type of problems. In my previous post, I explained the hardware and software configuration I'm about to use for this benchmark. Now I'm going to run the same training exercise with and without GPU and compare the runtimes. More...

GPU and ML: Setting up CUDA + Ubuntu 18.04 on Supermicro X10 server board

There are lots of blog posts explaining how to setup a Machine Learning system with GPU support, but what I ended up going through I could not find anywhere. Due to specific hardware and software combination I'm using, I had to figure out how to do thing and in what order for this to work. I may have gone through a dozen full reinstalls before I got a stable and working setup. That's why I'm writing it down here so it may save someone else a lot of time.More...

Azure Spark (HDInsight) performance is terrible, here is why

From my recent few posts you can see I'm experimenting with a small Spark cluster I built on my server at home. Although this machine was built with server grade parts, it was built 4 years ago, so not top of the line by any standard. One Xeon processor running at 3.1 GHz with 4 cores, 32 GB of DDR3 RAM and consumer (not server grade) SSD. I'm running 3 VMs on this machine, each one using only one core. Naturally I did not expect Spark processing on my cluster to be performant, but to my surprise, performance of these one core machines beats an Azure's HDInsight cluster with D12 v2 machines which have 4 cores each.More...