Streaming Data to HTTP using Akka Streams with Exponential Backoff on 429 Too Many Requests
HTTP/REST is probably the most used protocol to exchange data between different services, especially in today's microservice world...
HTTP/REST is probably the most used protocol to exchange data between different services, especially in today's microservice world...
August 18, 2018
by Roberto Congiu
Silhouette is probably the best library to implement authentication and authorization within the Play Framework. Git repo here : https://github.com/rcongiu/play-silhouette-basic-auth It is very powerful, as you can manage a common identity from multiple providers, so you can have users logging into your site from google, facebook, JWT, and may other methods. It also allows you to fine […]
October 29, 2017
by Roberto Congiu
(note: crossposted from my Nuvolatech Blog If you’ve worked with Spark, you have probably written some custom UDF or UDAFs. UDFs are ‘User Defined Functions’, so you can introduce complex logic in your queries/jobs, for instance, to calculate a digest for a string, or if you want to use a java/scala library in your queries.
April 4, 2015
by Roberto Congiu
Sometimes you need to create denormalized data from normalized data, for instance if you have data that looks like CREATE TABLE flat ( propertyId string, propertyName String, roomname1 string, roomsize1 string, roomname2 string, roomsize2 int, .. ) but we want something like CREATE TABLE nested ( propertyId string, propertyName string, rooms <array<struct<roomname:string,roomsize:int>> ) […]
January 10, 2015
by Roberto Congiu
Panna cotta is one of my favorite dessert and one you can enjoy at many Italian restaurants here in LA. It looks and sounds fancy, but it’s incredibly easy to make if you just get the right ingredients, in particular the gelatin. It is also very important to get very fresh ingredients, since it’s basically […]
September 17, 2013
by Roberto Congiu
Introduction Hive has a rich and complex data model that supports maps, arrays and structs, that could be mixed and matched, leading to arbitrarily nested structures, like in JSON. I wrote about a JSON SerDe in another post and if you use it, you know it can lead to pretty complicated nested tables. Unfortunately, hive […]
July 11, 2011
by Roberto Congiu
Today I finished coding another SerDe for Hive which, with my employer’s permission, I published on github here: https://github.com/rcongiu/Hive-JSON-Serde.git. Since the code is still fresh in my mind, I thought I’d write another article on how to write a SerDe, since the official documentation on how to do it it scarce and you’d have to […]
October 27, 2009
by Roberto Congiu
I am currently working to set up an OLAP data warehouse using Hive on top of Hadoop. We have a considerable amount of data that comes from the ad servers on which we need to perform various kinds of analysis. Writing a map-reduce job is not difficult in principle – it’s just time consuming and […]
October 27, 2009
by Roberto Congiu
With the constant increasing of the quantity of data that companies collect and need to process, Data Warehousing is a job sector that’s expnding even in the recession. It it also living a second youth, thanks to a number of open source projects that have been slowly but surely gaining popularity in a manner similar […]
June 7, 2009
by Roberto Congiu
One of the first questions that a ‘traditional’ ETL engineer asks when learning hadoop is, “How do I do a join ?” For instance, how can we do in hadoop something like querying for the names of all employees who are in a California city: SELECT e.name, c.name from employees e INNER JOIN cities c […]
March 12, 2019
by rcongiu
0