Consuming Microsoft Cognitive Services with Swift 4

This post is a direct result of a conversation with a colleague in a taxi in Madrid. We were driving to Santiago Bernabéu (the Real Madrid Stadium) to demonstrate to business leaders the power of artificial intelligence.

The conversation was around the ease of use of Cognitive Services for what we call “native native” developers. We refer to those that use Objective-C, Swift or Java as ‘native native’ as frameworks like ReactNative and Xamarin are also native, but we consider these “XPlat Native”. He argued that the lack of Swift SDKs prevented the adoption of our AI services such as our Vision APIs.

I maintained that all Cognitive Service APIs are well documented, and we provide an easy to consume suit of REST APIs, which any Swift developer worth their salt should be able to use with minimal effort.

Putting money where my mouth is

Having made such a statement, it made sense for me to test if my assertion was correct by building a sample app that integrates with Cognitive Services using Swift.

Introducing Bing Image Downloader. A fully native macOS app for downloading images from Bing, developed using Swift 4.

Screen Shot 2018-05-10 at 11.11.55.png

I’ve put the code on Github for you to download and play with if you’re interested in using Cognitive Services within your Swift apps, but I’ll also explain below how I went about building the app.

Where the magic happens

In the interest of good development practices, I started by creating a Protocol (C# developers should think of these as Interfaces) to define what functions the ImageSearch class will implement.

Protocol

protocol ImageServiceProtocol {
// We will take the results and add them to hard-coded singleton class called AppData. 
func searchForImageTerm(searchTerm : String)

// We pass in a completion handler for processing the results of this func
func searchForImageTerm(searchTerm : String, completion : @escaping ([ImageSearchResult]) -> ())
}

Two Implementations for one problem

I’ve made sure to include two implementations to give you options on how you’d want to interact with Cognitive Services. The approach used in the App makes use of the Singleton class for storing AppData as well as using Alamofire for handling network requests. We’ll look at this approach first.

search For Image Term

This is the public func, which is easiest to consume.

func searchForImageTerm(searchTerm : String) {

    //Search for images and add each result to AppData
    DispatchQueue.global.(qos: .background).async {
        let totalPics = 100
        let picsPerPage = 50 
        let numPages = totalPics / picsPerPage 
        (0 ..< numPages)             
            .compactMap { self.createUrlRequest(searchTerm: searchTerm, pageOffset: $0 }             
            .foreach{ self.fetchRequest(request: $0 as NSURLRequest) }         
        .RunLoop.current.run()     } 
} 

create Url Request

private func createUrlRequest(searchTerm : String, pageOffset : Int) -> URLRequest {

    let encodedQuery = searchTerm.addingPercentEncoding(withAllowedCharacters: .urlQueryAllowed)!
    let endPointUrl = "https://api.cognitive.microsoft.com/bing/v7.0/images/search"

    let mkt = "en-us"
    let imageType = "photo"
    let size = "medium" 

    // We should move these variables to app settings
    let imageCount = 100
    let pageCount = 2
    let picsPerPage = totalPics / picsPerPage 

    let url = URL(string: "\(endPointUrl)?q=\(encodedQuery)&count=\(picsPerPage)&offset=\(pageOffset * picsPerPage)&mkt=\(mkt)&imageType=\(imageType)&size=\(size)")!
        
    var request = URLRequest(url: url)
    request.setValue(apiKey, forHTTPHeaderField: "Ocp-Apim-Subscription-Key")
        
    return request
}

fetch Request

This is where we attempt to fetch and parse the response from Bing. If we detect an error, we log it (I’m using SwiftBeaver for logging).

If the response contains data we can decode, we’ll loop through and add each result to our AppData singleton instance.

private func fetchRequest(request : NSURLRequest){
    //This task is responsbile for downloading a page of results
    let task = URLSession.shared.dataTask(with: request as URLRequest){ (data, response, error) -> Void in
            
    //We didn't recieve a response
    guard let data = data, error == nil, response != nil else {
        self.log.error("Fetch Request returned no data : \(request.url?.absoluteString)")
        return
    }
            
    //Check the response code
    guard let httpResponse = response as? HTTPURLResponse,
        (200...299).contains(httpResponse.statusCode) else {
        self.handleServerError(response : response!)
        return
    }
            
    //Convert data to concrete type
    do
    {
        let decoder = JSONDecoder()
        let bingImageSearchResults = try decoder.decode(ImageResultWrapper.self, from: data)
                
        let imagesToAdd = bingImageSearchResults.images.filter { $0.encodingFormat != EncodingFormat.unknown }
            AppData.shared.addImages(imagesToAdd)            
        } catch {
            self.log.error("Error decoding ImageResultWrapper : \(error)")
            self.log.debug("Corrupted Base64 Data: \(data.base64EncodedString())")
        }     
     }
        
     //Tasks are created in a paused state. We want to resume to start the fetch.
     task.resume()
}   

Option two (with no 3rd party dependancies)

As a .NET developer, the next approach threw me for a while and took a little bit of reading about Closures to fully grasp. With this approach, I wanted to return an Array of ImageSearchResult type, but this proved not to be the best approach. Instead, I would need to pass in a function that can handle the array of results instead.

// Search for images with a completion handler for processing the result array
func searchForImageTerm(searchTerm : String, completion : @escaping ([ImageSearchResult]) -> ()) {
        
    //Because Cognitive Services requires a subscription key, we need to create a URLRequest to pass into the dataTask method of a URLSession instance..
    let request = createUrlRequest(searchTerm: searchTerm, pageOffset: 0)
       
    //This task is responsbile for downloading a page of results
    let task = URLSession.shared.dataTask(with: request, completionHandler: { (data, response, error) -> Void in
            
    //We didn't recieve a response
    guard let data = data, error == nil, response != nil else {
        print("something is wrong with the fetch")
        return
    }
            
    //Check the response code
    guard let httpResponse = response as? HTTPURLResponse,
    (200...299).contains(httpResponse.statusCode) else {
        self.handleServerError(response : response!)
        completion([ImageSearchResult]())
        return
    }
            
    //Convert data to concrete type
    do
    {
        let decoder = JSONDecoder()
        let bingImageSearchResults = try decoder.decode(ImageResultWrapper.self, from: data)
                
        //We use a closure to pass back our results.
        completion(bingImageSearchResults.images)
                
    } catch { self.log.error("Decoding ImageResultWrapper \(error)") }
    })
    task.resume()
}

Wrapping Up

You can find the full project on my Github page which contains everything you need to build your own copy of this app (maybe for iOS rather than macOS?).

If you have any questions, then please don’t hesitate to comment or email me!

 

Cross-Platform Desktop UIs with C#

xplatDesktop

I’ve spent the last 4 weeks traveling Europe for the Xamarin European roadshow, and have had the opportunity to meet a few thousand C# developers who share a passion for cross-platform development.

In almost every city, I’ve been asked to recommend a Xamarin.Forms style library for developing desktop applications. In this blog post I’m going to give an overview of the different options available to desktop developers who wish to target Windows, Mac and Linux.

Traditional Approach

The first approach is what we’ve named at Xamarin the ‘traditional’ approach. You’ve probably seen this approach, but for mobile. The general idea is that you should implement your user interface uniquely for each platform you wish to target. This means on Mac, you would use Cocoa (Xamarin.Mac), Windows would use WPF and Linux would use Gtk (Gtk#). This approach will guarantee that your desktop application looks and behaves as the platform users expect. It also means that your application looks great if Apple, Microsoft or the OpenSource community decide to update the look and feel of the underlying OS. It’s also worth noting that with this approach you gain 100% access to every UI control and API available in the UI libraries, and won’t be limited in your ability to create beautiful experiences for your users.

In case you’re in any doubt, this is the approach I recommend you take when developing your apps. This is actually the approach Xamarin has started to use for our new products. You can see this in action with our Profiler and Android Simulator; both of these use WPF on Windows, and Xamarin.Mac (Cocoa) on OS X.

XWT

Much like Xamarin.Forms, Xwt allows you to use one API that maps to the native widgets of the platform. This means your application when running on Windows will be using WPF widgets, on Mac its Cocoa, and Linux is Gtk. This results in a 100% native user interface on three platforms from one codebase. Much like Xamarin.Forms, because its aim is to create a unified API for all desktop platforms, it only maps to a subset of widgets and properties. It’s worth noting that with Xwt you still have the ability to implement a native widget which isn’t mapped as part of the API.

For all platforms you can use the native engine, or the Gtk engine. If you’re wondering what a Gtk app looks like on Windows and Mac, then I recommend downloading Xamarin Studio. This is primarily built using Gtk, and in areas actually uses Xwt. On Windows the native engine is WPF, on OS X its Cocoa, and on Linux it remains Gtk (using Gtk#).

Webview

One other option you might want to consider, is using a WebView for your user interface whilst maintaining a C# backend. This is the approach that Rdio has taken for their OS X client, and to a novice it’s difficult to spot that it’s not a native app. This approach can produce some great looking applications which can even run in the Cloud, but it would be difficult to claim you’ve created an application when the reality is you’ve packaged up a website.

QtSharp

Although this approach is not yet ready for consumption, I thought I would mention it as it’s a project on GitHub that excites me. Much like Xamarin.Mac is a binding to the Cocoa framework, a group of enterprising .Net developers are aiming to create a .Net binding to the Qt library. Having used Qt in a previous life, I can confirm that the UIs can often be a little hit or miss (because it’s a lowest common denominator approach). That said, if you’re developing an internal application, or willing to take the time to craft the UI for each platform (different layouts for each platform) then it can work really well.

The project is still in its infancy, and many developers have tried and failed at this approach. Its not ready for production as yet (it doesn’t appear to even be close) but its a great start. My fingers are crossed that the developers continue to invest their time in the project, and the .Net community gains access to one of the most widely used cross-platform user interfaces frameworks in existence.