G30 Consultants
Hero Image Clee Hill

G30 Consultants provides objective consultancy on technology, architecture and the process of designing, developing and deploying software systems. The basis of this authorative and broad coverage comes from experience gained over the past nearly forty years in; Computer Manufacturers at Apricot and Tandon, Systems Software Publishers like Digital Research and Novell, Financial Systems with Pegasus Software, startups like Joost, Platform Architecture and Engineering with BBC Online and STEM Publishing and Technology Enterprise Architecture for Elsevier.

This web site shows some of our ongoing projects, publications and documentation. In doing this the aim is to show the breadth of coverage of technology today. Scientific Research Publishing is a Flipboard magazine on the business of Scientific Research Publishing with ~ 1,500 subscribers. The Walled Garden and Prairie is a Chatauqua on philosophic tensions in Architecture, Security and Infrastructure. The Wiki is a place for us to publish documentation, articles in an open and public way.

This site is cookieless, no cookies are set and no information is captured other than access logs. If the Wiki pages are accessed then there are Confluence cookies set, you should be prompted and informed on their use.

Projects

Sevilla

Seville Project

Capability Mapping Tool.

Using a curated taxonomy the tool enables the mapping of an Organisation to its capabilities, both internal and external. Progress includes the cleaning up of SIC and NAICS Codes.

Common diagrammatic vocabulary of Architecture

Zettel.IO

Zettel.io

is a repository for public and private notes, a kind of web Kasten as used by Liebniz. Zettel.io will provide a solution as to how to manage notes, papers and slips without regard to the tool or editor used to create them. It is a set of repositories, both public and private, that stores and indexes content and does so with the lowest barrier to entry.

Publications

Scientific Research Publishing

The invisible structure providing Open Access in HEP

Only four years ago, most publications in High-Energy Physics (HEP) were behind paywalls, only accessible to a limited audience of academics. Today, …

The invisible structure providing Open Access in HEP

CERN revises its Open Access Policy

A revised version of the CERN Open Access Policy was approved by Director-General Fabiola Gianotti on 25 May 2021. The updated policy will help …

Supporting open access research: three new agreements | Library

The Library has signed three new agreements to cover article processing charges (APCs) for open access (OA) articles published by Waterloo …

Supporting open access research: three new agreements | Library

BMJ and Jisc collaborate to support open access publishing | Research Information

Publishing platforms are digital solutions that are principally designed to help publishers and authors to evaluate and disseminate content Research …

BMJ and Jisc collaborate to support open access publishing | Research Information

Jisc toolkit helps university presses publish OA | Research Information

Publishing platforms are digital solutions that are principally designed to help publishers and authors to evaluate and disseminate content Research …

https://www.researchinformation.info/news/deal-will-evaluate-uk-journal-subscriptions?s=09

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis - Scientometrics

Traditionally, Web of Science and Scopus have been the two most widely used databases for bibliometric analyses. However, during the last few years …

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis - Scientometrics

COVID-19: Where is the data?

Levels of open data remain stubbornly low, despite efforts to make research publishing more transparent, say Julien Larregue et al. 22 December …

COVID-19: Where is the data?

The Walled Garden & Prairie

walledgarden
&
prairie

This Chatauqua is a series on what is generally seen as two major incompatible views on how to design, implement and operate information systems. The aim is to explore how both general approaches make sense in different ways and in combination rather than in antagonism. In choosing Walled Garden and Prairie to categorise these two general approaches it might seem that I have a negative bias to one or the other, that really isn't where this is coming from. Walled Garden is often treated as a derogatory term, with hard perimeters, control and command hierarchies and long term planning but its also (in its original use) about providing the right conditions for different plants with different requirements and a structured navigation which allows maintenance, gardening without obstructing or affecting the rest of the Garden. In its way the Prairie might seem without controls, borders, vulnerable and inefficient but it's also resilient because there is no single point of failure and burning it down periodically keeps it healthy.

G30 on Data Articles

They Are Asking For What?

Page edited by Simon Lucy

In the preparation for GDPR and the UK refresh of the Data Protection Act it seems that some organisations, especially government departments and public bodies are asking some very awkward questions.

  • Assurances, pledges, loyalty oaths that an organisation is 'GDPR' ready.
  • Copies of the organisation's policies and procedures relating to GDPR
  • Claiming a right of audit 
  • Contractually requiring agreement and proof of deletion, 'forgetting' and, or returning of data.
  • Contractual requirements which duplicate or place the burden of handling personal data provided to the organisation, by the contracting party.


There are probably others going the rounds as well. Prefacing what I'm going to say with IANAL (I am not a lawyer) it makes sense to apply some rationality to this and maintain the proper borders between organisations. In general I do not believe that anyone has to promise to any one else, including the Government that they are going to obey the law. Extra contractual conditions won't provide any kind of indemnity to either party, remember that organisations that supply data have a duty of care to both the owners of the data (individuals), and whoever they share it with, that they indeed have the necessary permissions. There could potentially be a lot of 'indemnities' swapped around. But of course they mean nothing.

The same really holds true on being asked for copies of policies, procedures or even implementations. This isn't new behaviour there are some companies that seem to love forcing their own procedures down suppliers or even customer's throats. Being asked for the details of ISO 27001, the procedures for generating, holding and rotating keys and so on. Each one of these requests can be managed with a blanket 'We maintain and regularly review our policies in this and other areas in the light of Best practice and Regulatory conditions applicable at the time, this includes any legal requirements. The specifics of our policies, procedures and implementations are naturally commercially sensitive and are not to be shared.".

This is especially true over a right of audit. Unless the specifics of handling data can be entirely separated by supplier or third party it would be very difficult to allow third party audit without also exposing other data to which they had no reasonable access.

The last two points, contractually requiring or appearing to place the burden of performing the mandated deletion or forgetting of personal data, are going to be awkward for both supplier, third party and the owner of the data themselves. No organisation is going to want to have the unmitigated risk of a third party not being able to effectively delete or forget the data they've had, but this in no way absolves the data collecting company from their own obligations. There are practicalities around the deletion and forgetting of data which can mitigate some of the risk but first think about the shared risk of a collecting company and third parties that they may use to process, transform or manage that data for them. Those third parties are not just companies like Cambridge Analytica (who never needed identified information anyway), but any kind of service or processing organisation, lawyers, accountants, consultancies, outsourcing IT, off shoring, marketeers, job services, SaaS applications, on and on. All of them will have a shared risk.

Unlike Operational Risks, Business Risks should be singletons. If there isn't one policy, procedure, implementation in place, but many and those are determined by contract; there will be failure. The decision tree for making this sane is fairly obvious but there's a couple of things that might not be.

  • This is a Trust process so be transparent about what can be transparent, and transparent about where the borders are.
  • Ask the same Transparency of your Supplier, Customer etc 
  • Eliminate the possibility of being able to use aggregated data before exploring needing to share Personal Data.
  • Still before sharing Personal Data explore thoroughly pseudonymous data (really)
  • Before giving access to Personal Data explore the options of Escrow, the only copy of the data is managed possibly by one third party whose business is providing this service and has no interest in the data itself. (Just like any other Due Diligence process between tow or more parties).
  • If you do have to share data do not (that should be in CAPS), DO NOT ship data but control access. This can even work for the 3rd party processor if they have to take the data set. But they don't do they?
  • Deletion, Forgetting and Archive. Agree what they mean before you start doing this at all.

On the last point, which I want to elaborate on in lots of detail one day soon, it's important to not get bound up in definitions of Deletion, Forgetting and Archive that require adjusting the laws of Physics. The regulation so far, describes all of these states in terms of availability to the 'System' and access or use. Those are the states to concentrate upon. There are complications relating to Archive and legacy systems of many years standing but there are also ways of handling them.


CC BY-SA 4.0

Do We Need This?

Page edited by Simon Lucy

The GDPR forces us to look at our data, categorise it as personal, personally identifiable and everything else (keeping in mind that what was once impersonal can become personally identifiable in association), but often we don't  question why we collect and store this information. It is already part of existing Data Protection legislation that only data that is necessary should be collected and then kept only for as long as it is necessary, Rarely do we consider whether data or metadata is useful in itself once we add it into our model and data stores. Often we start collecting it for some future use which is neither clear, decided or planned; and once we have it we keep it because its data and must be valuable.

I'm suggesting that we not collect common personal categorisation data unless there is an overriding need and that for the overwhelming cases there is no such need. This thought was provoked most recently by this Tweet.

Ada Rose Cannon ada@mastodon.social Retweeted ProPublica

As a developer if you are ever asked to do something like this. Pause and look at yourself and what you are enabling. "Someone else would do it so I might as well be paid for it" is not an excuse. Don't build evil. Don't enable evil systems. We need a tech hippocratic oath.

Ada Rose Cannon ada@mastodon.social added,

My initial response was:

Simon_Lucy Retweeted Ada Rose Cannon ada@mastodon.social

This sounds easy to avoid but there are simpler, basic enablers. Collecting and indexing classifiers such as ethnicity, gender, gender preference allows populations to be targetted for whatever purpose.

Because it is straightforward to not engage in writing systems and applications that can be used to aid prejudice and foster division but its very hard to avoid modelling and designing into data stores categorisations that can be used in ways which are prejudicial to the owners of that personal data.

But what about needing to know who is affected by this or that prejudice and persecution so we can protect them and improve their lot? Surely we need to collect information to identify those parts of the population that need help? But do you need to count in order to know what is the right way to treat everyone? Is counting and identifying itself the wrong?

For specific needs does someone really need to identify themselves as disabled, or are they really an individual with a requirement?

Personally, I don't categorise myself in any ethnic, religious category on any form and would avoid gender and age if I could. As a modeller, architect, would I argue against collecting this data? I would now. 

I would employ all the arguments about not collecting personnel data unnecessarily. Does your application/system require gender to be relevant? Really? Age? Should the provision of public services need ethnic data? And so on.

Really ask if each one of these metadata categories is necessary, bear in mind that each of the categories will likely be from a control list plus 'other' perhaps. What purpose will be served, if its for some broad population statistical use ask how does this category give meaning ful information that actually matters in a statistic. Take Male/Female (I won't fall back on self described gender issues to begin with the traditional simple case should suffice), how does it help knowing someone ticked either box? Will they buy or be interested in a different product or service? Will they want different information, will the content be filtered?

If the answer is yes I'd ask the question, so you wouldn't sell to someone of the wrong gender? Would you only show pink bikes to girls? Carbon fibre drop handlebars to boys? I'd hope not, so how does your case differ?

Are the services, products and information that I'm interested in in any way absolutely connected with my age, gender, ethnicity, mobility? I don't think so. The actual services, products and information might very well have particular characteristics that I want to filter and search by but not necessarily because I possess or share those characteristics.

If you're building recommenders why limit or filter or weight your recommendations based upon any of these largely loose non-authoritative categories? Isn't the behaviour and content far more important. Quite a while ago now, over ten years, we were involved in building a streaming personal video platform and the Advertising people wanted us to include something like 200+ questions on the user's likes and dislikes. That got rejected fairly quickly just in contemplating the registration funnel but it did betray the then assumptions about how to collect and slice and dice population data. Analyse the individual, get as much stuff about the individual from the individual and apply that to the content, product or service. It came straight out of the publishing industry with the cards for subscribers to punch or circle their characteristics.

Now of course we apply it the other way round, the behaviour of the individual and their peers or group, their history of success and failure or abandonment, the content/product/service chosen by that group along with many other dimensions of behaviour to new and evolving content, products and services, all that Big Data stuff. And we don't really need all that categorisation up front, in fact it could skew the results badly. 

But for those organisations whose data sets were modelled and collected well before this Big Data magic and they had all this carefully researched (or not) data, has it been reevaluated? Or is it sitting there in the rest of the data but consciously or unconsciously splitting your data sets into what might be irrelevant and even misleading subsets.

I think a great many of these characteristics should not be collected and stored and they should all be reevaluated periodically. I include official forms in this, actually I especially include official forms in this.


CC BY-SA 4.0

Contact Us